218 113 48MB
English Pages 1600 [1603] Year 2021
HANDBOOK OF HUMAN FACTORS AND ERGONOMICS
HANDBOOK OF HUMAN FACTORS AND ERGONOMICS Fifth Edition
Edited by
Gavriel Salvendy and Waldemar Karwowski University of Central Florida Orlando, Florida
This book is printed on acid-free paper. Copyright © 2021 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with the respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor the author shall be liable for damages arising herefrom. For general information about our other products and services, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Cataloging-in-Publication Data is Available: 9781119636083 (hardback) 9781119636106 (epdf) 9781119636090 (epub) Cover Design: Wiley Cover Image: © alvarez/Getty Images; SciePro/Getty Images; John Gress Media Inc/Shutterstock.com; Scharfsinn/Shutterstock.com; Bloomicon/Shutterstock.com
CONTENTS
About the Editors
ix
Contributors
xi
Foreword Preface 1. Human Factors Function 1. The Discipline of Human Factors and Ergonomics Waldemar Karwowski and Wei Zhang 2. Human Systems Integration and Design Guy A. Boy 2. Human Factors Fundamentals
xxi xxiii 1 3 38
55
3. Sensation and Perception Robert W. Proctor and Janet D. Proctor
57
4. Selection and Control of Action Robert W. Proctor and Kim-Phuong L. Vu
91
5. Information Processing Christopher D. Wickens and C. Melody Carswell
114
6. Decision-Making Models, Decision Support, and Problem Solving Mark R. Lehto and Gaurav Nanda
159
7. Mental Workload G.M. Hancock, L. Longo, M.S. Young, and P.A. Hancock
203
8. Social and Organizational Foundation of Ergonomics: Multi-Level Systems Approaches Pascale Carayon
227
9. Emotional Design Feng Zhou, Yangjian Ji, and Roger Jianxin Jiao
236
10. Cross-Cultural Design Tom Plocher, Pei-Luen Patrick Rau, Yee-Yin Choong, and Zhi Guo 3. Design of Equipment, Tasks, Jobs, and Environments
252
281
11. Three-Dimensional (3D) Anthropometry and Its Applications in Product Design Liang Ma and Jianwei Niu
283
12. Basic Biomechanics and Workplace Design William S. Marras and Waldemar Karwowski
303 v
vi
CONTENTS
13. The Changing Nature of Task Analysis Erik Hollnagel
358
14. Workplace Design Nicolas Marmaras and Dimitris Nathanael
368
15. Job and Team Design Frederick P. Morgeson and Michael A. Campion
383
16. Design, Delivery, Evaluation, and Transfer of Effective Training Systems Tiffany M. Bisbey, Rebecca Grossman, Kareem Panton, Chris W. Coultas, and Eduardo Salas
414
17. Situation Awareness Mica R. Endsley
434
4. Design for Health, Safety, and Comfort
457
18. Sound and Noise: Measurement and Design Guidance John G. Casali
459
19. Vibration and Motion Neil J. Mansfield and Michael J. Griffin
494
20. Human Errors and Human Reliability Peng Liu, Renyou Zhang, Zijian Yin, and Zhizhong Li
514
21. Occupational Safety and Health Management Jeanne Mager Stellman, Sonalee Rau, and Pratik Thaker
573
22. Managing low-Back Disorder Risk in the Workplace William S. Marras and Waldemar Karwowski
597
23. Manual Materials Handling: Evaluation and Practical Considerations Fadi A. Fathallah and Ira Janowitz
630
24. Warnings and Hazard Communications Michael S. Wogalter, Christopher B. Mayhorn, and Kenneth R. Laughery, Sr.
644
25. Use of Personal Protective Equipment Grazyna ̇ Bartkowiak, Krzysztof Baszczy´nski, Anna Bogdan, Agnieszka Brochocka, Anna Da˛browska, Rafał Hrynyk, Emilia Irzma´nska, Danuta Koradecka, Emil Kozłowski, Katarzyna Majchrzycka, Krzysztof Makowski, Anna Marszałek, Magdalena Młynarczyk, Rafał Mły´nski, Grzegorz Owczarek, ̇ and Jan Zera
668
5. Human Performance Modeling
685
26. Mathematical Modeling in Human–Machine System Design and Evaluation Changxu Wu and Yili Liu
687
27. Modeling and Simulation of Human Systems Gunther E. Paul
704
28. Human Supervisory Control of Automation Thomas B. Sheridan
736
29. Digital Human Modeling in Design Vincent G. Duffy
761
30. Extended Reality (XR) Environments Kay M. Stanney, Hannah Nye, Sam Haddad, Kelly S. Hale, Christina K. Padron, and Joseph V. Cohn
782
31. Neuroergonomics Hasan Ayaz and Frédéric Dehais
816
CONTENTS
6. System Evaluation
vii
843
32. Accident and Incident Investigation Patrick G. Dempsey
845
33. Human Factors and Ergonomics Audits Colin G. Drury and Patrick G. Dempsey
853
34. Cost/Benefit Analysis for Human Systems Investments William B. Rouse and Dennis K. McBride
880
7. Human–Computer Interaction
893
35. Data Visualization Sumanta N. Pattanaik and R. Paul Wiegand
895
36. Representation Design John M. Flach, Kevin B. Bennett, Jonathan W. Butler, and Michael A. Heroux
947
37. Collecting and Analyzing User Insights Matthias Peissner, Kathrin Pollmann, and Nora Fronemann
960
38. Usability and User Experience: Design and Evaluation James R. Lewis and Jeff Sauro
972
39. Website Design and Evaluation Kim-Phuong L. Vu, Robert W. Proctor, and Ya-Hsin Hung
1016
40. Mobile Systems Design and Evaluation June Wei and Siyi Dong
1037
41. Human Factors in Ambient Intelligence Environments Constantine Stephanidis, Margherita Antona, and Stavroula Ntoa
1058
42. Human-Centered Design of Artificial Intelligence George Margetis, Stavroula Ntoa, Margherita Antona, and Constantine Stephanidis
1085
43. Cybersecurity, Privacy, and Trust Abbas Moallem
1107
44. Human–Robot Interaction Jessie Y.C. Chen and Michael J. Barnes
1121
45. Human Factors in Social Media Qin Gao and Yue Chen
1143
8. Design for Individual Differences
1187
46. Design for All in Digital Technologies Constantine Stephanidis
1189
47. Design for People Experiencing Functional Limitations Gregg C. Vanderheiden, J. Bern Jordan, and Jonathan Lazar
1216
48. Design for Aging Jia Zhou and Qin Gao
1249
49. Design of Digital Technologies for Children Panos Markopoulos, Janet C. Read, and Michail Giannakos
1287
9. Selected Applications 50. Human Factors and Ergonomics Standards Waldemar Karwowski, Redha Taiar, David Rodrick, Bohdana Sherehiy, and Robert R. Fox
1305 1307
viii
CONTENTS
51. Data Analytics in Human Factors Matt Holman, Guy Walker, Melissa Bedinger, Annie Visser-Quinn, Kerri McClymont, Lindsay Beevers, and Terry Lansdown
1351
52. Human Factors and Ergonomics in Design of A3 : Automation, Autonomy, and Artificial Intelligence Ben D. Sawyer, Dave B. Miller, Matthew Canham, and Waldemar Karwowski
1385
53. Human Factors and Ergonomics in Health Care Pascale Carayon, Kathryn Wust, Bat-Zion Hose, and Megan E. Salwei
1417
54. Human Factors and Ergonomics in Digital Manufacturing Dieter Spath and Martin Braun
1438
55. Human Factors and Ergonomics in Aviation Steven J. Landry
1460
56. Human Side of Space Exploration and Habitation Kevin R. Duda, Dava J. Newman, Joanna Zhang, Nicolas Meirhaeghe, and H. Larissa Zhou
1480
57. Human Factors and Ergonomics for Sustainability Klaus Fischer, Andrew Thatcher, and Klaus J. Zink
1512
Index
1529
ABOUT THE EDITORS
Gavriel Salvendy is University Distinguished Professor at the University of Central Florida and founding president of the Academy of Science, Engineering, and Medicine of Florida. He is also professor emeritus of Industrial Engineering at Purdue University and was the Founding Head (2001–2011) of the Department of Industrial Engineering at Tsinghua University, China. From 1984 to 1999, he was the NEC Corporation’s private chair holder at Purdue University. He is the author of nearly 600 research publications including over 320 journal papers. He has been the major professor to 68 PhD students. His main research deals with the human aspects of design, operation, and management of advanced engineering systems. In 1990, he became the first member of the Human Factors and Ergonomics Society to be elected to the National Academy of Engineering (NAE). In 1995, he received an Honorary Doctorate from the Chinese Academy of Sciences. He is the fourth person in all fields of science and engineering in the 45 years of the Academy ever to receive this award. In 2006, he received the Friendship Award presented by the People’s Republic of China. The award is the highest honor the Chinese government confers on foreign experts. In 2007, he received the American Association of Engineering Societies’ John Fritz Medal, which is the engineering profession’s highest award. Special issues of the journals Ergonomics (2003), Computers in Industry (2010) and Intelligent Manufacturing (2011) were published in honor of Gavriel Salvendy. He is Honorary Fellow and life member of the Ergonomics Society and Fellow of the Human Factors and Ergonomics Society, Institute of Industrial and Systems Engineers, and the American Psychological Association. He earned his Ph.D. in Engineering Production at the University of Birmingham, United Kingdom. Waldemar Karwowski is Pegasus Professor and Chairman, Department of Industrial Engineering and Management Systems, University of Central Florida, USA. He holds an M.S. in Production Engineering and Management from the University of Technology Wroclaw, Poland, and a Ph.D. in Industrial Engineering from Texas Tech University, USA. He was awarded D.Sc. in management science by the Institute for Organization and Management in Industry, Warsaw, and received the National Professorship title from the President of Poland (2012). Three Central European universities also awarded him Doctor Honoris Causa degrees. Dr. Karwowski served on the Board on Human Systems Integration, National Research Council, USA (2007–2011). He currently is Co-Editor-in-Chief of Theoretical Issues in Ergonomics Science journal, Editor-in-Chief of Human-Intelligent Systems Integration journal, and Field Chief Editor of the Frontiers in Neuroergonomics journal. Dr. Karwowski has over 550 research publications, including over 200 journal papers focused on ergonomics and safety, human performance, neuro-fuzzy systems, nonlinear dynamics, human-centered AI, and neuroergonomics. He is Fellow of the Ergonomics Society (UK), the Human Factors and Ergonomics Society (HFES), the Institute of Industrial and Systems Engineers (IISE), and the International Ergonomics Association (IEA), and has served as President of both HFES (2006–2007) and the IEA (2000–2003). He received the William Floyd Award from the Chartered Institute of Ergonomics & Human Factors, the United Kingdom in 2017, and the David F. Baker Distinguished Research Award, Institute of Industrial and Systems Engineers, Atlanta, USA, in 2020.
ix
CONTRIBUTORS
Margherita Antona Principal Researcher, HCI Lab Institute of Computer Science – FORTH Heraklion, Crete, Greece Hasan Ayaz Provost Solutions Fellow Associate Professor School of Biomedical Engineering, Science and Health Systems Department of Psychology, College of Arts and Sciences Drexel Solutions Institute Drexel University Michael J. Barnes Senior Research Psychologist U.S. Army Research Laboratory Adelphi, Maryland ̇ Grazyna Bartkowiak Assistant Professor and Head of the Laboratory Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Krzysztof Baszczy´nski Assistant Professor and Head of the Laboratory Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Melissa Bedinger Research Associate Institute for Infrastructure and Environment School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK
Lindsay Beevers Professor of Water Management Institute for Infrastructure and Environment School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK Kevin B. Bennett Professor Human Factors and Industrial Organizational PhD Program Wright State University Dayton, Ohio Tiffany M. Bisbey Graduate Student Department of Psychological Sciences Rice University Texas Anna Bogdan Associate Professor and Vice-Dean General and Scientific Affairs Faculty of Building Services, Hydro and Environmental Engineering Warsaw University of Technology Warsaw, Poland Guy A. Boy FlexTech Chair Institute Professor CentraleSupélec, Paris Saclay University ESTIA Institute of Technology Paris, France xi
xii
Martin Braun Senior Researcher at Fraunhofer Institute for Industrial Engineering (IAO) Lecturer at Institute for Human Factors and Technology Management (IAT) University of Stuttgart Stuttgart, Germany Agnieszka Brochocka Assistant Professor and Head of the Laboratory Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Jonathan W. Butler Senior UX/UI Designer Mile Two, LLC Dayton, Ohio Michael A. Campion Herman C. Krannert Distinguished Professor of Management Krannert Graduate School of Management Purdue University West Lafayette, Indiana Matthew Canham Research Professor Institute for Simulation and Training University of Central Florida Orlando, Florida Pascale Carayon Leon and Elizabeth Janssen Professor in the College of Engineering Department of Industrial & Systems Engineering Wisconsin Institute for Healthcare Systems Engineering University of Wisconsin-Madison Madison, Wisconsin C. Melody Carswell Professor Emeritus of Psychology University of Kentucky Lexington, Kentucky John G. Casali Grado Chaired Professor; Director, Auditory Systems Laboratory Department of Industrial and Systems Engineering
CONTRIBUTORS
Virginia Tech Blacksburg, Virginia Founder and Chief Technology Officer Hearing, Ergonomics & Acoustics Resources (H.E.A.R.) LLC Jessie Y.C. Chen Senior Research Scientist (ST) for Soldier Performance U.S. Army Research Laboratory Adelphi, Maryland Yue Chen Postdoctoral Researcher Department of Industrial Engineering Tsinghua University Beijing, China Yee-Yin Choong Research Scientist National Institute of Standards and Technology Gaithersburg, Maryland Joseph V. Cohn Chief, Research Program Administration Division Defense Health Agency Chris W. Coultas Director of Science & Research; Senior Consultant Leadership Worth Following, LLC Anna Da˛browska Research Associate Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Frédéric Dehais Professor ISAE-SUPAERO, Université de Toulouse School of Biomedical Engineering, Science and Health Systems Drexel University Patrick G. Dempsey Chief, Workplace Health Branch Pittsburgh Mining Research Division National Institute for Occupational Safety and Health Pittsburgh, Pennsylvania
CONTRIBUTORS
Siyi Dong Researcher Center for Research on Zhejiang Digital Development and Governance Doctoral Candidate School of Management Zhejiang University China Colin G. Drury SUNY Distinguished Professor Emeritus Department of Industrial and Systems Engineering University at Buffalo: SUNY Buffalo, New York Kevin R. Duda Group Lead, Space & Mission Critical Systems Systems Engineering Directorate The Charles Stark Draper Laboratory, Inc. Vincent G. Duffy Professor School of Industrial Engineering and Department of Agricultural and Biological Engineering Purdue University West Lafayette, Indiana Mica R. Endsley President SA Technologies Marietta, Georgia Fadi A. Fathallah Professor Department of Biological and Agricultural Engineering University of California, Davis California Klaus Fischer FOM University of Applied Science Mannheim, Germany John M. Flach Senior Cognitive Systems Engineer Mile Two, LLC Emeritus Professor Psychology Department Wright State University Dayton, Ohio
xiii
Robert R. Fox General Motors Technical Fellow for Ergonomics Global Ergonomics & Virtual Human Simulation General Motors Company Nora Fronemann Team Lead User Experience Fraunhofer Institute for Industrial Engineering IAO Germany Qin Gao Associate Professor Department of Industrial Engineering Tsinghua University Beijing, China Michail N. Giannakos Professor and Head of the Learner-Computer Interaction Lab Department of Computer Science Norwegian University of Science and Technology Trondheim, Norway Michael J. Griffin Professor Human Factors Research Unit Institute of Sound and Vibration Research University of Southampton Southampton, UK Rebecca Grossman Associate Professor of Industrial/Organizational Psychology Department of Psychology Hofstra University New York Zhi Guo Senior User Researcher Institute of Human Factors and Ergonomics Department of Industrial Engineering Tsinghua University Beijing, China Sam Haddad Engineering Director Augmented Reality Technical Fellow Design Interactive, Inc. Oviedo, Florida
xiv
Kelly S. Hale Principal Member of Technical Staff User Experience and Performance Division Draper G. M. Hancock Assistant Professor Department of Psychology California State University, Long Beach Long Beach, California P. A. Hancock Provost Distinguished Research Professor Department of Psychology and Institute for Simulation & Training University of Central Florida Orlando, Florida Michael A. Heroux Director of Design Mile Two, LLC Dayton, Ohio Erik Hollnagel Senior Professor of Patient Safety University of Jönköping Sweden Visiting Professorial Fellow Macquarie University Matt Holman Doctoral Student in Human Factors School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK
CONTRIBUTORS
Ya-Hsin Hung Postdoctoral Research Assistant Department of Psychological Sciences Purdue University West Lafayette, Indiana Emilia Irzma´nska Assistant Professor and Head of the Laboratory Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Ira Janowitz Consultant in Ergonomics (Ret.) Yangjian Ji Professor School of Mechanical Engineering Zhejiang University China Roger Jianxin Jiao Associate Professor The George W. Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, Georgia J. Bern Jordan Co-PI, Trace Center Assistant Research Scientist, College of Information Studies University of Maryland Baltimore, Maryland
Bat-zion Hose Doctoral Student and Research Assistant Department of Industrial & Systems Engineering Wisconsin Institute for Healthcare Systems Engineering University of Wisconsin-Madison Madison, Wisconsin
Waldemar Karwowski Pegasus Professor and Department Chair Department of Industrial Engineering and Management Systems University of Central Florida Orlando, Florida
Rafał Hrynyk Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland
Danuta Koradecka Professor and Director Central Institute for Labour Protection – National Research Institute Warsaw, Poland
CONTRIBUTORS
Emil Kozłowski Research Associate Department of Vibroacoustic Hazards Central Institute for Labour Protection - National Research Institute Warsaw, Poland Steven J. Landry Professor and Peter and Angela Dal Pezzo Chair and Department Head The Harold & Inge Marcus Department of Industrial and Manufacturing Engineering The Pennsylvania State University Pennsylvania Terry Lansdown Associate Professor Psychology School of Social Sciences Heriot-Watt University Edinburgh, UK Kenneth R. Laughery, Sr. Professor Emeritus Psychology Department Rice University Houston, Texas Jonathan Lazar Associate Director, Trace Center Professor, College of Information Studies University of Maryland Baltimore, Maryland Mark R. Lehto Professor School of Industrial Engineering Purdue University West Lafayette, Indiana James R. Lewis Distinguished User Experience Researcher MeasuringU Denver, Colorado Zhizhong Li Professor Department of Industrial Engineering Tsinghua University Beijing, China
xv
Peng Liu Associate Professor Zhejiang University Hangzhou, China Yili Liu Arthur F. Thurnau Professor Department of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan Luca Longo Assistant Professor School of Computer Science College of Sciences & Health Technological University Dublin Ireland Liang Ma Associate Professor Department of Industrial Engineering Tsinghua University Beijing, China Katarzyna Majchrzycka Associate Professor and Department Head Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Krzysztof Makowski Department of Personal Protective Equipment Central Institute for Labour Protection – National Research Institute Warsaw, Poland Neil J. Mansfield Professor of Human Factors Engineering Head of Department of Engineering Nottingham Trent University Nottingham, UK George Margetis Postdoctoral Researcher, HCI Lab Institute of Computer Science – FORTH Heraklion, Crete, Greece
xvi
Panos Markopoulos Professor of Design for Behaviour Change Vice-Dean Department of Industrial Design Eindhoven University of Technology Eindhoven, the Netherlands Nicolas Marmaras Professor of Ergonomics and Dean of the School School of Mechanical Engineering National Technical University of Athens Athens, Greece William S. Marras Honda Chair Professor and Director College of Engineering Spine Research Institute Department of Integrated Systems Engineering The Ohio State University Columbus, Ohio Anna Marszałek Assistant Professor Department of Ergonomics Central Institute for Labour Protection - National Research Institute Warsaw, Poland Christopher B. Mayhorn Professor and Head Psychology Department North Carolina State University Raleigh, North Carolina Dennis K. McBride Chief Strategy Officer Source America Kerri McClymont Doctoral Student in Civil Engineering Institute for Infrastructure and Environment School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK Nicolas Meirhaeghe Doctoral Candidate Harvard-MIT Division of Health Sciences and Technology
CONTRIBUTORS
Bioastronautics Training Program Massachusetts Institute of Technology Cambridge, Massachusetts Dave B. Miller Postdoctoral Associate Department of Industrial Engineering and Management Systems University of Central Florida Orlando, Florida Magdalena Młynarczyk Assistant Professor Head of the Laboratory of Thermal Load Department of Ergonomics Central Institute for Labour Protection - National Research Institute Warsaw, Poland Rafał Mły´nski Assistant Professor Department of Vibroacoustic Hazards Central Institute for Labour Protection - National Research Institute Warsaw, Poland Abbas Moallem Adjunct Professor Department of Industrial Engineering and Dept. of Applied Data Science San Jose State University San Jose, California Frederick P. Morgeson Eli Broad Professor of Management The Broad College of Business Michigan State University East Lansing, Michigan Gaurav Nanda Assistant Professor of Practice School of Engineering Technology Purdue University West Lafayette, Indiana Dimitris Nathanael Assistant Professor School of Mechanical Engineering National Technical University of Athens Athens, Greece
CONTRIBUTORS
Dava J. Newman Apollo Program Professor of Astronautics Harvard-MIT Health, Sciences and Technology MacVicar Faculty Fellow Department of Aeronautic and Astronautics Massachusetts Institute of Technology Cambridge, Massachusetts Jianwei Niu Associate Professor Department of Logistics Engineering, School of Mechanical Engineering University of Science and Technology Beijing Beijing, China Stavroula Ntoa Postdoctoral Researcher, HCI Lab Institute of Computer Science – FORTH Heraklion, Crete, Greece Hannah Nye Senior User Experience Designer Extended Reality Design Lead Design Interactive, Inc. Grzegorz Owczarek Assistant Professor and Head of the Laboratory Department of Personal Protective Equipment Central Institute for Labour Protection - National Research Institute Warsaw, Poland Christina K. Padron Director of DOD Programs Dynepic, Inc.
xvii
Gunther E. Paul Associate Professor Principal Research Fellow OHS Australian Institute of Tropical Health and Medicine James Cook University North Queensland, Australia Matthias Peissner Head of Research Area Human-Technology Interaction Fraunhofer Institute for Industrial Engineering IAO Stuttgart, Germany Tom Plocher Principal Investigator Moai Technologies, LLC Kathrin Pollmann User Experience Researcher Fraunhofer Institute for Industrial Engineering IAO Stuttgart. Germany Janet D. Proctor Senior Academic Advisor Psychological Sciences College of Health and Human Sciences Purdue University West Lafayette, Indiana Robert W. Proctor Distinguished Professor Department of Psychological Sciences Purdue University West Lafayette, Indiana
Kareem Panton Graduate Student Department of Psychology Hofstra University New York
Pei-Luen Patrick Rau Professor, Department of Industrial Engineering Vice Dean, Tsinghua Global Innovation Exchange (GIX) Institute Tsinghua University Beijing, China
Sumanta N. Pattanaik Associate Professor Department of Computer Science University of Central Florida Orlando, Florida
Sonalee Rau Department of Health Policy & Management Mailman School of Public Health Columbia University New York
xviii
Janet C. Read Professor of Child Computer Interaction School of Psychology and Computer Science University of Central Lancashire Preston, UK David Rodrick Health Scientist Administrator Center for Quality Improvement and Patient Safety Agency for Healthcare Research and Quality William B. Rouse Research Professor McCourt School of Public Policy Georgetown University Washington, DC Eduardo Salas Allyn R. & Gladys M. Cline Professor of Psychology and Department Chair Department of Psychological Sciences Rice University Houston, Texas Megan E. Salwei Postdoctoral Research Fellow Department of Biomedical Informatics Center for Research and Innovation in Systems Safety Vanderbilt University Medical Center Nashville, Tennessee Jeff Sauro Founding Principal MeasuringU Denver, Colorado Ben D. Sawyer Assistant Professor Department of Industrial Engineering and Management Systems University of Central Florida Orlando, Florida Bohdana Sherehiy Senior Consultant EurekaFacts Washington, DC
CONTRIBUTORS
Thomas B. Sheridan Ford Professor of Engineering and Applied Psychology Emeritus Department of Mechanical Engineering Professor of Aeronautics and Astronautics Emeritus Massachusetts Institute of Technology Cambridge, Massachusetts Dieter Spath Professor Institute for Human Factors and Technology Management (IAT) University of Stuttgart Head of Fraunhofer Institute for Industrial Engineering (IAO) President of Acatech – German National Academy of Science and Engineering Stuttgart, Germany Kay M. Stanney CEO & Founder Design Interactive, Inc. Oviedo, Florida Jeanne Mager Stellman Professor Emerita and Special Lecturer Department of Health Policy & Management Mailman School of Public Health Columbia University New York Constantine Stephanidis Professor of HCI Department of Computer Science University of Crete Head of HCI Lab and Ambient Intelligence Program Institute of Computer Science – FORTH Heraklion, Crete, Greece Redha Taiar Professor Department of Sport Sciences Université de Reims Champagne Ardenne Reims, France Pratik Thaker Corporate Director Environmental Health & Safety NewYork-Presbyterian Hospital New York
CONTRIBUTORS
Andrew Thatcher Chair of Industrial/Organisational Psychology Department of Psychology University of the Witwatersrand Johannesburg, South AfricA Gregg Vanderheiden Founder and Director, Trace Center Professor, College of Information Studies University of Maryland Baltimore, Maryland Annie Visser-Quinn Research Associate Institute for Infrastructure & Environment School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK Kim-Phuong L. Vu Professor Department of Psychology California State University Long Beach Long Beach, California Guy Walker Professor of Human Factors and EGIS Leader of Pioneering Education School of Energy, Geoscience, Infrastructure and Society Heriot-Watt University Edinburgh, UK June Wei Professor Department of Management and MIS College of Business University of West Florida Pensacola, Florida Christopher D. Wickens Adjunct Professor of Psychology Colorado State University Professor Emeritus of Aviation and of Psychology University of Illinois at Champaign Urbana Institute of Aviation and Department of Psychology Urbana, Illinois
xix
R. Paul Wiegand Assistant Professor Computer Science & Quantitative Methods Winthrop University Rock Hill, South Carolina Michael S. Wogalter Professor Emeritus Psychology Department North Carolina State University Raleigh, North Carolina Changxu Wu Professor Department of Industrial Engineering Tsinghua University Beijing, China Kathryn Wust Doctoral Student and Research Assistant Department of Industrial & Systems Engineering Wisconsin Institute for Healthcare Systems Engineering University of Wisconsin-Madison Madison, Wisconsin Zijian Yin Doctoral Student Department of Industrial Engineering Tsinghua University Beijing, China Mark S. Young Visiting Professor Loughborough Design School Loughborough University Loughborough, UK ̇ Jan Zera Professor and Head of the Electroacoustics Division Institute of Radioelectronics and Multimedia Technology Faculty of Electronics and Information Technology Warsaw University of Technology Warsaw, Poland Joanna Zhang Associate Systems Engineer Northrop Grumman
xx
Renyou Zhang Postdoctoral Research Associate Department of Industrial Engineering Tsinghua University Beijing, China Assistant Professor Department of Safety Engineering Beijing Institute of Petrochemical Technology
CONTRIBUTORS
Jia Zhou Associate Professor School of Management Science and Real Estate Chongqing University China
Wei Zhang Professor Department of Industrial Engineering Tsinghua University Beijing, China
H. Larissa Zhou Doctoral Student in Materials Science/Mechanical Engineering NASA Space Technology Research Fellow Harvard University, School of Engineering and Applied Sciences Cambridge, Massachusetts
Feng Zhou Assistant Professor Dept. of Industrial and Manufacturing Systems Engineering University of Michigan-Dearborn Dearborn, Michigan
Klaus J. Zink Senior Research Professor and Scientific Director Institute for Technology and Work University of Kaiserslautern Kaiserslautern, Germany
FOREWORD
Review of the fourth edition and comment on the fifth edition by Donald A. Norman, Director and Co-Founder, University of California, San Diego Design Lab. This review was written by me for my Website (www.jnd.org): hence the informal writing style. Although it is not in the format I would have provided had I done a normal foreword for this Handbook. I have given permission to reprint it here. I could write it more substantively, with more words and deeper analysis, but the message would stay the same: This is an essential book for professionals and students alike. Maybe the message is even stronger in this shorter, less formal format. I’m often asked for reading suggestions, especially for references to the literature on Human Factors and Ergonomics. In the past few months, I have been reading chapters of one book that has it all: Gavriel Salvendy’s massive tome, the Handbook of Human Factors and Ergonomics. It is huge, with over 1,500 pages and 61 chapters. It takes 2 pages just to list the advisors, 10 pages to list the authors of the chapters. It is also expensive: $250. Buy it. The articles are all excellent. They all reflect up-to-date reviews of the areas they cover. They are wonderful self-study material, wonderful references, and would make excellent material in multiple courses. Yes, it is obscenely expensive, but this one book is the equivalent of 10 normal books. Consider it as essential piece of professional equipment. Buy it. Use it. If you don’t know human factors, this is a great way to find the parts relevant to your work. And even if you are an expert, this book will be valuable because it is unlikely that you are expert at all the topics covered here, yet very likely you will need some of the ones you are not (yet) expert at. I follow my own advice. I consider myself an expert (I am a Fellow of the Human Factors Society), but I still learn each time I read from these pages. So, yes, grit your teeth and buy the book. The 5th edition has new – and very important – chapters written by the authorities in each topic. It has kept up with the times and become even more valuable as both a text and a reference.
Review of the fifth edition by Thomas B. Sheridan, Ford Professor Emeritus of Engineering and Applied Psychology, Massachusetts Institute of Technology The fifth edition of the Handbook of Human Factors and Ergonomics is the most authoritative and comprehensive reference work in the field.
From the Foreword to the second edition by John F. Smith, Jr., Chairman of the Board, Chief Executive Officer and President, General Motors Corporation The publication of this second Handbook of Human Factors and Ergonomics is very timely. It is a comprehensive guide that contains practical knowledge and technical background on virtually all aspects of physical, cognitive, and social ergonomics. As such, it can be a valuable source of information for any individual or organization committed to providing competitive, high-quality products and safe, productive work environments.
From the Foreword to the first edition by E. M. Estes, Retired President, General Motors Corporation Regardless of what phase of the economy a person is involved in, this handbook is a very useful tool. Every area of human factors from environmental conditions and motivation to the use of new communications systems, robotics, and business systems is well covered in the handbook by experts in every field.
xxi
PREFACE
The Handbook of Human Factors and Ergonomics (HFE) provides scientifically-based practical information applied to the design of systems, including hardware, software, facilities, and environments for effective human use, safety, and comfort, which results in high quality and productive work performance, and products and services which customers like. The Handbook’s first four editions have received strong endorsements from captains of industry and leading scientists worldwide. Some of the previous editions have also been published in Japanese and Russian and won the Institute of Industrial Engineers’ Joint Publishers Book of the Year Award. The HFE discipline is well recognized worldwide, with over 50 scientific societies working under the umbrella of the International Ergonomics Association. HFE professionals play a critical role in the design and operation of products, processes, and systems to benefit humankind. When HFE is effectively implemented, it can improve the quality, productivity, safety, and well-being of people worldwide. The 57 chapters were authored by 142 experts from four continents. In creating this Handbook, the authors gathered information from 10,193 references and presented 619 figures and 269 tables to provide theoretically based and practically oriented HFE knowledge for practitioners, educators, and researchers. The chapters have been completely revised and updated, and 17 new chapters have been included to account for the rapidly expanding theory, methods, and applications of the HFE discipline. These new chapters discuss the following subjects: • • • • • • • • • • • • • • • • •
Human Systems Integration and Design Emotional Design Three-Dimensional (3D) Anthropometry and Its Applications in Product Design Manual Materials Handling: Evaluation and Practical Considerations Modeling and Simulation of Human Systems Neuroergonomics Representation Design Mobile Systems Design and Evaluation Human-Centered Design of Artificial Intelligence Cybersecurity, Privacy, and Trust Human–Robot Interaction Human Factors in Social Media Design of Digital Technology for Children Data Analytics in Human Factors Human Factors and Ergonomics in Design of A3 : Automation, Autonomy, and Artificial Intelligence Human Side of Space Exploration and Habitation Human Factors and Ergonomics for Sustainability.
xxiii
xxiv
PREFACE
The main aim of this Handbook is to serve the needs of the human factors and ergonomics researchers, practitioners, educators, and others who need to apply HFE knowledge to the effective design and operation of products, services and systems utilized for the benefit of mankind. The many contributing authors came through magnificently. We thank them all most sincerely for agreeing so willingly to create this Handbook with us. Gavriel Salvendy and Waldemar Karwowski October 2020
PART 1
HUMAN FACTORS FUNCTION
CHAPTER 1 THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Waldemar Karwowski University of Central Florida Orlando, Florida
Wei Zhang Tsinghua University Beijing, China
1
INTRODUCTION
3
2
HUMAN–SYSTEM INTERACTIONS
5
3
HUMAN FACTORS AND SYSTEM COMPATIBILITY
4
10
10 HUMAN-CENTERED DESIGN OF SERVICE SYSTEMS
25
11 HUMAN–SYSTEMS INTEGRATION
25
12 BOARD ON HUMAN–SYSTEMS INTEGRATION OF THE NATIONAL RESEARCH COUNCIL
27
13 THE INTERNATIONAL ERGONOMICS ASSOCIATION
27 29
CHALLENGES OF HUMAN FACTORS DISCIPLINE
13
5
PARADIGMS OF ERGONOMICS
13
6
ERGONOMICS COMPETENCY AND LITERACY
17
14 THE FOUNDATIONL FOR PROFESSIONAL ERGONOMICS
7
ERGONOMICS DESIGN
18
15 FUTURE OPPORTUNITIES
7.1
Axiomatic Design: Design Axioms
18
15.1
Developing the Discipline and Profession
30
7.2
Theory of Axiomatic Design in Ergonomics
19
15.2
Human Factors and Global Trends
31
7.3
Axiomatic Design Approach in Ergonomics: Applications
15.3
Human Factors and Sustainability
31
20
15.4
Cyberergonomics: The Ergonomics of the Artificial
31
8
THEORETICAL ERGONOMICS: SYMVATOLOGY
21
9
CONGRUENCE BETWEEN MANAGEMENT AND ERGONOMICS
23
REFERENCES
30
33
The purpose of science is mastery over nature. F. Bacon (Novum Organum, 1620) 1 INTRODUCTION Over the last 70 years, human factors, a term that is used here synonymously with ergonomics and denoted as human factors ergonomics (HFE), has been evolving as a unique and independent discipline that originated with a focus on the nature of human–artifact interactions. Such interactions are viewed from the unified perspective of the science, engineering, design, technology, and management of human-compatible systems, including a variety of natural and artificial products, processes, and living environments (Karwowski, 2005). The various dimensions of such defined ergonomics discipline are shown in Figure 1. The International Ergonomics Association (IEA, 2003) defines ergonomics (or human factors) as the scientific discipline concerned with the understanding of the interactions among humans and other elements of a system and the profession that applies theory, principles, data, and methods to design in order to optimize human well-being and overall system performance. Human factors professionals contribute to
the design and evaluation of tasks, jobs, products, environments, and systems in order to make them compatible with the needs, abilities, and limitations of people. HFE discipline promotes a holistic, human-centered approach to systems design that considers the physical, cognitive, neural, social, emotional, organizational, developmental, ecological, environmental, and other factors relevant for the socio-economic development and well-being of the global society (Ayaz & Dehais, 2018; Bridger, 2006; Chapanis, 1995, 1999; Drury, 2008; Edholm & Murrell, 1973; Falzon, 2014; Grandjean, 1986; Hancock, 2017; Jaworek, Marek, & Karwowski, 2020; Karwowski, 2001; Kroemer, 2017; Moray, 2000; Parasuraman, 2003; Salvendy, 1997; Sanders & McCormick, 1993; Stanton et al., 2004; Vicente, 2004; Wilson, 2014; Wilson & Corlett, 1995). Historically, ergonomics (ergon + nomos), or “the study of work,” was originally and proposed and defined by the Polish scientist B. W. Jastrzebowski (1857a–d) as the scientific discipline with a very broad scope and wide subject of interests 3
4
HUMAN FACTORS FUNCTION
Philosophy (social needs)
Technology/ environment
Theory
Design
Figure 1
Practice and education
Ergonomics discipline
Management
General dimensions of ergonomics discipline. (Source: Karwowski, 2005. © 2005 Taylor & Francis.)
and applications, encompassing all aspects of human activity, including labor, entertainment, reasoning, and dedication (Karwowski, 1991, 2001). In his paper published in the journal Nature and Industry (1857a–d), Jastrzebowski divided work into two main categories: the useful work, which brings improvement for the common good, and the harmful work that brings deterioration (discreditable work). Useful work, which aims to improve things and people, is classified into physical, aesthetic, rational, and moral work. According to Jastrzebowski, such work requires utilization of the motor forces, sensory forces, forces of reason (thinking and reasoning), and the spiritual force. The four main benefits of useful work are exemplified through the property, ability, perfection, and felicity. The contemporary ergonomics discipline, independently introduced by Murrell in 1949 (Edholm & Murrell, 1973), was viewed at that time as an applied science, the technology, and sometimes both. British scientists had founded the Ergonomics Research Society in 1949. The development of ergonomics internationally can be linked to a project initiated by the European Productivity Agency (EPA), a branch of the Organization for European Economic Cooperation, which first established a Human Factors Section in 1955 (Kuorinka, 2000). Under the EPA project, in 1956, specialists from European countries visited the United States to observe human factors research. In 1957, the EPA organized a technical seminar on “Fitting the Job to the Worker” at the University of Leiden, The Netherlands, during which a set of proposals was presented to form an international association of work scientists. A steering committee consisting of H.S. Belding, G.C.E. Burger, S. Forssman, E. Grandjean, G. Lehman, B. Metz, K.U. Smith, and R.G. Stansfield, was charged to develop a specific proposal for such an association. The committee decided to adopt the name International Ergonomics Association (Koningsveld, 2019). At the meeting in Paris in 1958, it was decided to proceed with forming the new association. The steering committee designated itself the Committee for the International Association of Ergonomic Scientists and elected G.C.E. Burger as its first president, K.U. Smith as treasurer, and E. Grandjean as secretary. The Committee for the International Association of Ergonomic Scientists met in Zurich in 1959 during a conference organized by the EPA and decided to retain the name International Ergonomics Association. On
April 6, 1959, at the meeting in Oxford, E. Grandjean declared the founding of the IEA (Koningsveld, 2019). The committee met again in Oxford, later in 1959 and agreed upon the set of bylaws or statutes of the IEA. These were formally approved by the IEA General Assembly at the first International Congress of Ergonomics held in Stockholm in 1961. Historically, the most often cited domains of specialization within HFE have been the physical, cognitive, social, and organizational ergonomics and the area of human–computer interaction. Physical ergonomics is mainly concerned with human anatomical, anthropometric, physiological, and biomechanical characteristics as they relate to physical activity (Chaffin, Anderson, & Martin, 2006; Karwowski & Marras, 1999; Kroemer et al., 1994; Marras, 2008; National Research Council, 2001; Pheasant, 1986). Cognitive ergonomics focuses on mental processes such as perception, memory, information processing, reasoning, and motor response as they affect interactions among humans and other elements of a system (Diaper & Stanton, 2004; Hollnagel, 2003; Vicente, 1999). Organizational ergonomics (also known as macroergonomics) is concerned with the optimization of sociotechnical systems, including their organizational structures, policies, and processes (Hendrick & Kleiner, 2001, 2002a, 2002b; Holman et al., 2003; Nemeth, 2004; Reason, 1997). Examples of the relevant topics include communication, crew resource management, design of working times, teamwork, participatory work design, community ergonomics, computer-supported cooperative work, new work paradigms, virtual organizations, telework, and quality management. The above traditional domains as well as new domains are listed in Table 1. According to the above discussion, the paramount objective of HFE is to understand the interactions between people and everything that surrounds us and, based on such knowledge, to optimize the human well-being and overall system performance. Table 2 provides a summary of the specific HFE objectives as originally discussed by Chapanis (1995). According to National Academy of Engineering (2004), in the future, ongoing developments in engineering will expand toward tighter connections between technology and the human experience, including new products customized to the physical dimensions and capabilities of the user, and ergonomic design of engineered products.
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 1
5
Exemplary Domains of Disciplines of Medicine, Psychology, and Human Factors/Ergonomics
Medicine
Psychology
Human factors/ergonomics
Cardiology Dermatology Gastroenterology Neurology Radiology Endocrinology Pulmonology Gerontology Neuroscience Nephrology Oncology Ophthalmology Urology Psychiatry Internal medicine Community medicine Physical medicine Other
Applied psychology Child psychology Clinical psychology Cognitive psychology Community psychology Counseling psychology Developmental psychology Experimental psychology Educational psychology Environmental psychology Forensic psychology Health psychology Positive psychology Organizational psychology Social psychology Quantitative psychology Social psychology Neuropsychology
Physical ergonomics Cognitive ergonomics Macroergonomics Knowledge ergonomics Health care ergonomics Participatory ergonomics Human–computer interaction Neuroergonomics Affective ergonomics Ecological ergonomics Forensic ergonomics Consumer ergonomics Human–systems integration Ergonomics of aging Information ergonomics Cyberergonomics Human factors of artificial intelligence Virtual ergonomics Community ergonomics Nanoergonomics Service ergonomics Systems Ergonomics Ecological ergonomics Green ergonomics
Table 2 Original Objectives of Human Factors and Ergonomics Discipline Proposed by Chapanis (1995) Basic operational objectives Reduce errors Increase safety Improve system performance Objectives bearing on reliability, maintainability, and availability (RMA) and integrated logistic support (ILS) Increase reliability Improve maintainability Reduce personnel requirements Reduce training requirements Objectives affecting users and operators Improve the working environment Reduce fatigue and physical stress Increase ease of use Increase user acceptance Increase aesthetic appearance Other objectives Reduce losses of time and equipment Increase economy of production Source: Chapanis, 1995. © 1995 John Wiley & Sons.
2
HUMAN–SYSTEM INTERACTIONS
While in the past, the HFE discipline has followed developments in technology (reactive design approach), in the future, contemporary HFE should drive the progress of technology
(proactive design approach) (Bridger, 2006; Wilson, 2014). While technology is a product and a process involving both science and engineering, science aims to understand the “why” and “how” of nature through a process of scientific inquiry that generates knowledge about the natural world (Mitchem, 1994; National Research Council, 2001). Engineering is the “design under constraints” of cost, reliability, safety, environmental impact, ease of use, available human and material resources, manufacturability, government regulations, laws, and politics (Wulf, 1998). As a body of knowledge of design and creation of human-made products and a process for solving problems, engineering seeks to shape the natural world to meet human needs and wants. In general, HFE discovers and applies knowledge about human behavior, abilities, limitations, and other characteristics to the design of tools, machines, systems, tasks, jobs, and environments for productive, safe, comfortable, and effective human use (Bridger, 2006; Helander, 1997; Sanders & McCormick, 1993). In this context, HFE deals with a broad scope of problems relevant to the design and evaluation of systems, consumer products, and working environments, in which human– machine-organization-environment interactions affect human performance, well-being. and safety, as well as user experience and system usability (Bedny & Karwowski, 2007; Bisantz & Burns, 2009; Carayon, 2011; Dekker, 2007; Karwowski, 2006; Karwowski, Salvendy, & Ahram, 2010; Reason, 2008; Sears & Jacko, 2009; Weick & Sutcliffe, 2007; Wogalter, 2006). The broad scope of issues addressed by the HFE discipline is presented in Table 3. Figure 2 illustrates the evolution of the scope of HFE concerning the nature of human-system interactions and applications of human–system integration in a large variety of domains (Chebbykin et al., 2008; Cook & Durso, 2008; Dekker, 2007; Guerin et al., 2007; Kaber & Boy, 2010; Karwowski, 2007; Lehto & Buck, 2007; Marek, Karwowski,
6
HUMAN FACTORS FUNCTION
Table 3 Classification Scheme for Human Factors/Ergonomics 1. General Human Characteristics 2. 3. 4. 5. 6. 7.
Psychological aspects Physiological and anatomical aspects Group factors Individual differences Psychophysiological state variables Task-related factors Information Presentation and Communication
8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Visual communication Auditory and other communication modalities Choice of communication media Person–machine dialogue mode System feedback Error prevention and recovery Design of documents and procedures User control features Language design Database organization and data retrieval Programming, debugging, editing, and programming aids Software performance and evaluation Software design, maintenance, and reliability
21. 22. 23. 24. 25.
Input devices and controls Visual displays Auditory displays Other modality displays Display and control characteristics
Display and Control Design
Workplace and Equipment Design 26. General workplace design and buildings 27. Workstation design 28. Equipment design
Table 3
(continued)
39. 40. 41. 42. 43. 44. 45. 46.
Job attitudes and job satisfaction Job design Payment systems Selection and screening Training Supervision Use of support Technological and ergonomic change
47. 48. 49. 50.
General health and safety Etiology Injuries and illnesses Prevention
Health and Safety
Social and Economic Impact of the System 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61.
Trade unions Employment, job security, and job sharing Productivity Women and work Organizational design Education Law Privacy Family and home life Quality of working life Political comment and ethical considerations Methods and Techniques
62. Approaches and methods 63. Techniques 64. Measures Source: Based on Ergonomics Abstracts, 2004.
Human–technology relationships
Environment 29. 30. 31. 32. 33. 34. 35.
Illumination Noise Vibration Whole-body movement Climate Altitude, depth, and space Other environmental issues
Technology–system relationships
Human–system relationships
System Characteristics 36. General system features Work Design and Organization 37. Total system design and evaluation 38. Hours of work
Human–machine relationships Figure 2 Expanded view of the human–technology relationships. (Source: Modified from Meister, 1999.)
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 4
7
Taxonomy of Human Factors and Ergonomics Elements: The Human Factor
Human elements Physical/sensory Cognitive Motivational/emotional Human conceptualization Stimulus–response orientation (limited) Stimulus–conceptual–response orientation (major) Stimulus–conceptual–motivational–response orientation (major) Human technological relationships Controller relationship Partnership relationship Client relationship Effects of Technology on the Human Performance effects Goal accomplishment Goal non-accomplishment Error/time discrepancies Feeling effect Technology acceptance Technology indifference Technology rejection Demand effects Resource mobilization Stress/trauma
Effects of the Human on technology Improvement in technology effectiveness Absence of effect Reduction in technological effectiveness Human Operations in Technology Equipment operation Equipment maintenance System management Type/degree of human involvement Direct (operation) Indirect (recipient) Extensive Minimal None
Source: Meister, 1999. © 1999 Taylor & Francis.
& Rice 2010; Marras, 2008; Marras & Karwowski, 2006a, 2006b; Pew & Mavor, 2008; Rouse, 2007; Salas, Goodwin, & Burke, 2008; Salvendy & Karwowski, 2010; Schmorrow & Stanney, 2008;Vicente, 2004; Zacharias, McMillian, & Van Hemel, 2008). Initially, HFE focused on the local human–machine interactions, while today, the main focus is broadly defined in terms of human–technology–organization–environment interactions (Bridger, 2006; Wilson, 2014). Tables 4 and 5 present the taxonomy of the human-related and technology-related components, respectively, which are important to HFE discipline. According to Meister (1987), the traditional concept of the human–machine system is an organization of people and the machines they operate and maintain to perform assigned jobs that implement the purpose for which the system was developed. In this context, a system is a construct whose characteristics are manifested in Table 5 Taxonomy of Human Factors and Ergonomics Elements: Technology Technology Elements Components Tools Equipment Systems Degree of Automation Mechanization Computerization Artificial intelligence System Characteristics Dimensions Attributes Variables
Effects of Technology on the Human Changes in human role Changes in human behavior Organization–Technology Relationships Definition of organization Organizational variables
Source: Meister, 1999. © 1999 Taylor & Francis.
physical and behavioral phenomena Meister (1991). The system is critical to HFE theorizing because it describes the substance of the human-technology relationship. General system variables of interest to HFE discipline are shown in Table 6. The scope of HFE factors that need to be considered in the design, testing, and evaluation of any human–system interactions is shown in Table 7 in the form of the exemplary ergonomics checklist. It should be noted that such checklists also reflect practical application of the discipline. According to the Board of Certification in Professional Ergonomics (BCPE), a practitioner of ergonomics is a person who (1) has a mastery of a body of ergonomics knowledge; (2) has a command of the Table 6 1. 2. 3. 4. 5. 6.
7. 8. 9. 10.
11.
General System Variables
Requirement constraints imposed on the system Resources required by the system Nature of its internal components and processes Functions and missions performed by the system Nature, number, and specificity of goals Structural and organizational characteristics of the system (e.g. its size, number of subsystems and units, communication channels, hierarchical levels, and amount of feedback) Degree of automation Nature of the environment in which the system functions System attributes (e.g. complexity, sensitivity, flexibility, vulnerability, reliability, and determinacy) Number and type of interdependencies (human–machine interactions) within the system and type of interaction (degree of dependency) Nature of the system’s terminal output(s) or mission effects
Source: Meister, 1999. © 1999 Taylor & Francis.
8
HUMAN FACTORS FUNCTION
Table 7
Examples of Factors to Be Used in Ergonomics Checklists I. Anthropometric, Biomechanical, and Physiological Factors
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Are the differences in human body size accounted for by the design? Have the right anthropometric tables been used for specific populations? Are the body joints close to neutral positions? Is the manual work performed close to the body? Are there any forward-bending or twisted trunk postures involved? Are sudden movements and force exertion present? Is there a variation in worker postures and movements? Is the duration of any continuous muscular effort limited? Are the breaks of sufficient length and spread over sthe duration of the task? Is the energy consumption for each manual task limited? II. Factors Related to Posture (Sitting and Standing)
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Is sitting/standing alternated with standing/sitting and walking? Is the work height dependent on the task? Is the height of the work table adjustable? Are the height of the seat and backrest of the chair adjustable? Is the number of chair adjustment possibilities limited? Have good seating instructions been provided? Is a footrest used where the work height is fixed? Has the work above the shoulder or with hands behind the body been avoided? Are excessive reaches avoided? Is there enough room for the legs and feet? Is there a sloping work surface for reading tasks? Have the combined sit–stand workplaces been introduced? Are handles of tools bent to allow for working with the straight wrists? III. Factors Related to Manual Materials Handling (Lifting, Carrying, Pushing, and Pulling Loads)
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Have tasks involving manual displacement of loads been limited? Have optimum lifting conditions been achieved? Is anybody required to lift more than 23 kg? Have lifting tasks been assessed using the NIOSH (1991) method? Are handgrips fitted to the loads to be lifted? Is more than one person involved in lifting or carrying tasks? Are there mechanical aids for lifting or carrying available and used? Is the weight of the load carried limited according to the recognized guidelines? Is the load held as close to the body as possible? Are pulling and pushing forces limited? Are trolleys fitted with appropriate handles and handgrips? IV. Factors Related to Design of Tasks and Jobs
1. 2. 3. 4. 5. 6. 7. 8. 9.
Does the job consist of more than one task? Has a decision been made about allocating tasks between people and machines? Do workers performing the tasks contribute to problem solving? Are the difficult and easy tasks performed interchangeably? Can workers decide independently on how the tasks are carried out? Are there sufficient possibilities for communication between workers? Is there sufficient information provided to control the assigned tasks? Can the group take part in management decisions? Are the shift workers given enough opportunities to recover?
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 7
9
(continued) V. Factors Related to Information and Control Tasks
Information 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Has an appropriate method of displaying information been selected? Is the information presentation as simple as possible? Has the potential confusion between characters been avoided? Has the correct character/letter size been chosen? Have texts with capital letters only been avoided? Have familiar typefaces been chosen? Is the text/background contrast good? Are the diagrams easy to understand? Have the pictograms been properly used? Are sound signals reserved for warning purposes?
Control 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Is the sense of touch used for feedback from controls? Are differences between controls distinguishable by touch? Is the location of controls consistent and is sufficient spacing provided? Have the requirements for the control-display compatibility been considered? Is the type of cursor control suitable for the intended task? Is the direction of control movements consistent with human expectations? Are the control objectives clear from the position of the controls? Are controls within easy reach of female workers? Are labels or symbols identifying controls properly used? Is the use of color in controls design limited?
Human–Computer Interaction 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Is the human–computer dialogue suitable for the intended task? Is the dialogue self-descriptive and easy to control by the user? Does the dialogue conform to the expectations on the part of the user? Is the dialogue error tolerant and suitable for user learning? Has command language been restricted to experienced users? Have detailed menus been used for users with little knowledge and experience? Is the type of help menu fitted to the level of the user’s ability? Has the QWERTY layout been selected for the keyboard? Has a logical layout been chosen for the numerical keypad? Is the number of function keys limited? Have the limitations of speech in human–computer dialogue been considered? Are touch screens used to facilitate operation by inexperienced users? VI. Environmental Factors
Noise and Vibration 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Is the noise level at work below 80 dBA? Is there an adequate separation between workers and source of noise? Is the ceiling used for noise absorption? Are the acoustic screens used? Are hearing conservation measures fitted to the user? Is personal monitoring to noise/vibration used? Are the sources of uncomfortable and damaging body vibration recognized? Is the vibration problem being solved at the source? Are machines regularly maintained? Is the transmission of vibration prevented? (continued overleaf)
10
HUMAN FACTORS FUNCTION
Table 7
(continued)
Illumination 1. 2. 3. 4. 5. 6. 7.
Is the light intensity for normal activities in the range of 200–800 lux? Are large brightness differences in the visual field avoided? Are the brightness differences between task area, close surroundings, and wider surroundings limited? Is the information easily legible? Is ambient lighting combined with localized lighting? Are light sources properly screened? Can the light reflections, shadows, or flicker from the fluorescent tubes be prevented?
Climate 1. 2. 3. 4. 5. 6. 7. 8.
Are workers able to control the climate themselves? Is the air temperature suited to the physical demands of the task? Is the air prevented from becoming either too dry to too humid? Are draughts prevented? Are the materials/surfaces that have to be touched neither too cold nor too hot? Are the physical demands of the task adjusted to the external climate? Are undesirable hot and cold radiation prevented? Is the time spent in hot or cold environments limited?
Source: Based on DuI & Weerdmeester, 1993.
methodologies used by ergonomists in applying that knowledge to the design of a product, system, job, or environment; and (3) has applied their knowledge to the analysis, design testing, and evaluation of products, systems, and environments. The areas of current practice in the field can best be described by examining the focus of 26 (as of 2020) Technical Groups of the Human Factors and Ergonomics Society, as illustrated in Table 8.
3 HUMAN FACTORS AND SYSTEM COMPATIBILITY The HFE discipline advocates systematic use of the knowledge concerning the relevant human characteristics in order to achieve compatibility in the design of interactive systems of people, machines, environments, and devices of all kinds to ensure specific goals (Human Factors and Ergonomics Society, 2003). Typically, such goals include improved (system) effectiveness, productivity, safety, ease of performance, and the contribution to overall human well-being and quality of life. Although the term compatibility is a keyword in the above definition, it has been mainly used in a narrow sense only, often in the context of the design of displays and controls, including the studies of spatial (location) compatibility or the intention–response–stimulus compatibility related to the movement of controls (Wickens & Carswell, 1997). Karwowski (1997) introduced the term human-compatible systems to focus on the need for comprehensive treatment of compatibility in the human factors discipline. Karwowski and his co-workers (Genaidy et al., 2005; Karwowski, 1991, 1995; Karwowski, Marek, & Noworol, 1988) advocated using the compatibility concept in a greater context of workplace and work systems design. The American Heritage Dictionary of English Language (Morris, 1978) defines “compatible” as (1) capable of living or performing in harmonious, agreeable, or congenial combination with another or others and (2) capable of orderly, efficient integration and operation with other elements in a system. From the beginning of contemporary ergonomics, the measurements of
compatibility between the system and the human and evaluation of the results of ergonomics interventions were based on the measures that best suited specific purposes (Karwowski, 2001). Such measures included the specific psychophysiological responses of the human body (e.g. heart rate, EMG, perceived human exertion, satisfaction, comfort or discomfort) as well as a number of indirect measures, such as the incidence of injury, economic losses or gains, system acceptance, or operational effectiveness, quality, or productivity. The lack of a universal matrix to quantify and measure human–system compatibility is an important obstacle in demonstrating the value of ergonomics science and profession (Karwowski, 1997). However, even though in the past ergonomics was perceived by some (e.g. see Howell, 1986) as a highly unpredictable area of human scientific endeavor, today HFE has positioned itself as a unique, design-oriented discipline, independent of engineering and medicine (Bridger, 2006; Helander, 1997; Karwowski, 1991, 2003; Moray, 1994; Sanders & McCormick, 1993; Wilson, 2014). Figure 3 illustrates the human–system compatibility approach to ergonomics in the context of quality of working life and system (an enterprise or business entity) performance (Genaidy et al., 2005). This approach reflects the nature of complex compatibility relationships between the human operator (human capacities and limitations), technology (in terms of products, machines, devices, processes, and computer-based systems), and the broadly defined environment (business processes, organizational structure, nature of work systems, and effects of work-related multiple stressors). The operator’s performance is an outcome of the compatibility matching between individual human characteristics (capacities and limitations) and the requirements and affordances of both the technology and environment. The quality of working life and the system (enterprise) performance is affected by matching the positive and negative outcomes of the complex compatibility relationships between the human operator, technology, and environment (Genaidy et al., 2007). Positive outcomes include such measures as work productivity, performance times, product quality, and subjective psychological (desirable) behavioral outcomes such as job satisfaction, employee morale, human well-being,
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 8
11
Subject Interests of Technical Groups of Human Factors and Ergonomics Society (as of 2020)
Technical group
Description/areas of concerns
Aerospace systems
Application of human factors to the development, design, certification, operation, and maintenance of human–machine systems in aviation and space environments. The group addresses issues for civilian and military systems in the realms of performance and safety. Human factors applications appropriate to meeting the emerging needs of older people and special populations in a wide variety of life settings. Fostering the development and application of real-time physiological and neurophysiological sensing technologies that can ascertain a human’s cognitive state while interacting with computing-based systems; data classification and integration architectures that enable closed-loop system applications; mitigation (adaptive) strategies that enable efficient and effective system adaptation based on a user’s dynamically changing cognitive state; individually tailored training systems. Focus on research, design, and application concerning human factors and ergonomics (HF/E) issues related to children’s emerging development from birth to 18. The topic of children inevitably includes caregivers and educators, which too is a main focus, particularly with respect to their perceptions and physical and cognitive tasks. The scope not only includes products intended for children, but also other products in the environment that present hazards to children. Research on human cognition and decision making and the application of this knowledge to the design of systems and training programs. Emphasis is on considerations of descriptive models, processes, and characteristics of human decision making, alone or in conjunction with other individuals or intelligent systems; factors that affect decision making and cognition in naturalistic task settings; technologies for assisting, modifying, or supplementing human decision making; and training strategies for assisting or influencing decision making. All aspects of human-to-human communication, with an emphasis on communication mediated by telecommunications technology, including multimedia and collaborative communications, information services, and interactive broadband applications. Design and evaluation of both enabling technologies and infrastructure technologies in education, medicine, business productivity, and personal quality of life. Human factors in the design of computer systems. This includes the user-centered design of hardware, software, applications, documentation, work activities, and the work environment. Practitioners and researchers in the CSTG community take a holistic, systems approach to the design and evaluation of all aspects of user–computer interactions. Some goals are to ensure that computer systems are useful, usable, safe, and, where possible, fun and to enhance the quality of work life and recreational/educational computer use by ensuring that computer interface, function, and job design are interesting and provide opportunities for personal and professional growth. Development of consumer products that are useful, usable, safe, and desirable. Application of the principles and methods of human factors, consumer research, and industrial design to ensure market success. Studying humans in the context of cyberspace, cybersecurity, and information security (InfoSec). The CYTG promotes the study and observation of how human interaction effects any facet of cybersecurity, and at any level in the system, from end-users of email to military cyber defense teams. Cybersecurity human factors includes the scientific application of all human factors and cognitive as well as emotive concepts, including awareness, workload, stress, teaming, signal detection, decision-making, and attention research. Education and training of human factors and ergonomics specialists. This includes undergraduate, graduate, and continuing education needs, issues, techniques, curricula, and resources. In addition, a forum is provided to discuss and resolve issues involving professional registration and accreditation Relationship between human behavior and the designed environment. Common areas of research and interest include ergonomic and macroergonomic aspects of design within home, office, and industrial settings. An overall objective of this group is to foster and encourage the integration of ergonomics principles into the design of environments Application of human factors knowledge and technique to “standards of care” and accountability established within legislative, regulatory, and judicial systems. The emphasis on providing a scientific basis to issues being interpreted by legal theory. Maximizing the contributions of human factors and ergonomics to medical systems effectiveness and the quality of life of people who are functionally impaired Focus on studying the teamwork among humans, artificial intelligence (AI), and robots, including the design, development, implementation, and evaluation of the human–AI–robot team and team components.
Aging Augmented cognition
Children’s Issues
Cognitive engineering and decision making
Communications
Computer systems
Consumer products
Cybersecurity
Education
Environmental design
Forensics professional
Healthcare The Human–AI–Robot Teaming (HART)
(continued overleaf)
12 Table 8
HUMAN FACTORS FUNCTION (continued)
Technical group
Description/areas of concerns
Individual differences
A wide range of personality and individual difference variables that are believed to mediate performance. Application of ergonomics data and principles for improving safety, productivity, and quality of work in industry. Concentration on service and manufacturing processes, operations, and environments. Human factor aspects of user interface design of Web content, Web-based applications, Web browsers, Webtops, Web-based user assistance, and Internet devices; behavioral and sociological phenomena associated with distributed network communication; human reliability in administration and maintenance of data networks; and accessibility of Web-based products. Organizational design and management issues in human factors and ergonomics as well as work system design and human–organization interface technology. The Technical Group is committed to improving work system performance (e.g. productivity, quality, health and safety, quality of work life) by promoting work system analysis and design practice and the supporting empirical science concerned with the technological subsystem, personnel subsystem, external environment, organizational design, and their interactions. Perception and its relation to human performance. Areas of concern include the nature, content, and quantification of sensory information and the context in which it is displayed; the physics and psychophysics of information display; perceptual and cognitive representation and interpretation of displayed information; assessment of workload using tasks having a significant perceptual component; and actions and behaviors that are consequences of information presented to the various sensory systems. Developing consumer products that are useful, usable, safe, and desirable. By applying the principles and methods of human factors, consumer research, and industrial design, the group works to ensure the success of products sold in the marketplace Development and application of human factors technology as it relates to safety in all settings and attendant populations. These include, but are not limited to, aviation, transportation, industry, military, office, public building, recreation, and home environment Fostering research and exchanging information on the integration of human factors and ergonomics into the development of systems. Members are concerned with defining human factors/ergonomics activities and integrating them into the system development process in order to enable systems that meet user requirements. Specific topics of interest include the system development process itself; developing tools and methods for predicting and assessing human capabilities and limitations, notably modeling and simulation; creating principles that identify the role of humans in the use, operation, maintenance, and control of systems; applying human factors and ergonomics data and principles to the design of human–system interfaces; and the full integration of human requirements into system and product design through the application of HSI methods to ensure technical and programmatic integration of human considerations into systems acquisition and product development processes; the impact of increasing computerization and stress and workload effects on performance. Human factors related to the international surface transportation field. Surface transportation encompasses numerous mechanisms for conveying humans and resources: passenger, commercial, and military vehicles, on- and off-road; mass transit; maritime transportation; rail transit, including vessel traffic services (VTSs); pedestrian and bicycle traffic; and highway and infrastructure systems, including intelligent transportation systems (ITSs). All aspects of human factors and ergonomics as applied to the evaluation of systems. Evaluation is a core skill for all human factors professionals and includes measuring performance, workload, situational awareness, safety, and acceptance of personnel engaged in operating and maintaining systems. Evaluation is conducted during system development when prototype equipment and systems are being introduced to operational usage and at intervals thereafter during the operational life of these systems. Fosters information and interchange among people interested in the fields of training and training research. Human factors issues associated with human–virtual environment interaction. These issues include maximizing human performance efficiency in virtual environments, ensuring health and safety, and circumventing potential social problems through proactive assessment. For VE/VR systems to be effective and well received by their users, researchers need to focus significant efforts on addressing human factors issues.
Industrial ergonomics
Internet
Macroergonomics
Perception and performance
Product design
Safety
System development
Surface transportation
Test and evaluation
Training Virtual environment
Source: Based on Technical Groups | HFES - Human Factors and Ergonomics Society.
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS
13
Early developments
Philosophy
Practice
Phase I
Philosophy
Design
Phase II
Philosophy
Practice
Theory
Phase III
Philosophy
Theory
Design
Philosophy
Theory
Practice
Theory
Practice
Design
Design
Practice
Phase IV
Theory Contemporary status Figure 3
Evolution in development of human factors and ergonomics discipline. (Source: Karwowski, 2005. © 2005 Taylor & Francis.)
and commitment. The negative outcomes include both human and system-related errors, loss of productivity, poor quality, accidents, injuries, physiological stresses, and subjective psychological (undesirable) behavioral outcomes such as job dissatisfaction, job/occupational stress, and discomfort.
4 CHALLENGES OF HUMAN FACTORS DISCIPLINE In the past, the HFE discipline has faced many challenges, including distinguishing features that impact its further development and progress today (Karwowski, 2005): • HFE experiences the continuing evolution of its philosophy, including diverse and ever-expanding human-centered design criteria (from safety to comfort, productivity, usability, or affective needs like job satisfaction or life happiness). • HFE covers extremely diverse subject matters, similar to medicine, engineering, and psychology (see Table 1). • HFE deals with very complex and often non-linear phenomena that are not easily understood and cannot be simplified by making non-defendable assumptions about their nature. • Historically, HFE has been developing from the “philosophy of fit” toward practice. The focus of contemporary HFE is on developing a sound theoretical basis for the design and practical applications (see Figure 4). • HFE attempts to by-step the need for the fundamental understanding of human–system interactions without separation from the consideration of knowledge utility for practical applications in the quest for immediate and useful solutions (also see Figure 5). • HFE has enjoyed limited recognition by decision makers, the general public, and politicians about its value that it can bring to global society at large, especially in the context of facilitating socio-economic development. • HFE still has a relatively limited professional educational base.
• The potential impact of HFE is affected by the human factors and the ergonomics knowledge illiteracy of the students and professionals in other disciplines, the mass media, and the public at large. The contemporary HFE discipline is interested in the fundamental understanding of the interactions between people and their environments at the local and global levels (Thatcher, Nayak, & Waterson, 2020). Central to HFE interests is also an understanding of how human–system interactions should be designed given rapid technological progress machine intelligence, shifting of work paradigms, and environmental concerns. At the same time, HFE also falls under the category of applied research. The taxonomy of research efforts with respect to the quest for a fundamental understanding and the consideration of use, initially proposed by Stokes (1997), allows for differentiation of the main categories of research dimensions as follows: (1) pure basic research; (2) use-inspired basic research; and (3) pure applied research. Figure 5 illustrates the interpretation of these categories for the HFE theory, design, and applications. Table 9 presents relevant specialties and subspecialties in HFE research as outlined by Meister (1999), who classified them into three main categories: (1) system/technology-oriented specialties; (2) process-oriented specialties; and (3) behaviorally oriented specialties. In addition, Table 10 presents a list of contemporary HFE research methods that can be used to advance knowledge discovery and utilization through its practical applications. 5 PARADIGMS OF ERGONOMICS The paradigms for any scientific discipline include theory, abstraction, and design (Pearson & Young, 2002). Theory is the foundation of the mathematical sciences. Abstraction (modeling) is the foundation of the natural sciences, where progress is achieved by formulating hypotheses and systematically following the modeling process to verify and validate them. Design is the basis for engineering, where progress is achieved primarily by posing problems and systematically following the design process to construct systems that solve them. In view of the above, Karwowski (2005) discussed the paradigms for HFE discipline: (1) ergonomics theory; (2) ergonomics abstraction; and
14
HUMAN FACTORS FUNCTION
Environmental system Business processes Organizational structure Work systems Work-related stressors
Human operator
Technology
Perceptual processes Motor processes Cognitive processes Affective processes Other
Products Devices Machines Processes Computerbased systems
Human capabilities
Environmental requirements
Other systems
Human limitations
Technological requirements
Human performance
Environmental affordances
Technological affordances Positive outcomes
Negative outcomes
Quality of working life System (enterprise) performance Note: Figure 4
—Matching of compatibility relationships
Human–system compatibility approach to ergonomics. (Source: Karwowski, 2005. © 2005 Taylor & Francis.)
Considerations of use?
Yes Quest for fundamental understanding?
No
Yes
Pure basic research Symvatology Theoretical ergonomics
Use-inspired basic research Ergonomics design
No Figure 5 Francis.)
Pure applied research Applied ergonomics
Considerations of fundamental understanding and use in ergonomics research. (Source: Karwowski, 2005. © 2005 Taylor &
(3) ergonomics design. Ergonomics theory is concerned with the ability to identify, describe, and evaluate human–system interactions. Ergonomics abstraction is concerned with the ability to use those interactions to make predictions that can be compared with the real world. Ergonomics design is concerned with the ability to implement knowledge about those interactions
and use them to develop systems that satisfy customer needs and relevant human compatibility requirements. Furthermore, the pillars for any scientific discipline include a definition, a teaching paradigm, and an educational base (National Research Council, 2001). A definition of the ergonomics discipline and profession adopted by the IEA (2003) emphasizes fundamental
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 9
15
Specialties and Subspecialties in Human Factors and Ergonomics Research System/Technology-Oriented Specialties
1. Aerospace: civilian and military aviation and outer space activities. 2. Automotive: automobiles, buses, railroads, transportation functions (e.g. highway design, traffic signs, ships). 3. Communication: telephone, telegraph, radio, direct personnel communication in a technological context. 4. Computers: anything associated with the hardware and software of computers. 5. Consumer products: other than computers and automobiles, any commercial product sold to the general public (e.g. pens, watches, TV). 6. Displays: equipment used to present information to operators (e.g. HMO, HUD, meters, scales). 7. Environmental factors/design: the environment in which human–machine system functions are performed (e.g. offices, noise, lighting). 8. Special environment: this turns out to be underwater. Process-Oriented. Specialties The emphasis is on how human functions are performed and methods of improving or analyzing that performance: 1. 2. 3. 4. 5. 6.
Biomechanics: human physical strength as it is manifested in such activities as lifting, pulling, and so on. Industrial ergonomics (IE): papers related primarily to manufacturing; processes and resultant problems (e.g. carpal tunnel syndrome). Methodology/measurement: papers that emphasize ways of answering HFE questions or solving HFE problems. Safety: closely related to IE but with a major emphasis on analysis and prevention of accidents. System design/development: papers related to the processes of analyzing, creating, and developing systems. Training: papers describing how personnel are taught to perform functions/tasks in the human–machine system. Behaviorally Oriented Specialties
1. 2.
Aging: the effect of this process on technological performance. Human functions: emphasizes perceptual-motor and cognitive functions. The latter differs from training in the sense that training also involves cognition but is the process of implementing cognitive capabilities. (The HFE specialty called cognitive ergonomics/decision making has been categorized.) 3. Visual performance: how people see. They differ from displays in that the latter relate to equipment for seeing, whereas the former deals with the human capability and function of seeing. Source: Meister, 1999. © 1999 Taylor & Francis. Table 10
Contemporary Human Factors and Ergonomics Research Methods
Physical Methods PLIBEL: method assigned for identification of ergonomic hazards musculoskeletal discomfort surveys used at NIOSH Dutch musculoskeletal questionnaire (DMQ) Quick exposure checklist (QEC) for assessment of workplace risks for work-related musculoskeletal disorders (WMSDs) Rapid upper limb assessment (RULA) Rapid entire body assessment Strain index Posture checklist using personal digital assistant (PDA) technology Scaling experiences during work: perceived exertion and difficulty Muscle fatigue assessment: functional job analysis technique Psychophysical tables: lifting, lowering, pushing, pulling, and carrying Lumbar motion monitor Occupational repetitive-action (OCRA) methods: OCRA index and OCRA checklist Assessment of exposure to manual patient handling in hospital wards: MAPO index (movement and assistance of hospital patients) Psychophysiological Methods Electrodermal measurement Electromyography (EMG) Estimating mental effort using heart rate and heart rate variability Ambulatory EEG methods and sleepiness (continued overleaf)
16 Table 10
HUMAN FACTORS FUNCTION (continued)
Assessing brain function and mental chronometry with event-related potentials (ERPs) EMG and functional magnetic resonance imaging (fMRI) Ambulatory assessment of blood pressure to evaluate workload Monitoring alertness by eyelid closure Measurement of respiration in applied human factors and ergonomics research Behavioral and Cognitive Methods Observation Heuristics Applying interviews to usability assessment Verbal protocol analysis Repertory grid for product evaluation Focus groups Hierarchical task analysis (HTA) Allocation of functions Critical decision method Applied cognitive work analysis (ACWA) Systematic human error reduction and prediction approach (SHERPA) Predictive human error analysis (PHEA) Hierarchical task analysis Mental workload Multiple resource time sharing Critical path analysis for multimodal activity Situation awareness measurement and situation awareness Keystroke level model (KLM) GOMS Link analysis Global assessment technique Team Methods Team training Distributed simulation training for teams Synthetic task environments for teams: CERTTs UAV-STE Event-based approach to training (EBAT) Team building Measuring team knowledge Team communications analysis Questionnaires for distributed assessment of team mutual awareness Team decision requirement exercise: making team decision requirements explicit Targeted acceptable responses to generated events or tasks (TARGETs) Behavioral observation scales (BOS) Team situation assessment training for adaptive coordination Team task analysis Team workload Social network analysis Environmental Methods Thermal conditions measurement Cold stress indices Heat stress indices Thermal comfort indices Indoor air quality: chemical exposures Indoor air quality: biological/particulate-phase contaminant Exposure assessment methods
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS Table 10
17
(continued)
Olfactometry: human nose as detection instrument Context and foundation of lighting practice Photometric characterization of luminous environment Evaluating office lighting Rapid sound quality assessment of background noise Noise reaction indices and assessment Noise and human behavior Occupational vibration: concise perspective Habitability measurement in space vehicles and Earth analogs Macroergonomic Methods Macroergonomic organizational questionnaire survey (MOQS) Interview method Focus groups Laboratory experiment Field study and field experiment Participatory ergonomics (PE) Cognitive walk-through method (CWM) Kansei Engineering HITOP analysis TM TOP-Modeler C CIMOP System C Anthropotechnology Systems analysis tool (SAT) Macroergonomic analysis of structure (MAS) Macroergonomic analysis and design (MEAD) Source: Based on Stanton et al. 2004.
questions and significant accomplishments, recognizing that the HFE field is constantly changing. A teaching paradigm for ergonomics should conform to established scientific standards, emphasize the development of competence in the field, and integrate theory, experimentation, design, and practice. Finally, an introductory course sequence in ergonomics should be based on the curriculum model and the disciplinary description.
related to one or more specific area of ergonomics application. Ergonomics literacy is a prerequisite to ergonomics competency. The following can be proposed as dimensions for ergonomics literacy: 1.
2. 6
ERGONOMICS COMPETENCY AND LITERACY
As pointed out by the National Academy of Engineering (Pearson & Young, 2002), many consumer products and services promise to make people’s lives easier, more enjoyable, more efficient, or healthier but very often do not deliver on this promise. The design of interactions with technological artifacts and work systems requires involvement of ergonomically competent people—people with ergonomics proficiency in a certain area, although not generally in other areas of application, similarly to medicine or engineering. One of the critical issues in this context is the ability of users to understand the utility and limitations of technological artifacts. Ergonomics literacy prepares individuals to perform their roles in the workplace and outside the working environment. Ergonomically literate people can learn enough about how technological systems operate to protect themselves by making informed choices and making use of beneficial affordances of the artifacts and environment. People trained in ergonomics typically possess a high level of knowledge and skill
3.
Ergonomics knowledge and skills. An individual has the basic knowledge of the philosophy of human-centered design and principles for accommodating human limitations. Ways of thinking and acting. An individual seeks information about benefits and risks of artifacts and systems (consumer products, services, etc.) and participates in decisions about purchasing and use and/or development of artifacts/systems Practical ergonomics capabilities. An individual can identify and solve simple task-related design problems at work or home and can apply basic concepts of ergonomics to make informed judgments about the usability of artifacts and the related risks and benefits of their use.
Table 11 presents a list of 10 standards for ergonomics literacy which were proposed by Karwowski (2003) in parallel to a model of technological literacy reported by the NAE (Pearson & Young, 2002). Eight of these standards are related to developing an understanding of the nature, scope, attributes, and role of the HFE discipline in modern society, while two of them refer to the need to develop the abilities to apply the ergonomics design process and evaluate the impact of artifacts on human safety and well-being.
18 Table 11 Standards for Ergonomics Literacy: Ergonomics and Technology An understanding of: Standard 1: characteristics and scope of ergonomics Standard 2: core concepts of ergonomics Standard 3: connections between ergonomics and other fields of study and relationships among technology, environment, industry, and society Standard 4: cultural, social, economic, and political effects of ergonomics Standard 5: role of society in the development and use of technology Standard 6: effects of technology on the environment Standard 7: attributes of ergonomics design Standard 8: role of ergonomics research, development, invention, and experimentation Abilities to: Standard 9: apply the ergonomics design process Standard 10: assess the impact of products and systems on human health, well-being, system performance, and safety Source: Karwowski, 2003. © 2007 Human Factors and Ergonomics Society.
7 ERGONOMICS DESIGN Ergonomics is a design-oriented discipline (Helander, 1994). However, as discussed, ergonomists do not design systems; rather HFE professionals design the interactions between the artifact systems and humans in the context of the specific environment (Karwowski, 2005). One of the fundamental problems involved in such a design is that typically there are multiple functional system–human compatibility requirements that must be satisfied at the same time. In order to address this issue, structured design methods for complex human–artifact systems are needed. In such a perspective, ergonomics design can be defined in general as mapping from the human capabilities and limitations to system (technology–environment) requirements and affordances (Figure 6), or, more specifically, from system–human compatibility needs to relevant human–system interactions. Suh (1990, 2001) proposed a framework for axiomatic design which utilizes four different domains that reflect mapping between the identified needs (“what one wants to achieve”) and the ways to achieve them (“how to satisfy the stated needs”). These domains include (1) customer requirements (customer needs or desired attributes); (2) the functional domain (functional requirements and constraints); (3) the physical domain (physical design parameters); and (4) the processes domain (processes and resources). Karwowski (2003) conceptualized the above domains for ergonomics design purposes as illustrated in Figure 7 using the concept of compatibility requirements and compatibility mappings between the domains of (1) HFE requirements (goals in terms of human needs and system performance); (2) functional requirements and constraints expressed in terms of human capabilities and limitations; (3) the physical domain in terms of design of compatibility, expressed through the human–system interactions and specific work system design solutions; and (4) the processes domain, defined as management of compatibility (see Figure 8). 7.1 Axiomatic Design: Design Axioms The axiomatic design process is described by the mapping process from functional requirements (FRs) to design parameters
HUMAN FACTORS FUNCTION
Functional requirements
Human capabilities and limitations
Design parameters
f
Technology– environment requirements and affordances
Compatibility mapping
System–human compatibility needs
g
Technology– environment compatibility requirements and affordances
Figure 6 Ergonomics design process: compatibility mapping. (Source: Karwowski, 2005. © 2005 Taylor & Francis.
(DPs). The relationship between the two vectors FR and DP is as follows: {FR} = [A]{DP} where [A] is the design matrix that characterizes the product design. The design matrix [A] for three functional domains and three physical domains (DPs) is shown below: ] [ A11 A12 A13 [A] = A21 A22 A23 A31 A32 A33 The following two design axioms, proposed by Suh (1991), are the basis for the formal methodology of design: (1) the independence axiom; and (2) the information axiom. 7.1.1 Axiom 1: The Independence Axiom This axiom stipulates a need for independence of the FRs, which are defined as the minimum set of independent requirements that characterize the design goals (defined by DPs). 7.1.2 Axiom 2: The Information Axiom This axiom stipulates minimizing the information content of the design. Among those designs that satisfy the independence axiom, the design that has the smallest information content is the best design. According to the second design axiom, the information content of the design should be minimized. The information content Ii for a given functional requirement (FRi ) is defined in terms of the probability Pi of satisfying FRi : Ii = log2 (1∕Pi ) = −log2 Pi bits The information content will be additive when there are many functional requirements that must be satisfied simultaneously. In the general case of m number of FRs, the information content for the entire system Isys is Isys = −log2 C{m} where C{m} is the joint probability that all m FRs are satisfied.
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS
CA
FR
Compatibility mapping
Customer domain
DP
Human capabilities and limitations Physical Cognitive Affective Organizational
Usability Productivity Performance Human well-being Figure 7
PV
Compatibility mapping
Functional domain
Safety and health
Figure 8
19
Compatibility mapping
Physical domain
Process domain
Design of compatibility
Management of compatibility
Workplace, products, work system design solutions
Process design and management
Four domains of design in ergonomics. (Source: Karwowski, 2003.)
Technology of design
Science of design
Practical workplace improvements
Development of principles of work design
Science of design
Technology of design
Axiomatic design
Optimal design of products and systems
Ergonomics science
Methodologies for ergonomics design
Axiomatic approach to ergonomics design. (Source: Karwowski, 2003.)
The above axioms can be adapted for ergonomics design purposes as follows. 7.1.3 Axiom 1: The Independence Axiom This axiom stipulates a need for independence of the functional compatibility requirements (FCRs), which are defined as the minimum set of independent compatibility requirements that characterize the design goals (defined by ergonomics design parameters, EDPs). 7.1.4 Axiom 2: The Human Incompatibility Axiom This axiom stipulates a need to minimize the incompatibility content of the design. Among those designs that satisfy
the independence axiom, the design that has the smallest incompatibility content is the best design. As discussed by Karwowski (2001, 2003), in ergonomics design, the above axiom can be interpreted as follows. The human incompatibility content of the design Ii for a given functional requirement (FRi ) is defined in terms of the compatibility Ci index satisfying FRi : Ii = log2 (1∕Ci ) = −log2 Ci ints where I denotes the incompatibility content of a design. 7.2 Theory of Axiomatic Design in Ergonomics The need to remove the system–human incompatibility (or ergonomics entropy) plays a central role in ergonomics design
20
HUMAN FACTORS FUNCTION
(Karwowski, 1991, 2001, 2003). In this view, the second axiomatic design axiom can be adopted for the purpose of ergonomics theory as follows. The incompatibilty content of the design, Ii for a given functional compatibility requirement (FCRi ), is defined in terms of the compatibility Ci index that satisfies this FCRi : Ii = log2 (1∕Ci ) = −log2 Ci [ints] where I denotes the incompatibility content of a design, while the compatibility index Ci [0 < C < 1] is defined depending on the specific design goals, that is, the applicable or relevant ergonomics design criterion used for system design or evaluation. In order to minimize system–human incompatibility, one can (1) minimize exposure to the negative (undesirable) influence of a given design parameter on the system–human compatibility or (2) maximize the positive influence of the desirable design parameter (adaptability) on system–human compatibility. The first design scenario, that is, a need to minimize exposure to the negative (undesirable) influence of a given design parameter (Ai ), typically occurs when Ai exceeds some maximum exposure value of Ri , for example, when the compressive force on the human spine (lumbosacral joint) due to manual lifting of loads exceeds the accepted (maximum) reference value. It should be noted that if Ai < Ri , then C can be set to 1, and the related incompatibility due to the considered design variable will be zero. The second design scenario, that is, the need to maximize the positive influence (adaptability) of the desirable feature (design parameter Ai ) on system human compatibility), typically occurs when Ai is less than or below some desired or required value of Ri (i.e. minimum reference value). For example, when the range of chair height adjustability is less than the recommended (reference) range of adjustability to accommodate 90% of the mixed (male/female) population. It should be noted that if Ai > Ri , then C can be set to 1, and the related incompatibility due to the considered design variable will be zero. In both of the above described cases, the human–system incompatibility content can be assessed as discussed below. 1. Ergonomics Design Criterion: Minimize exposure when Ai > Ri . The compatibility index Ci is defined by the ratio Ri /Ai where Ri = maximum exposure (standard) for design parameter i and Ai = actual value of a given design parameter i: Ci = Ri ∕Ai and hence Ii = −log2 Ci = −log2 (Ri ∕Ai ) = log2 (Ai ∕Ri ) ints Note that if Ai < Ri , then C can be set to 1, and incompatibility content Ii is zero. 2. Ergonomics Design Criterion: Maximize adaptability when Ai < Ri . The compatibility index Ci is defined by the ratio Ai /Ri , where Ai = actual value of a given design parameter i and Ri = desired reference or required (ideal) design parameter standard i: Ci = Ai ∕Ri Hence Ii = −log2 Ci = −log2 (Ai ∕Ri ) = log2 (Ri ∕Ai ) ints Note that if Ai > Ri , then C can be set to 1 and incompatibility content Ii is zero.
As discussed by Karwowski (2005), the proposed units of measurement for system–human incompatibility (ints) are parallel and numerically identical to the measure of information (bits). The information content of the design in expressed in terms of the (ergonomics) incompatibility of design parameters with the optimal, ideal, or desired reference values, expressed in terms of ergonomics design parameters, such as the range of table height or chair height adjustability, the maximum acceptable load of lift, the maximum compression on the spins, the optimal number of choices, the maximum number of hand repetitions per cycle time on a production line, the minimum required decision time, and the maximum heat load exposure per unit of time. The general relationships between technology of design and science of design are illustrated in Figure 8. Furthermore, Figure 9 depicts such relationships for the HFE discipline. In the context of axiomatic design in ergonomics, the functional requirements are the human–system compatibility requirements, while the design parameters are the human–system interactions. Therefore, ergonomics design can be defined as mapping from the human–system compatibility requirements to the human–system interactions. More generally, HFE can be defined as the science of design, testing, evaluation, and management of human–system interactions according to the human–system compatibility requirements. 7.3 Axiomatic Design Approach in Ergonomics: Applications Helander (1994, 1995) was the first to provide a conceptualization of the second design axiom in ergonomics by considering the selection of a chair based on the information content of specific chair design parameters. Karwowski (2003) introduced the concept of system incompatibility measurements and the measure on incompatibility for ergonomics design and evaluation. Furthermore, Karwowski (2003) has also illustrated an application of the first design axiom adapted to the needs of ergonomics design using an example of the design of the rear-light system utilized to provide information about the application of brakes in a passenger car. The rear-light system is illustrated in Figure 10. In this highway safety-related example, the FRs of the rear-lighting (braking display) system were defined in terms of FRs and DPs as follows: FR1 = Provide early warning to maximize lead response time (MLRT) (information about the car in front that is applying brakes) FR2 = Assure safe braking (ASB) The traditional (old) design solution is based on two DPs: DP1 = Two rear brake lights on the sides (TRLS) DP2 = Efficient braking mechanism (EBM) The design matrix of the traditional rear-lighting system (TRLS) is as follows: { } ( ){ } FR1 X 0 DP1 = FR DP X X 2
MLRT ASB
2
X X
0 X
TRLS EBM
This rear-lighting warning system (old solution) can be classified as a decoupled design and is not an optimal design. The reason for such classification is that, even with the efficient braking mechanism, one cannot compensate for the lack of time in
THE DISCIPLINE OF HUMAN FACTORS AND ERGONOMICS
21
Technology of ergonomics
Practice of ergonomics
Figure 9
Theoretical basis of ergonomics
Technology of ergonomics
Theoretical basis of ergonomics
Practice of ergonomics
Axiomatic design in ergonomics
Human-compatible products and systems
Science, technology, and design in ergonomics. (Source: Karwowski, 2003.)
Traditional side lights Figure 10 Illustration of redesigned rear-light system of an automobile.
the driver’s response to braking of the car in front due to a sudden traffic slowdown. In other words, this rear-lighting system does not provide early warning that would allow the driver to maximize their lead response time (MLRT) to braking. The solution that was implemented two decades ago uses a new concept for the rear lighting of the braking system (NRLS). The new design is based on the addition of the third braking light, positioned in the center and at a height that allows this light to be seen through the windshields of the car preceding the car immediately in front (Figure 10). This new design solution has two DPs: DP1 = A new rear-lighting system (NRLS) DP2 = Efficient braking mechanism) (EBM) (the same as before) The formal design classification of the new solution is an uncoupled design. The design matrix for this new design is as follows: X 0
?
Science of ergonomics
Additional center light
MLRT ASB
Science of ergonomics
0 X
NRLS EBM
It should be noted that the original (traditional) rear-lighting system (TRLS) can be classified as decoupled design. This old design [DP1,O ] does not compensate for the lack of early
warning that would maximize a driver’s lead response time (MLRT) whenever braking is needed and, therefore, violates the second functional requirement (FR2 ) of safe braking. The design matrix for new system (NRLS) is an uncoupled design that satisfies the independence of functional requirements (independence axiom). This uncoupled design, [DP1,N ], fulfills the requirement of maximizing lead response time (MLRT) whenever braking is needed and does not violate the FR2 (safe braking requirement). 8 THEORETICAL ERGONOMICS: SYMVATOLOGY The system–human interactions often represent complex phenomena with dynamic compatibility requirements. They are often nonlinear and can be unstable (chaotic) phenomena, the modeling of which requires a specialized approach. Karwowski (2001) indicated a need for symvatology as a corroborative science to ergonomics that can help in developing solid foundations for the ergonomics science. The proposed subdiscipline is called symvatology, or the science of the artifact–human (system) compatibility. Symvatology aims to discover laws of the artifact–human compatibility, proposes theories of the artifact–human compatibility, and develops a quantitative matrix for measurement of such compatibility. Karwowski (2001) coined the term symvatology, by joining two Greek words: symvatotis (compatibility) and logos (logic, or reasoning about). Symvatology is the systematic study (which includes theory, analysis, design, implementation, and application) of interaction processes that define, transform, and control compatibility relationships between artifacts (systems) and people. An artifact system is defined as a set of all artifacts (meaning objects made by human work) as well as natural elements of the environment, and their interactions occurring in time and space afforded by nature. A human system is defined as the human (or humans) with all the characteristics (physical, perceptual, cognitive, emotional, etc.) which are relevant to an interaction with the artifact system. To optimize both the human and system well-being and performance, system–human compatibility should be considered at all levels, including the physical, perceptual, cognitive, emotional, social, organizational, managerial, environmental, and political. This requires a way to measure the inputs and outputs that characterize the set of system–human interactions (Karwowski, 1991). The goal of quantifying artifact–human compatibility can only be realized if we understand its nature. Symvatology aims to observe, identify, describe, and perform empirical investigations and produce theoretical explanations
22
HUMAN FACTORS FUNCTION
of the natural phenomena of artifact–human compatibility. As such, symvatology should help to advance the ergonomics discipline by providing a methodology for the design for compatibility as well as the design of compatibility between artificial systems (technology) and humans. In the above perspective, the goal of ergonomics should be to optimize both the human and system well-being and their mutually dependent performance. As pointed out by Hancock (1997), it is not enough to assure the well-being of the human, as one must also optimize the well-being of a system (i.e. the artifacts-based technology and nature) to make the proper uses of life. Due to the nature of the interactions, an artifact system is often a dynamic system with a high level of complexity, and it exhibits a nonlinear behavior. The American Heritage Dictionary of English Language (Morris, 1978) defines “complex” as consisting of interconnected or interwoven parts. Karwowski et al. (1988) proposed representing the artifact–human system (S) as a construct which contains the human subsystem (H), an artifact subsystem (A), an environmental subsystem (E), and a set of interactions (I) occurring between different elements of these subsystems over time (t). In the above framework, compatibility is a dynamic, natural phenomenon that is affected by the artifact–human system structure, its inherent complexity, and its entropy or the level of incompatibility between the system’s elements. Since the structure of system interactions (I) determines the complexity and related compatibility relationships in a given system, compatibility should be considered in relation to the system’s complexity. The system space, denoted here as an ordered set [(complexity, compatibility)], is defined by the four pairs as follows [(high, high), (high, low), (low, high), (low, low)]. In the best scenario, that is, under the most optimal state of system design, the artifact–human system exhibits high compatibility and low complexity levels. It should be noted that the transition from high to low level of system complexity does not necessarily lead to improved (higher) level of system compatibility. Also, it is often the case in most of the artifact–human systems that an
System entropy
improved (higher) system’s compatibility can only be achieved at the expense of increasing the system’s complexity. The lack of compatibility, or ergonomics incompatibility (EI), defined as degradation (disintegration) of the artifact–human system, is reflected in the system’s measurable inefficiency and associated human losses. In order to express the innate relationship between the system’s complexity and compatibility, Karwowski et al. (1988) and Karwowski (1991) proposed the complexity–incompatibility principle, which can be stated as follows: As the (artifact–human) system complexity increases, the incompatibility between the system elements, as expressed through their ergonomic interactions at all system levels, also increases, leading to greater ergonomic (nonreducible) entropy of the system and decreasing the potential for effective ergonomic intervention. The above principle was illustrated by Karwowski (1995) using as an example the design of an office chair (see Figure 11). Karwowski (1992a) also discussed the complexity–compatibility paradigm in the context of organizational design. It should be noted that the above principle reflects the natural phenomena that others in the field have described in terms of difficulties encountered in humans interacting with consumer products and technology in general. For example, according to Norman (1988), the paradox of technology is that added functionality to an artifact typically comes with the tradeoff of increased complexity. These added complexities often lead to increase human difficulty and frustration when interacting with these artifacts. One of the reasons for the above is that technology which has more features also has less feedback. Moreover, Norman noted that the added complexity cannot be avoided when functions are added and can only be minimized with good design that follows natural mapping between the system elements (i.e. the control-display compatibility). Following Ashby’s (1964) law of requisite variety, Karwowski (1995) proposed the corresponding law, called the “law of requisite complexity,” which states that only design complexity can reduce system complexity. The above means that only the added complexity of the regulator (R = re/design),
Simple chair [1]
Complex chair [2] E(H2)
E(S) ≥ E(H) − E(R)
E(H1) E(S 1) E(S 2)
Design: Regulator Entropy E(R)
E(R1)
Ergonomic Intervention (compatibility requirements)
E(R2)
Complexity1 Figure 11
, where A > B means that A is preferred to B =, where A = B means that A and B are equivalent ≥, where A ≥ B means that B is not preferred to A B. Transitivity of preference. If A1 ≥ A2 and A2 ≥ A3 , then A1 ≥ A3 . C. Quantification of judgment. The relative likelihood of each possible consequence that might result from an alternative action can be specified. D. Comparison of alternatives. If two alternatives yield the same consequences, the alternative yielding the greater chance of the preferred consequence is preferred. E. Substitution. If A1 > A2 > A3 , the decision maker will be willing to accept a gamble [p(A1 ) and (1 − p)(A3 )] as a substitute for A2 for some value of p ≥ 0. F. Sure-thing principle. If A1 ≥ A2 , then for all p, the gamble [p(A1 ) and (1 − p)(A3 )] ≥ [p(A2 ) and (1 − p)(A3 )].
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
163
Dominance is perhaps the most fundamental normative decision rule. Dominance is said to occur between two alternative actions, Ai and Aj , when Ai is at least as good as Aj for all events E, and for at least one event Ek , Ai is preferred to Aj . For example, one investment might yield a better return than another regardless of whether the stock market goes up or down. Dominance can also be described for the case where the consequences are multidimensional. This occurs when for all events E, the kth consequence associated with action i (Cik ) and action j (Cjk ) satisfies the relation Cik ≥ Cjk for all k and for at least one consequence Cik > Cjk . For example, a physician choosing between alternative treatments has an easy decision if one treatment is both cheaper and more effective for all patients. Dominance is obviously a normative decision rule, since a dominated alternative can never be better than the alternative that dominates it. Dominance is also conceptually simple, but it can be difficult to detect when there are many alternatives to consider or many possible consequences. The use of tests for dominance by decision makers in naturalistic settings is discussed further in Section 2.3.
2.1.5 Minimax (Cost and Regret) and the Value of Information The minimax cost decision rule selects the best alternative (Ai ) by first identifying the worst possible outcome for each alternative. The worst outcomes are then compared between alternatives. The alternative with the minimum worst-case cost is selected. Formally, the preferred action Ai is the action for which over the events k, MAXk (Cik ) = MINi [MAXk (Cik )]. Minimax cost corresponds to assuming the worst and therefore makes sense as a strategy where an adverse opponent is able to control the events (von Neumann & Morgenstern, 1947). Along these lines, an airline executive considering whether to reduce fares might assume that a competitor will also cut prices, leading to a no-win situation. Minimax regret is a similar decision rule, but the calculations are performed using regret instead of cost (Savage, 1954). Regret is calculated by first identifying which alternative is best for each possible event. The regret Rik , associated with each consequence (Cik ) for the combination of event Ek and alternative Ai then becomes: Rik = MAXi (Cik ) − Cik (1)
2.1.3 Lexicographic Ordering and EBA
The preferred action Ai is the action for which over all events k, has the minimum maximum regret. That is:
2.1.2 Dominance
The lexicographic ordering principle (see Fishburn, 1974) considers the case where alternatives have multiple consequences. For example, a purchasing decision might be based on both the cost and performance of the considered product. The different consequences are first ordered in terms of their importance. Returning to the above example, performance might be considered more important than cost. The decision maker then sequentially compares each alternative beginning with the most important consequence. If an alternative is found that is better than the others on the first consequence, it is immediately selected. If no alternative is best on the first dimension, the alternatives are compared for the next-most important consequence. This process continues until an alternative is selected or all the consequences have been considered without making a choice. The latter situation can happen only if the alternatives have the same consequences. The elimination by aspects (EBA) rule (Tversky, 1972) is similar to the lexicographic decision rule. It differs in that the consequences used to compare the alternatives are selected in random order, where the probability of selecting a consequence dimension is proportional to its importance. Both EBA and lexicographic ordering are noncompensatory decision rules since the decision is made using a single consequence dimension. 2.1.4 Minimum Aspiration Level and Satisficing The minimum aspiration level or satisficing decision rule assumes that the decision maker sequentially screens the alternative actions until an action is found which is good enough. For example, a person considering the purchase of a car might stop looking once he or she found an attractive deal instead of comparing every model on the market. More formally, the comparison of alternatives stops once a choice is found that exceeds a minimum aspiration level Sik for each of its consequences Cik over the possible events Ek . Satisficing can be a normative decision rule when (1) the expected benefit of exceeding the aspiration level is small, (2) the cost of evaluating alternatives is high, or (3) the cost of finding new alternatives is high. More often, however, it is viewed as an alternative to maximizing decision rules. From this view, people cope with incomplete or uncertain information and their limited rationality by satisficing in many settings instead of optimizing (Simon, 1955, 1983).
MAXk (Rik ) = MINi [MAXk (Rik )]
(2)
Note that the minimax cost and minimax regret principles do not always suggest the same choice. Minimax cost is easily interpreted as a conservative strategy. Minimax regret is more difficult to judge from an objective or normative perspective (Savage, 1954). As pointed out by Bell (1982) regret can explain human preferences not explainable by cost alone. It is also interesting to observe that regret is closely related to the value of information. This follows, since with hindsight, decision makers may regret their choice if they did not select the alternative giving the best result for the event (Ek ) which actually took place. With perfect information, the decision maker would always choose the alternative Ai which gives the best outcome given Ek Consequently, the regret (Rik ) associated with having chosen alternative (Ai ) is a measure of the value of having perfect information, or of knowing ahead of time that event Ek would occur. When each of the events (Ek ) occur with probability Pk , it becomes possible to calculate the expected value of both perfect and imperfect information (Raiffa, 1968). The value of imperfect (or sample) information provides a normative rule for deciding whether to collect additional information. For example, a decision to perform a survey before introducing a product can be made by comparing the cost of the survey to the expected value of the information obtained. It is often assumed that decision makers are biased when they fail to seek out additional information. The above discussion shows that not obtaining information is justified when the information costs too much. From a practical perspective, the value of information can guide decisions to provide information to product users (Lehto & Papastavrou, 1991). 2.1.6 Maximum Expected Value From elementary probability theory, return is maximized by selecting the alternative with the greatest expected value. The expected value of an action Ai is calculated by weighting the decision maker’s preference V(Cik ) for its consequences Cik over all events k by the probability Pik that the event will occur. The expected value of a given action Ai is therefore ∑ EV[Ai ] = Pik V(Cik ) (3) k
HUMAN FACTORS FUNDAMENTALS
EV[L] =
∑
∑
u(x)
x
Risk averse
x
u(x)
Risk neutral
u(x)
Monetary value is a common value function. For example, lives lost, units sold, or air quality might all be converted into monetary values. More generally, however, value reflects preference, as illustrated by ordinary concepts such as the value of money or the attractiveness of a work setting. Given that the decision maker has large resources and is given repeated opportunities to make the choice, choices made on the basis of expected monetary value are intuitively justifiable. A large company might make nearly all of its decisions on the basis of expected monetary value. Insurance buying and many other rational forms of behavior cannot, however, be justified on the basis of expected monetary value. It has long been recognized that rational decision makers made choices not easily explained by expected monetary value (Bernoulli, 1738). Bernoulli cited the St. Petersburg paradox, in which the prize received in a lottery was 2n , n being the number of times a flipped coin turned up heads before a tail was observed. The probability of n flips before the first tail is observed is 0.5n . The expected value of this lottery becomes
u(x)
164
Risk seeking Figure 2
x
Mixed risk aversion
x
Utility functions for differing risk attitudes.
∞
Pik V(Cik ) =
k
0.5n 2n → ∞
(4)
n=0
The interesting twist is that the expected value of the lottery above is infinite. Bernoulli’s conclusion was that preference cannot be a linear function of monetary value since a rational decision maker would never pay more than a finite amount to play the lottery. Furthermore, the value of the lottery can vary between decision makers. According to utility theory, this variability, described in utility, reflects rational differences in preference between decision makers for uncertain consequences. 2.1.7 Subjective Expected Utility (SEU) Theory Expected utility theory extended expected value theory to describe better how people make uncertain economic choices (von Neumann & Morgenstern, 1947). In their approach, monetary values are first transformed into utilities using a utility function u(x). The utilities of each outcome are then weighted by their probability of occurrence to obtain an expected utility. The SEU theory added the notion that uncertainty about outcomes could be represented with subjective probabilities (Savage, 1954). It was postulated that these subjective estimates could be combined with evidence using Bayes’ rule to infer the probabilities of outcomes.7 This group of assumptions corresponds to the Bayesian approach to statistics. Following this approach, the SEU of an alternative Ai , given subjective probabilities Sik and consequences Cik over events Ek , becomes ∑ Sik U(Cik ) (5) SEU[Ai ] = k
Note the similarity between formulation (5) for SEU and equation (3) for expected value. The EV and SEU are equivalent if the value function equals the utility function. Methods for eliciting value and utility functions differ in nature (Section 4.1). Preferences elicited for uncertain outcomes measure utility.8 Preferences elicited for certain outcomes measure value. It has, accordingly, often been assumed that value functions differ from utility functions, but there are reasons to treat value and utility functions as equivalent (von Winterfeldt & Edwards, 1986). The latter authors claim that the differences between elicited value and utility functions are small and that “severe limitations constrain those relationships, and only a few possibilities exist, one of which is that they are the same.” When people are presented with choices that have uncertain outcomes, they react in different ways. In some situations,
people find gambling to be pleasurable. In others, people will pay money to reduce uncertainty: for example, when people buy insurance. SEU theory distinguishes between risk-neutral, risk-averse, risk-seeking, and mixed forms of behavior. These different types of behavior are described by the shape of the utility function (Figure 2). A risk-neutral decision maker will find the expected utility of a gamble to be the same as the utility of the gamble’s expected value. That is, expected u(gamble) = u(gamble’s expected value). For a risk-averse decision maker, expected u(gamble) < u(gamble’s expected value); for a risk-seeking decision maker, expected u(gamble) > u(gamble’s expected value). On any given point of a utility function, attitudes toward risk are described formally by the coefficient of risk aversion: CRA =
u′′ (x) u′ (x)
(6)
where u′ (x) and u′′ (x) are, respectively, the first and second derivatives of u(x) taken with respect to x. Note that when u(x) is a linear function of x [i.e., u(x) = ax + b], then CRA = 0. For any point of the utility function, if CRA < 0, the utility function depicts risk-averse behavior, and if CRA > 0, the utility function depicts risk-seeking behavior. The coefficient of risk aversion therefore describes attitudes toward risk at each point of the utility function given that the utility function is continuous. SEU theory consequently provides a powerful tool for describing how people might react to uncertain or risky outcomes. However, some commonly observed preferences between risky alternatives cannot be explained by SEU. Section 2.2.2 focuses on experimental findings showing deviations from the predictions of SEU. A major contribution of SEU is that it represents differing attitudes toward risk and provides a normative model of decision making under uncertainty. The prescriptions of SEU are also clear and testable. Consequently, SEU has played a major role in fields other than economics, both as a tool for improving human decision making and as a stepping stone for developing models that describe how people make decisions when outcomes are uncertain. As discussed further in Section 2.2, much of this work has been done in psychology. 2.1.8 Multiattribute Utility (MAUT) Theory Multiattribute utility theory (MAUT) (Keeney & Raiffa, 1976) extends SEU to the case where the decision maker has multiple objectives. The approach is equally applicable for describing
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
utility and value functions. Following this approach, the utility (or value) of an alternative A, with multiple attributes x, is described with the multiattribute utility (or value) function u(x1 , … , xn ), where u(x1 , … , xn ) is some function f(x1 , … , xn ) of the attributes x. In the simplest case, MAUT describes the utility of an alternative as an additive function of the single-attribute utility functions un (xn ). That is, u(x1 , … , xn ) =
n ∑
ki ui (xi )
(7)
i=1
where the constants kn are used to weight each single-attribute utility function (un ) in terms of its importance.9 Assuming that an alternative has three attributes, x, y, and z, an additive utility function is u(x, y, z) = kx ux (x) + ky uy (y) + kz uz (z). Along these lines, a community considering building a bridge across a river versus building a tunnel or continuing to use the existing ferry system might consider the attractiveness of each option in terms of the attributes of economic benefits, social benefits, and environmental benefits. More complex multiattribute utility functions include multiplicative forms and functions that combine utility functions for subsets of two or more attributes (Keeney & Raiffa, 1976). An example of a simple multiplicative function would be u(x, y) = ux (x)uy (y). A function that combines utility functions for subsets would be u(x, y, z) = kxy uxy (x, y) + kz uz (z). The latter type of function becomes useful when utility independence is violated. Utility independence is violated when the utility function for one attribute depends on the value of another attribute. Along these lines, when assessing uxy (x, y), it might be found that ux (x) depends on the value of y. For example, people’s reaction to the level of crime in their own neighborhood might depend on the level of crime in a nearby suburb. In the latter case, it is probably better to measure uxy (x is crime in one’s own neighborhood and y is crime in a nearby suburb) directly than to estimate it from the single-attribute functions. Assessment of utility and value functions is discussed in Section 4.1. MAUT has been applied to a wide variety of problems (Clemen, 1996; Keeney & Raiffa, 1976; Saaty, 1990; von Winterfeldt & Edwards, 1986; Wallenius et al., 2008). An advantage of MAUT is that it helps structure complex decisions in a meaningful way. Alternative choices and their attributes often naturally divide into hierarchies. The MAUT approach encourages such divide-and-conquer strategies and, especially in its additive form, provides a straightforward means of recombining weights into a final ranking of alternatives. The MAUT approach is also a compensatory strategy that allows normative trade-offs between attributes in terms of their importance. 2.2 Behavioral Economics As a normative ideal, classical decision theory has influenced the study of decision making in a major way. This began with the emergence of the field of behavioral decision theory in the 1950s. Much of the earlier work in behavioral decision theory compared human behavior to the prescriptions of classical decision theory (Edwards, 1954; Einhorn & Hogarth, 1981; Slovic et al., 1977). Numerous departures were found, including the influential finding that people use heuristics during judgment tasks (Tversky & Kahneman, 1974). On the basis of such research, psychologists have concluded that other approaches are needed to describe the process of human decision making. More recently, this research has become the foundation of a new field commonly referred to as behavioral economics.10 Behavioral economics tries to explore the underlying reasons behind the irrational behavior in human decision making
165
based on principles from psychology and economics. It is a subdiscipline of economics formulated as a backlash against neoclassical economics whose main tenet is strong reliance on rationality of human decision makers (Camerer & Loewenstein, 2004; Simon, 1987). One of the psychological foundations of behavioral economics is the concept of Bounded Rationality proposed by Herbert Simon that states that rationality among humans is bounded due to limitations of available information, thinking capacity, and time available for decision making (Simon, 1955, 1982). In the middle of the twentieth century, it was shown that many axioms based on human rationality are often violated by decision models. As well as people are limited in their information processing (Simon, 1982), people care more about the fairness over self-interest (ultimatum game) (Thaler, 1988); people weight risky outcomes in a nonlinear fashion (the prospect theory) (Kahneman & Tversky, 1979); and people value a thing more after they possess it (endowment effect) (Thaler, 1980). Such evidence led many economists to reappreciate the values of psychology to model and predict decision-making behaviors, which led the growth of behavioral economics as a strong subdiscipline (Pesendorfer, 2006). However, behavioral economics is neither a totally new idea nor a drastic change in the core idea of neoclassical economics. Human psychology was already well understood by Adam Smith, who is the father of modern economics. His books include not only The Wealth of Nations but also Theory of Moral Sentiments, which is less known but has profound insights about human psychology. It is interesting to see how eloquently he described loss-aversion in his book, “We suffer more … when we fall from a better to a worse situation, than we ever enjoy when we rise from a worse to a better” (Smith, 1875, p. 331). In addition, it is difficult to say that behavioral economics changed the core idea of neoclassical economics. That is, maximizing expected value is still worthwhile as a core framework to help understand many economic behaviors. Behavioral economics instead provides more capability and external validity by adding additional parameters in traditional economic models and theories. One of most exciting developments is that fast and frugal heuristics can perform very well even when compared to sophisticated optimization model (Gigerenzer, 2008). Some criticize behavioral economics as well. First, it has been claimed that behavioral economics did not provide a unified and elegant theory of rational agents. However, a lack of a unified model not only reflects the diversity and complexity of human decision making but also conversely demonstrates the incompleteness of traditional economic models based on rationality (Kahneman, 2003). Second, some doubt that findings in behavioral economics have significant influences on real-life situations, such as policy making (Brannon, 2008). However, some studies (Ariely, 2008; Levitt & Dubner, 2005; Thaler & Sunstein, 2008) have demonstrated direct implications on policy making and economics. More specifically, behavioral economics has been used to understand behaviors in macroeconomics, saving, labor economics, and finance. For example, many people do not mind decreases in their real wage, considering inflation, as long as there is no decrease in their nominal wage, without considering inflation (Shafir et al., 1997). This behavioral pattern is called “the money illusion.” In this section, the main concepts of behavioral economics and topics related to descriptive decision-making models are covered. Behavioral economics research is focused on two main aspects of behavioral decision making: (1) judgement—how do people estimate probabilities?, and (2) choice—how people make the selection among available alternatives based on their judgement (Camerer et al., 2004). We discuss some of the important concepts related to ‘judgment’ in this subsection and topics related to “choice” in the Section 2.2.1. More comprehensive reviews for behavioral economics and behavioral
166
finance can be found in (Barberis & Thaler, 2003; Camerer et al., 2004; Glaser et al., 2004; Samson, 2016, 2019). 2.2.1 Human Judgment In this subsection, we first consider briefly human abilities and limitations for statistical estimation and inference. Attention then shifts to several heuristics that people may use to cope with their limitations and how their use can cause certain biases. We then briefly consider the role of memory and selective processing of information from a similar perspective. Attention then shifts to mathematical models of human judgment that provide insight into how people judge probabilities, the biases that might occur, and how people learn to perform probability judgment tasks. In the final section, we summarize findings on debiasing human judgments. Human Abilities and Limitations for Statistical Estimation and Inference The ability of people to perceive, learn, and draw inferences accurately from uncertain sources of information has been a topic of much research. A good review of recent research in this area can be found in Wickens et al. (2020) and of earlier research in von Winterfeldt and Edwards (1986). Research conducted in the early 1960s tested the notion that people behave as “intuitive statisticians” who gather evidence and apply it in accordance with the Bayesian model of inference (Peterson & Beach, 1967). Much of the earlier work focused on how good people are at estimating statistical parameters such as means, variances, and proportions. Other studies have compared human inferences obtained from probabilistic evidence to the prescriptions of Bayes’ rule. The research first shows that people can be fairly good at estimating means, variances, or proportions from sample data. Sedlmeier et al. (1998, p. 754) point out that “there seems to be broad agreement with” the conclusion of Jonides and Jones (1992) that people can give answers that reflect the actual relative frequencies of many kinds of events with great fidelity. However, as discussed by von Winterfeldt and Edwards (1986), like other psychophysical measures, subjective probability estimates are noisy. Their accuracy will depend on how carefully they are elicited and on many other factors. Studies have shown that people are especially likely to have trouble estimating accurately the probability of unlikely events, such as nuclear plant explosions. For example, when people were asked to estimate the risk associated with the use of consumer products (Dorris & Tabrizi, 1978; Rethans, 1980) or various technologies (Lichtenstein et al., 1978), the estimates obtained were often weakly related to accident data. Weather forecasters are one of the few groups of people who have been documented as being able to estimate high and low probabilities accurately (Winkler & Murphy, 1973). Similarly, experienced physicians have been observed to provide fairly accurate estimate of diagnosis time of different patients (Wu, Lehto, Yih, Saleem, & Doebbeling, 2007). See Table 2 for findings related to human abilities and limitations for statistical estimation and inference, and Table 3 in the next subsection for related biases and heuristics used by people. Part of the issue is that when events occur rarely, people will not be able to base their judgments on a representative sample of their own observations. Most of the information they receive about unlikely events will come from secondary sources, such as media reports, rather than from their own experience. This tendency might explain why risk estimates are often related more strongly to factors other than likelihood, such as catastrophic potential or familiarity (Lehto et al., 1994; Lichtenstein et al., 1978; Slovic 1978, 1987). Media reporting focuses on “newsworthy” events, which tend to be more catastrophic and unfamiliar. Consequently, judgments based on media reports
HUMAN FACTORS FUNDAMENTALS
might reflect the latter factors instead of likelihood. Weber (1994) provides additional evidence that subjective probabilities are related to factors other than likelihood and argues that people will overestimate the chance of a highly positive outcome because of their desire to obtain it. Weber also argues that people will overestimate the chance of a highly undesirable outcome because of their fear of receiving it. Traditional methods of decision analysis separately elicit and then recombine subjective probabilities with utilities, as discussed earlier, and assume that subjective probabilities are independent of consequences. A finding of dependency therefore casts serious doubt on the normative validity of this commonly accepted approach. When studies of human inference are considered, several other trends become apparent (Table 2). In particular, several significant deviations from the Bayesian model have been found: 1. Decision makers tend to be conservative in that they do not give as much weight to probabilistic evidence as does Bayes’ rule (Edwards, 1968). 2. Decision makers do not consider base rates or prior probabilities adequately (Tversky & Kahneman, 1974). 3. Decision makers tend to ignore the reliability of the evidence (Tversky & Kahneman, 1974). 4. Decision makers tend to overestimate the probability of conjunctive events and underestimate the probability of disjunctive events (Bar-Hillel, 1973). 5. Decision makers tend to seek out confirming evidence rather than disconfirming evidence and place more emphasis on confirming evidence when it is available (Baron, 1985; Einhorn & Hogarth, 1978). The order in which the evidence is presented has an influence on human judgments (Hogarth & Einhorn, 1992). 6. Decision makers are overconfident in their predictions (Fischhoff et al. 1977), especially in hindsight (Christensen-Szalanski & Willham, 1991; Fischhoff, 1982). 7. Decision makers show a tendency to infer illusionary causal relations (Tversky & Kahneman, 1973). A lively literature has developed regarding these deviations and their significance (Caverni et al., 1990; Doherty, 2003; Evans, 1989; Klein et al., 1993; Wickens, 1992).11 From one perspective, these deviations demonstrate inadequacies of human reason and are a source of societal problems (Baron, 1998; and many others). From the opposite perspective, it has been held that the foregoing findings are more or less experimental artifacts that do not reflect the true complexity of the world (Cohen, 1993). A compelling argument for the latter point of view is given by Simon (1955, 1983). From this perspective, people do not use Bayes’ rule to compute probabilities in their natural environments because it makes unrealistic assumptions about what is known or knowable. Simply put, the limitations of the human mind and time constraints make it nearly impossible for people to use principles such as Bayes’ rule to make inferences in their natural environments. To compensate for their limitations, people use simple heuristics or decision rules that are adapted to particular environments. The use of such strategies does not mean that people will not be able to make accurate inferences, as emphasized by both Simon and researchers embracing the ecological12 (i.e., Gigerenzer, Todd, & the ABC Research Group, 1999; Hammond, 1996) and naturalistic (i.e., Klein et al., 1993) models of decision making. In fact, as discussed further in this section, the use of simple heuristics in rich environments can lead to inferences that are in many cases more accurate than
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING Table 2
167
Sample Findings on the Ability of People to Estimate and Infer Statistical Quantities
Statistical estimation Estimation of Mean and Variance Accurate estimation of sample means Sample size bias in estimation of means Variance estimates correlated with mean Variance biases not found Variance estimates are based on range, people tend to focus more on outlier data points Underestimation of variability Estimation of Event Frequencies and Probabilities Accurate estimation of event frequency Accurate estimates of sample proportions between 0.75 and 0.25 Severe overestimates of high probabilities; severe underestimates of low proportions Reluctance to report extreme events Weather forecasters provided accurate probabilities Poor estimates of expected severity Correlation of 0.72 between subjective and objective measures of injury frequency Risk estimates lower for self than for others Risk estimates related to catastrophic potential, degree of control, familiarity Evaluations of outcomes and probabilities are dependent Overweighting the probability of rare events in decisions from descriptions; underweighting decisions from experiences Statistical inference Conservative aggregation of evidence Nearly optimal aggregation of evidence in naturalistic setting Failure to consider base rates Base rates considered Increasing the salience of base rate information increases its consideration Overestimation of conjunctive events Underestimation of disjunctive events Subjects considered variability of data when judging probabilities People insensitive to information missing from fault trees Overconfidence in estimates Illusionary correlations Decision makers are insufficiently sensitive to the relevance of the information. Gambler’s fallacy: irrational belief that frequent events of past will be infrequent in future Subjects with greater tendency to gamble had a higher tendency to perceive illusionary correlations Misestimation of covariance between items Misinterpretation of regression to the mean
those made using Naïve Bayes, or linear regression (Gigerenzer et al., 1999). There is an emerging body of literature that shows, on the one hand, that deviations from Bayes’ rule can in fact be justified in certain cases from a normative view and, on the other hand,
Reference Peterson & Beach (1967) Smith & Price (2010); (Allik, Toom, Raidvee, Averin, & Kreegipuu, 2013); (Smith, Rule, & Price, 2017) Lathrop (1967) Levin (1975) Pitz (1980) Wickens et al., (2020) Estes (1976); Hasher and Zacks (1984); Jonides & Jones (1992); (Kane & Woehr, 2006) Edwards (1954) Fischhoff et al. (1977); Lichtenstein et al. (1982) Du Charme (1970) Winkler & Murphy (1973) Dorris and Tabrizi (1978) Rethans (1980) Weinstein (1980, 1987); Prater, Kirytopoulos, & Ma (2017) Lichtenstein et al. (1978) Weber (1994) Hertwig et al. (2004); Barron & Ursino (2013)
Edwards (1968) Lehto et al. (2000) Tversky & Kahneman (1974) Birnbaum & Mellers (1983) Obrecht & Chesney (2018) Bar-Hillel (1973); Brockner, Paruchuri, Idson, & Higgins (2002); Costello (2009) Kahneman and Tversky (1973) Evans and Pollard (1985) Fischhoff et al. (1978) Fischhoff (1982) Hsee, Yang, & Li (2019) Tversky and Kahneman (1974); Barron & Leider (2010) Wilke, Scheibehenne, Gaissmaier, McCanney, & Barrett (2014) Arkes (1981) Tversky and Kahneman (1974); Morton & Torgerson (2003)
that these deviations may disappear when people are provided with richer information or problems in more natural contexts. For example, drivers performing a simulated passing task combined their own observations of the driving environment with imperfect information provided by a collision-warning system,
168
HUMAN FACTORS FUNDAMENTALS
as predicted by a distributed signal detection theoretic model of optimal team decision making (Lehto, Papastavrou, Ranney, & Simmons, 2000). Other researchers have pointed out that: 1. A tendency toward conservatism can be justified when evidence is not conditionally independent (Navon, 1979). 2. Subjects do use base-rate information and consider the reliability of evidence in slightly modified experimental settings (Birnbaum & Mellers, 1983; Koehler, 1996). In particular, providing natural frequencies instead of probabilities to subjects can improve performance greatly (Gigerenzer & Hoffrage, 1995; Krauss et al., 1999). 3. A tendency to seek out confirming evidence can offer practical advantages (Cohen, 1993) and may reflect cognitive failures, due to a lack of understanding of how to falsify hypotheses, rather than an entirely motivational basis (Evans, 1989; Klayman & Ha, 1987). 4. Subjects prefer stating subjective probabilities with vague verbal expressions rather than precise numerical values (Wallsten, Zwick, Kemp, & Budescu, 1993), demonstrating that they are not necessarily overconfident in their predictions. 5. There is evidence that the hindsight bias can be moderated by familiarity with both the task and the type of outcome information provided (Christensen-Szalanski & Willham, 1991). Based on such results, numerous researchers have questioned the practical relevance of the large literature showing different types of biases. One reason that this literature may be misleading is that researchers overreport findings of bias (Cohen, 1993; Evans, 1989). A more significant concern is that studies showing bias are almost always conducted in artificial settings where people are provided information about an unfamiliar topic. Furthermore, the information is often given in a form that forces use of Bayes’ rule or other form of abstract reasoning to get the correct answer. For example, consider the simple case where a person is asked to predict how likely it is that a woman has breast cancer, given a positive mammogram (Martignon & Krauss, 2003). In the typical study looking for bias, the subject might be told to assume (1) the probability that a 40-year-old woman has breast cancer is 1%; (2) the probability of a positive mammogram given that a woman has cancer is 0.9; and (3) the probability of a positive mammogram given that a woman does not have cancer is 0.1. Although the correct answer can easily be calculated using Bayes’ rule, it is not at all surprising that people unfamiliar with probability theory have difficulty determining it. In the real world, it seems much more likely that a person would simply keep track of how many women receiving a mammogram actually had breast cancer. The probability that a woman has breast cancer, given a positive mammogram, is then determined by dividing the number of women receiving a mammogram who actually had breast cancer by the number of women receiving a mammogram. The latter calculation gives exactly the same answer as using Bayes’ rule and is much easier to do. The implications of the example above are obvious: First, people can duplicate the predictions of the Bayes rule by keeping track of the right relative frequencies. Second, if the right relative frequencies are known, accurate inferences can be made using very simple decision rules. Third, people will have trouble making accurate inferences if they do not know the right relative frequencies. Recent studies and reevaluations of older studies provide additional perspective. The finding that subjects are much better at integrating information when they are provided data in the form of natural frequencies instead of probabilities (Gigerenzer & Hoffrage, 1995; Krauss et al., 1999) is particularly interesting.
One conclusion that might be drawn from the latter work is that people are Bayesians after all if they are provided adequate information in appropriate representations (Martignon & Krauss, 2003). Other support for the proposition that people are not as bad at inference as it once seemed includes Dawes and Mulford’s (1996) review of the literature supporting the overconfidence effect or bias, in which they conclude that the methods used to measure this effect are logically flawed and that the empirical support is inadequate to conclude that it really exists. Part of the issue is that much of the psychological research on the overconfidence effect “overrepresents those situations where cue-based inferences fail” (Juslin & Olsson, 1999). When people rate objects that are selected randomly from a natural environment, overconfidence is reduced. Koehler (1996) provides a similarly compelling reexamination of the base-rate fallacy. He concludes that the literature does not support the conventional wisdom that people routinely ignore base rates. On the contrary, he states that base rates are almost always used and that their degree of use depends on task structure and representation as well as their reliability compared to other sources of information. Because such conflicting results can be obtained, depending on the setting in which human decision making is observed, researchers embracing the ecological (i.e., Gigerenzer et al., 1999; Hammond, 1996) and naturalistic (Klein et al., 1993; Klein, 1998) models of decision making strongly emphasize the need to conduct ecologically valid research in rich realistic decision environments. Heuristics and Biases Tversky and Kahneman (1973, 1974) made a key contribution to the field when they showed that many of the above-mentioned discrepancies between human estimates of probability and Bayes’ rule could be explained by the use of three heuristics. Heuristics can be defined as cognitive shortcuts or rules of thumb that simplify the decision process, often in absence of sufficient information, time, or thinking capability. The three heuristics they proposed were those of representativeness, availability, and anchoring and adjustment. The representativeness heuristic holds that the probability of an item A belonging to some category B is judged by considering how representative A is of B. For example, a person is typically judged more likely to be a librarian than a farmer when described as “a meek and tidy soul who has a desire for order and structure and a passion for detail.” Applications of this heuristic will often lead to good probability estimates but can lead to systematic biases. Tversky and Kahneman (1974) give several examples of such biases. In each case, representativeness influenced estimates more than other, more statistically oriented information. In the first study, subjects ignored base-rate information (given by the experimenter) about how likely a person was to be either a lawyer or an engineer. Their judgments seemed to be based entirely on how representative the description seemed to be of either occupation. Tversky and Kahneman (1973) found people overestimated conjunctive probabilities in a similar experiment. Here, after being told that “Linda is 31 years old, single, outspoken, and very bright. most subjects said it was more likely she was both a bank teller and active as a feminist than simply a bank teller. In a third study, most subjects felt that the probability of more than 60% male births on a given day was about the same for both large and small hospitals (Tversky & Kahneman, 1974). Apparently, the subjects felt that large and small hospitals were equally representative of the population. Other behaviors explained in terms of representativeness by Tversky and Kahneman (1974) included gambler’s fallacy, insensitivity to predictability, illusions of validity, and misconceptions of statistical regression to the mean. With regard to gambler’s fallacy, they note that people may feel that long sequences of heads or tails when flipping coins are unrepresentative of normal behavior. After a sequence of heads, a tail
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
therefore seems more representative. Insensitivity to predictability refers to a tendency for people to predict future performance without considering the reliability of the information on which they base the prediction. For example, a person might expect an investment to be profitable solely on the basis of a favorable description or some positive characteristics without considering whether the description or characteristic has any predictive value (Chen, Kim, Nofsinger, & Rui, 2007). In other words, a good description is believed to be representative of high profits, even if it states nothing about profitability The illusion of validity occurs when people use highly correlated evidence to make a conclusion. Despite the fact that the evidence is redundant, the presence of many representative pieces of evidence increases confidence greatly. Misconception of regression to the mean occurs when people react to unusual events and then infer a causal linkage when the process returns to normality on its own. For example, a manager might incorrectly conclude that punishment works after seeing that unusually poor performance improves to normal levels following punishment. The same manager might also conclude that rewards do not work after seeing that unusually good performance drops after receiving a reward. The availability heuristic holds that the probability of an event is determined by how easy it is to remember the event happening. Tversky and Kahneman state that perceived probabilities will therefore depend on familiarity, salience, effectiveness of memory search, and imaginability. The implication is that people will judge events as more likely when the events are familiar, highly salient (such as an airplane crash), or easily imaginable. Events will also be judged more likely if there is a simple way to search memory. For example, it is much easier to search for words in memory by the first letter rather than by the third letter. It is easy to see how each item above affecting the availability of information can influence judgments. Biases should increase when people lack experience or when their experiences are too focused. The anchoring-and-adjustment heuristic holds that people start from an initial estimate and then adjust it to reach a final value. The point chosen initially has a major impact on the final value selected when adjustments are insufficient. Tversky and Kahneman (1974) refer to this source of bias as an anchoring Table 3
169
effect. They show how this effect can explain under- and overestimates of disjunctive and conjunctive events. This happens if the subject starts with a probability estimate of a single event. The probability of a single event is, of course, less than that for the disjunctive event and greater than that for the conjunctive event. If adjustment is too small, under- and overestimates occur, respectively, for the disjunctive and conjunctive events. Tversky and Kahneman also discuss how anchoring and adjustment may cause biases in subjective probability distributions. The notion of heuristics and biases has had a particularly formative influence on decision theory. A substantial body of work has emerged that focuses on applying research on heuristics and biases (Heath et al., 1994; Kahneman, Slovic, & Tversky, 1982). Applications include medical judgment and decision making, affirmative action (Education, personality assessment, legal decision making, mediation, and policy making. It seems clear that this approach is excellent for describing many general aspects of decision making in the real world. Heuristics are often believed to cause cognitive biases and irrationality, but opinion is divided on this. The “fast and frugal” decision making approach views heuristics as an ecologically rational strategy to take decisions under limited information, time, or computing capacity (Goldstein & Gigerenzer, 2002). Research on heuristics and biases has also been criticized as being pretheoretical (Slovic, Fischhoff, & Lichtenstein, 1977) and, as pointed out earlier, has contributed to overselling of the view that people are biased. The latter point is interesting, as Tversky and Kahneman (1973) have claimed all along that using these heuristics can lead to good results. However, nearly all the research conducted in this framework has focused on when they might go wrong. While some heuristics such as availability and representativeness are widely applicable, some are domain specific, for example, in the social and consumer psychology domain some heuristics are brand name, price, and scarcity (Shah & Oppenheimer, 2008). A list of some of the commonly observed heuristics and biases are listed in Table 3. As summarized in Table 3, these go well beyond the heuristics and biases originally suggested by Kahneman and Tversky. Many of these biases, such as confirmation bias, the Dunning-Kruger effect, herd behavior, etc. have serious implications for society as a whole, and
List of Common Heuristics and Biases in Decision Making
Name
Description
Reference
Representativeness heuristic
The probability of an item A belonging to some category B is judged by considering how representative A is of B. For examples, a consumer inferring higher quality of a product if the packaging is similar to a high-quality product. Using this heuristic, people make judgements about the likelihood of an event on the basis of how easily they can remember it. For example, consumers forming the image of a store being lower priced based on how many low-priced items from the store they can recall It is a priming effect where the first exposure to an option results in treating that as a reference point and all the subsequent options are evaluated relative to that reference. For example, judging the price of a house with reference to the price of first house that the agent showed. When people make quick decisions based on their current emotion or feeling associated with the stimulus (person, object, or activity involved) rather than considering longer term pros and cons of that decision. Often, the emotion or feeling is based on their past experiences with the stimulus. Example: People may only consider positive aspects of a product when in good mood or associate positive previous experience with it.
Tversky & Kahneman (1974) Kardes et al. (2004)
Availability heuristic
Anchoring heuristic
Affect heuristic
Tversky & Kahneman (1974) Ofir, Raghubir, Brosh, Monroe, & Heiman (2008)
Tversky & Kahneman (1974) Furnham & Boo (2011)
Slovic, Finucane, Peters, & MacGregor (2002)
170 Table 3
HUMAN FACTORS FUNDAMENTALS (continued)
Name
Description
Reference
Cognitive dissonance
It refers to the mental discord that arises between two simultaneous and conflicting ideas, feeling, or decision options. It is often caused when the person engages in a behavior that is inconsistent with their public image or the person they want to be. People try to reduce this discord by changing their belief or actions. Example: Smokers trying to rationalize smoking by believing that there is not enough medical evidence on harmful effects of smoking. It corresponds to the behavior when someone selectively seeks out information to support their existing beliefs and opinions about an issue and tend to ignore the opposite views. This can lead to suboptimal decisions in various domains such as (1) science- people collecting evidence selectively to support their theories, (2) e-commerce—people selectively looking for positive reviews for a brand they already like, (3) internet and social media- people selectively following news and joining groups that align with their political leaning and other beliefs. It is the delusion of competence where people overestimate their abilities and think that they have the aptitude or are well-trained to effectively handle a situation or make a decision under certain circumstances. Kruger and Dunning (1999) observed that the difference between perceived and actual competence explained different types of sub-optimal decisions. Examples: people overestimating their technical competence in workplace. This implies the effect when people follow what a majority of other people are doing instead of using their own independent judgement based on the available information. Example: irrational buying or selling behavior in stock markets, e-commerce decisions based only on online reviews. It is also known as the ‘knew-it-all-along effect’ where, on being provided with new information changes the recollection from an original thought or memory to something different. It can result in distorted judgments about the probability of an event’s occurrence because the outcome of an event seems more predictable than it really was. It may also lead to distorted memory for judgments of factual knowledge. This bias refers to the tendency of people to overestimate the probability of positive outcomes and underestimate the probability of negative events. The underlying reasons for this bias can be heightened sense of control over things, good mood, etc. Examples: unrealistic completion schedules of projects, underestimating the risk of being in a car accident or loss in an investment relative to other people The present bias refers to the tendency of people to overvalue smaller payoffs in the immediate or near future as compared to higher payoff at a later time in the future in a trade-off situation. This refers to the preference of people to stay in the same state by not taking any action or to stay committed with a previously taken decision. This typically happens when it is an important decision and the transition costs involved are not huge. Examples include not buying a different brand of product, resistance in changing healthcare insurance plans, and slow adoption of new technology in workplace. Among the different decision alternatives available, the one that is recognized faster is given preference. This heuristic is typically implemented when the situation does not permit to analyze all the information carefully such as time pressure. Among the different decision alternatives available, selection is made based on a single attribute that can most effectively discriminate among the alternatives (and ignoring other attributes). For example, buying a car based only on the basis of fuel efficiency.
Festinger (1957) Chapman et al. (1993)
Confirmation bias
Dunning-Kruger effect
Herd behavior
Hindsight bias
Optimism bias
Present bias
Status quo bias
Take-the-first heuristic
Take-the-best heuristic
Einhorn and Hogarth (1978); Baron (1985); Chapman et al. (1993) Nickerson (1998) Kappes, Harvey, Lohrenz, Montague, & Sharot (2020) Knobloch-Westerwick, Mothes, & Polavin (2017) Kruger & Dunning (1999) Gibbs, Moore, Steel, & McKinnon (2017) Pennycook, Ross, Koehler, & Fugelsang (2017)
Bikhchandi, Hirschleifer, & Welch (1992) Banerjee (1992) Koch, 2016 Shen, Zhang, & Zhao (2016) Fischhoff et al. (1977); Mazzoni & Vannucci (2007), Christensen-Szalanski & Willham (1991) Harley (2007)
Armor & Taylor (2002); Shepperd, Carroll, Grace & Terry (2002); Prater, Kirytopoulos, & Ma (2017); Sharot, (2011)
O’Donoghue & Rabin (1999) Frederick, Loewenstein, & O’Donoghue (2002) Samuelson & Zeckhauser (1988) Dean, Kıbrıs, & Masatlioglu (2017) Li, Liu, & Liu (2016)
Samson (2016) Gigerenzer & Gaissmaier (2011) Johnson & Raab (2003) Gigerenzer & Gaissmaier (2011)
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
especially so in this connected modern era where people use the internet to communicate and acquire information. Confirmation bias, for example, is of particular concern, as the tendency of people to seek out evidence on the web that confirms their existing beliefs leads to polarization and susceptibility to misinformation as discussed later in this chapter. Memory Effects and Selective Processing of Information The heuristics-and-biases framework has been criticized by many researchers for its failure to adequately address more fundamental cognitive processes that might explain biases (Dougherty et al., 2003). This follows because the availability and representativeness heuristics can both be described in terms of more fundamental memory processes. For example, the availability heuristic proposes that the probability of an event is determined by how easy it is to remember the event happening. Ease of recall, however, depends on many things, such as what is stored in memory, how it is represented, how well it is encoded, and how well a cue item matches the memory representation. Dougherty et al. (2003) note that three aspects of memory can explain many of the findings on human judgment: (1) how information is stored or represented; (2) how information is retrieved; and (3) experience and domain knowledge. The first aspect pertains to what is actually stored when people experience events. The simplest models assume that people store a record of each instance of an experienced event and, in some cases, additional information such as the frequency of the event (Hasher & Zacks, 1984) or ecological cue validities (Brehmer & Joyce, 1988). More complex models assume that people store an abstract representation or summary of the event (Pennington & Hastie, 1988), in some cases at multiple levels of abstraction (Reyna & Brainerd, 1995). The way information is stored or represented can explain several of the observed findings on human judgment. First, there is strong evidence that people are often excellent at storing frequency information13 and the process by which this is done is fairly automatic (Gigerenzer, Hoffrage, & Kleinbolting, 1991; Hasher & Zacks, 1984). Gigerenzer et al. conclude that with repeated experience people should also be able to store ecological cue validities. The accuracy of these stored representations would, of course, depend on how large and representative the sample of encoded observations is. Such effects can be modeled with simple adding models that might include the effects of forgetting (or memory trace degradation) or other factors, such as selective sampling or the amount of attention devoted to the information at the time it is received. As pointed out by Dougherty et al. (2003) and many others, many of the biases in human judgment follow directly from considering how well the events are encoded in memory. In particular, except for certain sensory qualities which are encoded automatically, encoding quality is assumed to depend on attention. Consequently, some biases should reflect the tendency of highly salient stimuli to capture attention. Another completely different type of bias might reflect the fact that the person was exposed to an unrepresentative sample of events. Lumping these two very different biases together, as is done by the availability heuristic, is obviously debatable. Other aspects of human memory mentioned by Dougherty et al. (2003) that can explain certain findings on human judgment include the level of abstraction of the stored representation and retrieval methods. One interesting observation is that people often find it preferable to reason with gist-based representations rather than verbatim descriptions of events (Reyna & Brainerd, 1995). When the gist does not contain adequate detail, the reasoning may lead to flawed conclusions. Some of the differences observed between highly skilled experts and novices might correspond to situations where experts have stored a large
171
number of relevant instances and their solutions in memory, whereas novices have only gist-based representations. In such situations, novices will be forced to reason using the information provided. Experts, on the other hand, might be able to solve the problem with little or no reasoning, simply by retrieving the solution from memory. The latter situation would correspond to Klein’s (1989, 1998) recognition-primed decision making. However, there is also reason to believe that people are more likely to develop abstract gist-type representations of events with experience (Reyna & Brainerd, 1995). This might explain the findings in some studies that people with less knowledge and experience sometimes outperform experts. A particularly interesting demonstration is given by Gigerenzer et al. (1999), who discuss a study where a simple recognition heuristic based on the collective recognition of the names of companies by 180 German laypeople resulted in a phenomenally high yield of 47% and outperformed the Dax 30 market index by 10%. It outperformed several mutual funds managed by professionals by an even greater margin. Biases in human judgment, which in some but not all cases are memory-related, can also be explained by models of how information is processed during task performance. Along these lines, Evans (1989) argues that factors which cause people to process information in a selective manner or attend to irrelevant information are the major cause of biases in human judgment. Evans’s model of selective processing of information is consistent with other explanations of biases. Among such explanations, information overload has been cited as a reason for impaired decision making by consumers (Jacoby, 1977). The tendency of highly salient stimuli to capture attention during inference tasks has also been noted by several researchers (Nisbett & Ross, 1980; Payne, 1980). Nisbett and Ross suggest that vividness of information is determined by its emotional content, concreteness and imagability, and temporal and spatial proximity. As noted by Evans and many others, these factors have also been shown to affect the memorability of information. The conclusion is that biases due to salience can occur in at least two different ways: (1) people might focus on salient but irrelevant items while performing the task; and (2) people might draw incorrect inferences when the contents of memory are biased due to salience effects during earlier task performance. The dual-process model of human decision making has also been used to explain certain heuristics and biases people use. System 1 is automatic, fast, and non-conscious while System 2 is more controlled, slow, and conscious (System 2) thinking. Many heuristics and cognitive biases seem to be the result of intuitions, impressions, or automatic thoughts generated by System 1 (Kahneman, 2011). Some of the factors that make System 1’s processes more dominant in decision making include cognitive busyness, distraction, time pressure, and positive mood. On the other hand, System 2’s processes tend to be enhanced when the decision involves an important object, has heightened personal relevance, and when the decision maker is held accountable by others (Samson & Voyer, 2012, 2014; Strack & Deutsch, 2015). Debiasing or Aiding Human Judgments The notion that many biases (or deviations from normative models) in statistical estimation and inference can be explained has led researchers to consider the possibility of debiasing (a better term might be improving) human judgments (Keren, 1990). Part of the issue is that the heuristics people use often work very well. The nature of the heuristics also suggests some obvious generic strategies for improving decision making. One conclusion that follows directly from the earlier discussion is that biases related to the availability and representativeness heuristics might be reduced if people were provided better, more representative samples of information. Other strategies that follow directly from the earlier discussion include making ecologically valid cues more salient,
172
providing both outcome and cognitive feedback, and helping people do analysis. These strategies can be implemented in training programs or guide the development of decision aids.14 An encouraging result is that some studies of naturalistic decision making support the conclusion that decision-making skills can be improved through training (Fallesen & Pounds, 2001; Pliske & Klein, 2003; Pliske, McCloskey, & Klein, 2001). The use of computer-based training to develop task-specific decision-making skills is one very interesting development (Sniezek et al., 2002). Decision-making games (Pliske et al., 2001) and cognitive simulation (Satish & Streufert, 2002) are other approaches that have been applied successfully to improve decision-making skills. Other research shows that training in statistics reduces biases in judgment (Fong, Krantz, & Nisbett, 1986). In the latter study, people were significantly more likely to consider sample size after training. These results supplement some of the findings discussed earlier, indicating that judgment biases can be moderated by familiarity with the task and the type of outcome information provided. Some of these results discussed earlier included evidence that providing feedback on the accuracy of weather forecasts may help weather forecasters (Winkler & Murphy, 1973), and research showing that cognitive feedback about cues and their relationship to the effects inferred leads to quicker learning than does feedback about outcomes (Balzer, Doherty, & O’Connor, 1989). Other studies have shown that simply asking people to write down reasons for and against their estimates of probabilities can improve calibration and reduce overconfidence (Koriat, Lichtenstein, & Fischhoff, 1980). This, of course, supports the conclusion that judgments will be less likely to be biased if people think carefully about their answers. Other research showed that subjects were less likely to be overconfident if they expressed subjective probabilities verbally instead of numerically (Wallsten et al., 1993; Zimmer, 1983). Conservatism, or the failure to modify probabilities adequately after obtaining evidence, was also reduced in Zimmer’s study. The results above support the conclusion that it might be possible to improve or aid human judgment. On the other hand, many biases, such as optimistic beliefs regarding health risks, have been difficult to modify (Weinstein & Klein, 1995). In confirmation bias, people show a tendency to seek out information that supports their personal views (Weinstein, 1979) and are quite resistant to information that contradicts strongly held beliefs (McGuire, 1966; Nisbett & Ross, 1980). Evans (1989) concludes that “pre-conceived notions are likely to prejudice the construction and evaluation of arguments.” Other evidence shows that experts may have difficulty providing accurate estimates of subjective probabilities even when they receive feedback. For example, many efforts to reduce both overconfidence in probability estimates and the hindsight bias have been unsuccessful (Fischhoff, 1982). One problem is that people may not pay attention to feedback (Fischhoff & MacGregor, 1982). They also may attend only to feedback that supports their hypothesis, leading to poorer performance and at the same time greater confidence (Einhorn & Hogarth, 1978). Several efforts to reduce confirmation biases, the tendency to search for confirming rather than disconfirming evidence, through training have also been unsuccessful (Evans, 1989).15 The conclusion is that debiasing human judgments is difficult but not impossible. Some perspective can be obtained by considering that most studies showing biases have focused on statistical inference and generally involved people not particularly knowledgeable about statistics, who are not using decision aids such as computers or calculators. It naturally may be expected that people will perform poorly on such tasks, given their lack of training and forced reliance on mental calculations (von Winterfeldt & Edwards, 1986). The finding that people can improve their abilities on such tasks after training in statistics is particularly telling
HUMAN FACTORS FUNDAMENTALS
and also encouraging. Another encouraging finding is that biases are occasionally reduced when people process information verbally instead of numerically. This result might be expected given that most people are more comfortable with words than with numbers. Along with debiasing, the decision-making process can be aided by proving concise and better quality information, particularly in the age of internet. For example, providing too many choices to decision makers such as e-shoppers leads to “overchoice” or choice overload that can negatively impact the quality of decision. Choice overload can occur with either a large number of choice attributes or alternatives. The greater the number or complexity of choices offered, the more likely the decision-maker will apply heuristics. Overchoice has been associated with behavioral outcomes such as unhappiness (Schwartz, 2004), decision fatigue, choosing the default option, as well as choice deferral—avoiding making a decision altogether, such as not buying a product (Iyengar & Lepper, 2000). Choice overload can be counteracted by simplifying choice attributes or the number of available options (Johnson et al., 2012). Similarly, information overload is becoming an increasing concern in the online environment with the growing use of e-commerce and social media platforms. In e-commerce interactions, information overload (often due to poor user interface design) can lead to confusion and negatively impact the quality of decisions (Ocón Palma, Seeger, & Heinzl, 2020). Social media platforms which typically increase exposure to many types of unverified information can lead to information overload, particularly when searching health-related conditions (Khaleel et al., 2020) or in developing crisis situations such as a natural disaster or pandemic outbreak (Bode & Vraga, 2018; Kaufhold, Rupp, Reuter, & Habdank, 2020; Matthes, Karsay, Schmuck, & Stevic, 2020). Similar to Evans’s (1989) argument, a recent empirical study by Geotte et al. found that information overload can also contribute to confirmation bias (Goette, Han, & Leung, 2020). Various approaches to reduce information overload on the web include less cluttered and better designed user interface, clustering information into logical groups, and presenting information in stages, e.g., summary of information on main page and providing detail when demanded through an “expand” icon on website. Along with information overload, misinformation on the internet is another growing cause of concern which can lead to biases, wrong conclusions, and sub-optimal decisions. Misinformation typically involves sharing of false content intentionally put on the web with usually one or more of these motivations: money, political influence, or just to cause trouble. With largely unverified information transmission on the social media, misinformation has been to various purposes such as spreading political opinions or campaigns and infusing negative product reviews for competitors’ product.(Shin, Jian, Driscoll, & Bar, 2018; Wardle, 2019). To prevent the spread of misinformation, various measures have been examined with some success, such as, adding fact checking capabilities on social media platforms, making information source visible to users, and adding a verification score along with the information (Lewandowsky, Ecker, Seifert, Schwarz, & Cook, 2012; Pennycook & Rand, 2019; Tambuscio, Oliveira, Ciampaglia, & Ruffo, 2018; Walter & Tukachinsky, 2019). One of the approaches to help or influence decision makers is Nudging, which means “altering people’s behavior in a predictable way without forbidding any options or significantly changing their economic incentives” (Thaler & Sunstein, 2008, p. 6). Some examples include: (1) setting the default option which is selected when user does not take any action due to status quo bias or decision fatigue; (2) framing choices, i.e., wording them differently to highlight the positive or negative
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
aspects of the same decision that often changes their relative attractiveness; and (3) decoy effect, i.e., adding additional options which change the relative importance of previously available choices as humans typically consider choices relative to each other. 2.2.2 Preference and Choice Much of the research on human preference and choice has focused on comparing observed preferences to the predictions of SEU theory (Goldstein & Hogarth, 1997). Early work examining SEU as a descriptive theory drew generally positive conclusions. However, it soon became apparent that people’s preferences for risky or uncertain alternatives often violated basic axioms of SEU theory. The finding that people’s preferences change when the outcomes are framed in terms of costs, as opposed to benefits, has been particularly influential. Several other common deviations from SEU have been observed. One potentially serious deviation is that preferences can be influenced by sunk costs or prior commitment to a particular alternative. Preferences change over time and may depend on which alternatives are being compared or even the order in which they are compared. The regret associated with making the “wrong” choice seems to play a major role when people compare alternatives. Accordingly, the satisfaction people derive from obtaining particular outcomes after making a decision is influenced by positive and negative expectations prior to making the decision. Other research on human preference and choice has shown that people choose between and apply different decision strategies depending on the cognitive effort required to apply a decision strategy successfully, the needed level of accuracy, and time pressure. Certain strategies are more likely than others to lead to choices consistent with those prescribed by SEU theory. Alternative models, such as prospect theory and randomutility theory, were consequently developed to explain human preferences under risk or uncertainty.16 The following discussion will first summarize some common violations of the axioms underlying SEU theory before moving on to framing effects and preference reversals. Attention will then shift to models of choice and preference. The latter discussion will begin with prospect theory before addressing other models of labile or conditional preferences. Decision-making strategies, and how people choose between them, are covered in later sections. Violation of the Rationality Axioms Several studies have shown that people’s preferences between uncertain alternatives can be inconsistent with the axioms underlying SEU theory. One fundamental violation of the assumptions is that preferences can be intransitive (Budescu & Weiss, 1987; Tversky, 1969). Also, as mentioned earlier, subjective probabilities may depend on the values of consequences (violating the independence axiom), and as discussed in the next section, the framing of a choice can affect preference. Another violation is given by the Myers effect (Myers et al., 1965), where preference reversals between high-variance (H) and low-variance(L) gambles can occur when the gambles are compared to a certain outcome, depending on whether the certain outcome is positive (H preferred to L) or negative (L preferred to H). The latter effect violates the assumption of independence because the ordering of the two gambles depends on the certain outcome. Another commonly cited violation of SEU theory is that people show a tendency toward uncertainty avoidance, which can lead to behavior inconsistent with the “sure-thing” axiom. The Ellsberg and Allais paradoxes (Allais, 1953; Ellsberg, 1961) both involve violations of the sure-thing axiom (see Table 1) and seem to be caused by people’s desire to avoid
173
uncertainty. The Allais paradox is illustrated by the following set of gambles. In the first gamble, a person is asked to choose between gambles A1 and B1 , where: Gamble A1 results in $1 million for sure. Gamble B1 results in $2.5 million with a probability of 0.1, $1 million with a probability of 0.89, and $0 with a probability of 0.01. In the second gamble, the person is asked to choose between gambles A2 and B2 , where: A2 results in $1 million with a probability of 0.11 and $0 with a probability of 0.89. Gamble B2 results in $2.5 million with a probability of 0.1 and $0 with a probability of 0.9. Most people prefer gamble A1 to B1 and gamble B2 to A2 . It is easy to see that this set of preferences violates expected utility theory. First, if A1 > B1 , then u(A1 ) > u(B1 ), meaning that u($1 million) > 0.1u($2.5 million) + 0.89u($1 million) + 0.01u($0). If a utility of 0 is assigned to receiving $0 and a utility of 1 to receiving $2.5 million, then u($1 million) > 1/11. However, from the preference A2 > B2 , it follows that u($1 million) < $1/11. Obviously, no utility function can satisfy this requirement of assigning a value both greater than and less than 1/11 to $1 million. Savage (1954) mentioned that the set of gambles above can be reframed in a way that shows that these preferences violate the sure-thing principle. After doing so, Savage found that his initial tendency toward choosing A1 over B1 and A2 over B2 disappeared. As noted by Stevenson et al. (1993), this example is one of the first cases cited of a preference reversal caused by reframing a decision, the topic discussed below. Framing of Decisions and Preference Reversals A substantial body of research has shown that people’s preferences can shift dramatically depending on the way a decision is represented. The best-known work on this topic was conducted by Tversky and Kahneman (1981), who showed that preferences between medical intervention strategies changed dramatically depending on whether the outcomes were posed as losses or gains. The following question, worded in terms of benefits, was presented to one set of subjects: Imagine that the United States is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimate of the consequences of the programs are as follows: If program A is adopted, 200 people will be saved. If program B is adopted, there is a 1/3 probability that 600 people will be saved and a 2/3 probability that no people will be saved. Which of the two programs would you favor? The results showed that 72% of subjects preferred program A. The second set of subjects was given the same cover story but worded in terms of costs: If program C is adopted, 400 people will die. If program D is adopted, there is a 1/3 probability that nobody will die and a 2/3 probability that 600 people will die. Which of the two programs would you favor? The results now showed that 78% of subjects preferred program D. Since program D is equivalent to B and program A is equivalent to C, the preferences for the two groups of subjects were strongly reversed. Tversky and Kahneman concluded that this
174
reversal illustrated a common pattern in which choices involving gains are risk averse and choices involving losses are risk seeking. The interesting result was that the way the outcomes were worded caused a shift in preference for identical alternatives. Tversky and Kahneman called this tendency the reflection effect. A body of literature has since developed showing that the framing of decisions can have practical effects for both individual decision makers (Heath et al., 1994; Kahneman et al., 1982) and group decisions (Paese, Bieser, & Tubbs, 1993). On the other hand, recent research shows that the reflection effects can be reversed by certain outcome wordings (Kuhberger, 1995); more important, Kuhberger provides evidence that the reflection effect observed in the classic experiments can be eliminated by fully describing the outcomes (i.e., referring to the paragraph above, a more complete description would state: “If program C is adopted, 400 people will die and 200 will live”). Other recent research has explored the theory that perceived risk and perceived attractiveness of risky outcomes are psychologically distinct constructs (Weber et al., 1992). In the latter study, it was concluded that perceived risk and attractiveness are “closely related but distinct phenomena.” Related research has shown weak negative correlations between the perceived risk and value of indulging in alcohol-related behavior for adolescent subjects (Lehto et al., 1994). The latter study also showed that the rated propensity to indulge in alcohol-related behavior was strongly correlated with perceived value (R = 0.8) but weakly correlated with perceived risk (R = –0.15). Both findings are consistent with the theory that perceived risk and attractiveness are distinct constructs, but the latter finding indicates that perceived attractiveness may be the better predictor of behavior. Lehto et al. conclude that intervention methods attempting to lower preferences for alcohol-related behavior should focus on lowering perceived value rather than on increasing perceived risk. Prospect Theory Prospect theory (Kahneman & Tversky, 1979) attempts to account for behavior not consistent with the SEU model by including the framing of decisions as a step in the judgment of preference between risky alternatives. Prospect theory assumes that decision makers tend to be risk averse with regard to gains and risk seeking with regard to losses. This leads to a value function that weights losses disproportionately. As such, the model is still equivalent to SEU, assuming a utility function expressing mixed risk aversion and risk seeking. Prospect theory, however, assumes that the decision maker’s reference point can change. With shifts in the reference point, the same returns can be viewed as either gains or losses.17 The latter feature of prospect theory, of course, is an attempt to account for the framing effect discussed above. Prospect theory also deviates significantly from SEU theory in the way in which probabilities are addressed. To describe human preferences more closely, perceived values are weighted by a function π(p) instead of the true probability, p. Compared to the untransformed form of p, π(p) overweights very low probabilities and underweights moderate and high probabilities. The function π(p) is also generally assumed to be discontinuous and poorly defined for probability values close to 0 or 1. Prospect theory assumes that the choice process involves an editing phase and an evaluation phase. The editing phase involves reformulation of the options to simplify subsequent evaluation and choice. Much of this editing process is concerned with determining an appropriate reference point in a step called coding. Other steps that may occur include the segregation of riskless components of the decision, combining probabilities for events with identical outcomes, simplification by rounding off probabilities and outcome measures, and search for dominance. In the evaluation phase, the perceived values are then weighed by the function π(p). The alternative with the greatest weighed
HUMAN FACTORS FUNDAMENTALS
value is then selected. Several other modeling approaches that differentially weigh utilities in risky decision making have been proposed (Goldstein & Hogarth, 1997). As in prospect theory, such models often assume that the subjective probabilities, or decision weights, are a function of outcome sign (i.e., positive, neutral, or negative), rank (i.e., first, second, etc.), or magnitude. Other models focus on display effects (i.e., single-stage vs. multistage arrangements) and distribution effects (i.e., two outcome lotteries vs. multiple-outcome lotteries). Prospect theory and other approaches also address how the value or utility of particular outcomes can change between decision contexts, as discussed below. For example, Hertwig et al. (2004) and Hertwig (2012) (also see Fox & Hadar, 2006) compared subject decision making where a full description of the probability of risky events is given, to decisions from experience, where decision makers learn the probability from experience. They reported that these two different contexts can lead to dramatically different choice behavior. In the case of decisions from description, people tended to overweight the probability of rare events as predicted by prospect theory. In contrast, for decisions from experience people tended to underweight the probability of rare events. Labile Preferences There is no doubt that human preferences often change after receiving some outcome. After losing money, an investor may become risk averse. In other cases, an investor may escalate his or her commitment to an alternative after an initial loss, even if better alternatives are available. From the most general perspective, any biological organism becomes satiated after satisfying a basic need, such as hunger. Preferences also change over time or between decision contexts. For example, a 30-year-old decision maker considering whether to put money into a retirement fund may currently have a very different utility function than at retirement. The latter case is consistent with SEU theory but obviously complicates analysis. Economists and behavioral researchers have both focused on mathematically modeling choice processes to explain intransitive or inconsistent preference orderings of alternatives (Goldstein & Hogarth, 1997). Game theory provides interesting insight into this issue. From this perspective, preferences of the human decision maker are modeled as the collective decisions obtained by a group of internal agents, or selves, each of which is assumed to have distinct preferences (see Elster, 1986). Intransitive preferences and other violations of rationality on the part of the human decision maker then arise from interactions between competing selves.18 Along these lines, Ainslie (1975) proposed that impulsive preference switches (often resulting in risky or unhealthy choices) arise as the outcome of a struggle between selves representing conflicting short- and long-term interests, respectively. Another area of active research has focused on how experiencing outcomes can cause shifts in preference. One robust finding is that people tend to be more satisfied if an outcome exceeds their expectations and less satisfied if it does not (i.e., Connolly et al., 1997; Feather, 1966). Expectations therefore provide a reference point against which outcomes are compared. A related result found in marketing studies is that negative experiences often have a much larger influence on product preferences and future purchasing decisions than positive experiences (Baumeister et al., 2001; Oldenburger et al., 2007). Other studies have shown that people in a wide variety of settings often consider sunk costs when deciding whether to escalate their commitment to an alternative by investing additional resources (Arkes & Blumer, 1985; Arkes & Hutzel, 2000). From the perspective of prospect theory, sunk costs cause people to frame their choice in terms of losses instead of gains, resulting in risk-taking behavior and consequently, escalating commitment. Other plausible explanations for escalating commitment include
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
Pa,A = Prob (Ua ≥ Ub ),
for all b in A
(8)
where Ua is the uncertain utility of alternative a and Ub is the uncertain utility of alternative b. The most basic random utility models assign a utility to each alternative by sampling a single value from a known distribution. The sampled utility of each alternative then remains constant throughout the choice process. Basic random utility models can predict a variety of preference reversals and intransitive preferences for single- and multipleattribute comparisons of alternatives (i.e., Tversky, 1972). Sequential sampling models extend this approach by assuming that preferences can be based on more than one observation. Preferences for particular alternatives are accumulated over time by integrating or otherwise summing the sampled utilities. The utility of an alternative at a particular time is proportional to the latter sum. A choice is made when the summed preferences for a particular alternative exceed some threshold, which itself may vary over time or depend on situational factors (Busemeyer & Townsend, 1993; Wallsten, 1995). It is interesting to observe that sequential sampling models can explain speed–accuracy tradeoffs in signal detection tasks (Stone, 1960) as well as shifts in preferences due to time pressure (Busemeyer & Townsend, 1993; Wallsten, 1995) if it is assumed that people adjust their threshold downward under time pressure. That is, under time pressure, people sample less information before making a choice. In the following section we explore further how and why decision strategies might change over time and between decision contexts. 2.2.3 Adaptive Decision Behavior The main idea of adaptive decision behavior, or contingent decision behavior, is that an individual decision maker uses different strategies in different situations (Payne et al., 1993). In some cases, people will follow strategies that include short-cuts or heuristics that reduce the complexity of the problem while increasing the chance of making suboptimal choices. Various decision strategies have been identified (Bettman, Luce, & Payne, 1998; Wright, 1975). Some strategies are less cognitively burdensome but their accuracy is also low. Other strategies are more cognitively burdensome, but their accuracy could be higher. Along these lines, cognitive continuum theory (Hammond, 1980) distinguishes judgments on a cognitive continuum varying from highly intuitive decisions to highly analytical decisions. Hammond (1993) summarizes earlier research showing that task characteristics cause decision makers to vary on this continuum. A tendency toward analysis increases, and reliance on intuition decreases, when (1) the number of cues increases; (2) cues are measured objectively instead of subjectively; (3) cues are of low redundancy; (4) decomposition of the task is high; (5) certainty is high; (6) cues are weighted unequally in the environmental model; (7) relations are nonlinear; (8) an organizing principle is
available; (9) cues are displayed sequentially instead of simultaneously; and (10) the time period for evaluation is long. Intuitive methods can be better than analytical methods in some situations (Hammond, Hamm, Grassia, & Pearson, 1987). The theory of contingent decision making (Beach & Mitchell, 1978; Payne, Bettman, & Johnson, 1993) is similar to cognitive continuum theory in that it holds that people use different decision strategies, depending upon the characteristics of the task and the decision context. Payne et al. limit their modeling approach to tasks that require choices to be made (simple memory tasks are excluded from consideration). They also add the assumption that people make choices about how to make choices. Choices between decision strategies are assumed to be made rationally by comparing their cost (in terms of cognitive effort) against their benefits (in terms of accuracy). Cognitive effort and accuracy (of a decision strategy) are both assumed to depend upon task characteristics, such as task complexity, response mode, and method of information display. Cognitive effort and accuracy also are assumed to depend upon contextual characteristics, such as the similarity of the compared alternatives, attribute ranges and correlations, the quality of the considered options, reference points, and decision frames. Payne et al. place much emphasis on measuring the cognitive effort of different decision strategies in terms of the number of elemental information elements that must be processed for different tasks and contexts. The accuracy of a strategy is measured in relative terms; “that is, the quality of the choice expected from a rule [strategy] is measured against the standard of accuracy provided by a normative model like the weighted additive rule [WADD]” (Payne et al., 1993, p. 93). (Note: Brackets have been added for terminology consistency.) Payne et al. relate the accuracy of different decision strategies to task characteristics and contexts and also present research showing that people will shift decision strategies to reduce cognitive effort, increase accuracy, or in response to time pressure. This approach is illustrated in Figure 3, which compares the accuracy and effort of WADD against five other decision strategies (EBA, EQW, MCD, LEX, random choice).19 As shown in Figure 3, WADD is the most accurate, but most costly (in terms of effort), decision strategy. In contrast, EBA is the least accurate, but also least costly strategy. Related work includes a study by Todd and Benbasat (1991, 1993) comparing the effectiveness of computer-based decision aids. They showed that “when a more accurate normative strategy [WADD] is made less effortful to use, it is used” (Todd & Benbasat, 2000, p. 91). (Note: Brackets have been added for terminology consistency.)
1.0 Relative accuracy (%WADD)
a desire to avoid waste or to avoid blame for an initially bad decision to invest in the first place. Interestingly, some recent evidence suggests that people may deescalate commitment in response to sunk costs (Heath, 1995). The latter effect is also contrary to classical economic theory, which holds that decisions should be based solely on marginal costs and benefits. Heath explains such effects in terms of mental accounting. Escalation is held to occur when a mental budget is not set or expenses are difficult to track. Deescalation is held to occur when people exceed their mental budget, even if the marginal benefits exceed the marginal costs. Other approaches include value and utility as random variables within models of choice to explain intransitive or inconsistent preference orderings of alternatives. The random utility model (Iverson & Luce, 1998) describes the probability Pa,A of choosing a given alternative a from a set of options A as
175
WADD EQW
.75 MCD
LEX
.5
EBA
.25 Random choice 0 200
150
100
50
0
Effort (total EIPs) Figure 3 Trade-off between relative accuracy and effort. (Source: Adapted from Payne et al., 1993.)
176
HUMAN FACTORS FUNDAMENTALS
2.3 Naturalistic Decision Models In a dynamic and realistic environment, actions taken by a decision maker are made sequentially in time. Taking actions can change the environment, resulting in a new set of decisions. The decisions might be made under time pressure and stress by groups or by single decision makers. This process might be performed on a routine basis or might involve severe conflict. For example, either a group of soldiers or an individual officer might routinely identify marked vehicles as friends or foes. When a vehicle has unknown or ambiguous marking, the decision changes to a conflict-driven process. Naturalistic decision theory has emerged as a field that focuses on such decisions in real-world environments (Klein, 1998; Klein, Orasanu, Calderwood, & Zsambok, 1993). The notion that most decisions are made in a routine, nonanalytical way is the driving force of this approach. Areas where such behavior seems prominent include juror decision making, troubleshooting of complex systems, medical diagnosis, management decisions, and numerous other examples. For many years, it has been recognized that decision making in natural environments often differs greatly between decision contexts (Beach, 1993; Hammond, 1993). In addressing this topic, the researchers involved often question the relevance and validity of both classical decision theory and behavioral research not conducted in real-world settings (Cohen, 1993). Numerous naturalistic models have been proposed (Klein et al., 1993). These models assume that people rarely weigh alternatives and compare them in terms of expected value or utility. Each model is also descriptive rather than prescriptive. Perhaps the most general conclusion that can be drawn from this work is that people use different decision strategies, depending on their experience, the task, and the decision context. Several of the models also postulate that people choose between decision strategies by trading off effectiveness against the effort required. In the following discussion we address several models of dynamic and naturalistic decision making: (1) levels of task performance (Rasmussen, 1983); (2) recognition-primed decisions (Klein, 1989); (3) dominance structuring (Montgomery, 1989); and (4) explanation-based decision making (Pennington & Hastie, 1988).
lines, Wagenaar (1992) discusses several case studies in which people following risky forms of behavior do not seem to be consciously evaluating the risk. Drivers, in particular, seem habitually to take risks. Wagenaar explains such behavior in terms of faulty rules derived on the basis of benign experience. In other words, drivers get away with providing small safety margins most of the time and consequently learn to run risks on a routine basis. Drucker (1985) points out several cases where organizational decision makers have failed to recognize that the generic principles they used to apply were no longer appropriate, resulting in catastrophic consequences. Errors also occur because of cognitive limitations or faulty mental models or because of inappropriate affective reactions, such as anger or fear (Lehto, 1991). As noted by Isen (1993), there also is growing recognition that positive affect can influence decision making. For example, positive affect can promote the efficiency and thoroughness of decision making but may cause people to avoid negative materials. Positive affect also seems to encourage risk-averse preferences. Decision making itself can be anxiety-provoking, resulting in violations of rationality (Janis & Mann, 1977). A study involving drivers arrested for drinking and driving (McKnight, Langston, McKnight, & Lange, 1995) provides an interesting perspective on how the sequential nature of naturalistic decisions can lead people into traps. The study also shows how errors can occur at multiple levels of performance. In this example, decisions made well in advance of the final decision to drive while impaired played a major role in creating situations where drivers were almost certain to drive impaired. For example, the driver may have chosen to bring along friends and therefore have felt pressured to drive home because the friends were dependent on him or her. This initial failure by drivers to predict the future situation could be described as a failure to shift up from a rule-based level to a knowledge-based level of performance. In other words, the driver never stopped to think about what might happen if he or she drank too much. The final decision to drive, however, would correspond to an error (or violation) at the judgment-based level if the driver’s choice was influenced by an affective reaction (perceived pressure) to the presence of friends wanting a ride.
2.3.1 Levels of Task Performance There is growing recognition that most decisions are made on a routine basis in which people simply follow past behavior patterns (Beach, 1993; Rasmussen, 1983; Svenson, 1990). Rasmussen (1983) follows this approach to distinguish among skill-based, rule-based, and knowledge-based levels of task performance. Lehto (1991) further considers judgment-based behavior as a fourth level of performance. Performance is said to be at either a skill-based or a rule-based level when tasks are routine in nature. Skill-based performance involves the smooth, automatic flow of actions without conscious decision points. As such, skill-based performance describes the decisions made by highly trained operators performing familiar tasks. Rule-based performance involves the conscious perception of environmental cues, which trigger the application of rules learned on the basis of experience. As such, rule-based performance corresponds closely to recognition-primed decisions (Klein, 1989). Knowledge-based performance is said to occur during learning or problem-solving activity during which people cognitively simulate the influence of various actions and develop plans for what to do. The judgment-based level of performance occurs when affective reactions of a decision maker cause a change in goals or priorities between goals (Etzioni, 1988; Janis & Mann, 1977; Lehto, 1991). Distinctive types of errors in decision making occur at each of the four levels (Lehto, 1991; Reason, 1990). Along these
2.3.2 Recognition-Primed Decision Making Klein (1998, 2004) developed the theory of recognition-primed decision making on the basis of observations of firefighters and other professionals in their naturalistic environments. He found that up to 80% of the decisions made by firefighters involved some sort of situation recognition, where the decision makers simply followed a past behavior pattern once they recognized the situation. The model he developed distinguishes between three basic conditions. In the simplest case, the decision maker recognizes the situation and takes the obvious action. A second case occurs when the decision maker consciously simulates the action to check whether it should work before taking it. In the third and most complex case, the action is found to be deficient during the mental simulation and is consequently rejected. An important point of the model is that decision makers do not begin by comparing all the options. Instead, they begin with options that seem feasible based on their experience. This tendency, of course, differs from the SEU approach but is comparable to applying the satisficing decision rule (Simon, 1955) discussed earlier. Situation assessment is well recognized as an important element of decision making in naturalistic environments (Klein et al., 1993). Recent research by Klein and his colleagues has examined the possibility of enhancing situation awareness through training (Klein & Wolf, 1995). Klein and his colleagues have also applied methods of cognitive task analysis to
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
177
naturalistic decision-making problems. In these efforts they have focused on identifying: (1) critical decisions, (2) the elements of situation awareness, (3) critical cues indicating changes in situations, and (4) alternative courses of action (Klein, 1995). Accordingly, practitioners of naturalistic decision making tend to focus on process-tracing methods and behavioral protocols (Ericsson & Simon, 1984) to document the processes people follow when they make decisions.20
Raiffa, 1976; Raiffa, 1968). We first discuss briefly some of the ways that group decisions differ from those made by isolated decision makers who need to consider only their own preferences. That is, ethics and social norms play a much more prominent role when decisions are made by or within groups. Attention will then shift to group processes and how they affect group decisions. In the last section we address methods of supporting or improving group decision making.
2.3.3 Dominance Structuring
3.1 Ethics and Social Norms
Dominance structuring (Montgomery, 1989; Montgomery & Willen, 1999) holds that decision making in real contexts involves a sequence of four steps. The process begins with a pre-editing stage in which alternatives are screened from further analysis. The next step involves selecting a promising alternative from the set of alternatives that survive the initial screening. A test is then made to check whether the promising alternative dominates the other surviving alternations. If dominance is not found, the information regarding the alternatives is restructured in an attempt to force dominance. This process involves both the bolstering and deemphasizing of information in a way that eliminates disadvantages of the promising alternative. Empirical support can be found for each of the four stages of the bolstering process (Montgomery & Willen, 1999). Consequently, this theory may have value as a description of how people make nonroutine decisions.
When decisions are made by or within groups, a number of issues arise that have not been touched on in the earlier portions of this chapter. To start, there is the complication that preferences may vary between members of a group. It often is impossible to maximize the preferences of all members of the group, meaning that tradeoffs must be made and issues such as fairness must be addressed to obtain acceptable group decisions. Another complication is that the return to individual decision makers can depend on the actions of others. Game theory21 distinguishes two common variations of this situation. In competitive games, individuals are likely to take “self-centered” actions that maximize their own return but reduce returns to other members of the group. Behavior of group members in this situation may be well described by the minimax decision rule discussed in Section 2.1.5. In cooperative games, the members of the group take actions that maximize returns to the group as a whole. Members of groups may choose cooperative solutions that are better for the group as a whole for many different reasons (Dawes, van de Kragt, & Orbell, 1988). Groups may apply numerous forms of coercion to punish members who deviate from the cooperative solutions. Group members may apply decision strategies such as reciprocal altruism. They also might conform because of their social conscience, a need for self-esteem, or feelings of group identity. Fairness considerations can in some case explain preferences and choices that seem to be in conflict with economic self-interest (Bazerman, 1998). Changes in the status quo, such as increasing the price of bottled water immediately after a hurricane, may be viewed as unfair even if they are economically justifiable based on supply and demand. People are often willing to incur substantial costs to punish “unfair” opponents and reward their friends or allies. The notion that costs and benefits should be shared equally is one fairness-related heuristic that people use (Messick, 1991). Consistent results were found by Guth et al. (1982) in a simple bargaining game where player 1 proposes a split of a fixed amount of cash and player 2 either accepts the offer or rejects it. If player 2 rejects the offer, both players receive nothing. Classical economics predicts that player 2 will accept any positive amount (i.e., player 2 should always prefer something to nothing). Consequently, player 1 should offer player 2 a very small amount greater than zero. The results showed that contrary to predictions of classical economics, subjects tended to offer a substantial proportion of the cash (the average offer was 30%). Some of the subjects rejected positive offers. Others accepted offers of zero. Further research, summarized by Bolton and Chatterjee (1996), confirms these findings that people seem to care about whether they receive their fair share. Ethics clearly plays an important role in decision making. Some choices are viewed by nearly everyone as being immoral or wrong (i.e., violations of the law, dishonesty, and numerous other behaviors that conflict with basic societal values or behavioral norms). Many corporations and other institutions formally specify codes of ethics prescribing values such as honesty, fairness, compliance with the law, reliability, consideration or sensitivity to cultural differences, courtesy, loyalty, respect for the environment, and avoiding waste. It is easy to visualize
2.3.4 Explanation-Based Decision Making Explanation-based decision making (Oskarsson, Van Boven, McClelland, & Hastie, 2009; Pennington & Hastie, 1986, 1988) assumes that people begin their decision-making process by constructing a mental model that explains the facts they have received. While constructing this explanatory model, people are also assumed to be generating potential alternatives to choose between. The alternatives are then compared to the explanatory model rather than to the facts from which it was constructed. Pennington and Hastie have applied this model to juror decision making and obtained experimental evidence that many of its assumptions seem to hold. They note that juror decision making requires consideration of a massive amount of data that is often presented in haphazard order over a long time period. Jurors seem to organize this information in terms of stories describing causation and intent. As part of this process, jurors are assumed to evaluate stories in terms of their uniqueness, plausibility, completeness, or consistency. To determine a verdict, jurors then judge the fit between choices provided by the trial judge and the various stories they use to organize the information. Jurors’ certainty about their verdict is assumed to be influenced both by evaluation of stories and by the perceived goodness of fit between the stories and the verdict.
3
GROUP DECISION MAKING
Much research has been done over the past 25 years or so on decision making by groups and teams. Most of this work has focused on groups as opposed to teams. In a team it is assumed that the members are working toward a common goal and have some degree of interdependence, defined roles and responsibilities, and task-specific knowledge (Orasanu & Salas, 1993). Team performance is a major area of interest in the field of naturalistic decision theory (Klein, 1998; Klein et al., 1993), as discussed earlier. Group performance has traditionally been an area of study in the fields of organizational behavior and industrial psychology. Traditional decision theory has also devoted some attention to group decision making (Keeney &
178
HUMAN FACTORS FUNDAMENTALS
scenarios where it is in the best interest of a decision maker to choose economically undesirable options (at least in the short term) to comply with ethical codes. According to Kidder (1995), the “really tough choices … don’t center on right versus wrong. They involve right versus right.” Kidder refers to four dilemmas of right versus right that he feels qualify as paradigms: (1) truth versus loyalty (i.e., whether to divulge information provided in confidence); (2) individual versus community; (3) short term versus long term; and (4) justice versus mercy. At least three principles, which in some cases provide conflicting solutions, have been proposed for resolving ethical dilemmas. These include (1) utilitarianism, selecting the option with the best overall consequences; (2) rule-based, following a rule regardless of its current consequences (i.e., waiting for a stop light to turn green even if no cars are coming); and (3) fairness, doing what you would want others to do for you. Numerous social dilemmas also occur in which the payoffs to each participant result in individual decision strategies harmful to the group as a whole. The tragedy of the commons (Hardin, 1968) is illustrative of social dilemmas in general. For example, as discussed in detail by Baron (1998), consider the crash of the East Coast commercial fishing industry, brought about by overfishing. Baron suggests that individual fishers may reason that if they do not catch the fish, someone else will. Each fisher then attempts to catch as many fish as possible, even if this will cause the fish stocks to crash. Despite the fact that cooperative solutions, such as regulating the catch, are obviously better than the current situation, individual fishers continue to resist such solutions. Regulations are claimed to infringe on personal autonomy, to be unfair, or to be based on inadequate knowledge. Similar examples include littering, wasteful use of natural resources, pollution, social free riding, and corporations expecting governments or affected consumers to pay for the cost of accidents (e.g., oil spill and subprime mortgage crisis). These behaviors can all be explained in terms of the choices faced by the offending individual decision maker (Schelling, 1978). Simply put, the individual decision maker enjoys the benefits of the offensive behavior, as small as they may be, but the costs are incurred by the entire group.
A lack of conflict-settling procedures and separation or lack of contact between groups can also contribute to conflict. Conflict becomes especially likely during a crisis and often escalates when the issues are perceived to be important or after resistance or retaliation occurs. Polarization, loyalty to one’s own group, lack of trust, and cultural and socioeconomic factors are often contributing factors to conflict and conflict escalation. Ellis and Fisher (1994) distinguish between affective and substantive forms of conflict. Affective conflict corresponds to emotional clashes between individuals or groups; substantive conflict involves opposition at the intellectual level. Substantive conflict is especially likely to have positive effects on group decisions by promoting better understanding of the issues involved. Affective conflict can also improve group decisions by increasing interest, involvement, and motivation among group members and, in some cases, cohesiveness. On the other hand, affective conflict may cause significant ill-will, reduced cohesiveness, and withdrawal by some members from the group process. Baron (1998) provides an interesting discussion of violent conflict and how it is related to polarized beliefs, group loyalty, and other biases. Defection and the formation of coalitions are a commonly observed effect of conflict, or power struggles, within groups. Coalitions often form when certain members of the group can gain by following a common course of action at the expense of the long-run objectives of the group as a whole. Rapidly changing coalitions between politicians and political parties are obviously a fact of life. Coalitions, and their formation, have been examined from decision-analytic and game theory perspectives (Bolton & Chatterjee, 1996; Raiffa, 1982). These approaches make predictions regarding what coalitions will form, depending on whether the parties are cooperating or competing, which have been tested in a variety of experiments (Bolton & Chatterjee, 1996). These experiments have revealed that the formation of coalitions is influenced by expected payoffs, equity issues, and the ease of communication. However, Bazerman (1998) notes that the availability heuristic, overconfidence, and sunk cost effects are likely to explain how coalitions actually form in the real world.
3.2 Group Processes
3.2.2 Conflict Resolution
A large amount of research has focused on groups and their behavior. Accordingly, many models have been developed that describe how groups make decisions. A common observation is that groups tend to move through several phases as they go through the decision-making process (Ellis & Fisher, 1994). One of the more classic models (Tuckman, 1965) describes this process with four words: forming, storming, norming, and performing. Forming corresponds to initial orientation, storming to conflict, norming to developing group cohesion and expressing opinions, and performing to obtaining solutions. As implied by Tuckman’s choice of terms, there is a continual interplay between socioemotive factors and rational, task-oriented behavior throughout the group decision-making process. Conflict, despite its negative connotations, is a normal, expected aspect of the group decision process and can in fact serve a positive role (Ellis & Fisher, 1994). In the following discussion we first address causes and effects of group conflict, then shift to conflict resolution.
Groups resolve conflict in many different ways. Discussion and argument, voting, negotiation, arbitration, and other forms of third-party intervention are all methods of resolving disputes. Discussion and argument are clearly the most common methods followed within groups to resolve conflict. Other methods of conflict resolution normally play a complementary rather than a primary role in the decision process. That is, the latter methods are relied on when groups fail to reach consensus after discussion and argument or they simply serve as the final step in the process. Group discussion and argument are often viewed as constituting a less than rational process. Along these lines, Brashers et al. (1994) state that the literature suggests that
3.2.1 Conflict Whenever people or groups have different preferences, conflict can occur. As pointed out by Zander (1994), conflict between groups becomes more likely when groups have fuzzy or potentially antagonistic roles or when one group is disadvantaged (or perceives that it is not being treated fairly).
argument in groups is a social activity, constructed and maintained in interaction, and guided perhaps by different rules and norms than those that govern the practice of ideal or rational argument. Subgroups speaking with a single voice appear to be a significant force … Displays of support, repetitive agreement, and persistence all appear to function as influence mechanisms in consort with, or perhaps in place of, the quality or rationality of the arguments offered. Brashers et al. also suggest that members of groups appear uncritical because their arguments tend to be consistent with
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
social norms rather than the rules of logic: “[S]ocial rules such as: (a) submission to higher status individuals, (b) experts’ opinions are accepted as facts on all matters, (c) the majority should be allowed to rule, (d) conflict and confrontation are to be avoided whenever possible.” A number of approaches to conflict management have been suggested that attempt to address many of the issues raised by Brashers et al. These approaches include seeking consensus rather than allowing decisions to be posed as win–lose propositions, encouraging and training group members to be supportive listeners, deemphasizing status, depersonalizing decision making, and using facilitators (Likert & Likert, 1976). Other approaches that have been proposed include directing discussion toward clarifying the issues, promoting an open and positive climate for discussion, facilitating face-saving communications, and promoting the development of common goals (Ellis & Fisher, 1994). Conflicts can also be resolved through voting and negotiation, as discussed further in Section 3.3. Negotiation becomes especially appropriate when the people involved have competing goals and some form of compromise is required. A typical example would be a dispute over pay between a labor union and management. Strategic concerns play a major role in negotiation and bargaining (Schelling, 1960). Self-interest on the part of the involved parties is the driving force throughout a process involving threats and promises, proposals and counterproposals, and attempts to discern how the opposing party will respond. Threats and promises are a means of signaling what the response will be to actions taken by an opponent and consequently become rational elements of a decision strategy (Raiffa 1982). Establishing the credibility of signals sent to an opponent becomes important. Methods of attaining credibility include establishing a reputation, the use of contracts, cutting off communication, burning bridges, leaving an outcome beyond control, moving in small steps, and using negotiating agents (Dixit & Nalebuff, 1991). Given the fundamentally adversarial nature of negotiation, conflict may move from a substantive basis to an affective, highly emotional state. At this stage, arbitration and other forms of third-party intervention may become appropriate, due to a corresponding tendency for the negotiating parties to take extreme, inflexible positions. 3.3 Group Performance and Biases The quality of the decisions made by groups in a variety of different settings has been seriously questioned. Part of the issue here is the phenomenon of groupthink, which has been blamed for several disastrous public policy decisions (Hart, Stern, & Sundelius, 1997; Janis, 1972). Eight symptoms of groupthink cited by Janis and Mann (1977) are the illusion of invulnerability, rationalization (discounting of warnings and negative feedback), belief in the inherent morality of the group, stereotyping of outsiders, pressure on dissenters within the group, self-censorship, illusion of unanimity, and the presence of mindguards who shield the group from negative information. Janis and Mann proposed that the results of groupthink include failure to consider all the objectives and alternatives, failure to reexamine choices and rejected alternatives, incomplete or poor search for information, failure to adequately consider negative information, and failure to develop contingency plans. Groupthink is one of the most cited characteristics of how group decision processes can go wrong. Given the prominence of groupthink as an explanation of group behavior, it is somewhat surprising that only a few studies have evaluated this theory empirically. Empirical evaluation of the groupthink effect and the development of alternative modeling approaches continue to be active areas of research (Hart et al., 1997).
179
Other research has attempted to measure the quality of group decisions in the real world against rational, or normative, standards. Viscusi (1991) cites several examples of apparent regulatory complacency and regulatory excess in government safety standards in the United States. He also discusses a variety of inconsistencies in the amounts awarded in product liability cases. Baron (1998) provides a long list of what he views as errors in public decision making and their very serious effects on society. These examples include collective decisions resulting in the destruction of natural resources and overpopulation, strong opposition to useful products such as vaccines, violent conflict between groups, and overzealous regulations, such as the Delaney clause. He attributes these problems to commonly held, and at first glance innocuous, intuitions such as do no harm, nature knows best, and be loyal to your own group, the need for retribution (an eye for an eye), and a desire for fairness. A significant amount of laboratory research is also available that compares the performance of groups to that of individual decision makers (Davis, 1992; Kerr, MacCoun, & Kramer, 1996). Much of the early work showed that groups were better than individuals on some tasks. Later research indicated that group performance is less than the sum of its parts. Groups tend to be better than individuals on tasks where the solution is obvious once it is advocated by a single member of the group (Davis, 1992; Kerr et al., 1996). Another commonly cited finding is that groups tend to be more willing than individuals to select risky alternatives, but in some cases the opposite is true. One explanation is that group interactions cause people within the group to adopt more polarized opinions (Moscovici, 1976). Large groups seem especially likely to reach polarized, or extreme, conclusions (Isenberg, 1986). Groups also tend to overemphasize the common knowledge of members, at the expense of underemphasizing the unique knowledge certain members have (Gruenfeld, Mannix, Williams, & Neale, 1996; Stasser & Titus, 1985). A more recent finding indicates that groups were more rational than individuals when playing the ultimatum game (Bornstein & Yaniv, 1998). Duffy (1993) notes that teams can be viewed as information processes and cites team biases and errors that can be related to information-processing limitations and the use of heuristics, such as framing. Topics such as mediation and negotiation, jury decision making, and public policy are now being evaluated from the latter perspective (Heath et al., 1994). Much of this research has focused on whether groups use the same types of heuristics and are subject to the same biases of individuals. This research has shown (1) framing effects and preference reversals (Paese, Bieser, & Tubbs, 1993); (2) overconfidence (Sniezek, 1992); (3) use of heuristics in negotiation (Bazerman & Neale, 1983); and (4) increased performance with cognitive feedback (Harmon & Rohrbaugh, 1990). One study indicated that biasing effects of the representativeness heuristic were greater for groups than for individuals (Argote, Seabright, & Dyer, 1986). The conclusion is that group decisions may be better than those of individuals in some situations but are subject to many of the same problems. 3.4 Prescriptive Approaches A wide variety of prescriptive approaches have been proposed for improving group decision making. The approaches address some of the foregoing issues, including the use of agendas and rules of order, idea-generating techniques such as brainstorming, nominal group and Delphi techniques, decision structuring, and methods of computer-mediated decision making. As noted by Ellis and Fisher (1994), there is conflicting evidence regarding the effectiveness of such approaches. On the negative side, prescriptive approaches might stifle creativity in some situations and can be sabotaged by dissenting members of groups.
180
On the positive side, prescriptive approaches make the decision process more orderly and efficient, promote rational analysis and participation by all members of the group, and help ensure implementation of group decisions. In the following discussion we briefly review some of these tools for improving group decision making. 3.4.1 Agendas and Rules of Order Agendas and rules of order are often essential to the orderly functioning of groups. As noted by Welch (1994), an agenda “conveys information about the structure of a meeting: time, place, persons involved, topics to be addressed, perhaps suggestions about background material or preparatory work.” Agendas are especially important when the members of a group are loosely coupled or do not have common expectations. Without an agenda, group meetings are likely to dissolve into chaos (Welch, 1994). Rules of order, such as Robert’s Rules of Order (Robert, 1990), play a similarly important role, by regulating the conduct of groups to ensure fair participation by all group members, including absentees. Rules of order also specify voting rules and means of determining consensus. Decision rules may require unanimity, plurality, or majority vote for an alternative. Attaining consensus poses an advantage over voting, because voting encourages the development of coalitions, by posing the decision as a win–lose proposition (Ellis & Fisher, 1994). Members of the group who voted against an alternative are often unlikely to support it. Voting procedures can also play an important role (Davis, 1992).
HUMAN FACTORS FUNDAMENTALS
groups (Delbecq, Van de Ven, & Gustafson, 1975). The nominal group technique consists of asking each member of a group to write down and think about his or her ideas independently. A group moderator then asks each member to present one or more of his or her ideas. Once all of the ideas have been posted, the moderator allows discussion to begin. After the discussion is finished, each participant rates or ranks the ideas presented. The subject ratings are then used to develop a score for each idea. Nominal group technique is intended to increase participation by group members and is based on the idea that people will be more comfortable presenting their ideas if they have a chance to think about them first (Delbecq et al., 1975). The Delphi technique allows participants to comment anonymously, at their leisure, on proposals made by other group members. Normally, the participants do not know who proposed the ideas they are commenting on. The first step is to send an open-ended questionnaire to members of the group. The results are then used to generate a series of follow-up questionnaires in which more specific questions are asked. The anonymous nature of the Delphi process theoretically reduces the effect of participant status and power. Separating the participants also increases the chance that members will provide opinions “uncontaminated” by the opinions of others.
A variety of approaches have been developed for improving the creativity of groups in the early stages of decision making. Brainstorming is a popular technique for quickly generating ideas (Osborn, 1937). In this approach, a small group (of no more than 10 people) is given a problem to solve. The members are asked to generate as many ideas as possible. Members are told that no idea is too wild and are encouraged to build on the ideas submitted by others. No evaluation or criticism of the ideas is allowed until after the brainstorming session is finished. Buzz group analysis is a similar approach, more appropriate for large groups (Ellis & Fisher, 1994). Here, a large group is first divided into small groups of four to six members. Each small group goes through a brainstorming-like process to generate ideas. They then present their best ideas to the entire group for discussion. Other commonly applied idea-generating techniques include focus group analysis and group exercises intended to inspire creative thinking through role playing (Clemen, 1996; Ellis & Fisher, 1994). The use of brainstorming and the other idea-generating methods mentioned above will normally provide a substantial amount of, in some cases, creative suggestions, especially when participants build on each other’s ideas. However, personality factors and group dynamics can also lead to undesirable results. Simply put, some people are much more willing than others to participate in such exercises. Group discussions consequently tend to center around the ideas put forth by certain more forceful individuals. Group norms, such as deferring to participants with higher status and power, may also lead to undue emphasis on the opinions of certain members.
3.4.4 Structuring Group Decisions As discussed earlier in this chapter, the field of decision analysis has devised several methods for organizing or structuring the decision-making process. The rational reflection model (Siebold, 1992) is a less formal, six-step procedure that serves a similar function. Group members are asked first to define and limit the problem by identifying goals, available resources, and procedural constraints. After defining and limiting the problem, the group is asked to analyze the problem, collect relevant information, and establish the criteria that a solution must meet. Potential solutions are then discussed in terms of the agreed-upon decision criteria. After further discussion, the group selects a solution and determines how it should be implemented. The focus of this approach is on forcing the group to confine its discussion to the issues that arise at each step in the decision-making process. As such, this method is similar to specifying an agenda. Raiffa (1982) provides a somewhat more formal decisionanalytic approach for structuring negotiations. The approach begins by assessing (1) the alternatives to a negotiated settlement; (2) the interests of the involved parties; and (3) the relative importance of each issue. This assessment allows the negotiators to think analytically about mutually acceptable solutions. In certain cases, a bargaining zone is available. For example, an employer may be willing to pay more than the minimum salary acceptable to a potential employee. In this case, the bargaining zone is the difference between the maximum salary the employer is willing to pay and the minimum salary a potential employee is willing to accept. The negotiator may also think about means of expanding the available resources to be divided, potential trading issues, or new options that satisfy the interests of the concerned parties. Other methods for structuring group preferences are discussed in Keeney and Raiffa (1976). The development of group utility functions is one such approach. A variety of computer-mediated methods for structuring group decisions are also available.
3.4.3 The Nominal Group Technique and the Delphi Technique The nominal group technique (NGT) and the Delphi technique attempt to alleviate some of the disadvantages of working in
4 DECISION SUPPORT AND PROBLEM SOLVING The preceding sections of this chapter have much to say about how to help decision makers make better decisions. To summarize that discussion briefly: (1) classical decision theory
3.4.2 Idea Generation Techniques
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
provides optimal prescriptions for how decisions should be made; (2) decision analysis provides a set of tools for structuring decisions and evaluating alternatives; and (3) studies of human judgment and decision making, in both laboratory settings and naturalistic environments, help identify the strengths and weaknesses of human decision makers. These topics directly mirror important elements of decision support. That is, decision support should have an objective (i.e., optimal or satisfactory choices, easier choices, more justifiable choices, etc.). Also, it must have a means (i.e., decision analysis or other method of decision support) and it must have a current state (i.e., decision quality, effort expended, knowledge, etc., of the supported decision makers). The effectiveness of decision support can then be defined in terms of how well the means move the current state toward the objective. The focus of this section is on providing an overview of commonly used methods of computer-based decision support22 for individuals, groups, and organizations. Throughout this discussion, an effort is made to address the objectives of each method of support and its effectiveness. Somewhat surprisingly, less information is available on the effectiveness of these approaches than might be expected given their prevalence (see also Yates, Veinott, & Patalano, 2003), so the latter topic is not addressed in a lot of detail. The discussion begins with a brief introduction to the field of decision analysis. Attention then shifts to the topics of decision support systems (DSSs), expert systems, and neural networks. These systems can be designed to support the intelligence, design, or choice phases of decision making (Simon, 1977). The intelligence phase involves scanning and searching the environment to identify problems or opportunities. The design phase entails formulating models for generating possible courses of action. The choice phase refers to finding an appropriate course of action for the problem or opportunity. Hence, the boundary between the design and choice phases is often unclear. Decision support systems and expert systems can be used to support all three phases of decision making, whereas neural networks tend to be better suited for design and choice phases. For example, DSSs can be designed to help with interpreting economic conditions, while expert systems can diagnose problems. Neural networks can learn a problem domain, after which they can serve as a powerful aid for decision making. Attention then shifts to methods of supporting decisions by groups and organizations. The latter discussion first addresses the use by groups of DSSs and other tools similar to those used by individuals. In the sections that follow, we address approaches specifically designed for use by groups before briefly discussing the implications of problem-solving research for decision-making research. 4.1 Decision Analysis The application of classical decision theory to improve human decision making is the goal of decision analysis (Howard, 1968, 1988; Keeney & Raiffa, 1976; Raiffa, 1968). Decision analysis requires inputs from decision makers, such as goals, preference and importance measures, and subjective probabilities. Elicitation techniques have consequently been developed that help decision makers provide these inputs. Particular focus has been placed on methods of quantifying preferences, trade-offs between conflicting objectives, and uncertainty (Keeney & Raiffa, 1976; Raiffa, 1968). As a first step in decision analysis, it is necessary to do some preliminary structuring of the decision, which then guides the elicitation process. The following discussion first presents methods of structuring decisions and then covers techniques for assessing subjective probabilities, utility functions, and preferences.
181
4.1.1 Structuring Decisions The field of decision analysis has developed many useful frameworks for representing what is known about a decision (Clemen, 1996; Howard, 1968; von Winterfeldt & Edwards, 1986). In fact, these authors and others have stated that the process of structuring decisions is often the greatest contribution of going through the process of decision analysis. Among the many tools used, decision matrices and trees provide a convenient framework for comparing decisions on the basis of expected value or utility. Value trees provide a helpful method of structuring the sometimes complex relationships among objectives, attributes, goals, and values and are used extensively in multiattribute decision-making problems. Event trees, fault trees, inference trees, and influence diagrams are useful for describing probabilistic relationships between events and decisions. Each of these approaches is discussed briefly below. Decision Matrices and Trees Decision matrices are often used to represent single-stage decisions (Figure 4). The simplicity of decision matrices is their primary advantage. They also provide a very convenient format for applying the decision rules discussed in Section 2.1. Decision trees are also commonly used to represent single-stage decisions (Figure 5) and are particularly useful for describing multistage decisions (Raiffa, 1968). Note that in a multistage decision tree, the probabilities of later events are conditioned on the result of earlier events. This leads to the important insight that the results of earlier events provide information regarding future events.23 Following this approach, decisions may be stated in conditional form. An optimal decision, for example, might be to do a market survey first, then market the product only if the survey is positive. Analysis of a single- or multistage decision tree involves two basic steps, averaging out and folding back (Raiffa, 1968). These steps occur at chance and decision nodes, respectively.24 Averaging out occurs when the expected value (or utility) at each chance node is calculated. In Figure 5 this corresponds to calculating the expected value of A1 and A2 , respectively. Folding back refers to choosing the action with the greatest value expected at each decision node. Decision trees thus provide a straightforward way of comparing alternatives in terms of expected value or SEU. However, their development requires significant simplification of most decisions and the provision of numbers, such as measures of E1
E2
A1
C11
C12
A2
C21
C22
P
1−P
Figure 4 Decision matrix representation of a single-stage decision.
A1
E1
P1 1−P1
E2 E1 A2 E2
P2
C11 C12 C21
1−P2 C22
Figure 5 Decision tree representation of a single-stage decision.
182
HUMAN FACTORS FUNDAMENTALS
3. Decomposability: whether the whole is described by its parts. 4. Nonredundancy: the fact that correlated attributes give misleading results. 5. Minimum size: the fact that considering irrelevant attributes is expensive and may be misleading.
Objectives
Attribute 2 > Goal 2
Attribute 1 > Goal 1
Attribute 3 > Goal 3
Once a value tree has been generated, various methods can be used to assess preferences directly between the alternatives. Value for alternative 1
Value for alternative 2
Figure 6
Value for alternative 3
Event Trees or Networks Event trees or networks show how a sequence of events can lead from primary events to one or more outcomes. Human reliability analysis (HRA) event trees are a classic example of this approach (Figure 7). If probabilities are attached to the primary events, it becomes possible to calculate the probability of outcomes, as illustrated in Section 4.1.2. This approach has been used in the field of risk assessment to estimate the reliability of human operators and other elements of complex systems (Gertman & Blackman, 1994). Fault trees work backward from a single undesired event to its causes (Figure 8). Fault trees are commonly used in risk assessment to help infer the chance of an accident occurring (Gertman & Blackman, 1994; Hammer, 1993). Inference trees relate a set of hypotheses at the top level of the tree to evidence depicted at the lower levels. The latter approach has been used by expert systems such as Prospector (Duda, Hart, Konolige, & Reboh, 1979). Prospector applies a Bayesian approach to infer the presence of a mineral deposit from uncertain evidence.
Generic value tree.
preference and subjective probabilities, that decision makers may have difficulty determining. In certain contexts, decision makers struggling with this issue may find it helpful to develop value trees, event trees, or influence diagrams, as expanded on below. Value Trees Value trees hierarchically organize objectives, attributes, goals, and values (Figure 6). From this perspective, an objective corresponds to satisficing or maximizing a goal or set of goals. When there is more than one goal, the decision maker will have multiple objectives, which may differ in importance. Objectives and goals are both measured on a set of attributes. Attributes may provide (1) objective measures of a goal, such as when fatalities and injuries are used as a measure of highway safety; (2) subjective measures of a goal, such as when people are asked to rate the quality of life in the suburbs versus the city; or (3) proxy or indirect measures of a goal, such as when the quality of ambulance service is measured in terms of response time. In generating objectives and attributes, it becomes important to consider their relevance, completeness, and independence. Desirable properties of attributes (Keeney & Raiffa, 1976) include:
Operator doesn’t detect alarm
Operator detects alarm
Operator notifies supervisor
1. Completeness: the extent to which the attributes measure whether an objective is met. 2. Operationality: the degree to which the attributes are meaningful and feasible to measure.
Operator doesn’t notify supervisor
Figure 7 HRA event tree. (Source: Adapted from Gertman & Blackman, 1994.)
Operators fail to isolate RCS from DHR OR
Operators fail to restore signal power
Operators fail to restore power to control circuits
Operators fail to take appropriate control actions AND
Operator fails to close valve 1 Figure 8
Operator fails to close valve 2
Fault tree for operators. (Source: Adapted from Gertman & Blackman, 1994.)
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
Influence Diagrams and Cognitive Mapping Influence diagrams are often used in the early stages of a decision to show how events and actions are related. Their use in the early stages of a decision is referred to as knowledge (or cognitive) mapping (Howard, 1988). Links in an inference diagram depict causal and temporal relations between events and decision stages.25 A link leading from event A to event B implies that the probability of obtaining event B depends on whether event A has occurred. A link leading from a decision to an event implies that the probability of the event depends on the choice made at that decision stage. A link leading from an event to a decision implies that the decision maker knows the outcome of the event at the time the decision is made. One advantage of influence diagrams in comparison to decision trees is that influence diagrams show the relationships between events more explicitly. Consequently, influence diagrams are often used to represent complicated decisions where events interactively influence the outcomes. For example, the influence diagram in Figure 9 shows that the true state of the machine affects both the probability of the warning signal and the consequence of the operator’s decision. This linkage would be hidden within a decision tree.26 Influence diagrams have been used to structure medical decision-making problems (Holtzman, 1989) and are emphasized in modern texts on decision analysis (Clemen, 1996). Howard (1988) states that influence diagrams are the greatest advance he has seen in the communication, elicitation, and detailed representation of human knowledge. Part of the issue is that influence diagrams allow people who do not have deep knowledge of probability to describe complex conditional relationships with simple linkages between events. Once these linkages are defined, the decision becomes well defined and can be formally analyzed.
Standard methods for assessing utility functions (Raiffa, 1968) include (1) the variable probability method and (2) the certainty equivalent method. In the variable probability method, the decision maker is asked to give the value for the probability of winning at which they are indifferent between a gamble and a certain outcome (Figure 10). A utility function is then mapped out when the value of the certainty equivalent (CE) is changed over the range of outcomes. Returning to Figure 10, the value of P at which the decision maker is indifferent between the gamble and the certain loss of $50 gives the value for u(−$50). In the utility function in Figure 11, the decision maker gave a value of about 0.5 in response to this question. The certainty equivalent method uses lotteries in a similar way. The major change is that the probability of winning or losing the lottery is held constant while the amount won or lost is changed. In most cases the lottery provides an equal chance
Shut down machine?
$100
P
CE = −$50
1−P
−$100
Figure 10 Standard gamble used in the variable probability method of eliciting utility functions.
1.0
u(x)
0 –$100 Figure 11
$100
Typical utility function.
of winning and losing. The method begins by asking the decision maker to give a certainty equivalent for the original lottery (CE1 ). The value chosen has a utility of 0.5. This follows since the utility of the best outcome is assigned a value of 1 and the worst is given a utility of 0. The utility of the original gamble is therefore u(CE1 ) = pu(best) + (1 − p)u(worst) = p(1) + (1 − p)(0) = p = 0.5
4.1.2 Utility Function Assessment
Warning signal?
183
Machine in tolerance?
Payoff
Figure 9 Influence diagram representation of a single-stage decision.
(9)
The decision maker is then asked to give certainty equivalents for two new lotteries. Each uses the CE from the previous lottery as one of the potential prizes. The other prizes used in the two lotteries are the best and worst outcomes from the original lottery, respectively. The utility of the certainty equivalent (CE2 ) for the lottery using the best outcome and CE1 is given by u(CE2 ) = pu(best) + (1 − p)u(CE1 ) = p(1) + (1 − p)(0.5) = 0.75
(10)
The utility of the certainty equivalent (CE3 ) given for the lottery using the worst outcome and CE1 is given by u(CE3 ) = pu(CE1 ) + (1 − p)u(worst) = p(0.5) + (1 − p)(0) = 0.25
(11)
This process is continued until the utility function is specified in sufficient detail. A problem with the certainty equivalent method is that errors are compounded as the analysis proceeds. This follows since the utility assigned in the first preference assessment (i.e., u(CE1 )) is used throughout the subsequent preference assessments. A second issue is that the CE method uses different ranges in the indifference lotteries, meaning that the CEs are compared against different reference values. This might create inconsistencies since, as discussed in Section 2.2, attitudes toward risk usually change depending on whether outcomes are viewed as gains or losses. The use of different reference points may, of course, cause the same outcome to be viewed as either a loss or a gain. Utilities may also vary over time. In Section 2.2.2, we discussed some of these issues further.
184
HUMAN FACTORS FUNDAMENTALS
4.1.3 Preference Assessment Methods for measuring strength of preference include indifference methods, direct assessment, and indirect measurement (Keeney & Raiffa, 1976; von Winterfeldt & Edwards, 1986). Indifference methods modify one of two sets of stimuli until subjects feel that they are indifferent between the two. Direct-assessment methods ask subjects to rate or otherwise assign numerical values to attributes, which are then used to obtain preferences for alternatives. Indirect-measurement techniques avoid decomposition and simply ask for preference orderings between alternatives. There has been some movement toward evaluating the effectiveness of particular methods for measuring preferences (Birnbaum, Coffey, Mellers, & Weiss, 1992; Huber, Wittink, Fiedler, & Miller, 1993). Both of these approaches are expanded upon below, and examples are given illustrating how they can be used. Indifference Methods Indifference methods are illustrated by the variable probability and certainty equivalent methods of eliciting utility functions presented in Section 2.1. There, indifference points were obtained by varying either probabilities or values of outcomes. Similar approaches have been applied to develop multiattribute utility or value functions. This approach involves four steps: (1) develop the single-attribute utility or value functions; (2) assume a functional form for the multiattribute function; (3) assess the indifference point between various multiattribute alternatives; and (4) calculate the substitution rate or relative importance of one attribute compared to the other. The single-attribute functions might be developed by indifference methods (i.e., the variable probability or certainty equivalent methods) or direct-assessment methods, as discussed later. Indifference points between multiattribute outcomes are obtained through an interactive process in which the values of attributes are increased or decreased systematically. Substitution rates are then obtained from the indifference points. For example, consider the case for two alternative traffic safety policies, A1 and A2 . Each policy has two attributes, x = lives lost and y = money spent. Assume that the decision maker is indifferent between A1 and A2 , meaning the decision maker feels that v(x1 , y1 ) = v(20,000 deaths; $1 trillion) is equivalent to v(x2 , y2 ) = v(10,000 deaths; $1.5 trillion). For the sake of simplicity, assume an additive value function, where v(x, y) = kvx (x) + (1 − k)vy (y). Given this functional form, the indifference point A1 = A2 is used to derive the relation (1 − k)kvx (20, 000 deaths) + kvy ($1 × 1012 ) = (1 − k)vx (10, 000 deaths) + kvy ($1.5 × 1012 )
(12)
This results in the substitution rate v (20, 000 deaths) − vx (10, 000 deaths) k = x 1−k vy ($1.5 × 1012 ) − vy ($1 × 1012 )
(13)
If vx = − x and vy = − y, a value of approximately 2−5 is obtained for k. The procedure becomes somewhat more complex when nonadditive forms are assumed for the multiattribute function (Keeney & Raiffa, 1976). Direct-Assessment Methods Direct-assessment methods include curve-fitting and various numerical rating methods (von Winterfeldt & Edwards, 1986). Curve fitting is perhaps the simplest approach. Here, the decision maker first orders the various attributes and then simply draws a curve assigning values to them. For example, an expert might draw a curve relating levels
of traffic noise (measured in decibels) to their level of annoyance (on a scale of 0–1). Rating methods, as discussed earlier in reference to subjective probability assessment, include direct numerical measures on rating scales and relative ratings. The analytic hierarchy process (AHP) provides one of the more implementable methods of this type (Saaty, 1990). In this approach, the decision is first structured as a value tree (see Figure 6). Then each of the attributes is compared in terms of importance in a pairwise rating process. When entering the ratings, decision makers can enter numerical ratios (e.g., an attribute might be twice as important as another) or use the subjective verbal anchors mentioned earlier in reference to subjective probability assessment. The AHP program uses the ratings to calculate a normalized eigenvector assigning importance or preference weights to each attribute. Each alternative is then compared on the separate attributes. For example, two houses might first be compared in terms of cost and then be compared in terms of attractiveness. This results in another eigenvector describing how well each alternative satisfies each attribute. These two sets of eigenvectors are then combined into a single vector that orders alternatives in terms of preference. The subjective multiattribute rating technique (Smart) developed by Edwards (see von Winterfeldt & Edwards, 1986) provides a similar, easily implemented approach. Both techniques are computerized, making the assessment process relatively painless. Direct-assessment approaches have been relied on extensively by product designers who would like to improve perceived product quality. For example, the automotive industry in the United States has for many years made extensive use of a structured approach for improving product and service quality called quality function deployment (QFD). QFD is defined as converting the consumers’ demands into quality characteristics and developing a design quality for the finished product by systematically deploying the relationships between the demands and the characteristics, starting with the quality of each functional component and extending the deployment to the quality of each part and process. (Akao, 2004, p. 5) The House of Quality in Figure 12 for hotel wellness quality based on (Lehto & Lehto, 2019) succinctly represents how QFD works though QFD is not a mere diagram, but more likely a quality improvement procedure. The matrix located at the center of Figure 12 combines the customer attributes (or consumers’ demands such as cleanliness and exercise facilities) as rows and engineering characteristics as columns (such as Housekeeping and Exercise Room). By associating these two sets of attributes using a matrix, it is possible to identify which characteristics in a product or service should be improved in order to increase consumer satisfaction. On the top of the engineering characteristic, another diagonal matrix, which resembles a roof top of a house, shows the relationships between different engineering characteristics. Sometimes, two engineering characteristics can be improved together, but other times two engineering characteristics conflict (e.g., power vs. fuel efficiency of an automobile). Understanding the relationship between engineering characteristics could help informed decision making. In order to capture the customer attributes (or consumers’ needs), various methods have been used. First, the most important segment of users is usually defined, and analyzing existing customer databases and conducting a market survey using focus group, interview, and survey studies. Consumers’ needs are often described in their own words, such as “no wind noise in room” during this phase. These descriptions could be fuzzy, and fuzzy set theory could help quantify fuzziness, so that further
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
Urban Environment Conditions
Swimming Pool
Exercise Facilities
Security staff
Housekeeping staff
Services & facilities
Noise Control
Lighting
(Wellness Quality Requirements)
Bedding
Air Conditioning
Customer Attributes
Air Filtering
Room Anienties
Engineering Characteristics (Product Design Requirements)
185
Room Condition
Comfortable
Strong Negative Customer Perception (1 is poor; 5 is very good) 1 2 3 4 5 X
Y * X *
Safe & Secure
Y
Clean
X
* Y
X
* Y
Restful Quiet X
Facilities & Environment
Comfortable
Service
Relationships Strong Positive Medium Positive Medium Negative
Y
* X *
Safe & Secure
Y
Clean
X
* Y
X
* Y
Physical Activity Walkability Helpful
Y X * X
Friendly
* Y
Responsive
5 Technical Competitive Benchmark Evaluation
Figure 12
X
*
*
X Y
Y
*
X
Y
4 Y
X
Y
3 X
*
*
*
*
X Y
X Y
* Our Hotel X Hotel X
Example House of Quality for hotel wellness. (Source: Based on Lehto & Lehto, 2019.).
quantitative analysis could be done (Lehto & Buck, 2007). Web technologies have also been used to efficiently collect customer ratings collectively, which is called collective filtering (e.g., Rashid et al., 2002; Schafer, Konstan, & Riedl, 2001). These techniques have been actively applied to e-commerce websites, such as amazon.com and netflix.com. Indirect Measurement Indirect-measurement techniques avoid asking people to rate or directly rank the importance of
factors that affect their preferences. Instead, subjects simply state or order their preferences for different alternatives. A variety of approaches can then be used to determine how individual factors influence preference. Conjoint analysis provides one such approach for separating the effects of multiple factors when only their joint effects are known. Conjoint analysis is “a technique for measuring trade-offs for analyzing survey responses concerning preferences and intentions to buy, and it is a method for simulating
186
HUMAN FACTORS FUNDAMENTALS
Figure 13
Card 1
Card 2
Annual Price: $20
Annual Price: $50
Cash rebate: none
Cash rebate: 0.5%
Retail purchase insurance: none
Retail purchase insurance: none
Rental car insurance: $30,000
Rental car insurance: $30,000
Baggage insurance: $25,000
Baggage insurance: None
Airport club admission: $2 per visit
Airport club admission: $5 per visit
Medical–legal: no
Medical–legal: yes
Airport limousine: not offered
Airport limousine: 20% discount
Profile cards describe services that a credit card could offer. (Source: Adapted from Green et al., 2001.)
how consumers might react to changes in current products or to new products introduced into an existing competitive array” (Green, Krieger, & Wind, 2001, p. S57). It has been successful in both academia and industry (Green et al., 2001) to understand preferences and intentions of consumers. For example, Marriott’s Courtyard Hotels (Wind, Green, Shifflet, & Scarbrough, 1989) and New York EZ-Pass system (Vavra, Green, & Krieger, 1999) were designed using conjoint analysis approaches, both of which illustrated the utility of conjoint analysis. There are different types of conjoint analysis. One of the simplest types is full-profile studies. In a full-profile study, profiles of a product (e.g., home) having relatively small number of attributes (four to five) are shown to a survey respondent, so that he or she can sort or rate the profiles based on attributes. The order or rating scores of profiles is used to investigate which attributes are more likely to influence the respondent’s decision. Figure 13 shows two examples of profiles. Since each attribute could have multiple levels and profiles should cover all combinations comprehensively, even with a small number of attributes, the number of profiles that a respondent needs to compare with could be exponentially increased. For example, if each of five attributes has three levels, the number of profiles becomes 243 (=35 ). This combinatorial nature restricts the number of attributes in full-profile studies. Common solutions to this problem are to use a partial set of profiles (Green & Krieger, 1990) or to ask a respondent what are more important attributes or which levels of attributes are more desired to decrease the number of profiles (Green & Krieger, 1987). More recent advances include adaptive conjoint analysis (Huber & Zwerina, 1996) and fast polyhedral adaptive conjoint estimation (Toubia et al., 2003), both of which cut down the number of questions in a conjoint analysis adaptively using respondent’s responses. Related applications include the dichotomy-cut method, used to obtain decision rules for individuals and groups from ordinal rankings of multiattribute alternatives (Stanoulov, 1994). However, in spite of its success and evolution of over three decades, conjoint analysis still has room for improvement (Bradlow, 2005). Respondents may change their preference structure, but this aspect has not been systematically considered in many conjoint analysis studies. Conjoint analysis often burdens research participants by asking too many questions though various techniques (e.g., adaptive conjoint analysis and choice-based conjoint analysis) have been suggested to cut down the number of questions to be answered. Findings in behavioral research have not been fully reflected in conjoint analysis, yet. In other words, consumers actually use various heuristics and strategies to cut down the number of candidates (or profiles in the context of conjoint analysis), but many studies in conjoint analysis do not accommodate these aspects.
The policy-capturing approach used in social judgment theory (Hammond, 1993; Hammond et al., 1975) is another indirect approach for describing human judgments of both preferences and probability. The policy-capturing approach uses multivariate regression or similar techniques to relate preferences to attributes for one or more decision makers. The equations obtained correspond to policies followed by particular decision makers. An example equation might relate medical symptoms to a physician’s diagnosis. It has been argued that the policy-capturing approach measures the influence of factors on human judgments more accurately than do decomposition methods. Captured weights might be more accurate because decision makers may have little insight into the factors that affect their judgments (Valenzi & Andrews, 1973). People may also weigh certain factors in ways that reflect social desirability rather than influence on their judgments (Brookhouse, Guion, & Doherty, 1986). For example, people comparing jobs might rate pay as being lower in importance than intellectual challenge, whereas their preferences between jobs might be predicted entirely by pay. Caution must also be taken when interpreting regression weights as indicating importance, since regression coefficients are influenced by correlations between factors, their variability, and their validity (Stevenson, Busemeyer, & Naylor, 1993). There has been some movement toward evaluating the effectiveness of particular methods for measuring preferences (Birnbaum et al., 1992; Huber et al., 1993). However, the validity of direct versus indirect assessment is one area of continuing controversy. One conclusion that might be drawn is that it is not clear that any of the quantitative methods described above have adequate descriptors, factors, and methods to account for the dynamic characteristics (e.g., emerging consumer knowledge and reactions between competing opinions) of complex issues, such as making decisions about energy sources and consumption patterns responding to climate change and environmental concerns. 4.2 Individual Decision Support The concept of DSSs dates back to the early 1970s. It was first articulated by Little (1970) under the term decision calculus and by Scott-Morton (1977) under the term management decision systems. DSSs are interactive computer-based systems that help decision makers utilize data and models to solve unstructured or semistructured problems (Keen & Scott-Morton, 1978; Scott-Morton, 1977). Given the unstructured nature of these problems, the goal of such systems is to support, rather than replace, human decision making. The three key components of a DSS are (1) a model base; (2) a database; and (3) a user interface. The model base
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
comprises quantitative models (e.g., financial or statistical models) that provide the analysis capabilities of DSSs. The database manages and organizes the data in meaningful formats that can be extracted or queried. The user interface component manages the dialogue or interface between the DSS and the users. For example, visualization tools can be used to facilitate communication between the DSS and the users. DSSs are generally classified into two types: model-driven and data-driven. Model-driven DSSs utilize a collection of mathematical and analytical models for the decision analysis. Examples include forecasting and planning models, optimization models, and sensitivity analysis models (i.e., for asking “what-if” questions). The analytical capabilities of such systems are powerful because they are based on strong theories or models. On the other hand, data-driven DSSs are capable of analyzing large quantities of data to extract useful information. The data may be derived from transaction-processing systems, enterprise systems, data warehouses, or Web warehouses. Online analytical processing and data mining can be used to analyze the data. Multidimensional data analysis enables users to view the same data in different ways using multiple dimensions. The dimensions could be product, salesperson, price, region, and time period. Data mining refers to a variety of techniques that can be used to find hidden patterns and relationships in large databases and to infer rules from them to guide decision making and predict future behavior. Data mining can yield information on associations, sequences, classifications, clusters, and forecasts (Laudon & Laudon, 2003). Associations are occurrences linked to a single event (e.g., beer is purchased along with diapers); sequences are events linked over time (e.g., the purchase of a new oven after the purchase of a house). Classifications refer to recognizing patterns and rules to categorize an item or object into its predefined group (e.g., customers who are likely to default on loans); clustering refers to categorizing items or objects into groups that have yet been defined (e.g., identifying customers with similar preferences). Data mining can also be used for forecasting (e.g., projecting sales demand). Despite the popularity of DSSs not a lot of data are available documenting that they improve decision making (Yates, Veinott, & Patalano, 2003). It does seem logical that DSSs should play a useful role in reducing biases (see Section 2.2.2) and otherwise improving decision quality. This follows because a well-designed DSS will increase both the amount and quality of information available to the decision maker. A well-designed DSS will also make it easier to analyze the information with sophisticated modeling techniques. Ease of use is another important consideration. As discussed earlier, Payne et al. (1993) identify two factors influencing the selection of a decision strategy: (1) cognitive effort required of a strategy in making the decision and (2) the accuracy of the strategy in yielding a “good” decision. Todd and Benbasat (1991, 1992, 1999) found that DSS users adapted their strategy selection to the type of decision aids available in such a way as to reduce effort. In other words, effort minimization is a primary or more important consideration to DSS users than is the quality of decisions. More specifically, the role of effort may have a direct impact on DSS effectiveness and must be taken into account in the design of DSSs. 4.2.1 Expert Systems Expert systems are developed to capture knowledge for a very specific and limited domain of human expertise. Expert systems can provide the following benefits: cost reduction, increased output, improved quality, consistency of employee output, reduced downtime, captured scarce expertise, flexibility in providing services, easier operation of equipment, increased reliability, faster
187
response, ability to work with incomplete and uncertain information, improved training, increased ability to solve complex problems, and better use of expert time. Organizations routinely use expert systems to enhance the productivity and skill of human knowledge workers across a spectrum of business and professional domains. They are computer programs capable of performing specialized tasks based on an understanding of how human experts perform the same tasks. They typically operate in narrowly defined task domains. Despite the name expert systems, few of these systems are targeted at replacing their human counterparts; most of them are designed to function as assistants or advisers to human decision makers. Indeed, the most successful expert systems—those that actually address mission-critical business problems—are not “experts” as much as “advisors” (LaPlante, 1990). An expert system is organized in such a way that the knowledge about the problem domain is separated from general problem-solving knowledge. The collection of domain knowledge is called the knowledge base, whereas the general problem-solving knowledge is called the inference engine. The knowledge base stores domain-specific knowledge in the form of facts and rules. The inference engine operates on the knowledge base by performing logical inferences and deducing new knowledge when it applies rules to facts. Expert systems are also capable of providing explanations to users. As pointed out by Yates et al. (2003), the large number of expert systems that are in actual use suggests that expert systems are by far the most popular form of computer-based decision support. However, as for DSSs, not a lot of data are available showing that expert systems improve decision quality. Ease of use is probably one of the main reasons for their popularity. This follows, because the user of an expert system can take a relatively passive role in the problem-solving process. That is, the expert system asks a series of questions which the user simply answers if he or she can. The ability of most expert systems to answer questions and explain their reasoning can also help users understand what the system is doing and confirm the validity of the system’s recommendations. Such give and take may make users more comfortable with an expert system than they are with models that make sophisticated mathematical calculations that are difficult to verify. 4.2.2 Machine Learning-Based Decision Support With the recent advances in computing capabilities and availability of Cloud-based systems such as Amazon Web Services or Microsoft Azure, machine learning-based decision support systems are increasingly becoming popular. Popular machine learning models such as Naïve Bayes, Support Vector Machine, Neural Networks, and Logistic Regression have been found to be effective for classification tasks involving large volumes of multidimensional data. By applying a training set such as historical cases, these machine learning algorithms can be used to solve or analyze multitude of problems that would require considerably higher time and effort to do manually. The outputs or recommendations from the machine learning-based system can be used to support human decision making. Such machine learning-based decision support systems not only can suggest the most probable answers but also provide the probability of correctness of each suggested answer to help in decision making with reliability. Machine learning-based systems have been developed to predict customer responses to direct marketing (Cui and Wong, 2004), to forecast stock returns (Jasic & Wood, 2004; Olson & Mossman, 2003; Sapena et al., 2003), to assess product quality in the metallurgical industry (Zhou and Xu, 1999), to categorize discussion forum posts (Cui & Wise, 2015; Nanda & Douglas, 2019), and to support decision making on sales forecasting (Kuo & Xue, 1998). Machine learning-based decision support
188
HUMAN FACTORS FUNDAMENTALS
systems are particularly useful for classification tasks that need high accuracy and are time sensitive, such as assigning diagnostic codes to emergency room hospitalization records (Nanda, Vallmuur, & Lehto, 2019, 2020). These systems can reduce the time it takes for processing simple decision cases and help experts to focus better on complicated cases that require more attention. 4.2.3 Visual Analytics As the amount and complexity of available information ever grow, selecting and understanding relevant information become more and more challenging. In order to deal with massive, complex, and heterogeneous information, attempts to utilize the highest bandwidth human sensory channel, vision, have been made. Thus, “visual analytics” has been recently proposed as a separate research field. A commonly accepted definition of visual analytics is “the science of analytical reasoning facilitated by interactive visual interfaces” (Thomas & Cook, 2005, p. 4). More specifically, Keim et al. (2008a, p. 4) detailed the goal of visual analytics as follows: • Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data. • Detect the expected and discover the unexpected. • Provide timely, defensible, and understandable assessments. • Communicate assessment effectively for action. Obviously, visual analytics is largely overlapped with many disciplines, such as information visualization, human factors, data mining and management, decision making, and statistical analysis, to name a few. Keim et al. (2008b) especially pointed out the crucial role of human factors to understand interaction, cognition, perception, collaboration, presentation, and dissemination issues in employing visual analytics. There have been some early successes in this endeavor. Jigsaw (Stasko et al., 2008) and IN-SPIRE (http://in-spire.pnl .gov/) are visual analytics tools to support investigative analysis on text data. VisAware (Livnat, Agutter, Moon, & Foresti 2005)
Figure 14
was built to raise situation awareness in the context of network intrusion detection. Map of the Market (Wattenberg, 1999) and FinDEx (Keim et al., 2006) are visualization techniques to analyze the stock market and assets. Figure 14 shows a screen shot of Map of the Market for S&P 500 (finviz: Map of the Market, 2020) , which depicts increases and decreases of stock prices in color encoding and market capitalization in size encoding. It also supports details on demand through simple interaction (e.g., a user can drill down to a specific market segment by hovering over it, as shown in Figure 14 for the Internet Retail segment). With the increased computing capabilities of Cloud-based software tools such as SAS, Tableau, and Microsoft Azure, interactive visual analytics reports from large-scale multidimensional data can be easily developed. For example, Figure 15 shows a screen shot of an interactive dashboard by SAS (SAS Visual Analytics: Warranty Analysis. 2020) providing an overview of how the warranty costs for cars are distributed among various labor groups (e.g., engine, electrical systems, etc.) for different car models. Such an analysis can be very helpful for decision makers to identify the high-impact areas where warranty costs can be significantly reduced and formulate an appropriate strategy. In spite of these interesting and successful examples, researchers in visual analytics run into several challenges. Some of visual analytic tools utilize quite complex visualization techniques which may not be intuitively understood by non-visualization-savvy users (Berinato, 2016). The lack of comprehensive guidelines on how to create intuitive visualization techniques has long been a problem. Evaluating visual analytic tools has also been challenging (Plaisant, 2004). Visual analytics tasks tend to require high-level expertise and are dynamic, complex, and uncertain. Hence, investigating the effectiveness of visual analytic tools is very time-consuming and ambiguous. These problems are being further aggravated by an explosive growth of information. Though visual analytic approaches help users deal with larger data sets, the rapid growth in the volume of information is challenging to keep up. For a more comprehensive list of challenges, refer to Keim et al. 2008b, Thomas and Cook (2005), and Thomas and Kielman (2009).
Screenshot of Map of the Market taken on June 20, 2020 from finviz.com.
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
Figure 15
189
Screenshot of SAS Visual Analytics Dashboard for Warranty Analysis taken on June 20, 2020 from sas.com.
4.2.4 Other Forms of Individual Decision Support Other forms of individual decision support can be developed using machine learning, fuzzy logic, intelligent agents, case-based reasoning, and genetic algorithms. Fuzzy logic refers to the use of membership functions to express imprecision and an approach to approximate reasoning in which the rules of inference are approximate rather than exact. Garavelli and Gorgoglione (1999) used fuzzy logic to design a DSS to improve its robustness under uncertainty, and Coma et al. (2004) developed a fuzzy DSS to support a design for assembly methodology. Collan and Liu (2003) combined fuzzy logic with agent technologies to develop a fuzzy agent-based DSS for capital budgeting. Intelligent agents use built-in or learned rules to make decisions. In a multiagent marketing DSS, the final solution is obtained through cooperative and competitive interactions among intelligent agents acting in a distributed mode (Aliev et al., 2000). Intelligent agents can also be used to provide real-time decision support on airport gate assignment (Lam et al., 2003). Case-based reasoning, which relies on past cases to derive at a decision, has been used by Lari (2003) to assist in making corrective and preventive actions for solving quality problems and by Belecheanu et al. (2003) to support decision making on new product development. Genetic algorithms are robust algorithms that can search through large spaces quickly by mimicking the Darwinian “survival of the fittest” law. They can be used to increase the effectiveness of simulation-based DSSs (Fazlollahi and Vahidov, 2001). 4.3 Group and Organizational Decision Support Computer tools have been developed to assist in group and organizational decision making. Some of them implement the approaches discussed in Section 3. The spectrum of such tools ranges from traditional tools used in decision analysis, such as the analytic hierarchy process (Saaty, 1990; Basak & Saaty, 1993), to electronic meeting places or group DSSs (DeSanctis & Gallupe, 1987; Nunamaker et al., 1991), to negotiation support systems (Bui et al., 1990; Lim & Benbasat, 1993). We will discuss the use of individual decision support tools for group support, group DSSs, negotiation support systems, enterprise system support, and other forms of group and organizational support.
4.3.1 Using Individual Decision Support Tools for Group Support Traditional single-user tools can be used to support groups in decision making. A survey by Satzinger and Olfman (1995) found that traditional single-user tools were perceived by groups to be more useful than group support tools. Sharda et al. (1988) assessed the effectiveness of a DSS for supporting business simulation games and found that groups with access to the DSS made significantly more effective decisions than their non-DSS counterparts. The DSS groups took more time to make their decisions than the non-DSS groups at the beginning of the experiment, but decision times converged in a later period. The DSS teams also exhibited a higher confidence level in their decisions than the non-DSS groups. Knowledge-based systems (or expert support systems) are effective in supporting group decision making, particularly so with novices than experts (Nah & Benbasat, 2004). Groups using the system also make better decisions than individuals provided with the same system (Nah, Mao, & Benbasat, 1999). Hence, empirical findings have shown that traditional single-user tools can be effective in supporting group decision making. 4.3.2 Group Decision Support Systems Group decision support systems (GDSSs) combine communication, computing, and decision support technologies to facilitate formulation and solution of unstructured problems by a group of people (DeSanctis & Gallupe, 1987). DeSanctis and Gallupe defined three levels of GDSS. Level 1 GDSSs provide technical features aimed at removing common communication barriers, such as large screens for instantaneous display of ideas, voting solicitation and compilation, anonymous input of ideas and preferences, and electronic message exchange among members. In other words, a level 1 GDSS is a communication medium only. Level 2 GDSSs provide decision modeling or group decision techniques aimed at reducing uncertainty and “noise” that occur in the group’s decision process. These techniques include automated planning tools (e.g., project evaluation review technique (PERT), critical path method (CPM), Gantt), structured decision aids for the group process (e.g., automation of Delphi, nominal, or other idea-gathering and compilation techniques), and decision analytic aids for the task (e.g., statistical methods, social judgment models). Level 3 GDSSs are characterized
190
by machine-induced group communication patterns and can include expert advice in the selecting and arranging of rules to be applied during a meeting. To date, there has been little research in level 3 GDSSs because of the difficulty and challenges in automating the process of group decision making. GDSSs facilitate computer-mediated group decision making and provide several potential benefits (Brashers et al., 1994), including (1) enabling all participants to work simultaneously (e.g., they don’t have to wait for their turn to speak, thus eliminating the need to compete for air time); (2) enabling participants to stay focused and be very productive in idea generation (i.e., eliminating production blocking caused by attending to others); (3) providing a more equal and potentially anonymous opportunity to be heard (i.e., reducing the negative effects caused by power distance); and (4) providing a more systematic and structured decision-making environment (i.e., facilitating a more linear process and better control of the agenda). GDSSs also make it easier to control and manage conflict through the use of facilitators and convenient voting procedures. The meta-analysis by Dennis et al. (1996) suggests that, in general, GDSSs improve decision quality, increase time to make decisions, and have no effect on participant satisfaction. They also found that larger groups provided with a GDSS had higher satisfaction and experienced greater improvement in performance than smaller groups with GSSs. The findings from McLeod’s (1992) and Benbasat and Lim’s (1993) meta-analyses show that GDSSs increase decision quality, time to reach decisions, and equality of participation but decrease consensus and satisfaction. To resolve inconsistencies in the GDSS literature (such as those relating to satisfaction), Dennis and his colleagues (Dennis et al., 2001; Dennis & Wixom, 2002) carried out further meta-analyses to test a fit–appropriation model and identify further moderators for these effects. The result shows that both fit (between GSS structures and task) and appropriation support (i.e., training, facilitation, and software restrictiveness) are necessary for GDSSs to yield an increased number of ideas generated, reduce the time taken for the task, and increase satisfaction of users (Dennis et al., 2001). The fit–appropriation profile is adapted from Zigurs and Buckland (1998). Computer-supported collaborative systems provide features beyond GDSSs, such as project and calendar management, group authoring, audio and video conferencing, and group and organizational memory management. They facilitate collaborative work beyond simply decision making and are typically referred to as computer-supported collaborative work. These systems are particularly helpful for supporting group decision making in a distributed and asynchronous manner (Nanda, Lehto, & Nof, 2014; Nanda, Tan, Auyeung, & Lehto, 2012). 4.3.3 Negotiation Support Systems Negotiation support systems (NSSs) are used to assist people in activities that are competitive or involve conflicts of interest. The need for negotiation can arise from differences in interest or in objectives or even from cognitive limitations. To understand and analyze a negotiation activity, eight elements must be taken into account (Holsapple et al., 1998): (1) the issue or matter of contention; (2) the set of participants involved; (3) participants’ regions of acceptance; (4) the participants’ location (preference) within the region of acceptance; (5) the strategies for negotiation (e.g., coalition); (6) the participants’ movements from one location to another; (7) the rules of negotiation; and (8) assistance from an intervenor (e.g., mediator, arbitrator, or facilitator). NSSs should be designed with these eight components in mind by supporting these components. The configuration of basic NSSs comprises two main components (Lim & Benbasat, 1993): (1) a DSS for each negotiating party and (2) an electronic linkage between these systems to
HUMAN FACTORS FUNDAMENTALS
enable electronic communication between the negotiators. Full-feature session-oriented NSSs should also offer group process structuring techniques, support for an intervenor, and documentation of the negotiation (Foroughi, 1998). Nego-Plan is an expert system shell that can be used to represent negotiation issues and decompose negotiation goals to help analyze consequences of negotiation scenarios (Holsapple & Whinston, 1996; Matwin et al., 1989). A Web-based NSS called Inspire is used in teaching and training (Kersten & Noronha, 1999). Espinasse et al. (1997) developed a multiagent NSS architecture that can support a mediator in managing the negotiation process. To provide comprehensive negotiation support, NSSs should provide features of level 3 GDSSs, such as the ability to do the following: (1) perform analysis of conflict contingencies; (2) suggest appropriate process structuring formats or analytical models; (3) monitor the semantic content of electronic communications; (4) suggest settlements with high joint benefits; and (5) provide automatic mediation (Foroughi, 1998). In general, NSSs can support negotiation either by assisting participants or by serving as a participant (intervenor). 4.3.4 Enterprise Systems for Decision Support Enterprise-wide support can be provided by enterprise systems (ESs) and executive support systems (ESSs) (Turban & Aronson, 2001). ESSs are designed to support top executives, whereas ESs can be designed to support top executives or to serve a wider community of users. ESSs are comprehensive support systems that go beyond flexible DSSs by providing communication capabilities, office automation tools, decision analysis support, advanced graphics and visualization capabilities, and access to external databases and information in order to facilitate business intelligence and environmental scanning. For example, intelligent agents can be used to assist in environmental scanning. The ability to use ESs, also known as enterprise resource planning (ERP) systems, for decision support is made possible by data warehousing and online analytical processing. ESs integrate all the functions as well as the transaction processing and information needs of an organization. These systems can bring significant competitive advantage to organizations if they are integrated with supply chain management and customer relationship management systems, thus providing comprehensive information along the entire value chain to key decision makers and facilitating their planning and forecasting. Advanced planning and scheduling packages can be incorporated to help optimize production and ensure that the right materials are in the right warehouse at the right time to meet customers’ demands (Turban & Aronson, 2001). 4.3.5 Crowdsourcing Surowiecki (2004) popularized the concept of the wisdom of crowds through his book The Wisdom of Crowds: Why the Many Are Smarter Than the Few, which argues that the aggregation of information or opinions of crowds could result in better decisions than those of expert individuals or groups. The example that Surowiecki used to open his book is Galton’s surprising result at a weight-judging competition of a dressed ox at the annual show of the West of England Fat Stock and Poultry Exhibition (Galton, 1907). Galton analyzed the 787 collected votes for the competition and found that the median of the votes was 1207 lb, which was just 9 lb off from the true value, 1198 lb, which showed the power of aggregated information. Similar evidence has been collected in many other cases, such as locating a lost submarine, predicting the winner in sports betting, and predicating the future in investigative organizations (Surowiecki, 2004).
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
However, simply collecting opinions of crowds does not guarantee a better decision. The wisdom of crowds breaks down in the following situations (Surowiecki, 2004): (1) when the crowd becomes homogeneous, crowds fail to collect information from diverse perspectives; (2) when an organization is too centralized or too divided, it fails to collect information from individual members who directly confront the situation, and the collected information cannot be communicated within the organization; and (3) when individuals in the crowd imitate others’ opinions or are emotionally influenced by others, only a few members in the crowd play as information collectors or decision makers. Thus, Surowiecki suggests that the wisdom of crowds functions properly when the crowd has the following characteristics: diversity of opinions, independence, decentralization, and a mechanism to aggregate information. There have been efforts to construct functioning crowds more systematically. The prediction market (also known as information market, decision market, and event future) is a market to predict future events using a similar mechanism of the financial market (Wolfers & Zitzewitz, 2004). For example, a betting exchange, Tradesports.com, listed a security paying $100 if the Head of the Defense Advanced Research Projects Agency (DARPA), Admiral John Poindexter, resigned by the end of August 2003. The price of the security reflected the possibility of the event, so it fluctuated as more information was collected. This prediction market provides a platform to collect information from individuals with proper incentives. Various studies also reported that these prediction markets are extremely accurate (Berg et al., 2008), and it has been actively applied to various areas, such as predicting influenza outbreaks (Holden, 2007). 4.3.6 Other Forms of Group and Organizational Decision Support We have discussed how individual decision support tools, GDSSs, NSSs, ESSs, and ESs can facilitate and support group and organizational decision making. Other techniques drawn from the field of artificial intelligence, such as neural networks, expert systems, fuzzy logic, genetic algorithms, case-based reasoning, and intelligent agents, can also be used to enhance the decision support capabilities of these systems. It should also be noted that knowledge management practices can benefit groups and organizations by capitalizing on existing knowledge to create new knowledge, codifying existing knowledge in ways that are readily accessible to others, and facilitating knowledge sharing and distribution throughout an enterprise (Davenport & Prusak, 2000). Since knowledge is a key asset of organizations and is regarded as the only source of sustainable competitive strength (Drucker, 1995), the use of technologies for knowledge management purposes is a high priority in most organizations. For example, knowledge repositories (e.g., Intranets) can be created to facilitate knowledge sharing and distribution, focused knowledge environments (e.g., expert systems) can be developed to codify expert knowledge to support decision making, and knowledge work systems (e.g., computer-aided design, virtual reality simulation systems, and powerful investment workstations) can be used to facilitate knowledge creation. By making existing knowledge more available, these systems can help groups and organizations make more informed and better decisions. 4.4 Problem Solving Though problem solving is so commonly used, defining it is not an easy task (Hunt, 1998). Problem solving can be “understood as the bridging of the gap between an initial state of affairs and a desired state where no predetermined operation or strategy is
191
known to the individual” (Öllinger & Goel, 2010, p. 4), and most definitions of problem solving attempted so far include three core components: an initial state, a goal state, and paths between the two states (Mayer, 1983). The path is often unknown, so problem solving is largely an activity to search for the path. However, these descriptions and characterization do not clearly specify what problem solving is and is not. Problem solving deals with various topics, which include, but are not limited to, reading, writing, calculation, managerial problem solving, problem solving in electronics, game playing (e.g., chess), and problem solving for innovation and inventions (Sternberg & Frensch, 1991). Some problems are structured (e.g., Tower of Hanoi), but others are ill-structured (e.g., preparing good dinner for guests) (Reitman, 1964). Thus, we might need an even more inclusive definition of problem solving, as Anderson et al. (1985) suggested: any goal-directed sequence of cognitive operations. As problem solving includes a wide spectrum of topics, clearly drawing the boundary between problem solving and decision making is almost meaningless. Though Simon et al. (1987) provided elegant separation of two fields of research (i.e., problem solving covers fixing agendas, setting goals, and designing actions while decision making covers evaluating and choosing), one can easily argue against the division. Virtually all decision-making activities could be problem solving since decision making is an activity from a state of not having a selection toward a state with a selection. Conversely, some activities of problem solving, choosing a path out of potential paths or generating alternatives, would be considered as activities of decision making. Thus, it would be more appropriate to see that decision making and problem solving are largely overlapped, and suggesting a theoretical distinction between problem solving and decision making may not be fruitful. Kepner and Tregoe (1976) even said that problem solving and decision making are often used interchangeably. In spite of the fuzzy boundary between the two fields, decision making and problem solving have distinctive lineages. While decision-making research has been largely led by economists, statisticians, and mathematicians until descriptive approaches become more prominent, problem solving has a relatively longer and distinctive history of research mainly done by psychologists (Simon et al., 1987). Due to this difference, researchers in problem solving introduced several interesting research methods. Over time, research in problem solving and decision making has largely overlapped, and approaches successfully employed in one domain are quickly adopted by the other. For example, information-processing theory has been one of important paradigms to driving the development of decision theory in the last half century (Payne & Bettman, 2004). Cognitive architectures, such as ACT-R, have been employed to understand biases and heuristics used in decision making (e.g., Altmann & Burns, 2005; Belavkin, 2006; Dickison & Taatgen, 2007). Neuroimaging techniques have also been employed to understand decision-making tasks (e.g., Ernst et al., 2002; Trepel, Fox, & Poldrack, 2005). The interaction between the two research communities is expected to accelerate as the boundary of research questions is widened.
5 CONCLUSION Beach (1993) discusses four revolutions in behavioral decision theory. The first took place when it was recognized that the evaluation of alternatives is seldom extensive. It is illustrated by use of the satisficing rule (Simon, 1955) and heuristics (Gigerenzer et al., 1999; Tversky & Kahneman, 1974) rather than optimizing. The second occurred when it was recognized that people choose between strategies to make decisions. It is marked by the
192
development of contingency theory (Beach, 1990) and cognitive continuum theory (Hammond, 1980). The third is currently occurring. It involves the realization that people rarely make choices and instead rely on prelearned procedures. This perspective is illustrated by the dual-processing models of human decision making (Kahneman, 2011), the levels-of-processing approach (Rasmussen, 1983), and recognition-primed decisions (Klein, 1989). The fourth is just beginning. It is marked by the emergence of behavioral economics, finance, and game theory, and involves recognizing that decision-making research must abandon a single-minded focus on the economic view of decision making and include approaches drawn from relevant developments and research in cognitive psychology, organizational behavior, and systems theory. The discussion within this chapter parallels this view of decision making. The integrative model presented at the beginning of the chapter shows how the various approaches fit together as a whole. Each path through the model is distinguished by specific sources of conflict, the methods of conflict resolution followed, and the types of decision rules used to analyze the results of conflict resolution processes. The different paths through the model correspond to fundamentally different ways of making decisions, ranging from routine situation assessment-driven decisions to satisficing, analysis of single- and multiattribute expected utility, and even obtaining consensus of multiple decision makers in group contexts. Numerous other strategies and potential methods of decision support discussed in this chapter are also described by particular paths through the model. This chapter goes beyond simply describing methods of decision making by pointing out reasons that people and groups may have difficulty making good decisions. These include cognitive limitations, inadequacies of various heuristics used, biases and inadequate knowledge of decision makers, and task-related factors such as risk, time pressure, and stress. The discussion also provides insight into the effectiveness of approaches for improving human decision making. The models of selective attention point to the value of providing only truly relevant information to decision makers. Irrelevant information might be considered simply because it is there, especially if it is highly salient. Methods of highlighting or emphasizing relevant information are therefore warranted. The models of selective information also indicate that methods of helping decision makers cope with working memory limitations will be of value. There also is reason to believe that providing feedback to decision makers in dynamic decision-making situations will be useful. Cognitive rather than outcome feedback is indicated as being particularly helpful when decision makers are learning. Training decision makers also seems to offer potentially large benefits. One reason for this conclusion is that the studies of naturalistic decision making revealed that most decisions are made on a routine, nonanalytical basis. Studies of debiasing also partially support the potential benefits of training and feedback. On the other hand, the many failures to debias expert decision makers imply that decision aids, methods of persuasion, and other approaches intended to improve decision making are no panacea. Part of the problem is that people tend to start with preconceived notions about what they should do and show a tendency to seek out and bolster confirming evidence. Consequently, people may become overconfident with experience and develop strongly held beliefs that are difficult to modify, even if they are hard to defend rationally.
NOTES 1 No single book covers all the topics addressed here. More detailed sources of information are referenced throughout the chapter. Sources such as von Neumann and Morgenstern (1947), Savage
HUMAN FACTORS FUNDAMENTALS
2
3
4
5
6
7
8
9
10 11
12
13
14
15
(1954), Luce and Raiffa (1957), Shafer (1976), and Friedman (1990) are useful texts for readers desiring an introduction to normative decision theory. Raiffa (1968), Keeney and Raiffa (1976), Clemen and Reilly (2014) are applied texts on decision analysis. Kahneman et al. (1982), von Winterfeldt and Edwards (1986), Payne et al. (1993), Svenson and Maule (1993), Heath et al. (1994), Yates (1992), Koehler and Harvey (2004), and Camerer et al. (2004), among numerous others, are texts addressing elements of behavioral decision theory. Klein et al. (1993) and Klein (1998, 2004) provide introductions to naturalistic decision making. The notion that the best decision strategy varies between decision contexts is a fundamental assumption of the theory of contingent decision making (Payne et al., 1993), cognitive continuum theory (Hammond, 1980), and other approaches discussed later in the chapter. Conflict has been recognized as an important determinant of what people do in risky decision-making contexts (Janis & Mann, 1977). Janis and Mann focus on the stressful nature of conflict and on how affective reactions in stressful situations can affect decision strategies. The distinction between routine and conflict-driven decision making made here is similar to Rasmussen’s (1983) distinction between (1) routine skill or rule-based levels of control and (2) nonroutine knowledge-based levels of control in information-processing tasks. Note that multiple sources of conflict are possible for a given decision context. An attempt to resolve one source of conflict may also make the decision maker aware of other conflicts that must first be resolved. For example, decision makers may realize they need to know what the alternatives are before they can determine their aspiration levels. Clemen (1996) includes a chapter on creativity and decision structuring. Some practitioners claim that structuring the decision is the greatest contribution of the decision analysis process. When no evidence is available concerning the likelihood of different events, it was postulated that each consequence should be assumed to be equally likely. The Laplace decision rule makes this assumption and then compares alternatives on the basis of expected value or utility. Note that classical utility theory assumes that utilities are constant. Utilities may, of course, fluctuate. The random-utility model (Bock & Jones, 1968) allows such fluctuation. To develop the multi-attribute utility function, the single-attribute utility functions (un ) and the importance weights (kn ) are determined by assessing preferences between alternatives. Methods of doing so are discussed in Section 4.1.3. Closely related fields (or subfields) include behavioral finance and behavioral game theory. Doherty (2003) groups researchers on human judgment and decision making into two camps. The optimists focus on the success of imperfect human beings in a complex world. The pessimists focus on the deficiencies of human reasoning compared to normative models. As noted by Gigerenzer et al. (1999, p. 18), because of environmental challenges, “organisms must be able to make inferences that are fast, frugal, and accurate.” Similarly, Hammond (1996) notes that a close correspondence between subjective beliefs and environmental states will provide an adaptive advantage. This point directly confirms Tversky and Kahneman’s (1973) original assumption that the availability heuristic should often result in good predictions. These strategies seem to be especially applicable to the design of information displays and decision support systems. Computer-based decision support is addressed in Section 4.2. These strategies also overlap with decision analysis. As discussed in Section 4.1, decision analysis focuses on the use of analytic methods to improve decision quality. Engineers, designers, and other real-world decision makers will find it very debatable whether the confirmation bias is really a bias. Searching for disconfirming evidence obviously makes sense in hypothesis testing. That is, a single negative instance is enough to disprove a logical conjecture. In real-world settings, however,
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING
16
17
18
19
20 21 22
23 24 25 26
checking for evidence that supports a hypothesis can be very efficient. Singleton and Hovden (1987) and Yates (1992) are useful sources for the reader interested in additional details on risk perception, risk acceptability, and risk taking behavior. Section 2.3 is also relevant to this topic. The notion of a reference point against which outcomes are compared has similarities to the notion of making decisions on the basis of regret (Bell, 1982). Regret, however, assumes comparison to the best outcome. The notion of different reference points is also related to the well-known trend that the buying and selling prices of assets often differ for a decision maker (Raiffa, 1968). As discussed further in Section 4.3, group decisions, even though they are made by rational members, are subject to numerous violations of rationality. For example, consider the case where the decision maker has three selves that are, respectively, risk averse, risk neutral, and risk seeking. Assume that the decision maker is choosing between alternatives A, B, and C. Suppose that the risk-averse self rates the alternatives in the order A, B, C; the risk-neutral self rates them in the order B, C, A; and the risk-seeking self rates them in the order C, A, B. Also assume that the selves are equally powerful. Then two of the three agents always agree that A > B, B > C, and C > A. This ordering is, of course, nontransitive. The weighted additive (WADD) strategy requires the decision maker to first weight the importance of each attribute of the compared alternatives. The sum of the weighted values of each attribute is then used to compare the alternatives. The equally weighted additive (EQW) strategy is a similar strategy to WADD that assigns the same weight to each attribute. The lexicographic (LEX) strategy compares the choices sequentially beginning with the most important attribute. If an alternative is found that is better than the others on the first consequence, it is selected immediately. If no alternative is best on the first dimension, the alternatives are compared for the next most important consequence. This process continues until an alternative is selected or all the consequences have been considered without making a choice. Elimination by aspects (EBA) is similar to LEX, but assumes the attributes are selected in random order, where the probability of selecting a consequence dimension is proportional to its importance. Goldstein and Hogarth (1997) describe a similar trend in judgment and decision-making research. Friedman (1990) provides an excellent introduction to game theory. Over the years, many different approaches have been developed for aiding or supporting decision makers (see von Winterfeldt & Edwards, 1986; Yates et al., 2003). Some of these approaches have already been covered earlier in this chapter and consequently are not addressed further in this section. In particular, decision analysis provides both tools and perspectives on how to structure a decision and evaluate alternatives. Decision analysis software is also available and commonly used. In fact, textbooks on decision analysis normally discuss the use of spreadsheets and other software; software may even be made available along with the textbook (e.g., see Clemen, 1996). Debiasing, discussed earlier in this chapter, is another technique for aiding or supporting decision makers. For example, the first event in a decision tree might be the result of a test. The test result then provides information useful in making the final decision. Note that the standard convention uses circles to denote chance nodes and squares to denote decision nodes (Raiffa, 1968). As for decision trees, the convention for influence diagrams is to depict events with circles and decisions with squares. The conditional probabilities in a decision tree would reflect this linkage, but the structure of the tree itself does not show the linkage directly. Also, the decision tree would use the flipped probability tree using P(warning) at the first stage and P(machine down|warning) at the second stage. It seems more natural for operators to think about the problem in terms of P(machine down) and P(warning|machine down), which is the way the influence diagram in Figure 8 depicts the relationship.
193
REFERENCES Ainslie, G. (1975). Specious reward: A behavioral theory of impulsiveness and impulse Control. Psychological Bulletin, 82, 463–509. Akao, Y. (2004). Quality function deployment: Integrating customer requirements into product design. New York: Productivity Press. Aliev, R. A., Fazlollahi, B., & Vahidov, R. M. (2000). Soft computing based multi-agent marketing decision support system. Journal of Intelligent and Fuzzy Systems, 9(1–2), 1–9. Allais, M. (1953). Le comportement de l’homme rationel devant le risque: critique des postulates et axioms de l’école américaine. Econometrica, 21, 503–546. Allik, J., Toom, M., Raidvee, A., Averin, K., & Kreegipuu, K. (2013). An almost general theory of mean size perception. Vision Research, 83, 25–39. Altmann, E. M., & Burns, B. D. (2005). Streak biases in decision making: Data and a memory model. Cognitive Systems Research, 6, 5–16. Anderson, J. R., Boyle, C. F., & Reiser, B. J. (1985). Intelligent tutoring systems. Science, 228(4698), 456–462. Argote, L., Seabright, M. A., & Dyer, L. (1986). Individual versus group: Use of base-rate and individuating information. Organizational Behavior and Human Decision Making Processes, 38, 65–75. Ariely, D. (2008). Predictably irrational: The hidden forces that shape our decisions. New York: Harper. Arkes, H. R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49, 323–330. Arkes, H. R., & Blumer, C. (1985). The psychology of sunk cost. Organizational Behavior and Human Decision Processes, 35, 124–140. Arkes, H. R., & Hutzel, L. (2000). The role of probability of success estimates in the sunk cost effect. Journal of Behavioral Decision Making, 13(3), 295–306. Armor, D. A., & Taylor, S. E. (2002). When predictions fail: The dilemma of unrealistic optimism. In T. Gilovich, D. W. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgement (pp. 334–347). Cambridge: Cambridge University Press. Balzer, W. K., Doherty, M. E., & O’Connor, R. O., Jr. (1989). Effects of cognitive feedback on performance. Psychological Bulletin, 106, 41–43. Banerjee, A. (1992). A simple model of herd behavior. Quarterly Journal of Economics, 107, 797–817. Barberis, N., & Thaler, R. (2003). A survey of behavioral finance. In G. M. Constantinides, M. Harris, & R. M. Stulz (Eds.), Financial markets and asset pricing (Vol. 1, pp. 1053–1128). Amsterdam: Elsevier. Bar-Hillel, M. (1973). On the subjective probability of compound events. Organizational Behavior and Human Performance, 9, 396–406. Baron, J. (1985). Rationality and intelligence. Cambridge: Cambridge University Press. Baron, J. (1998). Judgment misguided: Intuition and error in public decision making. New York: Oxford University Press. Barron, G., & Leider, S. (2010). The role of experience in the Gambler’s Fallacy. Journal of Behavioral Decision Making, 23(1). 117–129. https://doi.org/10.1002/bdm.676 Barron, G., & Ursino, G. (2013). Underweighting rare events in experience based decisions: Beyond sample error. Journal of Economic Psychology, 39, 278–286. Basak, I., & Saaty, T. (1993). Group decision making using the analytic hierarchy process. Mathematical and Computer Modeling, 17, 101–109. Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, K. D. (2001). Bad is stronger than good. Review of General Psychology, 5(4), 323–370.
194 Bazerman, M. (1998). Judgment in managerial decision making (4th ed.). New York: Wiley. Bazerman, M. H., & Neale, M. A. (1983). Heuristics in negotiation: limitations to effective dispute resolution. In M. H. Bazerman & R. Lewicki (Eds.), Negotiating in organizations. Beverly Hills, CA: Sage. Beach, L. R. (1990). Image theory: Decision making in personal and organizational contexts. Chichester: Wiley. Beach, L. R. (1993). Four revolutions in behavioral decision theory. In M. M. Chemers & R. Ayman (Eds.), Leadership theory and research. San Diego, CA. Academic. Beach, L. R., & Mitchell, T. R. (1978). A contingency model for the selection of decision strategies. Academy of Management Journal, 3, 439–449. Belavkin, R. V. (2006). Towards a theory of decision-making without paradoxes. In Proceedings of the Seventh International Conference on Cognitive Modeling (pp. 38–43). Trieste, Italy. Belecheanu, R., Pawar, K. S., Barson, R. J., & Bredehorst, B. (2003). The application of case-based reasoning to decision support in new product development. Integrated Manufacturing Systems, 14(1), 36–45. Bell, D. (1982). Regret in decision making under uncertainty. Operations Research, 30, 961–981. Benbasat, I., & Lim, L. H. (1993). The effects of group, task, context, technology variables on the usefulness of group support systems: A meta-analysis of experimental studies. Small Group Research, 24(4), 430–462. Berg, J., Forsythe, R., Nelson, F., & Rietz, T. (2008). Results from a dozen years of election futures markets research. In C. R. Plott & V. L. Smith (Eds.), Handbook of experimental economics results (Vol. 1, pp. 742–751). Amsterdam: Elsevier. Berinato, S. (2016). Good charts: The HBR guide to making smarter, more persuasive data visualizations. Boston: Harvard Business Review Press. Bernoulli, D. (1738). Exposition of a new theory of the measurement of risk. St. Petersburg, Russia: Imperial Academy of Science. Bettman, J. R., Luce, M. F., & Payne, J. W. (1998). Constructive consumer choice processes. Journal of Consumer Research, 25(3), 187–217. Bikhchandi, S., Hirschleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom and cultural change as informational cascades. Journal of Political Economy, 100, 992–1026. Birnbaum, M. H., & Mellers, B. A. (1983). Bayesian inference: combining base rates with opinions of sources who vary in credibility. Journal of Personality and Social Psychology, 37, 792–804. Birnbaum, M. H., Coffey, G., Mellers, B. A., & Weiss, R. (1992). Utility measurement: configural-weight theory and the judge’s point of view. Journal of Experimental Psychology: Human Perception and Performance, 18, 331–346. Bock, R. D., & Jones, L. V. (1968). The measurement and prediction of judgment and choice. San Francisco, CA: Holden-Day. Bode, L., & Vraga, E. K. (2018). See something, say something: Correction of global health misinformation on social media. Health Communication, 33(9), 1131–1140. Bolton, G. E., & Chatterjee, K. (1996). Coalition formation, communication, and coordination: an exploratory experiment. In R. J. Zeckhauser, R. C. Keeney, & J. K. Sebanies (Eds.), Wise choices: Decisions, games, and negotiations. Boston, MA: Harvard University Press. Bornstein, G., & Yaniv, I. (1998). Individual and group behavior in the ultimatum game: Are groups more rational players? Experimental Economics, 1, 101–108. Bradlow, E. T. (2005). Current issues and a ‘wish list’ for conjoint analysis. Applied Stochastic Models in Business and Industry, 21(4–5), 319–323. Brannon, I. (2008). Review: Nudge: Improving Decisions about Health, Wealth, and Happiness. The Cato Journal, 28(3), 562–565.
HUMAN FACTORS FUNDAMENTALS Brashers, D. E., Adkins, M., & Meyers, R. A. (1994). Argumentation and computer-mediated group decision making. In L. R. Frey (Ed.), Group communication in context. Mahwah, NJ: Lawrence Erlbaum Associates. Brehmer, B., & Joyce, C. R. B. (1988). Human judgment: The SJT view. Amsterdam: North-Holland. Brockner, J., Paruchuri, S., Idson, L. C., & Higgins, E. T. (2002). Regulatory focus and the probability estimates of conjunctive and disjunctive events. Organizational Behavior and Human Decision Processes, 87(1), 5–24. Brookhouse, J. K., Guion, R. M., & Doherty, M. E. (1986). Social desirability response bias as one source of the discrepancy between subjective weights and regression weights. Organizational Behavior and Human Decision Processes, 37, 316–328. Budescu, D., & Weiss, W. (1987). Reflection of transitive and intransitive preferences: a test of prospect theory. Organizational Behavior and Human Performance, 39, 184–202. Bui, T. X., Jelassi, T. M. & Shakun, M. F. (1990). Group decision and negotiation support systems. European Journal of Operational Research, 46(2), 141–142. Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: A dynamic–cognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432–459. Camerer, C. F., & Loewenstein, G. (2004). Behavioral economics: Past, present, future. In C. F. Camerer, G. Loewenstein, & M. Rabin (Eds.), Advances in behavioral economics (pp. 3–51). Princeton, NJ: Princeton University Press. Camerer, C. F., Loewenstein, G., & Rabin, M. (2004). Advances in behavioral economics. Princeton, NJ: Princeton University Press. Caverni, J. P., Fabre, J. M., & Gonzalez, M. (1990). Cognitive biases. Amsterdam: North-Holland. Chapman, S., Wong, W. L., & Smith, W. (1993). Self-exempting beliefs about smoking and health: Differences between smokers and ex-smokers. American Journal of Public Health, 83(2), 215–219. Chen, G., Kim, K. A., Nofsinger, J. R., & Rui, O. M. (2007). Trading performance, disposition effect, overconfidence, representativeness bias, and experience of emerging market investors. Journal of Behavioral Decision Making, 20, 425–451. Christensen-Szalanski, J. J., & Willham, C. F. (1991). The hindsight bias: A meta-analysis. Organizational Behavior and Human Decision Processes, 48, 147–168. Clemen, R. T. (1996). Making hard decisions: An introduction to decision analysis (2nd ed.). Belmont, CA: Duxbury. Clemen, R. T., & Reilly, T. (2014). Making hard decisions with decision tools. Boston: Cengage Learning. Cohen, M. S. (1993). The naturalistic basis of decision biases. In G. A. Klein, J. Orasanu, R. Calderwood, & E. Zsambok (Eds.), Decision making in action: Models and methods (pp. 51–99). Norwood, NJ: Ablex. Collan, M., & Liu, S. (2003). Fuzzy logic and intelligent agents: Towards the next step of capital budgeting decision support. Industrial Management and Data Systems, 103(6), 410–422. Coma, O., Mascle, O., & Balazinski, M. (2004). Application of a fuzzy decision support system in a design for assembly methodology. International Journal of Computer Integrated Manufacturing, 17(1), 83–94. Connolly, T., Ordonez, L. D., & Coughlan, R. (1997). Regret and responsibility in the evaluation of decision outcomes. Organizational Behavior and Human Decision Processes, 70, 73–85. Costello, F. J. (2009). Fallacies in probability judgments for conjunctions and disjunctions of everyday events. Journal of Behavioral Decision Making, 22(3), 235–251. Cui, G., & Wong, M. L. (2004). Implementing neural networks for decision support in direct marketing. International Journal of Market Research, 46(2), 235–254. Cui, Y., & Wise, A. F. (2015). Identifying content-related threads in MOOC discussion forums. In Proceedings of the Second (2015)
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING ACM Conference on Learning @ Scale - L@S ’15 (pp. 299–303). New York: ACM. Davenport, T. H., & Prusak, L. (2000). Working knowledge: How organizations manage what they know. Boston: Harvard Business School Press. Davis, J. H. (1992). Some compelling intuitions about group consensus decisions, theoretical and empirical research, and interperson aggregation phenomena: selected examples, 1950–1990. Organizational Behavior and Human Decision Processes, 52, 3–38. Dawes, R. M., & Mulford, M. (1996). The false consensus effect and overconfidence: flaws in judgement or flaws in how we study judgement? Organizational Behavior and Human Decision Processes, 65, 201–211. Dawes, R. M., van de Kragt, A. J. C., & Orbell, J. M. (1988). Not me or thee but we: The importance of group identity in eliciting cooperation in dilemma situations: experimental manipulations. Acta Psychologia, 68, 83–97. Dean, M., Kıbrıs, Ö., & Masatlioglu, Y. (2017). Limited attention and status quo bias. Journal of Economic Theory, 169, 93–127. https:// doi.org/https://doi.org/10.1016/j.jet.2017.01.009 Delbecq, A. L., Van de Ven, A. H., & Gustafson, D. H. (1975). Group techniques for program planning. Glenview, IL: Scott, Foresman. Dennis, A. R., Haley, B. J., & Vandenberg, R. J. (1996). A meta-analysis of effectiveness, efficiency, and participant satisfaction in group support systems research. In Proceedings of the International Conference on Information Systems (pp. 278–289). Cleveland, OH. Dennis, A. R., & Wixom, B. H. (2002). Investigators the moderators of the group support systems use with meta-analysis. Journal of Management Information Systems, 18(3), 235–258. Dennis, A. R., Wixom, B. H., & Vandenberg, R. J. (2001). Understanding fit and appropriation effects in group support systems via meta-analysis. MIS Quarterly, 25(2), 167–193. DeSanctis, G., & Gallupe, R. B. (1987). A foundation for the study of group decision support systems. Management Science, 33(5), 589–609. Dickison, D., & Taatgen, N. A. (2007). ACT-R models of cognitive control in the abstract decision making task. In Proceedings of the Eighth International Conference on Cognitive Modeling (pp. 79–84). Ann Arbor, MI. Dixit, A., & Nalebuff, B. (1991). Making strategies credible. In R. J. Zechhauser (Ed.), Strategy and choice (pp. 161–184). Cambridge, MA: MIT Press. Doherty, M. E. (2003). Optimists, pessimists, and realists. In S. Schnieder & J. Shanteau (Eds.), Emerging perspectives on judgment and decision research (pp. 643–679). New York: Cambridge University Press. Dorris, A. L., & Tabrizi, J. L. (1978). An empirical investigation of consumer perception of product safety. Journal of Products Liability, 2, 155–163. Dougherty, M. R. P., Gronlund, S. D., & Gettys, C. F. (2003). Memory as a fundamental heuristic for decision making. In S. Schnieder & J. Shanteau (Eds.), Emerging perspectives on judgment and decision research (pp; 125–164). New York: Cambridge University Press. Drucker, P. (1985). The effective executive. New York; Harper & Row. Drucker, P. (1995). The information executives truly need. Harvard Business Review, 73(1), 54–62. Du Charme, W. (1970). Response bias explanation of conservative human inference. Journal of Experimental Psychology, 85, 66–74. Duda, R. O., Hart, K., Konolige, K., & Reboh, R. (1979). A computer-based consultant for mineral exploration. Technical Report, SRI International, Stanford, CA. Duffy, L. (1993). Team decision making biases: An information processing perspective. In G. A. Klein, J. Orasanu, R. Calderwood, & E. Zsambok (Eds.), Decision making in action: Models and methods. Norwood, NJ: Ablex.
195
Edwards, W. (1954). The theory of decision making. Psychological Bulletin, 41, 380–417. Edwards, W. (1968). Conservatism in human information processing. In B. Kleinmuntz (Ed.), Formal representation of human judgment (pp. 17–52). New York: Wiley. Einhorn, H. J., & Hogarth, R. M. (1978). Confidence in judgment: Persistence of the illusion of validity. Psychological Review, 70, 193–242. Einhorn, H. J., & Hogarth, R. M. (1981). Behavioral decision theory: Processes of judgment and choice. Annual Review of Psychology, 32, 53–88. Ellis, D. G., & Fisher, B. A. (1994). Small group decision making: Communication and the group process (4th ed.). New York: McGraw-Hill. Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. Quarterly Journal of Economics, 75, 643–699. Elster, J. (Ed.). (1986). The multiple self. Cambridge: Cambridge University Press. Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press. Ernst, M., Bolla, K., Mouratidis, M., Contoreggi, C., Matochik, J. A., Kurian, V., Cadet, J. L., et al. (2002). Decision-making in a risk-taking task: A PET study. Neuropsychopharmacology, 26(5), 682–691. Espinasse, B., Picolet, G., & Chouraqui, E. (1997). Negotiation support systems: A multi-criteria and multi-agent approach. European Journal of Operational Research, 103(2), 389–409. Estes, W. (1976). The cognitive side of probability learning. Psychological Review, 83, 37–64. Etzioni, A. (1988). Normative-affective factors: Toward a new decision-making model. Journal of Economic Psychology, 9, 125–150. Evans, J. B. T. (1989). Bias in human reasoning: Causes and consequences. London: Lawrence Erlbaum Associates. Evans, J. B. T., & Pollard, P. (1985). Intuitive statistical inferences about normally distributed data. Acta Psychologica, 60, 57–71. Fallesen, J. J., & Pounds, J. (2001). Identifying and testing a naturalistic approach for cognitive skill training. In E. Salas & G. Klein (Eds.), Linking expertise and naturalistic decision making (pp. 55–70). Mahwah, NJ: Lawrence Erlbaum Associates. Fazlollahi, B., & Vahidov, R. (2001). Extending the effectiveness of simulation-based dss through genetic algorithms. Information and Management, 39(1), 53–64. Feather, N. T. (1966). Effects of prior success and failure on expectations of success and failure. Journal of Personality and Social Psychology, 3, 287–298. Festinger, L. (1957). A theory of cognitive dissonance. Evanston, IL: Row, Peterson. finviz: Map of the market. (2020). Retrieved from https://finviz.com/map .ashx Fischhoff, B. (1982). For those condemned to study the past: Heuristics and biases in hindsight. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press. Fischhoff, B., & MacGregor, D. (1982). Subjective confidence in forecasts. Journal of Forecasting, 1, 155–172. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appropriateness of extreme confidence. Journal of Experimental Psychology: Human Perception and Performance, 3, 552–564. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330–344. Fishburn, P. C. (1974). Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11), 1442–1471.
196 Fong, G. T., Krantz, D. H., & Nisbett, R. E. (1986). The effects of statistical training on thinking about everyday problems. Cognitive Psychology, 18, 253–292. Foroughi, A. (1998). Minimizing negotiation process losses with computerized negotiation support systems. Journal of Applied Business Research, 14(4), 15–26. Fox, C. R., & Hadar, L. (2006). Decisions from experience = sampling error + Prospect Theory: Reconsidering Hertwig, Barron, Weber & Erev (2004). Judgment and Decision Making, 1(2), 159–161. Frederick, S., Loewenstein, G., & O’Donoghue, T. (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, 40, 351-401. Friedman, J. W. (1990). Game theory with applications to economics. New York: Oxford University Press. Frisch, D., & Clemen, R. T. (1994). Beyond expected utility: Rethinking behavioral decision research. Psychological Bulletin, 116(1), 46–54. Furnham, A., & Boo, H. C. (2011). A literature review of the anchoring effect. The Journal of Socio-Economics, 40(1), 35–-42. Galton, F. (1907). Vox populi. Nature, 75(1949), 450–451. Garavelli, A. C., & Gorgoglione, M. (1999). Fuzzy logic to improve the robustness of decision support systems under uncertainty. Computers and Industrial Engineering, 27(1–2), 477–480. Gertman, D. I., & Blackman, H. S. (1994). Human reliability and safety analysis data handbook. New York: Wiley. Gibbs, S., Moore, K., Steel, G., & McKinnon, A. (2017). The Dunning-Kruger effect in a workplace computing setting. Computers in Human Behavior, 72, 589–595. Gigerenzer, G. (2008). Why heuristics work. Perspectives on Psychological Science, 3(1, 20–29. Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62(1), 451–482. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: frequency formats. Psychological Review, 102, 684–704. Gigerenzer, G., Hoffrage, U., & Kleinbolting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506–528. Gigerenzer, G., Todd, P., & the ABC Research Group (1999). Simple heuristics that make us smart. New York: Oxford University Press. Glaser, M., Nöth, M., & Weber, M. (2004). Behavioral finance. In D. J. Koehler & N. Harvey (Eds.), Blackwell handbook of judgment and decision making (pp. 527–546). Hoboken, NJ: Wiley-Blackwell. Goette, L., Han, H. J., & Leung, B. T. K. (2020). Information overload and confirmation bias. https://doi.org/10.17863/CAM.52487. Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109(1), 75–90. Goldstein, W. M., & Hogarth, R. M. (1997). Judgment and decision research: Some historical context. In W. M. Goldstein & R. M. Hogarth (Eds.), Research on judgment and decision making: Currents, connections, and controversies (pp. 3–65). Cambridge: Cambridge University Press. Green, P. E., & Krieger, A. M. (1987). A consumer-based approach to designing product line extensions. Journal of Product Innovation Management, 4(1), 21–32. Green, P. E., & Krieger, A. M. (1990). A hybrid conjoint model for price-demand estimation. European Journal of Operational Research, 44(1), 28–38. Green, P. E., Krieger, A. M., & Wind, Y. (2001). Thirty years of conjoint analysis: reflections and prospects. Interfaces, 31(3), S56–S73. Gruenfeld, D. H., Mannix, E. A., Williams, K. Y., & Neale, M. A. (1996). Group composition and decision making: How member familiarity and information distribution affect process and performance. Organizational Behavior and Human Decision Making Processes, 67(1), 1–15.
HUMAN FACTORS FUNDAMENTALS Guth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior and Organization, ,3, 367–388. Hammer, W. (1993). Product safety management and engineering (2nd ed.). Chicago: American Society of Safety Engineers. Hammond, K. R. (1980). Introduction to Brunswikian theory and methods. In K. R. Hammond & N. E. Wascoe (Eds.), Realizations of Brunswick’s experimental design. San Francisco, CA: Jossey-Bass. Hammond, K. R. (1993). Naturalistic decision making from a Brunswikian viewpoint: Its past, present, future. In G. Klein, J. Orasanu, R. Calderwood, & E. Zsambok (Eds.), Decision making in action: Models and methods (pp. 205–227). Norwood, NJ: Ablex. Hammond, K. R. (1996). Human judgment and social policy: Irreducible uncertainty, inevitable error, unavoidable injustice. New York: Oxford University Press. Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987). Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. IEEE Transactions on Systems, Man, and Cybernetics, 17, 753–770. Hammond, K. R., Stewart, T. R., Brehmer, B., & Steinmann, D. O. (1975). Social judgment theory. In M. F. Kaplan & S. Schwartz (Eds.), Human judgment and decision processes (pp. 271–312). New York: Academic. Hardin, G. (1968). The Tragedy of the Commons. Science, 162, 1243–1248. Harley, E.M. (2007). Hindsight bias in legal decision making. Social Cognition, 25(1), 48–63. Harmon, J., & Rohrbaugh, J. (1990). Social judgement analysis and small group decision making: Cognitive feedback effects on individual and collective performance. Organizational Behavior and Human Decision Processes, 46, 34–54. Hart, P., Stern, E. K., & Sundelius, B. (1997) Beyond groupthink: Political group dynamics and foreign policy-making. Ann Arbor, MI. University of Michigan Press Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental information: the case of frequency of occurrence. American Psychologist, 39, 1372–1388. Heath, C. (1995). Escalation and de-escalation of commitment in response to sunk costs: the role of budgeting in mental accounting. Organizational Behavior and Human Decision Processes, 62, 38–54. Heath, L., Tindale, R. S. (Edwards, J., Posavac, E. J., Bryant, F. B., Henderson-King, E., Suarez-Balcazar, Y., & Myers, J. (1994). Applications of heuristics and biases to social issues. New York: Plenum, Hertwig, R. (2012). The psychology and rationality of decisions from experience. Synthese, 187(1), 269–292. Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological Science, 15(8), 534–539. Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: the belief-adjustment model. Cognitive Psychology, 24, 1–55. Holden, C. (2007). Bird flu futures. Science, 315(5817), 1345b. Holsapple, C. W., Lai, H., & Whinston, A. B. (1998). A formal basis for negotiation support system research. Group Decision and Negotiation, 7(3), 203–227. Holsapple, C. W., & Whinston, A. B. (1996). Decision support systems: A knowledge-based approach, St. Paul, MN: West Publishing. Holtzman, S. (1989). Intelligent decision systems. Reading, MA: Addison-Wesley. Howard, R. A. (1968). The foundations of decision analysis. IEEE Transactions on Systems, Science, and Cybernetics, 4, 211–219. Howard, R. A. (1988). Decision analysis: practice and promise. Management Science, 34, 679–695.
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING Hsee, C. K., Yang, Y., & Li, X. (2019). Relevance insensitivity: A new look at some old biases. Organizational Behavior and Human Decision Processes, 153, 13–26. Huber, J., Wittink, D. R., Fiedler, J. A., & Miller, R. (1993). The effectiveness of alternative preference elicitation procedures in predicting choice. Journal of Marketing Research, 30, 105–114. Huber, J., & Zwerina, K. (1996). The importance of utility balance in efficient choice designs. Journal of Marketing Research, 33(3), 307–317. Hunt, E. (1998). Problem solving. In, R. J. Sternberg (Ed.), Thinking and problem solving (pp. 215–232). Amsterdam: Elsevier. Isen, A. M. (1993). Positive affect and decision making. In M. Lewis & J. M. Haviland (Eds.), Handbook of emotions (pp. 261–277). New York: Guilford. Isenberg, D. J. (1986). Group polarization: A critical review and meta analysis. Journal of Personality and Social Psychology, 50, 1141–1151. Iverson, G., and Luce, R. D. (1998). The representational measurement approach to psychophysical and judgmental problems. In M. H. Birnbaum (Ed.), Measurement, judgement, and decision making (pp. 1–79). San Diego, CA: Academic. Iyengar, S., & Lepper, M. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79, 995–1006. Jacoby, J. (1977). Information load and decision quality: Some contested issues. Journal of Marketing Research, 14, 569–573. Janis, I. L. (1972). Victims of groupthink. Boston, M: Houghton Mifflin. Janis, I. L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice, and commitment. New York: Free Press. Jasic, T., & Wood, D. (2004). The profitability of daily stock market indices trades based on neural network predictions: Case study for the S&P 5000, the DAX, the TOPIX and the FTSE in the period 1965–1999. Applied Financial Economics, 14(4), 285–297. Johnson, E. J., Shu, S. B., Dellaert, B. G.C., Fox, C. R., Goldstein, D. G.,Häubl, G., Larrick, R. P., Payne, J. W., Peters, E., Schkade, D., Wansink, B., & Weber, E. U. (2012). Beyond nudges: Tools of a choice architecture, Marketing Letters, 23, 487–504. Johnson, J. G., & Raab, M. (2003). Take the first: Option generation and resulting choices. Organizational Behavior and Human Decision Processes, 91, 215–229. Jonides, J., & Jones, C. M. (1992). Direct coding for frequency of occurrence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 368–378. Juslin, P., & Olsson, H. (1999). Computational models of subjective probability calibration. In Judgment and decision making: Neo-Brunswickian and process-tracing approaches (pp. 67–95). Mahwah, NJ, Lawrence Erlbaum Associates. Kahneman, D. (2003). Maps of bounded rationality: Psychology for behavioral economics. American Economic Review, 93(5), 1449–1475. Kahneman, D. (2011). Thinking, fast and slow. London: Allen Lane. Kahneman, D., Slovic, P., & Tversky, A. (Eds.) (1982). Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 251–273. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Kane, J. S., & Woehr, D. J. (2006). Performance measurement reconsidered: An examination of frequency estimation as a basis for assessment. In W. Bennett, Jr., C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 77–110). Mahwah, NJ: Lawrence Erlbaum Associates. Kappes, A., Harvey, A. H., Lohrenz, T., Montague, P. R., & Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nature Neuroscience, 23(1), 130–137.
197
Kardes, F. R., Posavac, S. S., & Cronley, M. L. (2004). Consumer inference: A review of processes, bases, and judgment contexts. Journal of Consumer Psychology, 14(3), 230–256. Kaufhold, M.-A., Rupp, N., Reuter, C., & Habdank, M. (2020). Mitigating information overload in social media during conflicts and crises: design and evaluation of a cross-platform alerting system. Behaviour & Information Technologyv 39(3), 319–342. Keen, P. G. W., & Scott-Morton, M. S. (1978). Decision support systems: An organizational perspective. Reading, MA: Addison-Wesley. Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple objectives: Preferences and value tradeoffs. New York: Wiley. Keim, D., Andrienko, G., Fekete, J., Görg, C., Kohlhammer, J., & Melançon, G. (2008a). Visual analytics: Definition, process, and challenges. Information Visualization, 56, 154–175. Keim, D., Mansmann, F., Schneidewind, J., Thomas, J., & Ziegler, H. (2008b). Visual analytics: Scope and challenges. Visual Data Mining, 98, 76–90. Keim, D., Nietzschmann, T., Schelwies, N., Schneidewind, J., Schreck, T., & Ziegler, H. (2006). A spectral visualization system for analyzing financial time series data. In Proceedings of the Eurographics/IEEE-VGTC Symposium on Visualization (EuroVis), Lisbon, Portugal. Kepner, C. H., & Tregoe, B. B. (1976). The rational manager: A systematic approach to problem solving and decision making. Princeton, NJ: Kepner-Tregoe. Keren, G. (1990). Cognitive aids and debiasing methods: Can cognitive pills cure cognitive ills? In J. P. Caverni, J. M. Fabre, & M. Gonzalez (Eds.), Cognitive biases. Amsterdam: North-Holland, Kerr, L. N., MacCoun, R. J., & Kramer, G. P. (1996). Bias in judgment: Comparing individuals and groups. Psychological Review, 103, 687–719. Kersten, G. E., & Noronha, S. J. (1999). WWW-based negotiation support systems: design, implementation, and use. Decision Support Systems, 25(2), 135–154. Khaleel, I., Wimmer, B. C., Peterson, G. M., Zaidi, S. T. R., Roehrer, E., Cummings, E., & Lee, K. (2020). Health information overload among health consumers: A scoping review. Patient Education and Counseling, 103(1), 15–32. Kidder, R. M. (1995). How good people make tough choices. New York: HarperCollins. Klayman, J., & Ha, Y. W. (1987). Confirmation, disconfirmation, and information in hypothesis testing. Journal of Experimental Psychology: Human Learning and Memory, 94(2), 211–228. Klein, G. A. (1989). Recognition-primed decisions. In W. Rouse (Ed.), Advances in man–machine system research. Greenwich, CT: JAI Press. Klein, G. A. (1995). The value added by cognitive analysis. In Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting (pp. 530–533). San Diego, CA. Klein, G. A. (1998). Sources of power: How people make decisions. Cambridge, MA: MIT Press. Klein, G. A. (2004). The power of intuition: How to use your gut feelings to make better decisions at work. New York: Doubleday. Klein, G. A., Orasanu, J., Calderwood, R., & Zsambok, E. (Eds.) (1993). Decision making in action: Models and methods. Norwood, NJ: Ablex. Klein, G. A., & Wolf, S. (1995). Decision-centered training. In Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting (pp. 1249–1252). San Diego, CA. Knobloch-Westerwick, S., Mothes, C., & Polavin, N. (2017). Confirmation bias, ingroup bias, and negativity bias in selective exposure to political information. Communication Research, 47(1), 104–124. Koch, A. (2016). Herd behavior and mutual fund performance. Management Science, 63(11), 3849–3873. https://doi.org/10.1287/mnsc .2016.2543 Koehler, D. J., & Harvey, N. (2004). Blackwell handbook of judgment and decision making. Hoboken, NJ: Wiley-Blackwell.
198 Koehler, J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges. Behavioral and Brain Sciences, 19, 1–53. Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6, 107–118. Krauss, S., Martignon, L., & Hoffrage, U. (1999). Simplifying Bayesian inference: The general case. In, L. Magnani, N. Nersessian, & P. Thagard (Eds.), Model-based reasoning in scientific discovery (pp. 165–179). New York: Plenum. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. Kuhberger, A. (1995). The framing of decisions: A new look at old problems. Organizational Behavior and Human Decision Processes, 62, 230–240. Kuo, R. J., & Xue, K. C. (1998). A decision support system for sales forecasting through fuzzy neural networks with asymmetric fuzzy weights. Decision Support Systems, 24(2), 105–126. Lam, S-H., Cao, J-M., & Fan, H. (2003). Development of an intelligent agent for airport gate assignment. Journal of Air Transportation, 8(1), 103–114. LaPlante, A. (1990). Bring in the expert: Expert systems can’t solve all problems, But they’re learning. InfoWorld, 12(40), 55–64. Lari, A. (2003). A decision support system for solving quality problems using case-based reasoning. Total Quality Management and Business Excellence, 14(6), 733–745. Lathrop, R. G. (1967). Perceived variability. Journal of Experimental Psychology, 23, 498–502. Laudon, K. C., & Laudon, J. P. (2003). Management information systems: Managing the digital firm (8th ed.). Upper Saddle River, NJ: Prentice Hall. Lehto, M. R. (1991). A proposed conceptual model of human behavior and its implications for design of warnings. Perceptual and Motor Skills, 73, 595–611. Lehto, M. R., & Buck, J. R. (2007). Introduction to human factors and ergonomics for engineers. Boca Raton, FL: CRC Press. Lehto, M. R., James, D. S., & Foley, J. P. (1994). Exploratory factor analysis of adolescent attitudes toward alcohol and risk. Journal of Safety Research, 25, 197–213. Lehto, X. Y., & Lehto, M. R. (2019). Vacation as a public health resource: Toward a wellness-centered tourism design approach. Journal of Hospitality & Tourism Research, 43(7), 935–960. Lehto, M. R., & Papastavrou, J. (1991). A distributed signal detection theory model: implications to the design of warnings. In Proceedings of the 1991 Automatic Control Conference (pp. 2586–2590). Boston, MA. Lehto, M. R., Papastavrou, J. P., Ranney, T. A., & Simmons, L. (2000). An experimental comparison of conservative vs. optimal collision avoidance system thresholds. Safety Science, 36(3), 185–209. Levin, L. P. (1975). Information integration in numerical judgements and decision processes. Journal of Experimental Psychology: General, 104, 39–53. Levitt, S. D., & Dubner, S. J. (2005). Freakonomics: A rogue economist explores the hidden side of everything. New York: William Morrow & Co. Lewandowsky, S., Ecker, U. K. H., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106–131. Li, J., Liu, M., & Liu, X. (2016). Why do employees resist knowledge management systems? An empirical study from the status quo bias and inertia perspectives. Computers in Human Behavior, 65, 189–200. Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P.
HUMAN FACTORS FUNDAMENTALS Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 306–334). Cambridge: Cambridge University Press. Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Coombs, B. (1978). Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory, 4, 551–578. Likert, R., & Likert, J. G. (1976). New ways of managing conflict. New York: McGraw-Hill. Lim, L.-H., & Benbasat, I. (1993). A theoretical perspective of negotiation support systems. Journal of Management Information Systems, 9(3), 27–44. Little, J. D. C. (1970). Models and managers: The concept of a decision calculus. Management Science, 16(8), 466–485. Livnat, Y., Agutter, J., Moon, S., & Foresti, S. (2005). Visual correlation for situational awareness. In Proceedings of IEEE Symposium on Information Visualization (pp. 95–102). Los Alamitos, CA. Luce, R. D., & Raiffa, H. (1957). Games and decisions. New York: Wiley. Martignon, L., & Krauss, S. (2003). Can l’homme eclaire be fast and frugal? Reconciling Bayesianism and bounded rationality. In S. Schnieder & J. Shanteau (Eds.), Emerging perspectives on judgment and decision research (pp. 108–122). New York: Cambridge University Press. Matthes, J., Karsay, K., Schmuck, D., & Stevic, A. (2020). “Too much to handle”: Impact of mobile social networking sites on information overload, depressive symptoms, and well-being. Computers in Human Behavior, 105, 106217. Matwin, S., Szpakowicz, S., Koperczak, Z., Kersten, G. E., & Michalowski, W. (1989). Negoplan: An expert system shell for negotiation support. IEEE Expert, 4(4), 50–62. Mayer, R. E. (1983). Thinking, problem solving. In Cognition. New York: W. H. Freeman. Mazzoni, G., & Vannucci, M. (2007). Hindsight bias, the misinformation effect, and false autobiographical memories. Social Cognition, 25(1), 203–220. McGuire, W. J. (1966). Attitudes and opinions. Annual Review of Psychology, 17, 475–514. McKnight, A. J., Langston, E. A., McKnight, A. S., & Lange, J. E. (1995). The bases of decisions leading to alcohol impaired driving. In C. N. Kloeden & A. J. McLean (Eds), Proceedings of the 13th International Conference on Alcohol, Drugs, and Traffic Safety (pp. 143–147). Adelaide, Australia, August 13–18. McLeod, P. L. (1992). An assessment of the experimental literature on electronic support of group work: Results of a meta-analysis. Human–Computer Interaction, 7, 257–280. Messick, D. M. (1991). Equality as a decision heuristic. In B. Mellers (Ed.), Psychological issues in distributive justice. New York: Cambridge University Press. Montgomery, H. (1989). From cognition to action: The search for dominance in decision making. In H. Montgomery & O. Svenson (Eds.), Process and structure in human decision making. Chichester: Wiley. Montgomery, H., & Willen, H. (1999). Decision making and action: The search for a good structure. In Judgment and decision making: Neo-Brunswikian and process-tracing approaches (pp. 147–173). Mahwah, NJ: Lawrence Erlbaum Associates. Morton, V., & Torgerson, D. J. (2003). Effect of regression to the mean on decision making in health care. BMJ, 326(7398), 1083–1084. Moscovici, S. (1976). Social influence and social change. London: Academic. Myers, J. L., Suydam, M. M., & Gambino, B. (1965). Contingent gains and losses in a risky decision situation. Journal of Mathematical Psychology, 2, 363–370. Nah, F., & Benbasat, I. (2004). Knowledge-based support in a group decision making context: an expert–novice comparison. Journal of Association for Information Systems, 5(3), 125–150.
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING Nah, F. H., Mao, J., & Benbasat, I. (1999). The effectiveness of expert support technology for decision making: individuals versus small groups. Journal of Information Technology, 14(2), 137–147. Nanda, G., & Douglas, K. A. (2019). Machine learning based decision support system for categorizing MOOC discussion forum posts. In Proceedings of the 12th International Conference on Educational Data Mining (EdM 2019), Montréal, Canada, July 2–5, 2019. Nanda, G., Lehto, M. R., & Nof, S. Y. (2014). User requirement analysis for an online collaboration tool for senior industrial engineering design course. Human Factors and Ergonomics in Manufacturing & Service Industries, 24(5), 557–573. Nanda, G., Tan, J., Auyeung, P., & Lehto, M. (2012). Evaluating HUBzero(TM) as a collaboration platform for reliability engineering. In IIE Annual Conference. Proceedings. Nanda, G., Vallmuur, K., & Lehto, M. (2019). Semi-automated text mining strategies for identifying rare causes of injuries from emergency room triage data. IISE Transactions on Healthcare Systems Engineering, 1–15. Nanda, G., Vallmuur, K., & Lehto, M. (2020). Intelligent human-machine approaches for assigning groups of injury codes to accident narratives. Safety Science, 125, 104585. Navon, D. (1979). The importance of being conservative. British Journal of Mathematical and Statistical Psychology, 31, 33–48. Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2, 175-220. Nisbett, R., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall. Nunamaker, J. F., Dennis, A. R., Valacich, J. S., Vogel, D. R., & George, J. F. (1991). Electronic meeting systems to support group work: theory and practice at Arizona. Communications of the ACM, 34(7), 40–61. Obrecht, N. A., & Chesney, D. L. (2018). Tasks that prime deliberative processes boost base rate use. In Proceedings of the 40th Annual Conference of the Cognitive Science Society (pp. 2180–2185). Cognitive Science Society. Ocón Palma, M. del C., Seeger, A.-M., & Heinzl, A. (2020). “Mitigating information overload in e-commerce interactions with conversational agents BT. In, F. D. Davis, R. Riedl, J. vom Brocke, P.-M. Léger, A. Randolph, & T. Fischer (Eds.), Information systems and neuroscience (pp. 221–228). Cham: Springer International Publishing. O’Donoghue, T., & Rabin, M. (1999). Doing it now or later. American Economic Review, 89(1), 103–124. Ofir, C., Raghubir, P., Brosh, G., Monroe, K. B., & Heiman, A. (2008). Memory-based store price judgments: The role of knowledge and shopping experience. Journal of Retailing, 84(4), 414–423. Oldenburger, K., Lehto, X., Feinberg, R., Lehto, M., & Salvendy, G. (2007). Critical purchasing incidents in E-Business. Behaviour and Information Technology, 27(1), 63–77. Öllinger, M., & Goel, V. (2010). Problem solving. In B. Glatzeder, V. Goel, & A. von Müller (Eds.), Towards a theory of thinking (pp. 3–21). Berlin; Springer. Olson, D., & Mossman, C. (2003). Neural network forecasts of Canadian stock returns using accounting ratios. International Journal of Forecasting, 19(3), 453–465. Orasanu, J., & Salas, E. (1993). Team decision making in complex environments. In G. A. Klein, J. Orasanu, R. Calderwood, & E. Zsambok (Eds.), Decision making in action: Models and methods, Norwood, NJ. Ablex. Osborn, F. (1937). Applied imagination. New York: Charles Scribner & Sons, Oskarsson, A. T., Van Boven, L., McClelland, G. H., & Hastie, R. (2009). What’s next? Judging sequences of binary events. Psychological Bulletin, 135(2), 262–285. Paese, P. W., Bieser, M., & Tubbs, M. E. (1993). Framing effects and choice shifts in group decision making. Organizational Behavior and Human Decision Processes, 56, 149–165.
199
Payne, J. W. (1980). Information processing theory: Some concepts and methods applied to decision research. In T. S. Wallsten (Ed.), Cognitive processes in choice and decision research. Hillsdale, NJ: Lawrence Erlbaum Associates, Payne, J. W., & Bettman, J. R. (2004). Walking with the scarecrow: The information-processing approach to decision research. In D. J. Koehler, & N. Harvey (Eds.), Blackwell handbook of judgment and decision making (pp. 110–132). Hoboken, NJ: Wiley-Blackwell. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1993). The adaptive decision maker. Cambridge: Cambridge University Press. Pennington, N., & Hastie, R. (1986). Evidence evaluation in complex decision making. Journal of Personality and Social Psychology, 51(2), 242–258. Pennington, N., & Hastie, R. (1988). Explanation-based decision making: Effects of memory structure on judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 521–533. Pennycook, G., & Rand, D. G. (2019). Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences, 116(7), 2521 https://doi.org/10.1073/pnas.1806781116 Pennycook, G., Ross, R. M., Koehler, D. J., & Fugelsang, J. A. (2017). Dunning–Kruger effects in reasoning: Theoretical implications of the failure to recognize incompetence. Psychonomic Bulletin & Review, 24(6), 1774–1784. https://doi.org/10.3758/s13423-0171242-7 Pesendorfer, W. (2006). Behavioral economics comes of age: A review essay on ‘advances in behavioral economics,’ Journal of Economic Literature, 44(3, 712–721. Peterson, C. R., & Beach L. R. (1967). Man as an infinitive statistician. Psychological Bulletin, 68, 29–46. Pitz, G. F. (1980). The very guide of life: the use of probabilistic Information for making decisions. In, T. S. Wallsten (Ed.), Cognitive processes in choice and decision behavior. Mahwah, NJ: Lawrence Erlbaum Associates. Plaisant, C. (2004). The challenge of information visualization evaluation. In Proceedings of the Working Conference on Advanced Visual Interfaces (pp. 109–116). ACM, Gallipoli, Italy. Pliske, R., & Klein, G. (2003). The naturalistic decision-making perspective. In S. Schnieder & J. Shanteau (Eds.), Emerging perspectives on judgment and decision research (pp. 108–122). New York: Cambridge University Press,. Pliske, R. M., McCloskey, M. J., & Klein, G. (2001). Decision skills training: facilitating learning from experience. In E. Salas & G. Klein (Eds.), Linking expertise and naturalistic decision making (pp. 37–53). Mahwah, NJ: Lawrence Erlbaum Associates. Prater, J., Kirytopoulos, K. & Ma, T. (2017). Optimism bias within the project management context: A systematic quantitative literature review, International Journal of Managing Projects in Business, 10(2), 370–385. Raiffa, H. (1968). Decision analysis, Reading, MA: Addison-Wesley. Raiffa, H. (1982). The art and science of negotiation. Cambridge, MA: Harvard University Press. Rashid, A. M., Albert, I., Cosley, D., Lam, S. K., McNee, S. M., Konstan, J. A., & Riedl, J. (2002). Getting to know you: Learning new user preferences in recommender systems. In Proceedings of the 7th International Conference on Intelligent User Interfaces (pp. 127–134). San Francisco, CA ACM. Rasmussen, J. (1983). Skills, rules, knowledge: Signals, signs, and symbols and other distinctions in human performance models. IEEE Transactions on Systems, Man, and Cybernetics, 13(3), 257–267. Reason, J. (1990). Human error. Cambridge: Cambridge University Press. Reitman, W. R. (1964). Heuristic decision procedures, open constraints, and the structure of ill-defined problems. In M. W. Shelly & G. L. Bryan (Eds.), Human judgments and optimality (pp. 282–315). New York: Wiley.
200 Rethans, A. J. (1980). Consumer perceptions of hazards. In PLP-80 Proceedings (pp. 25–29). Reyna, V. F., & Brainerd, C. F. (1995). Fuzzy-trace theory: An interim synthesis. Learning and Individual Differences, 7, 1–75. Robert, H. M. (1990). Robert’s rules of order. Newly revised (9th ed.). Glenview, IL: Scott, Foresman. Saaty, T. L. (1990). Multicriteria decision making: The analytic hierarchy process, Pittsburgh, PA: RWS Publications. Samson, A. (Ed.) (2016). The behavioral economics guide 2016 (with an introduction by Gerd Gigerenzer). Retrieved from https://www .behavioraleconomics.com. Samson, A. (Ed.) (2019). The behavioral economics guide 2019 (with an Introduction by Uri Gneezy). Retrieved from https://www .behavioraleconomics.com. Samson, A., & Voyer, B. G. (2012). Two minds, three ways: Dual system and dual process models in consumer psychology. AMS Review, 2(2–4), 48–71. Samson, A., & Voyer, B. G. (2014). Emergency purchasing situations: Implications for consumer decision-making. Journal of Economic Psychology, 44, 21–33. Samuelson, W., & Zeckhauser, R. J. (1988). Status quo bias in decision making. Journal of Risk and Uncertainty, 1, 7–59. Sapena, O., Botti, V., & Argente, E. (2003). Application of neural networks to stock prediction in ‘pool’ companies. Applied Artificial Intelligence, 17(7), 661–673. SAS Visual Analytics: Warranty Analysis. (2020). Retrieved from https:// www.sas.com/en_us/software/visual-analytics/demo/warrantyanalysis.html Satish, U., & Streufert, S. (2002). Value of a cognitive simulation in medicine: towards optimizing decision making performance of healthcare personnel. Quality and Safety in Health Care, 11(2), 163–167. Satzinger, J., & Olfman, L. (1995). Computer support for group work: Perceptions of the usefulness of support scenarios and specific tools. Journal of Management Systems, 11(4), 115–148. Savage, L. J. (1954). The foundations of statistics. New York: Wiley. Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-Commerce recommendation applications. Data Mining and Knowledge Discovery, 5(1), 115–153. Schelling, T. (1960). The strategy of conflict. Cambridge, MA: Harvard University Press, Schelling, T. (1978). Micromotives and macrobehavior, New York: W.W. Norton, Schwartz, B. (2004). The paradox of choice: Why more is less. New York: Ecco. Scott-Morton, M. S. (1977). Management decision systems: Computer-based support for decision making. Cambridge, MA: Harvard University Press, Sedlmeier, P., Hertwig, R., & Gigerenzer, G. (1998). Are judgments of the positional frequencies of letters systematically biased due to availability?” Journal of Experimental Psychology: Learning, Memory and Cognition, 24(3), 754–770. Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press. Shafir, E., Diamond, P., & Tversky, A. (1997). Money illusion. Quarterly Journal of Economics, 112(2), 341–374. Shah, A. K., & Oppenheimer, D. M. (2008). Heuristics made easy: An effort-reduction framework. Psychological Bulletin. American Psychological Association. Sharda, R., Barr, S. H., & McDonnell, J. C. (1988). Decision support system effectiveness: a review and an empirical test. Management Science, 34(2), 139–159. Sharot, T. (2011). The optimism bias. Current Biology, 21(23), R941–R945. Shen, X.-L., Zhang, K. Z. K., & Zhao, S. J. (2016). Herd behavior in consumers’ adoption of online reviews. Journal of the Association for Information Science and Technology, 67(11), 2754–2765.
HUMAN FACTORS FUNDAMENTALS Shepperd, J. A., Carroll, P., Grace, J., & Terry, M. (2002). Exploring the causes of comparative optimism. Psychologica Belgica, 42, 65-98. Shin, J., Jian, L., Driscoll, K., & Bar, F. (2018). The diffusion of misinformation on social media: Temporal pattern, message, and source. Computers in Human Behavior, 83, 278–287. Siebold, D. R. (1992). Making meetings more successful: plans, formats, and procedures for group problem-solving. In, 6th ed., R. Cathcart & L. Samovar (Eds.), Small group communication (pp. 178–191). Dubuque, IA: Wm. C. Brown. Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69, 99–118. Simon, H. A. (1977). The new science of management decisions. Englewood Cliffs, NJ: Prentice-Hall. Simon, H. A. (1982). Models of bounded rationality. Cambridge, MA: MIT Press. Simon, H. A. (1983). Alternative visions of rationality. In H. A. Simon (Ed.), Reason in human affairs (pp. 3–36). Palo Alto, CA: Stanford University Press. Simon, H. A. (1987). Behavioural economics. In The new Palgrave: A dictionary of economics (Vol. 1, pp. 221–225). New York: Palgrave. Simon, H. A., Dantzig, G. B., Hogarth, R., Plott, C. R., Raiffa, H., Schelling, T. C., Shepsle, K. A., et al. (1987). Decision making and problem solving. Interfaces, 17(5), 11–31. Singleton, W. T., & Hovden, J. (1987). Risk and decisions. New York: Wiley. Slovic, P. (1978). The psychology of protective behavior. Journal of Safety Research, 10, 58–68. Slovic, P. (1987). Perception of risk. Science, 236, 280–285. Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2002). The affect heuristic. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment (pp. 397–420). New York: Cambridge University Press. Slovic, P., Fischhoff, B., & Lichtenstein, S. (1977). Behavioral decision theory. Annual Review of Psychology, 28, 1–39. Smith, A. (1875). The theory of moral sentiments: To which is added, a dissertation on the origin of languages. London: George Bell & Sons. Smith, A. R., & Price, P. C. (2010). Sample size bias in the estimation of means. Psychonomic Bulletin & Review, 17(4), 499–503. Smith, A. R., Rule, S., & Price, P. C. (2017). Sample size bias in retrospective estimates of average duration. Acta Psychologica, 176, 39–46. Sniezek, J. A. (1992). Groups under uncertainty: An examination of confidence in group decision making. Organizational Behavior and Human Decision Processes, 52, 124–155. Sniezek, J. A., Wilkins, D. C., Wadlington, P. L., & Baumann, M. R. (2002). Training for crisis decision-making: psychological issues and computer-based solutions. Journal of Management Information Systems, 18(4), 147–168. Stanoulov, N. (1994). Expert knowledge and computer-aided group decision making: some pragmatic reflections. Annals of Operations Research, 51, 141–162. Stasko, J., Görg, C., & Spence, R. (2008). Jigsaw: supporting investigative analysis through interactive visualization. Information Visualization, 7(2), 118–132. Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision making: Biased information sampling during discussion. Journal of Personality and Social Psychology, 48, 1467–1478. Sternberg, R. J., & Frensch, P. A. (1991). Complex problem solving: Principles and mechanisms. Mahwah, NJ: Lawrence Erlbaum Associates. Stevenson, M. K., Busemeyer, J. R., & Naylor, J. C. (1993). Judgment and decision-making theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 1, 2nd ed.) . Palo Alto, CA: Consulting Press.
DECISION-MAKING MODELS, DECISION SUPPORT, AND PROBLEM SOLVING Stone, M. (1960). Models for reaction time. Psychometrika, 25, 251–260. Strack, F., & Deutsch, R. (2015). The duality of everyday life: Dual-process and dual system models in social psychology. In APA handbook of personality and social psychology, Vol. 1: Attitudes and social cognition. (pp. 891–927). Washington, DC, US: American Psychological Association. Stukey, E., & Zeckhauser, R. (1978). Decision analysis. In A Primer for Policy Analysis (pp. 201–254). New York: W.W. Norton. Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few. New York: Doubleday. Svenson, O. (1990). Some propositions for the classification of decision situations. In K. Borcherding, O. Larichev, & D. Messick (Eds.), Contemporary issues in decision making (pp. 17–31). Amsterdam: North-Holland. Svenson, O., & Maule, A. J. (1993). Time pressure and stress in human judgment and decision making. New York: Plenum. Tambuscio, M., Oliveira, D. F. M., Ciampaglia, G. L., & Ruffo, G. (2018). Network segregation in a model of misinformation and fact-checking. Journal of Computational Social Science, 1(2), 261–275. https://doi.org/10.1007/s42001-018-0018-9. Thaler, R. H. (1980). Toward a positive theory of consumer choice. Journal of Economic Behavior & Organization, 1(1), 39–60. Thaler, R. H. (1988). Anomalies: The ultimatum game. Journal of Economic Perspectives, 2(4), 195–206. Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving decisions about health, wealth, and happiness. New Haven, CT: Yale University Press. Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: The research and development agenda for visual analytics. IEEE Computer Society, Washington, DC. Thomas, J. J., & Kielman, J. (2009). Challenges for visual analytics. Information Visualization, 8(4), 309–314. Todd, P., & Benbasat, I. (1991). An experimental investigation of the impact of computer based decision aids on decision making strategies. Information Systems Research, 2(2), 87–115. Todd, P., & Benbasat, I. (1992). The use of information in decision making: an experimental investigation of the impact of computer based DSS on processing effort. MIS Quarterly, 16(3), 373–393. Todd, P., & Benbasat, I. (1993). An experimental investigation of the relationship between decision makers, decision aids and decision making effort. INFOR, 31(2), 80–100. Todd, P., & Benbasat, I. (1999). Evaluating the impact of dss, cognitive effort, and incentives on strategy selection. Information Systems Research, 10(4, 356–374. Todd, P., & Benbasat, I. (2000). Inducing compensatory information processing through decision aids that facilitate effort reduction: an experimental assessment. Journal of Behavioral Decision Making, 13(1), 91–106. Toubia, O., Simester, D. I., Hauser, J. R., & Dahan, E. (2003). Fast polyhedral adaptive conjoint estimation. Marketing Science, 22(3), 273–303. Trepel, C., Fox, C. R., & Poldrack, R. A. (2005). Prospect theory on the brain? Toward a cognitive neuroscience of decision under risk. Cognitive Brain Research, 23(1), 34–50. Tuckman, B. W. (1965). Development sequence in small groups. Psychological Bulletin, 63, 289–399. Turban, E., & Aronson, J. E. (2001). Decision support systems and intelligent systems (6th ed.). Upper Saddle River, NJ: Prentice Hall. Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31–48. Tversky, A. (1972). Elimination by aspects: a theory of choice. Psychological Review, 79, 281–289. Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and Probability. Cognitive Psychology, 5, 207–232. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124–1131.
201
Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211, 453–458. Valenzi, E., & Andrews, I. R. (1973). Individual differences in the decision processes of employment interviews. Journal of Applied Psychology, 58, 49–53. Vavra, T. G., Green, P. E., & Krieger, A. M. (1999). Evaluating EZ-pass. Marketing Research, 11(3), 5–16. Viscusi, W. K. (1991). Reforming products liability. Cambridge, MA: Harvard University Press. von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior. Princeton, NJ: Princeton University Press. von Winterfeldt, D., &Edwards, W. (1986). Decision analysis and behavioral research, Cambridge: Cambridge University Press. Wagenaar, W. A. (1992). Risk taking and accident causation. In, J. F. Yates (Ed.), Risk-taking behavior (pp. 257–281). New York: Wiley. Wallenius, J., Dyer, J. S., Fishburn, P. C., Steuer, R. E., Zionts, S., & Deb, K. (2008). Multiple criteria decision making, multiattribute utility theory: Recent accomplishments and what lies ahead. Management Science, 54(7), 1336–1349. Wallsten, T. S. (1995). Time pressure and payoff effects on multidimensional probabilistic inference. In O. Svenson & J. Maule (Eds.), Time pressure and stress in human judgment. New York: Plenum. Wallsten, T. S., Zwick, R., Kemp, S., & Budescu, D. V. (1993). Preferences and reasons for communicating probabilistic information in verbal and numerical terms. Bulletin of the Psychonomic Society, 31, 135–138. Walter, N., & Tukachinsky, R. (2019). A meta-analytic examination of the continued influence of misinformation in the face of correction: How powerful is it, why does it happen, and how to stop it? Communication Research, 47(2), 155–177. Wardle, C. (2019, September). Misinformation has created a new world disorder. Scientific American. Retrieved from https://www .scientificamerican.com/article/misinformation-has-created-anew-world-disorder/ Wattenberg, M. (1999). Visualizing the stock market. In CHI ‘99 Extended Abstracts on human factors in computing systems (pp. 188–189). ACM, Pittsburgh, PA. Weber, E. (1994). From subjective probabilities to decision weights: the effect of asymmetric loss functions on the evaluation of uncertain outcomes and events. Psychological Bulletin, 115, 228–242. Weber, E., Anderson, C. J., & Birnbaum, M. H. (1992). A theory of perceived risk and attractiveness. Organizational Behavior and Human Decision Processes, 52, 492–523. Weinstein, N. D. (1979). Seeking reassuring or threatening information about environmental cancer. Journal of Behavioral Medicine, 2, 125–139. Weinstein, N. D. (1980). Unrealistic optimism about future life events. Journal of Personality and Social Psychology, 39, 806–820. Weinstein, N. D. (1987). Unrealistic optimism about illness susceptibility: Conclusions from a community-wide sample. Journal of Behavioral Medicine, 10, 481–500. Weinstein, N. D., & Klein, W. M. (1995). Resistance of Personal risk perceptions to debiasing interventions. Health Psychology, 14, 132–140. Welch, D. D. (1994). Conflicting agendas: Personal morality in institutional settings. Cleveland, OH: Pilgrim Press. Welford, A. T. (1976). Skilled performance. Glenview, IL: Scott, Foresman, Wickens, C. D. (1992). Engineering psychology and human performance. New York: HarperCollins. Wickens, C. D., Clegg, B. A., Witt, J. K., Smith, C. A. P., Herdener, N., & Spahr, K. S. (2020). Model of variability estimation: Factors influencing human prediction and estimation of variability in continuous information. Theoretical Issues in Ergonomics Science, 21(2), 220–238.
202 Wilke, A., Scheibehenne, B., Gaissmaier, W., McCanney, P., & Barrett, H. C. (2014). Illusionary pattern detection in habitual gamblers. Evolution and Human Behavior, 35(4), 291–297. https://doi.org/ https://doi.org/10.1016/j.evolhumbehav.2014.02.010 Wind, J., Green, P. E., Shifflet, D., & Scarbrough, M. (1989). ‘Courtyard by Marriott’: Designing a hotel facility with consumer-based marketing models. Interfaces, 19(1), 25–47. Winkler, R. L., & Murphy, A. H. (1973). Experiments in the laboratory and the real world. Organizational Behavior and Human Performance, 10, 252–270. Wolfers, J., & Zitzewitz, E. (2004). Prediction markets. Journal of Economic Perspectives, 18(2), 107–126. Wright, P. (1975). Consumer choice strategies: simplifying vs. optimizing. Journal of Marketing Research, 12(1), 60–67. Wu, S. J., Lehto, M., Yih, Y., Saleem, J. J., & Doebbeling, B. N. (2007). Relationship of estimated resolution time and computerized clinical reminder adherence. AMIA … Annual Symposium Proceedings. AMIA Symposium, 2007 (pp. 334–338). Yates, J. F. ((Ed.) (1992). Risk-taking behavior. New York: Wiley.
HUMAN FACTORS FUNDAMENTALS Yates, J. F., Veinott, E. S., & Patalano, A. L. (2003). Hard decisions, bad decisions: on decision quality and decision aiding. In S. Schnieder & J. Shanteau (Eds.), Emerging perspectives on judgment and decision research (pp. 13–63). New York: Cambridge University Press. Zander, A. (1994). Making groups effective (2nd ed.). San Francisco: Jossey-Bass. Zey, M. (Ed.) (1992). Decision making: Alternatives to rational choice models. London: Sage. Zhou, M., & Xu, S. (1999). Dynamic recurrent neural networks for a hybrid intelligent decision support system for the metallurgical industry. Expert Systems, 16(4), 240–247. Zigurs, I., & Buckland, B. K. (1998). A theory of task/technology fit and group support systems effectiveness. MIS Quarterly, 22(3), 313–334. Zimmer, A. (1983). Verbal versus numerical processing of subjective probabilities. In R. W. Scholtz (Ed.), Decision making under uncertainty. Amsterdam: North-Holland.
CHAPTER 7 MENTAL WORKLOAD G.M. Hancock California State University, Long Beach Long Beach, California
L. Longo School of Computer Science Technological University Dublin Dublin, Ireland
M.S. Young Loughborough Design School Loughborough University Loughborough, England
P.A. Hancock Institute for Simulation and Training University of Central Florida Orlando, Florida
1
INTRODUCTION
203
2
WHAT IS MENTAL AND COGNITIVE WORKLOAD?
204
2.1
A Brief History of Workload
204
2.2
The Theoretical Foundations of Workload and Its Assessment
3
1
4
5
HOW MENTAL WORKLOAD CAN BE MODELED
214
4.1
Computational Aspects and Aggregation Strategies
214
4.2
Computational Modeling Frameworks
216
SOME CURRENT CHALLENGES TO MENTAL WORKLOAD AND ITS ASSESSMENT
219
204
5.1
The Practical Importance of Mental Workload
220
HOW IS MENTAL WORKLOAD MEASURED?
206
5.2
Future Issues in Mental Workload
221
3.1
Primary Task Performance Measures
206
3.2
Secondary Task Performance Measures
206
3.3
Subjective Measures of Mental Workload
3.4
Physiological Measures of Mental Workload
6
CONCLUSION
222
208
7
ACKNOWLEDGMENTS
222
211
REFERENCES
INTRODUCTION
Understanding and assessing mental workload (MWL) have a long and involved history in the story of Human Factors and Ergonomics (HFE). There have been a number of influential texts (see e.g., Hancock & Meshkati, 1988; Longo & Chiara-Leva, 2017; Moray, 1979), as well as a series of comprehensive chapters which have reviewed the state-of-the-art across the decades (and see Gopher & Donchin, 1986; O’Donnell & Eggemeier, 1986). Many of the more influential of these chapters have appeared in the earlier editions of the present handbook (i.e., Tsang & Vidulich, 2006; Tsang, & Wilson, 1997; Vidulich & Tsang, 2012). Here, we build upon each of these solid foundations in order to provide an assessment of the state of understanding concerning MWL to the present time. In so doing, we also identify current challenges that are facing workload assessment right now and further, we look to the future of MWL at a time when ever-more automated and autonomous systems are coming to dominate the world of work itself. These technologies are now acting to significantly alter the nature of the mental workload imposed on remaining human users.
222
We begin with an assessment of the nature and characteristics of mental workload and how people have defined it over the years. We consequently delve, albeit briefly, into the theoretical foundations that have underwritten mental workload assessment. These excursions feature a number of theories derived directly from experimental psychology and most especially there is a strong link between mental workload assessment and advances in theories of attention (see e.g., Kahneman, 1973; Wickens, 1979, 2002). We shall reference these linkages throughout the chapter. Yet, at its heart, any handbook chapter is a highly practical form of communication and a tool for its user. Thus, the majority of the present work features the ways in which workload has been, and can be, assessed. Here, we look at the major techniques, their relative advantages and disadvantages and how they are enacted in practical circumstances in the many operational domains to which they can apply (e.g., Gaba & Lee, 1990). Having specified the major methods, we briefly examine alternative approaches including some that have fallen out of favor and others which, at the present time represent only candidate proposals which offer a degree of applicational promise. 203
204
We then look to communicate how workload assessment has specifically been employed in a growing range of practical enterprises. These exemplars do not purport to give an exhaustive listing. However, they do provide useful case studies which show the practitioner how others have used workload assessment to their advantage in their own realms of concern. We fully acknowledge that although workload and its assessment are a mature area of study, it is not without its challenges, both in terms of evolving theory and in issues related to measurement methodology. We set these issues before the reader so that they can judge for themselves the contemporary barriers which presently beset workload and its assessment. In concluding our chapter, we take a look at workload and its assessment in the broader context of humans and their interaction with developing and evolving forms of technology. Most especially, we consider where workload stands in relation to pressing issues such as human teaming with ever-more autonomous systems. We ask critical questions such as, can one assess the global workload level of multiple human–machine teams? How do specific advances in team workload assessment influence the answer to the prior question? What of underload? If humans are relegated solely to passive supervisory roles, as is often touted at the present time, can workload assessment be used to keep individuals “involved” with such work? And, should workload assessment be used to achieve this latter goal? As systems gain in autonomy, will even these residual “supervisory” roles really be necessary? Finally, we look a little beyond the strict lines of sterile workload assessment alone to ask questions about the affective aspects of work in general. How can workload assessment techniques of the present and those proposed for the future ensure enjoyable and challenging work for human operators? Are the present measurement scales equipped to answer these important and emerging questions? So, for those looking to employ workload assessment in their own work enterprise, can we provide better, more comprehensive, and even more applicable, human-centered tools to promote both the efficiency and enjoyment of human work? Such questions are those which a handbook chapter must address in order to answer the prospective challenges that the future promises to pose. 2 WHAT IS MENTAL AND COGNITIVE WORKLOAD? Referring to mental and cognitive workload might seem to imply that there are two potential constructs that we are examining, one being mental workload (MWL) and the other cognitive workload (CWL). However, we want, explicitly, to dispel any such potential confusion. So, we use the terms cognitive workload and mental workload synonymously. Moreover, we see no advantage whatsoever in pursuing any differentiation between these two terms. Indeed, it is important to warn against the issues and ambiguities that can arise when they are set in opposition to one another. In short, the two terms address one issue and should be treated to mean exactly the same thing. This being said, MWL has a history that goes back to around the time of World War II when the exigencies of combat and conflict raised concerns about performance capacities to an evident pre-eminence. While we do not dwell extensively on historical antecedents, we think it important for the reader to at least have some understanding of these groundings in order to know that MWL problems of the past are still relevant today and promise to be relevant also in the foreseeable future. 2.1 A Brief History of Workload While the assessment of human capacities for work go back to antiquity, the history of mental workload and its assessment lies
HUMAN FACTORS FUNDAMENTALS
within a much more limited time horizon. If we are concerned with the assessment of physical work capacity, the history of differing scientific forms of such investigation go back to the middle stages of the Industrial Revolution of the nineteenth century; but for MWL assessment, we are better justified in beginning an historical account at the dawn of the information age, occurring around the end of World War II. As we shall find, many of the beginnings of MWL assessment are bound up with progress in aviation and it is therefore appropriate to start with that connection. One challenge which especially faced those involved in test and evaluation of new aircraft was to provide some numerical designation for the handling qualities for what were essentially unique flight vehicles. As well as the engineering data, assessors wanted to know how the pilot felt about the aircraft they were flying. To provide this evaluation, two NASA researchers, Cooper and Harper (1969) devised a decision tree that allowed pilots to do just this (see Figure 1). Here, the assessing pilot is faced with a series of questions as to how they felt about the qualities of the aircraft they had flown. Then, through those decisions, the rating scale could derive a value which ranged from 1 for the best situation and 10 for the worst. Obviously, the scale could be applied to parts of performance, as well as the performance as a whole. It was some early HFE researchers who used this scale and developed their own Modified Cooper-Harper Scale (MCHS), to report some of the initial work concerning the subjective assessments of MWL (see e.g., Wierwille & Casali, 1983; and see Cummings, Myers, & Scott, 2006, for a more recent application). As we shall see in the specific section on subjective assessment techniques of MWL, the original Cooper-Harper Scale helped encourage further developments such as the well-known NASA Task Load Index (NASA-TLX) as well as the subjective workload assessment technique (SWAT) technique, both also founded in aviation research at NASA and the U.S. Air Force respectively. It was from such beginnings, largely stimulated by very practical, real-world questions, that workload began to blossom (and see Moray, 1979). At one and the same time that early subjective forms of assessment began to exert their influences, a parallel concern was always for how well the task was actually being performed. This line of insight was codified as primary task performance assessment in the workload realm; however, it had its origins in the very foundations of human behavioral assessment in dimensions such as the speed and accuracy of any task response. Also coopted into the pantheon of workload assessment were various physiological techniques which, as we shall see, first featured those derived from measurement of the peripheral nervous systems and then have, of the years, proceeded more to measures derived from central nervous system reflections. Historic progress in the area of workload assessment has seen sources of impetus varying from basic theoretical advances, progressively superior forms of neurophysiological assessment, but others have been driven by very practical demands to know, for example, just how much cognitive effort an individual can exert before they begin to fail or effectively become incapacitated by the demands set in front of them. These historical trends have been captured in the texts which have focused on workload across the years (see e.g., Hancock & Meshkati, 1988). 2.2 The Theoretical Foundations of Workload and Its Assessment Given this complex and multidisciplinary history, it is unsurprising that a single, universally accepted operational definition of mental workload remains elusive. Various conceptualizations hold that mental workload can consist of (1) an input load inherent in the task to be completed, or it can be identified with (2) the effort associated with an operator’s internal cognitive processes,
MENTAL WORKLOAD
205 Handling Qualities Rating Scale Aircraft Characteristics
Adequacy for Selected Task or Required Operation
Yes
Is it satisfactory without improvement?
No Deficiencies warrant improvement
Demands on the Pilot in Selected Task or Required Operation*
Pilot Rating
Excellent Highly desirable
Pilot compensation not a factor for desired performance
1
Good Negligible deficiencies
Pilot compensation not a factor for desired performance
2
Fair – Some mildly unpleasant deficiencies
Minimal pilot compensation required for desired performance
3
Minor but annoying deficiencies
Desired performance requires moderate pilot compensation
4
Moderately objectionable Adequate performance requires considerable pilot compensation deficiencies
5
Very objectionable but tolerable deficiencies
Adequate performance requires extensive pilot compensation
6
Major deficiencies
Adequate performance not attainable with maximum tolerable pilot compensation
7
Major deficiencies
Considerable pilot compensation is required for control
8
Major deficiencies
Intense pilot compensation is required to retain control
9
Major deficiencies
Control will be lost during some portion of required operation
10
Yes Is adequate performance attainable with a tolerable pilot workload
No
Deficiencies require improvement
Yes
Is it controllable?
No Improvement madatory
* Definition of required operation involves designation of flight phase and/or subphases with accompanying conditions.
Pilot decisions
Figure 1
Original Cooper-Harper Scale.
or even (3) an output resulting from the performance of work (Johannsen, 1979). More recently, these disparate suppositions have been synthesized together and then subsequently expanded upon. Common definitions of mental workload now include a number of defining attributes including: (1) an interaction between the operator’s neural capacities and task demands; (2) the dynamic allocation, monitoring, and expenditure of cognitive resources; and (3) the triggering of an associated subjective response (Van Acker, Parmentier, Vlerick, & Saldien, 2018). A particularly useful definition would therefore operationalize mental workload as “the operator’s allocation of limited processing capacity or resources to meet task demands; that is, the balance of internal resources and external demands” (Matthews & Reinerman-Jones, 2017, pp. 3–4). The foundation of this definition rests on the effective understanding of “cognitive resources,” their nature, expenditure, and restoration. These resources have generally been intimately linked with the construct of attention given its vital role in the accurate and efficient processing of information. Literally dozens of constructs have shaped the theoretical bases for the nature and underlying mechanisms of mental workload (Cain, 2007; and see Longo & Pandian, in press). For the sake of both clarity and brevity, here we examine only two of the most prominent of these theories: Resource Theory and Multiple Resource Theory. Resource Theory (RT), first proposed by Kahneman (1973) and expanded upon by Gopher and Donchin (1986), uses the term resources to represent a metaphorical reserve of energy upon which operators draw to generate and sustain cognitive processing. The higher the mental workload in terms of the complexity or severity of task demand or the novelty of the
information to be processed, the greater the draw on these limited resources. The common metaphor is a single pool of set dimensions and volume from which resources can be extracted (and see Figure 6). To illustrate the relationship between such cognitive resources and performance, Norman and Bobrow (1975) introduced the performance resource function (or PRF). PRFs illustrate the predicted performance level as a function of the amount of resources allocated for a task’s execution. The ascending portion is described as resource-limited as performance is primarily contingent on the operator’s capacity to effectively allocate and disburse resources for task execution, while the flatter plateau portion represents data-limited processing as performance depends on the quality of necessary data (i.e. environmental information, memories, etc.) and consequently any additional expenditure of resources would not produce gains in performance (cf. Navon & Gopher, 1979; Norman & Bobrow, 1975). PRFs will further feature in our discussion of primary and secondary task performance in the assessment methods section below. Multiple Resource Theory (MRT; Wickens, 1984, 2008) takes the pool and siphon analogy of Resource Theory and adds additional dimensions to account for the multi-faceted nature of attention and associated workload. It also embraces the similarity principle and the necessary distinction between serial and parallel processing. In RT, there is a single pool of resources from which the operator can draw energy for performing cognitive processes. In MRT, on the other hand, there are multiple pools of resources that can be tapped for resource allocation, depending on the nature of the processing required, the stage of information processing, and the modality by which the information is processed (Wickens, 1984). These multiple
206
pools may be tapped sequentially when task demands draw from a common pool or can be drawn on simultaneously and in parallel when demands require attention from different pools. Tasks that drain common pools of resources are consequently more difficult to perform in tandem and increase the likelihood of overload. Though they are stored differently, resources in this model are still limited, and performance suffers when an insufficient supply is available to meet the task demands of the moment (Wickens, 1984). Such conditions of overload, in which the operator has allocated all available resources and has no spare capacity for meeting the imposed demands, most often lead to performance decrements and even outright failure (Hancock & Caird, 1993). The crucial issue of failure thresholds, because of its critical operational significance, has constituted much, if not the lion’s share of research regarding mental workload (Hancock, 1989; Young, Brookhuis, Wickens, & Hancock, 2015). Assessment methods have consequently focused on establishing the thresholds between levels of mental workload: underload (available resources vastly outstrip demands), moderate workload (the operator has sufficient resources to meet challenging demands), and overload (demands exceed the capacity necessary to meet them; Matthews & Reinerman-Jones, 2017). These latter workload “red lines” have naturally been the topic of much discussion and debate in the area. 3 HOW IS MENTAL WORKLOAD MEASURED? 3.1 Primary Task Performance Measures The measurement of mental workload reflects the multidimensional nature of the concept, and so this in turn means that there is a spectrum of assessment techniques available. Indeed, researchers in applied domains tend to favor the use of a battery of several of these measures together in order to assess workload, rather than any one measure alone (Gopher & Kimchi, 1989; Hockey, Briner, Tatersall, & Wiethoff, 1989). This diverse usage tends to help ascertain the different aspects of the underlying construct that are involved. The main categories of workload measures are primary task performance measures, secondary task performance measures, physiological measures, and subjective ratings (see also Brookhuis & De Waard, 2000; Eggemeier & Wilson 1991).The first of these, and by far the most used category of measures is based on direct evaluation of the operator’s performance on variables associated with the primary task. The primary task is almost universally the focus of interest in the applied domain, for which we are trying to gauge the level of MWL. For example, primary task performance measures in driving are reflected in the efficiency of vehicle handling (i.e., lateral, and longitudinal control, such as demands associated with steering, acceleration, and braking). The basic premise is that a task with higher levels of demand will prove to be more difficult, resulting in degraded performance compared to a lower demand task. Mental workload levels are thought then to follow on these levels of response efficiency. Of course, though, following from attentional resource models of MWL (see e.g., Young et al., 2015), an increase in task-load difficulty on a primary task may not necessarily lead to performance deficits if the increase is still within the overall capacity of the operator (and see Hancock & Caird, 1993). In situations of low MWL, primary task performance is more likely to be data-limited rather than resource-limited (i.e., the performance ceiling is determined by the quality of data input, rather than the individual’s capacity; Norman & Bobrow, 1975). If performance is data-limited, any fluctuations in primary task performance alone would not reflect MWL. It is at this juncture where another form of task assessment, i.e., secondary task assessment comes in to play.
HUMAN FACTORS FUNDAMENTALS
Most often, primary tasks are associated directly with task output. When these outputs are visible and measurable, then no real barrier to their assessment exists. So, we might often observe that the speed and accuracy with which a particular task is performed are both informative as to production concerns, as well as helpful in understanding associated workload. However, not all tasks are so easily indexed, nor do they always lend themselves to simple measures of completion time and completion accuracy. There is another and especially important wrinkle here that often affects real-world operations. Studies from the laboratory frequently ask individuals to react as quickly and as accurately as possible, and often in these conditions, helpful participants seek to do so for the hour or more that they are in the experiment. The real world is different. We cannot expect people to extend absolutely peak performance over intervals of months, years, and even decades. The difference between these sorts of motivation, Simon (1996) has labeled optimization versus satisficing forms of action. The example of driving mentioned above, can be helpful here. Professional race car drivers engaged in premier races are straining every sinew to ensure that they are performing at the very peak of their possible capacity. They pressure both their vehicle and themselves to elicit the maximum performance that they are capable of. This is optimization, the search for the absolute best. But let us now consider the normal driving, even of a professional commercial driver. While they have a schedule to keep, they do not focus on reducing the lateral deviation of the vehicle down to zero. They may well stop and take a break; they might even enjoy their journey; and how much more so when the driving task is being done by the everyday driver. These latter conditions feature satisficing; that is doing well enough to get the task done. Many real-world tasks are of this sort. Think of office work where demands might still be high, but the individual has some degree of control over their schedule. These different work imperatives have implications for mental workload assessment. For while much of our experimental insight derives from instructions to optimize response, many of the same results are then used to apply to work where satisficing predominates. This means there is something of a disconnect between primary task performance as measures of work output and primary task performance as a reflection of mental workload. It is one of the principal reasons why we advocate using more than one single measure, wherever possible in actual operational conditions. Such measures might derive from the category of secondary tasks, to which we now turn. 3.2 Secondary Task Performance Measures A secondary task is one which is designed to compete for the same attentional resources, as identified in MRT, as the primary task itself. Theoretically, the efficiency of the performance of the secondary task can be used to infer the level of MWL required by the primary task. The secondary task here can be used as surrogate for the spare attentional capacity, reflecting the unused resources left over by the primary task. Therefore, as workload on the primary task increases, spare capacity is used up, and so performance on this secondary task decreases (see Figure 2). In the secondary task technique, participants are instructed to maintain consistent performance on the primary task, and to attempt the secondary task only when their primary task demands allow them to do so. By keeping primary task performance constant, differences in workload between different primary tasks are then reflected in performance variations on the secondary task, which if the tenets of MRT allow, can be selected to be a common one across differing work scenarios (e.g., if the primary task is driving, the secondary task should also be visuo-spatial, with a manual response). This ensures that the technique really is measuring spare capacity (and thus MWL) rather than a separate pool of resources. Sure enough,
MENTAL WORKLOAD
207
Resources Demanded / Capacity
100% 90% 80%
Secondary task
70%
Primary task
60% 50% 40% 30% 20% 10% 0% Low demand primary task
High demand primary task
Figure 2 A representation of the secondary task rationale. The black shading represents the resources demanded by the primary task; white shading represents the spare capacity available for a secondary task. As both primary tasks are within the total capacity of the operator (100% on the y-axis), they would therefore in theory show no differences in performance. However, performance on the secondary task would reflect the differences in spare capacity.
secondary tasks designed in this way appear to be more sensitive to changes in demand (e.g., Baber, 1991; Liu, 1996). A dilemma arises, however, as this also raises the problem of intrusiveness between two tasks competing for the same attentional resources. Thus, the specific choice of secondary task is critical to ensure construct validity as a measure of MWL (Kantowitz, 2000). Various authors have used visual tasks (Brouwer, Waterink, van Wolffelaar, & Rothengatter, 1991; Ephrath & Young, 1981; Verwey & Veltman, 1996; Wickens et al., 2000; Young & Stanton, 2002), forced-choice tasks (Thornton et al., 1992), and mental calculation (Harms, 1991; Recarte & Nunes, 2002). However, this choice is not without its accompanying problems. Various studies have observed interference between primary and secondary tasks when they draw on the same resource pools (e.g., Baber, 1991; Verwey & Veltman, 1996), while others have equally demonstrated non-interference when the tasks are designed to occupy different resource pools (e.g., Baber, 1991; Zeitlin, 1995). This would seem to justify the argument for using the latter type of secondary task—but, in that case, is that manipulation truly measuring MWL? To some extent, the answer is probably “yes.” It seems likely that attentional capacity is, in actuality, some hybrid blend of unitary and multiple resources, such that the separate dimensions in Wickens’s (2002) model are not exclusively separate, but instead perhaps still fed by some common reservoir (cf. Young & Stanton, 2007). This would explain why a difficult secondary task in a different processing code to the primary task still interferes with primary task performance (Young & Stanton, 2007). While that yet again raises the specter of the intrusiveness of the secondary task, on closer inspection it could also merely reflect structural competition at the input modality stage (i.e., visual/auditory) or response (vocal/manual). Indeed, the solution to the problem of interference from a secondary task would seem to lie in using a different input modality (e.g., Zeitlin, 1995), or alternative response modes (e.g., Brouwer et al., 1991). As further alternatives, one might consider a secondary task that is embedded in the overall task environment (Schlegel, 1993). This embedded task is ostensibly a part of the primary task, but not essential to its performance (e.g., providing readings from an instrument panel while driving). Another implementation of the secondary task technique uses concurrent performance on, for example, a Peripheral Detection Task (PDT). The PDT is based on the premise that visual attention narrows
as workload increases. Participants wear a headband with an LED light, which lights up randomly every three to five seconds. They are instructed to press a switch attached to their index finger as soon as they see the LED signal. As workload increases, the response time to, and the chance of missing such a signal respectively increase. The workload is then measured through the monitoring of response times and the number of associated missed signals (see also Schaap et al., 2008, 2013). This PDT technique has been successfully applied during driving (Van Winsum, Martens, & Herland, 1999) and so, prospectively, might be able to be used in a number of tasks that emphasize visual perception and manual input for task resolution. A simplistic secondary task can be advantageous when primary task measurement may be unfeasible or uneconomical to use (Wierwille et al., 1977). Secondary task methods have thus proved useful in many real-world, operational environments (e.g., Harms, 1991). They have been used to discriminate MWL levels on the flight deck (Ephrath & Young, 1981; Thornton, Braun, Bowers, & Morgan, 1992; Wickens, Gempler, & Morphew, 2000) and across varying forms of driving demand (Hancock, Wulf, Thom, & Fassnacht, 1990; Harms, 1991; Verwey & Veltman, 1996; Young & Stanton, 2002). Furthermore, secondary tasks have been demonstrated to assess different aspects of MWL. Where the primary task is effective in measuring long periods of workload and performance, overload, and resource competition (Hockey et al., 1989), the secondary task is useful for quantifying acute changes of workload (Verwey & Veltman, 1996), evaluate individual differences in attentional resources (Brown, 1978) and automaticity (Liu & Wickens, 1994). In combination with eye movement data, a secondary task has even been used as a proxy for overall attentional capacity (Young & Stanton, 2002). Nevertheless, there are limitations in the application of these secondary task approaches. They have been criticized for their issues with sensitivity, only being effective in assessing rather gross changes in difficulty (Wierwille, Gutmann, Hicks, & Muto, 1977). In cases where multiple primary task measures are accessible, these can be more sensitive than a single secondary task (Wierwille & Gutmann, 1978). Meanwhile, particularly for researchers examining MWL associated with automation, it may not be possible to study underload while using a secondary task, if the secondary task itself contributes to MWL regardless of the primary task demands (cf. Young & Stanton, 2007; Liu, 2003), These latter researchers found that a mobile phone task actually improved the primary driving performance in what was otherwise low workload situations. As observed then, the main argument against the practical use of secondary tasks is the problem of intrusiveness. The basic assumption of the technique is that only spare capacity is directed to the secondary task; therefore, ideally, there would be no concomitant effect on primary task performance. While there is evidence that intentional prioritization of the primary task can attenuate interference (Temprado, Zanone, Monno, & Laurent, 2001), Kantowitz (2000) has criticized experiments using secondary tasks in driving on the basis that the assumption of no interference with the primary task may not hold true, especially in the driving domain. Despite the best efforts of an experimenter to emphasize their instructions about maintaining priority on the primary task, it has been shown (e.g., Young & Stanton, 2007) that the use of a secondary task can have detrimental effects on primary task performance. Furthermore, interference can be particularly manifest at low workload levels (Wierwille & Gutmann, 1978). Meanwhile, there is evidence that other qualitative aspects of secondary tasks can determine the extent of their interference effects, such as whether they are forced-pace or interruptible (Lansdown, Brook-Carter, & Kersloot, 2004; Noy, Lemoine, Klachon, & Burns, 2004), or whether they are actually perceived as a subset of the primary
208
task (Cnossen, Meijman, & Rothengatter, 2004). There may also be individual differences with interference effects, such as gender (Lansdown, 2002; Lansdown et al., 2004; Lesch & Hancock, 2004) and particularly the level of a person’s expertise. Since skilled performers have more spare capacity, they are less susceptible to interference from secondary tasks than novices (e.g., Beilock, Wierenga, & Carr, 2002; Lansdown, 2002). Mental workload assessment nearly always involves some measurement of the primary task response profile since, after all, this is the reason the work is being undertaken in the first place. However, it is not the only type of measure which we can take, and we proceed to the next category which assesses subjective response.
HUMAN FACTORS FUNDAMENTALS
Increasing Subjective Rating
3.3 Subjective Measures of Mental Workload Owing largely to their facile usability, subjective measures have proved to be the most widely used instruments for workload assessment (Estes, 2015; Matthews et al., 2015). While primary task performance measures are, by definition, rooted in the application being studied, most subjective measures are more generic and can be used across most domains. They are, as such, more linked to the person doing the work than the nature of the specific task itself. Nevertheless, as we shall subsequently see, some researchers have come full circle with this and have developed bespoke assessment instruments for specific purposes. However, we begin our examination of subjective methods by first considering their theoretical foundation as reflections of mental workload. We have already seen that there is a quite extensive diversity of mental workload definitions in the literature in an earlier section. Some of these allude to MWL being essentially a subjective experience. In drawing an analogy with physical load, Schlegel (1993) defined MWL as comprising two components: stress (i.e., task demands) and strain (the resulting impact on the individual). More recently, Charles and Nixon (2019) expressed this in similar terms by distinguishing objective elements of the work (“taskload”) from the subjective perception of workload, which can be moderated by factors such as experience or time constraints. Meanwhile, Wilson and colleagues (2011) defined work stress as being experienced when perceived resources are outweighed by demands, thus explicitly characterizing mental workload as a subjective experience. Mental workload is, therefore, a subjective state, leading many to believe that the use of subjective ratings may well be the only “true” index of the experience of MWL (see e.g., Hart & Staveland, 1988; Longo, 2015). There are, however, many nuances that are associated with subjective MWL measurement. It seems that perceptions of MWL become distorted at high and low levels of demand (Desmond et al., 1998; Foy & Chapman, 2018). As well as its potential effect on subjective ratings, this interactive influence can have serious consequences for performance as people mismatch their effort to the task demands (cf. Desmond & Hoyes, 1996; Hockey, 1997). Estes (2015) elaborated this idea into a model of subjective ratings of workload; the model predicts a nonlinear S-shaped relationship (see Figure 3) which, significantly, contrasts with the common assumption of linearity in the literature. This same observation had been made earlier by Hancock and Caird (1993) in their own non-linear model of mental workload. It is certainly true that people can express themselves in words or via marks and indications on scales presented in post-task intervals (Zijlstra, 1993). These reflect a form of meta-cognition of their own conscious processing demands rather than implicit cognitive activity (Tsang & Vidulich, 2006). However, Hart (2006) cautioned us that differing people are likely to apply the workload label to quite different aspects of their experience (and see Hancock & Volante, 2020). This is particularly the case in a world where even the experts in the field cannot agree on a
Increasing Workload Figure 3 The subjective workload curve. Source: Estes (2015).
unified definition. We discuss the reliability and validity of subjective metrics below, but the lesson from this is that rating scales must be defined clearly and simply for users, otherwise variability of response between differing individuals obscures any picture that we are trying to understand. One commonality in the literature’s definitions is the multidimensional nature of MWL (Evans & Fendley, 2017). This is reflected in the variety of subjective instruments available. 3.3.1 Specific Measures of Subjective Mental Workload There are myriad techniques for measuring subjective mental workload (see Gawron, 2008, for an extensive review), which consider various aspects of MWL (Longo, 2015). These techniques broadly fall into one or more of the following categories (cf. Gawron, 2008; Tsang & Vidulich, 2006): • Paired comparisons: In which two tasks are compared in a relative manner to determine which has the higher workload. • Unidimensional assessment: Which use a simple visual analogue scale or a decision tree to arrive at a single number rating of workload. • Multidimensional assessments: Which most frequently consist of a set of subscales that be treated individually to diagnose the source(s) of MWL can either, or combined for an overall workload score The unidimensional and multidimensional scales can be used in either an absolute or relative manner, to rate tasks independently or to directly compare a series of tasks one against the other. Similarly, a relative rating could be benchmarked against a standardised reference task, or multiple conditions could be compared against each other in a repeated-measures experimental design (e.g., Young & Stanton, 2004). Unidimensional tools tend to be simpler to apply and analyze, but do not offer the diagnosticity of multidimensional techniques. These are often the forms of tradeoff the practitioner has to consider when selecting the scale that they wish to use. Although there exist quite a number of such scales, in effect, only a handful of these measures are commonly used in the research literature. Table 1 lists a selection of these major approaches, along with an idea of their popularity, based on a recent search of Google Scholar citations of them.
MENTAL WORKLOAD Table 1
209
Selection of the Most Commonly Used Subjective MWL Measurement techniques
Technique
Category
Reference
NASA Task Load Index (TLX) Subjective Workload Assessment Technique (SWAT) Cooper-Harper Rating Scale* Workload Profile Instantaneous Self-Assessment (ISA) Bedford Workload Scale Rating Scale Mental Effort (RSME) Subjective Workload Dominance (SWORD) Analytic Hierarchy Process (AHP)
Multidimensional Multidimensional Unidimensional Multidimensional Unidimensional Unidimensional Unidimensional Paired comparison Paired comparison
Hart & Staveland (1988) Reid & Nygren (1988) Cooper & Harper (1969) Tsang & Velazquez (1996) Tattersall & Foord (1996) Roscoe (1984) Zijlstra (1993) Vidulich (1989) Lidderdale (1987)
Google Citations 13,000 2,790 1,300 1,000 989 763 718 629 550
Source: Based on Wierwille & Casali, 1983. Note: * Note that the search for the Cooper-Harper rating scale would also have included results for the Modified Cooper-Harper Scale [MCHS]; Wierwille & Casali (1983).
Table 2
NASA-TLX Rating Scale Definitions
Scale Title
Endpoints
Mental Demand
Low / High
Physical Demand
Low / High
Temporal Demand
Low / High
Performance
Good / Poor
Effort Frustration
Low / High Low / High
Description How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving? How much physical activity was required (e.g., pushing, pulling, turning, controlling, activating etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious? How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic? How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals? How hard did you have to work (mentally and physically) to accomplish your level of performance? How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?
Source: From Hart & Staveland, 1988. © 1988 Elsevier.
The NASA-TLX is by far and away the most popular technique that has been used (Estes, 2015; Hart, 2006; Rizzo & Longo, 2017; Wilson et al., 2011). In a review of the NASA-TLX’s application twenty years after its development, Hart (2006) examined 550 studies that had used or reviewed the technique across a range of differing behavioural domains. She found that the TLX had been translated into more than a dozen languages, and there is even now software support for it via its own dedicated website and associated smartphone application. As a multidimensional method, there are six subscales of the NASA-TLX and these are composed of: Mental Demand (MD), Physical Demand (PD), Temporal Demand (TD), Own Performance (PE), Effort (EF), and Frustration (FR). Each dimension is rated on a visual analogue scale, with five-point steps between 0 and 100 from left to right on the scale. Participants are given clear definitions of the rating scales to assist them in making consistent assessments (see Table 2). Note that, in our experience of using the TLX (e.g., Young & Stanton, 2004), it is worth taking time to explain “temporal demand,” which borders on jargon for many participants. It is also useful to highlight the endpoints for the Performance subscale, which ranges from Good at the left end of the scale to Poor on the right (this way, all of the scoring is consistent, with low scores representing low workload). Despite its popularity, the TLX is not without criticism. The fact that the subscales often correlate with each other (which Hart (2006) offered as evidence for its construct validity) could be used to support an argument that the combined, overall
workload score is only a little better than simply asking “how demanding was the task?” (Wilson et al., 2011). Meanwhile, the dominance of effort in subjective ratings (cf. Tsang & Vidulich, 2006) may lead to support of the use of unidimensional scales such as RSME (which is conceptually like the TLX “effort” subscale; Ghanbary Sartang et al., 2016). Validity and Reliability Any metric is only useful if it works, and if it works consistently. For subjective workload metrics, there are numerous criteria for evaluating these dimensions of validity and reliability. These include, but are not limited to: sensitivity (to changes in workload), diagnosticity (for sources of variation in MWL), selectivity (to changes specifically associated with demand, rather than unrelated changes), intrusiveness (to the task being assessed), implementation requirements (such as the need for equipment and resources), acceptability (or face validity), and reliability (both within and between tests; Longo, 2015; Matthews et al., 2015). In general, subjective MWL scores have proved sensitive to perceived difficulty (Liu & Wickens, 1994) resource demand and changes in effort (Hockey et al., 1989), but less so to resource competition (Tsang & Vidulich, 2006). Multidimensional techniques, in particular, serve to satisfy the diagnosticity criterion (Wilson et al., 2011). While reliability and sensitivity of subjective MWL ratings have been demonstrated in western populations, Johnson and Widyanti (2011) caution us that cultural differences may well serve to affect such ratings. This is not just a question of
210
language barriers, scale titles, or how definitions should be translated appropriately (Hart, 2006). In eastern cultures, for instance, differences in factors such as individualism and power distance can influence a respondent’s willingness to openly report high workload, as it may be a criticism or an admission of difficulty. Consequently, subjective MWL scores on the RSME and NASA-TLX tend to have shown higher sensitivity to task demand in western than in eastern samples, despite the absolute levels of task performance being comparable. Unsurprisingly, given what was noted in Table 1, much of the research in validating subjective MWL measures has involved the NASA-TLX. As well as the different language versions, Hart (2006) described several methodological developments and modifications that have been made on the original instrument, such as adding, removing, or redefining the subscales for particular applications. Given the extensive validation work undertaken to develop the subscales, Hart (2006) noted, any such modification should then also be revalidated. The most common modification, which has been extensively tested, is to remove the lengthy and complex weighting procedure for the six subscales that was part of the original version. Research (e.g., Hendy et al., 1993; Hill et al., 1992; Nygren, 1991) suggests that this procedure is superfluous and may be safely omitted without compromising the measure. This also can serve to make it more acceptable to participants by reducing administration time and so increasing the likelihood of genuine responses (Hill et al., 1992). Many now use the simple raw scores for each subscale and/or the arithmetic mean of these for an overall workload (OWL) score; this version is often referred to as the ‘RTLX’. However, in close to thirty studies comparing the RTLX to the original, Hart (2006) found a mix of whether it was more, less, or equally sensitive to workload, concluding that the weighting improves sensitivity and inter-rater reliability. 3.3.2 Comparisons Between Different Subjective Measures Some research has supported the validity of the NASA-TLX relative to other MWL measures. The obvious comparison is with the SWAT. While both instruments offer good sensitivity, the TLX is slightly more sensitive at low levels of MWL (Tsang & Vidulich, 2006). Beyond subjective metrics, the TLX has, in general, also correlated more strongly with primary task and physiological measures of MWL (Charles & Nixon, 2019; Evans & Fendley, 2017; Fallahi et al., 2016a; Matthews et al., 2015; Yan et al., 2019). More widely, there have been numerous studies comparing subjective metrics with primary task and/or physiological measures of MWL. In several of these cases, the subjective ratings are not always reflected across task performance or physiological indices (e.g., Foy & Chapman, 2018; Hilburn, 1997; Matthews et al., 2015; Myrtek et al., 1994; Vidulich & Wickens, 1986). From the perspective of convergent validity (i.e., how closely different measures are related to the same underlying construct), this dissociation (Yeh & Wickens, 1988) appears to present a concern, and even challenges the notion of a general workload construct (Matthews et al., 2015). However, dissociation can actually be informative, as the different methods may be tapping into different aspects of the multidimensional construct that is MWL (Hilburn 1997; Tsang & Vidulich 2006). For example, subjective ratings are particularly sensitive to multitasking (Matthews et al., 2015; Tsang & Vidulich, 2006) and effort (Hockey et al., 1989; Tsang & Vidulich, 2006). Since effort may be invested to maintain constant performance (Hart, 2006; Hockey, 1997), it follows that subjective and primary task measures would, in these circumstances, dissociate. Moreover, at extremes of MWL, dissociation will occur when task performance is at ceiling (low MWL) and variations in
HUMAN FACTORS FUNDAMENTALS
effort have no effect, or conversely when invested effort is at its maximum (high MWL) so further increases in demand instead affect performance (Hancock & Caird, 1993; Tsang & Vidulich, 2006). Interestingly, the subscales of the NASA-TLX can reflect this, with the Performance scale being the biggest contributor to overall workload under low task demands (as people try to maintain their performance), while the Mental Demand becomes more prominent as demands increase (Fallahi et al., 2016a). These forms of association, insensitivity, and dissociation represent an important coming challenge to mental workload assessment which is why we expand upon them further in sections which follow. 3.3.3 Usage and Applications Most subjective MWL measurement tools are administered post-task. As such, they should be completed immediately following upon the experience, or as soon as possible after each trial (e.g., Hilburn, 1997). This is in order that they benefit from the participant’s freshest memory (Tsang & Vidulich, 2006). Nevertheless, ratings for long trial durations may be adversely influenced by recency effects, or aggregation across the trial, consequently losing the granularity of peaks and troughs in workload throughout that trial (and see Hancock, 2017). The Instantaneous Self-Assessment (ISA; see e.g., Tattersall & Foord, 1996) is one example of a method that can be administered as many times as is necessary during the task, owing to its simple, unidimensional scale. Even so, care must be taken that completing the ISA does not interfere with execution of the primary task in any clear way. Conversely, using a secondary task alongside subjective measures can result in and has resulted in interference on the subjective ratings, causing scores to be artificially inflated (Young & Stanton, 2007). To mitigate this effect, it is important to instruct participants to endeavour to only rate the primary task under investigation, not the combination of both primary and secondary tasks. The elusive “red line” of acceptable MWL limits has been proposed for at least one subjective scale (Hart, 2006); indeed, it is almost explicit at the upper end of some of the unidimensional methods such as the Cooper-Harper scale. However, since subjective ratings are often carried out in a relative sense and may be non-linear (cf. Estes, 2015), defining thresholds is almost certainly going to remain a problematic challenge, at least in the near term. In terms of application areas, the simplicity and flexibility of subjective techniques has seen them used in a wide variety of domains, such as aviation, military operations, healthcare, automotive concerns, consumer products, and, in particular, automation (Evans & Fendley, 2017; Hart, 2006). The fact that subjective ratings correlate with some physiological measures of workload can be used to develop real-time operator monitoring systems based on those selected physiological sensors (Evans & Fendley, 2017; Foy & Chapman, 2018). This is an opportunity we explore in the Section 3.4 on such physiological measures of MWL. MWL evaluation has, as we shall see, percolated into many differing operational domains. In respect of this, the healthcare and automotive areas have each seen their own derivatives of the NASA-TLX. These are known as the Surgery Task Load Index (SURG-TLX; Wilson et al., 2011) and the Driving Activity Load Index (DALI; Pauzié, 2008) respectively. The DALI instrument has even been further refined for specific application to competitive motorsport (Brown et al., 2020). In these cases, the original subscales of the TLX have been modified to suit the individual characteristics of each domain. For instance, in healthcare, subjective ratings have been shown to be sensitive to fatigue (Ghanbary Sartang et al., 2016; Wilson et al., 2011); meanwhile, the TLX scores of novice drivers correlate with errors, so improving MWL through practice and training promises to be able to help in reducing collisions (Chi et al., 2019; Yan et al., 2019).
MENTAL WORKLOAD
3.3.4 Advantages and Disadvantages We conclude this section of the chapter with a summary list of the advantages and disadvantages of subjective MWL measurement. These are given in order that the practitioner can select his or her own specific instrument best suited to their own individual conditions. Advantages • Easy and inexpensive to administer and analyse (Gawron, 2008; Ghanbary Sartang et al., 2016; Longo, 2015), especially for large numbers of participants (Evans & Fendley, 2017). • When administered post-task (as most subjective techniques are), it is unobtrusive with respect to the primary task or other measures (Gawron, 2008). • Multidimensional measures can offer diagnosticity in determining the source of MWL (Longo, 2015). • High face validity (Gawron, 2008). • Represents operator’s own sense of overload regardless of what objective measures say (Peterson & Kozhokar, 2017). • Transferable to a wide range of tasks (Gawron, 2008). Disadvantages • As most subjective techniques can only be administered post-task, ratings are dependent on short-term memory (Gawron, 2008) and are thus vulnerable to recency effects (cf. Peterson & Kozhokar, 2017); consequently, any variation of MWL during the task is lost in the ratings (Yan et al., 2019), which impacts on their reliability for long task durations (Longo, 2015). • Conversely, subjective ratings cannot be recorded in real time without disrupting the primary task (Evans & Fendley, 2017; Foy & Chapman, 2018; Tattersall & Foord, 1996), which means the utility of subjective techniques for applications such as adaptive automation is limited (Matthews et al., 2015). • Subjective ratings on a task may be influenced (anchored) by experiences on preceding tasks (Hart, 2006) or by the raters’ perception of their own performance on that task (Charles & Nixon, 2019; Peterson & Kozhokar, 2017). • Other metacognitive limitations can cloud accurate reporting (Longo, 2015; Petrusic & Cloutier, 1992; Praetorius & Duncan, 1988); for instance, novices do not have the frame of reference against which to assess demands (Chi et al., 2019), while information that is processed unconsciously cannot be rated subjectively (Gawron, 2008). • Inter-rater reliability might be affected by raters’ interpretations of the scale definitions (Gawron, 2008), making it difficult to draw absolute comparisons between participants (Longo, 2015). • All subjective techniques are susceptible to psychometric issues such as anchor effects or central tendency on the scales (Hart, 2006; Peterson & Kozhokar, 2017), as well as other biases (e.g., demand characteristics) that might influence responses (Matthews et al., 2015). 3.4 Physiological Measures of Mental Workload Given the conceptual issues with the multidimensional nature of workload as both an input (task-imposed demand) and an output (subjective experience of the operator; Xie & Salvendy,
211
2000b; Young, Brookhuis, Wickens & Hancock, 2015) we can now proceed to consider the growing popularity of psychophysiological assessment techniques as objective indices of workload. However, due to great variability in the manner and degree to which the human body reacts to mental workload, as well as associated individual differences in this complex process, no one physiological measure conclusively evaluates mental workload (Charles & Nixon, 2019). However, in keeping with theory and when appropriately chosen in accordance with experimental design, there are a number of measures that prove valid, reliable, and sensitive to workload manipulations (Matthews et al., 2015). What follows presents the major selection of these psychophysiological methods and measures, the physiological systems that they act to assess, and how such activities are reflective of workload. First, however, is a brief but necessary discussion of theoretical and practical implications concerning the selection and implementation of such methods. The recent cross-disciplinary pursuit of neuroergonomics is the study of the relationship between human performance and the neural mechanisms underlying perception, cognition, and action (Mehta & Parasuraman, 2013; Parasuraman & Hancock, 2004; Parasuraman & Wilson, 2008). Neuroergonomic assessment of mental workload is advantageous as it is objective, quantitative, continuous, non-invasive, less subject to bias, and can help contextualize the associations, dissociations, and insensitivities often resulting from the interpretation of performance and self-report measures alone (and see Section 5 on the current challenges to workload; Fairclough, 2009; Hancock & Matthews, 2019; Matthews & Reinerman-Jones, 2017). When considering their use, these advantages must be weighed against the constraints of time, resources, and the possibility of confounding variables. Researchers and practitioners must invest effort not only in learning how to implement these methods, but the post-processing of physiological data can also be quite time-consuming. While technology continues to become more portable and affordable, the necessary hardware and software for data collection, post-processing, and analysis still entail a sizeable investment. Finally, the method of data collection (i.e., sensor placement, wires, etc.) has the potential to introduce problematic confounds such as distraction and/or a restricted range of motion for performing operators. Again, for this reason, the requirements of the specific investigative design are one of several key factors which must be taken into account when selecting psychophysiological methods. It remains also always advisable to incorporate self-report measures of workload to help provide context for performance and physiological results. An important series of works (Eggemeier, 1988; Eggemeier, Wilson, Kramer, & Damos, 1991; O’Donnell & Eggemeier, 1986) established seven criteria by which to judge the effectiveness and utility of workload metrics, some of which have already been referred to. These considerations include sensitivity, diagnosticity, selectivity, reliability, intrusiveness, practical constraints, and operator acceptance. Sensitivity refers to the measure’s ability to differentiate amongst levels of mental workload, while diagnosticity is the measure’s capacity to distinguish the source(s) of mental workload (and see O’Donnell & Eggemeier, 1986). A selective measure is sensitive to changes in cognitive demands only, rather than to other factors which may or may not be related (i.e., general physiological arousal, physical workload, etc.; Matthews, & Reinerman-Jones, 2017). Reliability, of course, refers to the consistency of assessments. As discussed above, intrusiveness entails the degree to which the psychophysiological data collection process affects the valid and effective assessment of the primary task performance under analysis. Of course, the practical constraints of time, equipment, and training will all shape the decision as to when, how, and which physiological measures to adopt. Finally, operator acceptance should also be considered. Participants’ comfort levels
212
HUMAN FACTORS FUNDAMENTALS
regarding the placement, application, and weight of equipment can influence the quantity and quality of data collected as well as the attrition rates (Eggemeier et al., 1991). Having evaluated these criteria and concluded that psychophysiological measures would be a useful addition to any empirical design or application study, the investigator must now choose one or multiple measures from among several candidates. Some factors that should be considered during this decision-making process include task type, response type, activation type, and the granularity of temporal resolution (Hughes, Hancock, Marlow, Stowers, & Salas, 2019). Many measures (particularly those which assess activity of the heart, skin, and eyes) have been shown to accurately distinguish mental workload between task types (Charles & Nixon, 2019). They are consequently more useful when gauging the effects of cognitive tasks (i.e., those requiring the interpretation of processed information) rather than perceptual tasks which only require a sensory response (Hughes et al., 2019). Cognition is dependent on inter-neuronal communication, which is both electrical and neurochemical in nature (Pereda, 2014). As these electrical and chemical processes in the brain work cooperatively to induce changes in functioning throughout the body to produce perception and performance, researchers can choose a method that focuses on either electrical or metabolic functioning. Furthermore, one method may be more suited than another based on whether the performance of interest needs to be assessed relative to a particular event in time (i.e., time-locked) versus over time (i.e., in a continuous fashion). With regard to activation type, methods can focus on either the central nervous system (i.e., the brain and spinal cord) when conceptualizing workload as an imposed input to which the brain must react and respond in a coordinated way; or the peripheral nervous system (all spinal nerves and ganglia) to assess workload as an output (i.e., the extent to which mental workload elicits a systemic shift in functioning and arousal). Finally, the timescale of performance must be considered. Certain methods can assess mental workload effects on the order of milliseconds, such as event-related potentials via electroencephalography (Sur & Sinha, 2009), while others can
require hours of data recording in order to be derived accurately (Shaffer & Ginsburg, 2017). The human body is comprised of many physiological systems, some of which can render more insight than others when assessing cognitive workload and its effects. Herein is discussed the functioning (under normal and MWL conditions) of the nervous, cardiovascular, muscular, and integumentary systems, and methods for assessing such activity. Though multiple measures can be derived from each of these methods, only one candidate measure per method is put forth for the sake of brevity; references for discussions pertaining to other measures for each method are provided. 3.4.1 The Nervous System, Electroencephalography, and Event-Related Potentials The nervous system is comprised of two major divisions: the central (CNS) and peripheral (PNS) nervous systems (see Figure 4). The central nervous system includes the brain and spinal cord. The brain is the primary organ of cognition as it is responsible for processing and integrating sensory information; encoding, storing, and retrieving memories; generating emotions; and regulating life-sustaining functions (Carlson, 2013). The spinal cord conducts nerve signals between the brain and other organs, and both coordinates and controls various reflex functions (Kolb & Whishaw, 2009). Given their critical importance to maintaining life, these two structures are protected by bone, encased respectively in the skull and vertebral column. The peripheral nervous system is comprised of all nervous tissue not within the CNS, most typically that which is not protected by bone. The PNS’s sensory pathways carry signals from the sensory organs (i.e., those mechanical inputs from the five senses) to the brain, while its motor pathways conduct commands from the brain to muscles, organs, and glands (Kolb & Whishaw, 2009). A major division within the motoric functions of the PNS include the Somatic (SNS) and Autonomic Nervous Systems (ANS). The somatic nervous system controls voluntary movements, such as the muscles executing a performance response; while the autonomic nervous system controls
Nervous System
Peripheral Nervous System (PNS)
Central Nervous System (CNS)
Brain
Spinal cord
Receives and processes sensory information, initiates responses, stores, memories generates thoughts and emotions
Conducts signals to and from the brain, controls reflex activities
Figure 4
Motor Neurons
Sensory Neurons Sensory organs to CNS
CNS to muscles and glands
Somatic Nervous System
Autonomic Nervous System
Controls voluntary movements
Controls involuntary responses
Sympathetic Division
Parasympathetic Division
“Fight or Flight”
“Rest or Digest”
The systems and divisions composing the human nervous system.
MENTAL WORKLOAD
involuntary responses, such as digestion. The ANS is further divided into the sympathetic and parasympathetic divisions. The sympathetic division is predominantly active when the body needs to mobilize resources, adapt, and respond; high activation of this system is typified by the fight or flight response. The parasympathetic division is most active when the body seeks to conserve resources and energy, facilitating necessary rest and recovery functions (Carlson, 2013). Though these two systems are intended to be predominant under opposing conditions, their relative activation does not have a perfect inverse relationship with one another. The individual differences in physiological functioning in response to the imposition of stress (i.e., increasing mental workload) arise from this complex interplay between the various nervous systems (Allen & Crowell, 1989). One of the more credible estimates concerning the cellular composition of the human brain asserts that the organ is comprised of some 85 billion neurons (Herculano-Houzel, 2009; Williams & Herrup, 1988). Within each cell, the neuron communicates using electrical signals (i.e., action potentials), while communication between neurons (or synaptic transmission) is largely chemical, using neurotransmitters (Carlson, 2013). Electroencephalography (EEG) is a psychophysiological method which records voltage fluctuations dependent on differential ionic currents in cerebral neurons in real-time via electrodes on the scalp. This electrical activity is the cell “firing” as it processes sensory and/or cognitive information or executes motoric responses (Schomer & Lopes da Silva, 2011). Analysis of the location, timing, and magnitude of electrical activity in the cerebral cortex therefore gives researchers insight into the physiological processes driving such sensory, cognitive, and motor experiences. Using EEG methods, one can measure event-related potentials (ERPs). ERPs are electrophysiological responses to sensory, cognitive, and motor stimuli, and are visualized as characteristic positive and negative voltage fluctuations (see Figure 5). Each peak and trough are referred to as a component, and their specific title is loosely named for the direction and amount of time following the circumstance which provoked it. The most prolifically used ERP in the mental workload literature is the P3 or P300 wave, so named as it is a positive deflection occurring, on average, 300 milliseconds post-stimulus (see Figure 5). Research has established an inverse relationship between mental workload and the amplitude of the P300 component; as mental workload increases, the P300 amplitude decreases (Brouwer et al., 2012; Käthner, Wriessnegger, Müller-Putz, Kübler, & Halder, 2014; Prinzel, Freeman, Scerbo, Mikulka, & Pope, 2003). Solís-Marcos and Kircher (2019), for example, examined ERP components as indices of mental workload when using an in-vehicle information system to perform various tasks. The P300 waveform amplitude diminished proportionately as the number of concurrent tasks (i.e., mental workload) was increased from one to three.
Figure 5
Event-related potential components.
213
3.4.2 The Cardiovascular System, Electrocardiography, and Heart Rate Variability The cardiovascular system is a cooperative one which integrates the circulatory system and the respiratory system. The circulatory system is comprised of the heart, arteries, veins, and all other blood vessels. It serves to transport blood, oxygen, hormones, nutrients, and other resources throughout the body. The respiratory system is comprised of the lungs, bronchi, trachea, and other associated structures. Its primary purpose is the effective exchange of respiratory gases, the absorption of oxygen for the body’s use and the expulsion of carbon dioxide to prevent fatal hypercapnia. When stressed (e.g., under greater mental workload), these systems work in tandem as the unified cardiovascular system to mobilize critical physiological resources for the tissues and structures necessary to formulate an adaptive response. The efficiency with which the heart pumps blood is therefore a useful, indirect measure of the amount of load experienced by the human operator. Electrocardiography (ECG or, more commonly, EKG) is the method of recording the electrical signals innervating cardiac tissue. The most common HF/E measure derived from EKG data to measure the influence of workload via cardiovascular functioning is heart rate variability (HRV). Depending on the literature referenced, synonyms include cycle-length variability, R-R variability, etc. HRV is the variation in time between heartbeats; often quantified as the standard deviation of time in milliseconds between corresponding cardiac components (Kitamura et al., 2016; Stauss, 2003). The most commonly used cardiac component for analysis is the R-spike or R-wave, indicative of the depolarization of the ventricle chambers of the heart, given that it has the relatively largest amplitude of all cardiac components (Keene, Clayton, Berke, Loof, & Bolls, 2017). HRV has an inverse relationship with SNS arousal. As arousal increases (due to rising MWL), HRV decreases. The reduction in variation in cardiac functioning is indicative of the body’s more efficient and consistent mobilization of resources to the organs, tissues, and structures necessary to cope with the stressor. Less heart rate variability is consequently indicative of greater stress and workload (Kim, Cheon, Bai, Lee & Koo, 2018). Wei, Zhuang, Wanyan, Liu, and Zhang (2014) had participants fly aircraft through various demanding procedures (i.e., take-off, climb, cruise, approach, and landing) in a simulator under varying levels of mental workload; levels that were manipulated by the quantity of flight indicators to consult as well as the refresh frequencies of information. HRV was sensitive to these differential levels of MWL, and the authors observed HRV decrease with each respective rise in MWL demand. Hsu, Wang, and Chen (2015) reported similar decreases in HRV as mental workload increased under comparable multitasking conditions. 3.4.3 The Muscular System, Electromyography, and EMG Amplitude The muscular system is comprised of muscle tissues (or muscle fibers) in the body. Said fibers contract and/or relax in accordance with electrical neurological stimulation to produce movement. There are three primary types of muscular tissue. Cardiac tissue is exclusively located in the heart and involves involuntary movements (i.e., heartbeats). Smooth muscles are similarly responsible for involuntary movements and are found in the walls of internal organs, the digestive tract, and blood vessels (Shier, Butler, & Lewis, 2006). As a field, HFE is mostly focused on skeletal muscles given that they are consciously controlled and are therefore responsible for voluntary and coordinated actions and, as a consequence, performance efficiency. Electromyography (EMG) is the method by which electrodes record the electrical potentials driving the contraction of
214
the aforementioned skeletal muscle fibers. Surface electrodes, as the name suggests, record electrical activity in the muscles via the skin and are therefore more suited to assessing more superficial muscle groups (Ng, Kippers & Richardson, 1998). Deep muscles, those located closer to the bone rather than to the skin, can also be recorded, though the procedure necessitates invasive electrode placement, typically with a needle (Menkes & Pierce, 2019). Electrode size can vary depending on the number of motor units of experimental interest; however, at least two electrodes must always be placed as EMG records voltage differentials between the two (or more) electrodes (Merletti & Muceli, 2019). Prior to an experimental or data collection session, researchers must take at least two measurements to interpret EMG data appropriately. The participant must provide no less than two maximum voluntary contractions (MVCs) of the muscle group of interest to provide a comparison for analysis; typically, an average is taken across these two MVCs. Other important considerations for the accurate interpretation of EMG data include the confounds of fat tissue (which also conducts electricity), the elasticity of the skin, and muscle crosstalk, a phenomenon whereby the electrode can register activity from a neighboring muscle due to its proximity to the muscle of interest (Mesin, 2020). EMG measures of relative muscular activation (EMG amplitude, measured in microvolts) can therefore provide insight into stress, tension, other affective states, and—of course—mental workload. Fallahi and colleagues studied performance and physiological functioning during a field study of traffic monitoring. EMG amplitude was observed to increase proportionately as a function of mental workload (Fallahi, et al., 2016b).
3.4.4 The Integumentary System, Skin Conductance, and Skin Conductance Response The integumentary system comprises the skin and its related appendages and is the largest of all organs in the body (Kanitakis, 2002). As it is the largest and primary evolutionary protective mechanism of the body, it is very adaptable to change and therefore reflective of any changes in state. The electrical conductance of the skin (measured in micro-siemens) is dependent on the moisture levels of the skin’s surface. Sweat gland activity, which alters these moisture levels, is dependent on increased activation of the sympathetic nervous system (Roth, 1983). Increased levels of mental workload are associated with heightened mental effort and escalated physiological arousal (Gawron, Schiflett, & Miller, 1989; Hockey, 1986; Mulder & Mulder, 1987). As a measure, skin conductance response (SCR) has been described by alternative names depending on discipline and era; these include galvanic skin response, psychogalvanic reflex, and skin conductance level. The advantages of this particular method include its relatively low cost, its effectiveness in providing biofeedback, and its facility for meaningful interpretation over both short-term and extended time periods. Disadvantages of adopting SCR include its 1–3-second temporal lag between the neural activity stimulating the response and the response itself registering, as well as its susceptibility to various extraneous factors including drug use and environmental temperature and humidity levels. Kajiwara (2014) found skin conductance levels to be significantly reflective of mental workload as manipulated by operating speed in a simulated driving study. Mehler, Reimer, Coughlin, and Dusek (2009) also studied skin conductance as a function of cognitive workload in a simulated driving paradigm. In their experimental protocol, mental workload was manipulated via the n-back task as a secondary task to driving. Results revealed significant elevations in skin conductance levels in response to increasing cognitive demands (Mehler et al., 2009).
HUMAN FACTORS FUNDAMENTALS
3.4.5 Compound Measurement Approaches It is quite possible, and indeed advisable, to put together more than one single workload assessment method in order to evaluate an individual’s or group’s collective workload. In many modern, and practical, investigations, this is exactly what happens. These combinations give us the opportunity to talk a little more about a method of workload assessment we have only just touched upon. These groupings of secondary tasks have also been referred to as a loading task or an interpolated task although the terms are not exactly equivalent as we point out. Secondary tasks derived largely from the promulgation of the single attentional resource theory of Kahneman (1973) (see Figure 6). In this groundbreaking work, the postulation was that attention was not mediated by some gate-mechanisms interposed along a series of information-processing stages, as had been the accepted model. Rather, there was a fluid pool of such resources that could be devoted to one task or another, contingent upon its specific demands. The simplified illustration (at right of Figure 6) shows that these resources could be provided to multiple tasks (in this case marked A and B). One of the tasks (A) could be designated of primary concern and the other (B) of lesser importance. The narrative now runs that the degree of workload demanded by the primary task can be assessed by examining the performance efficiency on the secondary one. The idea being that as the finite, limited resources were devoted to one, they could not be given to the other. We have previously illustrated this logic in Figure 2. Reasonable although this logic was at the time, there were two different developments which meant that secondary tasks largely fell out of favor in practical workload assessment. One issue was theoretical, the other practical, and they emerged about the same time. The practical challenge was rather evident. Especially in critical performance situations, was it wise or even feasible to impose this secondary, or loading task, onto busy operators? Indeed, for safety critical operators, was it even legal to ask them to do additional work when the primary performance (e.g., air traffic control) could be so critical? One way of potentially circumventing this issue was to assess interpolated tasks. These were tasks that the operator often had to do anyway (e.g., provide clearances to aircraft). However, in this specific case, clearance are vocal utterances while aircraft separation on a radar display is largely one of spatio-motor demand. This led to the second objection which was largely driven by advances in attention theory. Here, Wickens (1980) proposed and then experimentally supported the notion that attention could be parsed into multiple attentional pools (and see Section 2.2 on MRT). In consequence, any secondary or loading task would have to address that same specific attentional pool, and thus even interpolated tasks could not be ensured to be addressing the same specific pool. These theoretical concerns, allied to the practical difficulties in administering secondary tasks, saw their relative demise. Each of the major methods we have already discussed, have proved to be much easier to administer. As a consequence, secondary tasks have largely gone out of fashion and have, in great part, failed to be resurrected. But more recently, a new avenue of attack on workload assessment has opened, and it is to this innovative model-based approach that we now turn.
4 HOW MENTAL WORKLOAD CAN BE MODELED 4.1 Computational Aspects and Aggregation Strategies As we have seen, a number of mental workload assessment procedures exist in the literature, demonstrating how the formalization and measurement of this construct represent a non-trivial challenge. For those unidimensional procedures, the problem
MENTAL WORKLOAD
215
MISCELLANEOUS DETERMINANTS
AROUSAL
MISCELLANEOUS MANIFESTATIONS OF AROUSAL
AVAILABLE CAPACITY ENDURING DISPOSITIONS
ALLOCATION POLICY
Resources Consumed (Workload)
MOMENTARY INTENTIONS EVALUATION OF DEMANDS ON CAPACITY
Available Resources (Residual Attention)
POSSIBLE ACTIVITIES
RESPONSES
Task A
Task B
Figure 6 Kahneman’s (1973) original proposition of a single pool of attentional resources and (inset), how the model can be used to understand cognitive workload response.
of aggregating those attributes believed to influence mental workload does not exist: the unique attribute accounted for is believed to entirely represent the construct of mental workload itself. However, for multi-dimensional procedures, there is often the issue of how to quantitatively represent multiple attributes as well as how to aggregate these towards a representative and meaningful operational index of mental workload that can be easily employed for practical and statistical purposes. One formal way in which we can address these issues is through workload modeling and, in what follows, we report the present state of the art in this emerging area of workload assessment. In the NASA-Task Load Index, for instance (Hart, 2006), subjective ratings are expressed as discrete numbers from 0–100, while in the Subjective Workload Assessment Technique (Reid & Nygren, 1988), numbers vary within the discrete range 1–3. The dimension of “intention” can be described as a single real value within the real interval -1–1 (Longo & Barrett, 2010a, 2010b). Many ranges and scales are commonly adopted but many other more complex computational representations exist for expressing a subjective judgment of an operator. For example, the dimensions of “cognitive ability” and “context bias” can be modeled as functions requiring three attributes and returning a real value from 0–1. Similarly, the concept of “arousal” is designed as a taxonomy of sub-factors organized as a unidirectional tree where leaf nodes indicate subjective judgments while internal node aggregation clusters. Unidirectional weighted edges link child nodes to parent nodes, towards a root node, which is a real value in 0 to 1 expressing a final degree of arousal (and see Longo & Barrett, 2010a, 2010b). Moray et al. (1988) proposed the use of fuzzy sets from Fuzzy Set Theory (Zadeh, 1965) as a way of transforming vague verbal qualitative judgments, expressed by humans, in a precise and computable manner. These are sets containing elements that have degrees
of membership. In classical set theory, the membership of an element in a set is expressed in binary terms: it either belongs to the set or not. Contrarily, in fuzzy set theory, the membership of an element in a set can be gradually expressed. This gradual membership is formally represented by a membership function. In detail, such function, for a fuzzy set A on the universe of discourse X (for example, “mental demand”) is defined as X → [0,1], where each element of X is mapped, usually by a human designer, to a value between 0 and 1. This value is called “membership value” or “degree of truth” and it models vaguely defined concepts such as those used in mental workload modeling. Beside the representation of each single attribute that might be accounted for the multi-dimensional representation of the construct of mental workload, there is also the problem of their aggregation toward a meaningful operational representative index that can be used for practical and statistical purposes. The use of different scales and methods evidently complicates this aggregation. In the last five decades, several computational aggregation strategies have been emerged, each with their advantages and limitations. 4.1.1 Additive Aggregation In the Workload Profile assessment procedure, the accounted workload dimensions are based upon the multiple resource theory proposed by Wickens (1984, 2008). Subjects, after task completion, are required to rate the proportion of attentional resources used for performing a given task, by using a rate in the real range 0–1. A rating of 0 means that the task placed no demand on the dimension being rated while 1 indicates that it required maximum attention (Tsang & Velazquez, 1996). On the one hand, the advantage of this aggregation strategy lies in its simplicity, as it is only a sum of each of the 8 rates. On the
216
other hand, from a computational perspective, the limitation is that it implies that each dimension has the same strength in affecting overall mental workload. 4.1.2 Weighted Aggregation and Preferentiality In order to deal with the limitation of simple additive models of mental workload, some authors have incorporated the notion of preferentiality of attributes. Here, the assumption is that the state of an operator, his/her previous knowledge, as well as external factors among others, all might have different influences on mental workload. A well-known model that embeds such an assumption is the NASA Task Load Index instrument (Hart, 2006). Here, the combination of the factors believed to influence mental workload is not based on a simple sum, but rather on a weighted average. Each factor is quantified through a subjective judgment using a discrete range 1–20 whose weight is computed via a paired comparison procedure. Participants are required to decide, for each possible pair (binomial coefficient) of the 6 attributes, which of the two contributed more to their workload during the task, such as “Temporal or Physical Demand,” “Physical Demand or Frustration,” and so forth, giving a total of 15 preferences—binomial (6)(2) = (6!)/(2! (6-2)!) = 15. The weights are the number of preferences, for each dimension, in the 15-answer set and the range is from 0 (not relevant) to 5 (more important than any other attribute). Eventually, the final human mental workload score is computed as a weighted average, considering the subjective rating of each attribute multiplied by the correspondent weight. Although intuitive, the main issue associated to this aggregation approach is that, in the case a new dimension has to be added, the paired comparison procedure will be more tedious, as requiring more judgments by participants. With only 9 or 10 dimensions, the comparisons required are respectively 36 and 45, which can be too cumbersome for an operator. Various authors have acknowledged this issue, and simpler additive versions of the NASA-TLX have been proposed (Nygren, 1991). 4.1.3 Ranking-Based Aggregation In the Subjective Workload Assessment Technique (SWAT) three workload attributes (time, effort, and stress) are modeled using discrete numbers from 1–3. Each number has an associated qualitative description indicating a possible degree for that attribute. A pre-task procedure, referred to as scale development, requires a subject to rank 27 cards, yielded from the combinations of the three dimensions at the three discrete levels. This reflects his/her perception of increasing mental workload, beginning with the card representing the lowest workload to that representing the highest. The rationale is to produce data that are used to produce a scaling solution, which is tailored to the perception of workload by the group of an individual or group, into an interval scale solution within the range 0–100. The subsequent step, referred to as event scoring, is the actual rating of mental workload for a given task. Subjects are asked to rate it with regard to three dimensions and the scale value associated with this combination, obtained from the scale development in the previous phase, is subsequently assigned as the mental workload score for that task, with a value between 0 and 100 using the interval scale developed in the first step. The advantage of this aggregation strategy is the high diagnosticity and content validity that can be reached, since the operator is highly involved in providing rich information (Rubio et al., 2004; Vidulich & Tsang, 1986). However, the disadvantage is also the very cumbersome and tedious procedure for subjects to obtain the workload ratings. If a mental workload modeler wanted to add another attribute believed to influence mental workload, then the ranking procedure would become too long
HUMAN FACTORS FUNDAMENTALS
and thus inappropriate, since with 4 attributes, the number of cards would be 64 with evident negative repercussion on their comparison and the rank procedure. Another disadvantage is that there is no consideration of how the three attributes of mental workload interact with each other. 4.1.4 Ad Hoc Aggregations Hancock and Chignell (1988) employed the construct of mental workload as a means of investigating the capability of operators interacting with machine through interfaces. Their theoretical formulation of mental workload, inspired by a power function, formalisms widely applied for fitting psychological data, includes the notions of skill of operators, the time pressure they are exposed to as well as the effort exerted for the execution of a task Their formalism is: MWL = 1∕e t(s − 1) where e is the effort exerted by an individual operator, t the actual time available for action and s the operator’s skill degree. Despite the very precise theoretical formalization, with expectations on high diagnosticity, validity, and sparsity of workload scores achievable in empirical contexts, some disadvantages exist. First, as also agreed by the authors, the use of the function does not solve the problem of workload assessment as the degree of effort, skill, and temporal constraint, should be quantified and scaled using the same data range. Second, the formalism is ad hoc, not extensible: it is extremely hard to add further attributes believed to influence mental workload. Eventually, there is no account of how the three attributes interact with each other. 4.2 Computational Modeling Frameworks In a more recent work, it has been argued that MWL could be better defined in a framework consisting of multiple indices of workload rather than a single index such as overall mental workload (Xie & Salvendy, 2000a). In detail, the authors proposed the following indexes: instantaneous workload, peak workload, average workload, accumulated workload, and overall workload (Figure 7). This proposal has been further articulated in terms of the properties of the analog workload signal by Hancock (2017). Instantaneous workload is aimed at measuring the dynamics of workload, a process characterized by constant change rather than being static (Rouse. Edwards, & Hammer, 1993). This can be found in most of the physiological measures, and it can be assessed at any time during task execution. Subjective and performance measures usually do not consider instantaneous workload, but rather an overall index is computed after task completion. Some studies have shown that it is possible to measure instantaneous workload even with these measures for short-period tasks (Verwey & Veltman, 1996). The peak
Instantaneous Workload Peak Workload
Average Workload
Accumulated Workload
Time
Figure 7 Indexes of mental workload in the framework by Xie and Salvendy (2000b).
MENTAL WORKLOAD
217
workload coincides with the maximum value of instantaneous workload detected within a task. If this value exceeds the maximum assumed mental workload limit, as defined by the red line threshold, then the operator may suffer with consequence on performance which starts degrading. The accumulated workload is a measure of the total amount of workload experienced by an operator during task execution (area below the instantaneous workload curve, Figure 7). The average workload is a measure of intensity of workload and it is the average of the instantaneous workload values, which is the accumulated workload per unit time. Normally, a limit is assumed for the average workload and if the latter exceeds the former, performance suffers. As mental workload is related to the task duration, both the average workload and the accumulated workload are necessary. The combination of these two indexes might lead to an accurate measure of workload in both long-term and short-term tasks. Eventually, the overall mental workload can be derived from the previous indexes and it represents the individual’s experience of mental workload based upon the working procedure as a whole. In detail, Xie and Salvendy (2000a) suggest that the overall workload is the mapping of the instantaneous workload or the accumulated and the average workload in the brain of an operator. The relationship between instantaneous workload and overall workload can be described with a mapping function f1 (). Similarly, the relationship between the average, accumulated workload and the overall workload can be represented by another mapping function f2 (). If the time interval of the task is fixed, the accumulated workload as well as the average workload should be proportional to overall workload. These relationships among the indexes of mental workload are depicted in Figure 8. Formally, this is defined as: Wpeak = Max {Winsta (t)} t
Wacc (t) =
∫0
Winst (u) du
1 W (t) t acc = f1 [Winst (t)] = f2 [Wacc (t), Wavg (t)] Wavg (t) =
MWLtot
where t indicates time, Winst is the instantaneous workload, Wpeak indicates the peak workload, Wacc is the accumulated workload, Wavg indicates the average workload and finally MWLtot is the overall workload. f1 () and f2 () are the mapping functions and they depend both upon the given task and the particular individual. Xie and Salvendy agree that the consideration and incorporation of individual factors in the assessment of mental workload is essential. Therefore, they proposed an extension of their framework to make its predictive capacity consistent with the subjective experience of mental workload
Instantaneous workload
Peak workload
Average workload
Accumulated workload
Overall workload
Figure 8 Relationships among the indexes of mental workload in the framework by Xie and Salvendy (2000b).
of an individual. Intuitively, people are not able to accomplish a given task with full efficiency because they can be often distracted and therefore are prone to errors and mistakes. Thus, all the effort that a subject might exert to accomplish a task does not fully contribute to its fulfillment. As a consequence, mental workload has been divided into ineffective and effective. Effective workload directly contributes to the fulfillment of a task and it coincides with the amount of workload that an operator bears while executing a task correctly and mostly efficiently. Ineffective workload does not contribute positively to the fulfillment of the task, it is person-specific, and an operator, through training and learning, can significantly decrease it. Given these two new indexes of workload, the above formula can be updated as: Wacc = Weff + Wineff Wavg =
Wacc T
with T the time available to perform a task, Weff and Wineff respectively the effective and ineffective workloads. This model assumes that an operator can fully concentrate during task execution, but in real-world circumstances, this is not often the case. Therefore, even if a task is extraordinarily complex and cumbersome, a subject can still withdraw from cognitive engagement and decide not to do anything to accomplish it. In turn, the workload can be null and, if this is the case, then there must be another reason that influenced it. This is what Xie and Salvendy (2000b) referred to as the degrading factor (DF), a number bounded in the range 0–100 where 0 indicates the total lack of willingness to perform the task, 1 the full concentration devoted to it, with intermediate values correspond either to partial degrees of concentration or fully attention just to some part of a task. Considering these considerations, the above formula can be updated as: Wacc = DF × (Weff + Wineff ) The implication behind the above model is that mental workload could be reduced and, as a consequence, the efficiency of an operator (EFC) can be improved by controlling those factors responsible to the formation of the ineffective workload. EFC =
Weff Weff + Wineff
Effective, ineffective workloads and the degrading factor can be affected by other attributes such as stress, knowledge, motivation, attitude, task complexity, and uncertainty as well as task duration. These factors are domain-specific, user-specific and they are important for the definition of a robust model of mental workload. Similarly, in order to define a task in a precise way, the environment in which a task is executed should be accounted for. In fact, so far the above model of mental workload was suitable for single-task environments. However, it could easily be expanded for multi-task environments where not only each task might influence other tasks, but an operator has a further mental effort required to their management. This effort is referred to as management load (ML), which is necessary to control the concurrent tasks, their scheduling, and switching: Weff = Wineff =
n ∑ i=1
n ∑
Weff for task i
i =1
Wineff for task i + ML
218
HUMAN FACTORS FUNDAMENTALS
with n the number of tasks simultaneously performed and ML the management load. Eventually, the final overall mental workload score can be computed as: MWLtot−multi−t =
n ∑
MWLtot for task I + EFC + ML
i =1
where HMWtot−multi−t is the overall multi-task workload, EFC is the efficiency and ML is the management load. The implication of the above-extended model is that the mental workload exerted by an operator, when performing parallel tasks, always generates higher workload than the simple sum of the overall mental workload, of the same tasks, performed individually. A practical experimentation of this model can be found in Xie and Salvendy (2000a). Despite the fact that this framework is an important contribution and step towards a better definition of mental workload, it is a theoretical proposal for modeling it that needs to be validated empirically. In addition, although it appears to be a complete framework for a comprehensive modeling of mental workload, it does not mention how a designer can embed in it user-specific, task-specific, and context-specific attributes of workload nor it takes into consideration how to formally model each of them. Similarly, this lack of formalization at the attribute level is accompanied also by an absence of consideration of their relationships and potential interactions. Eventually, as acknowledged by the authors themselves, another issue is the representation of the two mapping functions that are unknown in the literature and they represent only a theoretical proposal (Xie & Salvendy, 2000a). The specific aforementioned issues—lack of formalization at the attribute-level and absence of consideration of their potential interaction – have been tackled in recent works by Longo (2014, 2015). In detail, here it is assumed that (1) mental workload is a complex construct built upon a network of pieces of knowledge with different strength and (2) accounting for the relationships of these pieces of knowledge, and resolving the potential inconsistencies arising from their interaction, are essential in modeling MWL. In formal logics, these assumptions are the key components of a defeasible concept: a concept built upon a set of interactive pieces of knowledge, the reasons that can become defeated by additional ones. The term “defeasible” is borrowed from the multi-disciplinary field of defeasible reasoning aimed at studying the way humans reason under uncertainty and with contradictory and incomplete knowledge. In other words, it is a form of reasoning built upon reasons that are not infallible and a conclusion or claim, derived from the application of previous knowledge, can be retracted in the light of new knowledge. Argumentation theory, a new important multi-disciplinary topic in artificial intelligence that incorporates element of philosophy, psychology and sociology and that studies how people reason and express their arguments, provides a computational implementation of defeasible reasoning. This implementation has been proved useful for implementing reasoning activities and modeling complex constructs, such as mental workload (Rizzo & Longo, 2018). In detail, an argumentation framework
E F
A
D G
B
Figure 9
C
(Figure 9) can be seen as a graph, where each node represents an argument, and arrows represent contradictions among them. In detail, an argument is a particular formalism built upon premises that support a claim. In the context of mental workload modeling, for example, premises can be seen as one or the union of more mental workload attributes, that support the inference of a certain claim, for instance, a certain degree of overall mental workload (low, optimal or high). Arrows can be seen as attacks relations, and can model exceptions among arguments, particular cases in which an assessment of mental workload from an argument cannot be brought forward. An example of an argumentation framework for mental workload modeling can be found in Figure 9. Here, arguments A and B contradict each other, as they support opposite mental workload levels. Argument C is built upon two premises, differently than all the others. Argument F instead attacks argument B by underlying a contradiction: logically with high engagement, an operator cannot experience low effort. Similarly, if high performance has been achieved, it is unlikely an operator is highly frustrated. Arguments can logically interact with each other, forming inconsistencies and sometimes cycles. Therefore, a strategy for resolving these inconsistencies is needed. For this, readers are referred to Longo and Dondio (2014) for further formalisms and for other theoretical details on defeasible reasoning and argumentation theory, argument formation and resolution of inconsistencies. Similarly, they are referred to Longo (2014, 2015) for practical examples on how this framework can be employed in practice for empirical mental workload modeling. One of the advantages of this framework is that it allows mental workload designers, with different backgrounds, research experiences and knowledge, to reason over mental workload and formally build their own models that can be eventually exploited and empirically tested with real world data. Additionally, this knowledge-based framework does not make any assumption on how mental workload should be shaped, which attribute should be included, which scales should be used for their quantification, and whether any form of preferentiality among them should be considered or not. However, it does provide human designers with a practical tool for modeling individual attributes and infer from them a mental workload level with the precise notion of Argument −> Premises −> Claim rules It also provides them with a concrete formalism to account for inconsistencies or exceptions, with the precise notion of attack, and with strategies for resolving potential inconsistencies, arising from their interaction, towards a meaningful, rational, and justifiable index of overall mental workload. The framework leaves a modeler with the freedom of choice in terms of which attributes to include in an argument, according to their own background, discipline, experience, intuitions, purposes, and theoretical assumptions, and how to make explicit counterarguments, in the form of attack relations. However, the limitation of this framework is the initial effort required to translate a human designer’s knowledge base into a set of interacting arguments
A: high temporal demand –> high workload B: low mental demand –> low workload C: high frustration AND high engagement –> high mental workload D: low effort –> low mental workload E: high engagement –> NOT low effort F: high engagement –> NOT low mental demand G: high performance –> NOT high frustration
Example of an argumentation framework for mental workload modeling. Source: Longo (2014n 2015), Rizzo et al. (2017).
MENTAL WORKLOAD
219
(for example in Figure 9), similarly to all the knowledge-based approaches to inference under uncertainty within the discipline of artificial intelligence. Another recent approach for modeling mental workload without the assumptions of how relevant attributes should be modeled and how their interaction should be formalized was presented in Moustafa et al. (2017; Moustafa, 2018). Here, machine learning was used as the method for learning from data. Machine learning is an application of artificial intelligence that automatically creates models from data without the need to explicitly and formally represent them. In other words, a mental workload designer can select a set of attributes, believed to influence mental workload, and let a machine learning technique learn their importance and how, and to what extent they interact with each other. This approach can be used with subjective measures of workload, for example to fit an objective task performance measure, as in Longo (2018). It can also be used with physiological measures, as in the case of electroencephalography, without the need to define the list of mental workload attributes, and let a deep learning technique to extract the most salient ones (Hefron et al. 2018; Yin & Zhang, 2018). In summary, modeling mental workload as a computational concept is not a trivial task. Table 3 highlights the various advantages and disadvantages of different mental workload models and frameworks. Here, extensibility refers to the effort required to extend a given model by incorporating additional attributes for mental workload modeling. Preferentiality is the characteristic of a model to account for differences in attribute influence on overall mental workload. Sparsity refers to the expected variation in mental workload scores while explainability is the characteristic of a model to be interpretable by a researcher or designer. Dynamism is the characteristic of a model or framework to account for temporal variation of
workload and attribute interaction refers to the capability of a model to account for interaction of attributes. Eventually, knowledge-based, and data-driven models are respectively those that are built upon the declarative knowledge of a designer and those who are formed by an automatic learning procedure from data.
5 SOME CURRENT CHALLENGES TO MENTAL WORKLOAD AND ITS ASSESSMENT Even within the relatively large scope of a handbook chapter, it is not possible to consider all of the current challenges associated with the concept of mental workload. Thus, we have here been constrained to focus only upon some of the more pressing issues. We see no more pressing issue than that encapsulated by the non-converging pattern of evidence derived from multiple measures of workload. It is one that has been referred to as the AIDs (associations, insensitivities, and dissociations) of workload (see Hancock & Matthews, 2019). As we have explained above, there are three major methods of assessing mental workload. They are derived from the primary performance measures from the task at hand, from the performer’s subjective assessment of their own affective state, and the suite of physiological measures that can be derived from a variety of associated assessment procedures. Optimally, we would like to see them presenting a coherent story. Thus, as performance improves, the person’s subjective appreciation reflects this, as do their associated physiological patterns, derived from multiple sources. This converging and reinforcing pattern has been referred to as “association” (see Hancock & Matthews, 2019). In practical circumstances, where we get such associations, we are confident that our various reflections of mental workload are telling us a
Table 3 An Account of the Advantages and Disadvantages of Computational Models and Frameworks for Mental Workload Modeling Models/frameworks
Reference
Advantage
Disadvantage
Workload Profile
Tsang & Velazquez (1996) Wickens & Hollands. (1999)
Simplicity due to additivity Some extensibility Moderate explainability
No preferentiality No attribute interaction
NASA
Hart (2006).
Preferentiality Moderate explainability
Low extensibility No attribute interaction
SWAT
Vidulich & Tsang (1986)
Preferentiality
No extensibility No attribute interaction
Power function
Hancock et al. (1988)
Expected sparsity
No extensibility
Dynamic MWL
Xie & Salvendy (2000a)
Some extensibility Dynamism
No attribute formalism
Argument-based
Longo (2014, 2015)
Knowledge-based High extensibility Attribute interaction High explainability
High effort for knowledge-based formation
Machine Learning
Yin & Zhang (2018) Moustafa et al. (2017), Moustafa & Longo (2018) Hefron et al. (2018)
High extensibility No workload assumption Attribute interaction Data-driven
Knowledge-unaware Low/medium explainability
220
HUMAN FACTORS FUNDAMENTALS
consistent story and so we can use them to help predict future states, design system changes, and develop work strategies with a fair degree of confidence in success. However, associations are not always the patterns we see. Occasionally, performance will show the individual improving, but subjectively they feel no difference. In such cases the subjective dimension is “insensitive” with respect to performance change. Of course, such associations, insensitivities, and dissociations can be expressed in terms of each of the three major methods of assessment as is shown in Figure 10. This illustration shows the 33 (27) combinations that can accrue. Of course, there are many individual measures of primary task performance, subjective response, and physiological indicators as we have seen. Thus, the expression of these varying AIDs patterns can be much more complicated than the matrix in Figure 10 illustrates. It is important to note that “dissociations” between primary performance and subjective response especially is not a new concern (see Yeh & Wickens, 1988). Now some more than two decades ago, Hancock (1996) pointed to the various patterns that could result when taking multiple measures of mental workload. 5.1 The Practical Importance of Mental Workload 5.1.1 Workload Assessment and Its Role in Evolving Work
(MEASURES)
SUBJECTIVE
One of the most common methods of managing the stress associated with high mental workload has always been to provide the operator with computerized assistance and handing over task elements either in part (adaptive aiding) or in full (adaptive task allocation; Parasuraman, Mouloua, & Hillburn, 1999). Thanks to advances in computer science, materials science, and human factors/ergonomics, operational domains are rapidly and ubiquitously implementing automation in the workplace. With growing technological sophistication, automated systems are capable of performing more complex mental workload operations than ever before. As a result, the nature of work for the operator in
+ A+
o – A–
PR
– IM
AR
o
(M EA Y T A SU RE SK S)
+
–
o
+
L
A GIC
LO S) E SIO PHY ASUR (ME
Figure 10 An overarching workload matrix illustrating three disparate measures in which the respective methods can reflect either increasing, decreasing, or stable workload responses. Clearly, the opportunity exists for double associations (in which all methods agree, labeled A+ and A−). There can also be single associations (two methods agree, but one is either insensitive or dissociates from the others). Equally, there may be double dissociations (in which all methods disagree), or single dissociations (such that only two methods disagree; and see single associations). Insensitivities (methods show no change in relation to varying external task demand) can be plotted within each the remaining spaces after the association/dissociation relationships have been established. Source: after Hancock (2017).
the human–machine dyad is fundamentally changing. Our theoretical examinations of the nature of workload as well as our techniques for accurately assessing workload needs must change right along with it. Where once the human was an active operator, now with semi- and fully autonomous systems deployed in the work arena, the human’s role has changed to be that of a teammate, a supervisor, or a monitor. This fundamental role shift does not equate to an automatic reduction in mental workload, but rather a shift in its nature. Where automation was once a tool to aid in the recovery or prevention of overload in the operator, so now the widespread use of highly autonomous systems presents the dangers of underload (boredom, inattentiveness, fatigue, etc.; Parasuraman, Cosenzo, & De Visser, 2009; Raufi, 2019; Young & Stanton, 2002). A greater understanding and empirical examination of the task demands, operator characteristics, and environmental conditions that influence underload are consequently warranted, and a renewed focus on its valid and reliable assessment is critical for designing future automated systems necessitating monitoring. Our understanding and assessment of overload must also be revisited for the role of supervisor when overseeing the functioning of multiple semiand fully autonomous systems (e.g., urban air mobility control, unmanned aerial vehicle operations, etc.; and see Mouloua, Gilson, Kring, & Hancock, 2001). Research therefore remains necessary across the spectrum of mental workload as automation is altering the nature of mental workload and the thresholds at which humans are able to effectively adapt to it. Moreover, there remains complex cognitive workload which, even with the most sophisticated of automated assistance, can only be completed by humans (Parasuraman & Wickens, 2008). 5.1.2 Application Realms of Workload Evaluation As the introduction and increasing capacities of automated systems have infiltrated a host of operational domains, these systems then have also broadened the horizons for the application of mental workload and its evaluation. Not only in the aforementioned new circumstances of underload and overload, but also in the rapid transitions between the two states. Perhaps the most prolific and problematic area now experiencing these issues is with self-driving vehicles. Manufacturers are marketing these vehicles with underload as a selling point, highlighting the spare capacity that operators can use for performing other activities (e.g., reading, sleeping, watching television, etc.). However, at the speeds these vehicles are capable of traveling, take-over procedures—wherein the human must recover control of the vehicle as the automation has encountered conditions under which it is incapable of effectively operating—may sometimes be necessary on the order of seconds or less (Eriksson & Stanton, 2017). Such a rapid transition from low to high workload undoubtedly places great demand on the operator, and presents researchers designing effective and dynamic assessment techniques with a great challenge as to how to evaluate and model such swift fluctuations in state. As with other disciplines interested in distributed cognition (Hutchins, 2001), the modeling of cognitive structures, sub-systems, and their relationships to each other to produce cognition, research on any human cognitive process (especially mental workload) is of interest to the field of artificial intelligence (AI). AI researchers have made great strides in weak artificial intelligence, which models a narrow, specialized set of human cognitive abilities (such as information processing and mental workload; Wang & Siau, 2019). Advances in weak AI are furthermore necessary to achieve strong AI, or genuine human-like intelligence in a non-human agent (Wang & Siau, 2019).
MENTAL WORKLOAD
5.2 Future Issues in Mental Workload Although our present chapter is focused upon the current state of understanding, predicated upon the collective research of the past, we cannot neglect the opportunity to express some observations about where workload assessment will be going in the near future. These trends are of interest to both the theoretician who must recognize in which direction the cutting edge of research is proceeding, as well as the practitioner who must enact new methods and insights in real-world realms. We begin this brief assessment with the increasing capacities of measurement methodologies. It is almost incontrovertible that we will see ever more sophisticated, most especially in physiological and neurophysiological assessment techniques. In addition to new and innovative measures, the revolution in micro-electronics and wireless capabilities now means that these signals can be elicited and broadcast with ever greater facility. Hand-held devices, and soon differing forms of in-dwelling electronic sensors will mean that prohibitive amounts of data will be generated, even by individuals involved in simple, everyday activities. For those involved in the control of safety-critical systems, such assessment capacities are liable to be even greater. The challenge for the researcher will be to understand how these foundational physiological signals match specifically to the context-relevant workload assessment. The challenge to the practitioner will be to refine and incorporate these for real-time, dynamic usage. The task here, as it will be elsewhere in the world of advanced technology, will be to distill the relevant information to support critical insight, from the avalanche of ever-increasing data—in essence, a signal-to-noise challenge. Nor will the proliferation of available data be confined to physiological signals. Rather, those metrics which calibrate primary task performance will themselves become more plentiful and more prevalent. The issue with primary task performance has always been its retrospective nature. These measures serve well to tell us what has happened, they are rather restricted in trying to tell us what will happen. Advanced data collection, modeling, and “quickened” signal processing will look to address this shortfall. Such efforts will also benefit greatly from the advancing diagnosticity of the aforementioned suite of increasing physiological reflections of cognitive demand. In consequence of these developments, we anticipate that there will be significant advances in workload modelling and prediction that will provide at least a first pass, reasonable prospective assessment of what to expect from exposed human operators. Of course, to accomplish this, we have to resolve the presently recognized issue of associations, insensitivities, and dissociations that we raised in more detail previously (see Hancock & Matthews, 2019). However, these technological and modelling developments are not the only ones that we foresee. These other trends will, we believe, be very much contingent upon the evolving nature of work itself. We elaborate on this aspect of the future by first looking at an optimistic prognostication of coming work. In the above observations on technological developments, we did not expound upon the utility of advancing subjective assessment measures. Although these will also surely see some advance, our concern here is for the nature and quality of work that will be required of the human partner in human–machine teams. In relevant sciences concerning the design of work, such as human factors and ergonomics, the principle advocated most strongly is one that features and supports a human-as-operator-centered theme. From this viewpoint how the individual worker feels, what stresses they experience, what levels of fatigue they encounter are all central to consideration of design and operations. These are laudable efforts and are worthy of general approbation and collective social support (Hancock, 2009). However, it is fairly clear that a financial system riven with the mantra of “profit at all
221
cost,” is not itself centrally concerned, or even marginally motivated, for the quality of such human experiences. Rather the converse. Although considerable lip service is paid to worker safety, comfort, and associated wellness—and it should be acknowledged that for many organizations these concerns might be genuine enough—the dominant behemoth of the “bottom line” always seems to dominate actuarial calculus, and so the inherent hypocrisy of these assertions always hovers over the exposed workforce. What we do need to recognize is that experienced workload is critically conditioned upon affect and appraisal. We therefore observe that there is no necessary reason why human occupations cannot be made truly hedonic in nature, simply by the redesign of the work process (see Hancock, Pepe, & Murphy, 2005). Optimistically, advanced workload assessment capacities can embrace and emphasize this notion of “hedonomics” to ensure that all human workers are presented with and confronted by interesting, challenging, and rewarding work tasks. However, we are not sanguine about such an eventuality and our concerns are founded in the tidal wave of automation and incipient autonomy, a topic which was discussed in Section 5.1 on practical importance but that we briefly revisit and conclude here our brief glance at the future of mental workload and its assessment. There is little dispute about the growth of automation in technology and, further, few would disagree that we are also seeing a strong move toward greater autonomy in such systems. The question which challenges us presently concerns the human role in respect of these advancing technologies. Some envisage these systems as forms of companion or team-mate in which there is a strong emphasis on collaborative control. Such a line of development would still see much play for human cognitive workload assessment, at least within the next two or three decades. Others have begun to catalogue how advances in embodied automation, such as robots, may well serve to totally replace current human contributions (Hancock, 2017). Indeed, one report lists the degree of such redundancy and the expected cessation of human participation, couched within assessed probability across time (Frey & Osborne, 2013). Here, human cognitive workload assessment might be rather obviated by developments while, interestingly, assessment of computer “cognitive” load may actually burgeon in importance. The latter field, however, is much more likely to be that of the computer scientists rather than the behavioral researcher. In actuality, like all forms of technological advance, we are most likely to see a general palimpsest of developments where, in some domains, highly independent automation will dominate, but yet in other realms humans will remain central to successful activity (see Parasuraman & Wickens 2008). Elsewhere (Hancock, Nourbakhsh, & Stewart, 2019), it has been argued that the principal, public “battlefield” that will witness this evolutionary conflict, will be in the area of advanced and automated transportation. More especially, the individual, personal vehicle will be the location in which most individuals will encounter these pressing contentions. To what degree people will be willing to hand over control is a most pressing current issue. It is more than likely that perceived workload, and the effort associated with vehicle control, will be a factor which influences consumer choice and so the nature of the technology that is proposed and developed. This is also a generational issue. While those individuals brought up with driver-controlled vehicles may be reticent to hand over control to automation, younger generations, for whom personal vehicle ownership might even become rather anachronistic, may well tend to spend their time and effort on tasks other than steering and braking. Whether they will be equally ready to hand over other aspects of their professional pursuits to autonomous systems, very much remains to be seen. When automation is supportive, it can help the individual achieve their goals. When autonomy is replacive, it can reduce the active
222
individual to penury and despair. Workload assessment has its role in this flux of advances, and especially in conceiving and designing of companion, collaborative systems. Yet when it becomes an instrument of oppression, we in HFE need to be vocal advocates of its human-featured priority. Whether we live and prosper in the groves of academe or toil and labor in the “dark, satanic mills” of practical workplaces, our allegiance to a human-centered priority has to remain steadfast for, eventually, good ergonomics and good economics do not necessarily covary.
6 CONCLUSION The assessment of cognitive or mental workload has a relatively prolonged history, as say compared to more recent HFE concepts such as situation awareness, operational transparency, or even trust. Like many issues in HFE, workload assessment arose from a combination of very practical, real-world necessities allied to important contemporary scientific advances, mostly in the basic foundational theories of experimental psychology, and especially attention. For constituencies such as NASA and the U.S. Air Force, it became essential for them to provide some form of quantitative, as well as qualitative assessment of exactly how hard their pilots and other personnel were working; especially as they approached the edge of what was cognitively tolerable. The same strictures affected many other constituencies such as surgeons in medical facilities, nuclear power plant operators, firefighters, emergency responders of all sorts, and even financial managers. The inability to cope with the cognitive demands of such work, i.e., going over the workload “red line” could, and still does, lead to performance failure and, worse, even disaster and fatality. Little wonder such professionals turned to both engineering and psychological models and theories to provide answers to what are often referred to as “wicked” problems. Psychology, in particular, reached into its tool bag to provide both quantifiable subjective assessments and physiological indicators while engineering had reached back to its roots to access time and motion techniques to provide their own particular insight. There is no “right” or “wrong” answer within these techniques, they just prove more or less efficacious under differing circumstances and we have looked here to provide some guidance as to when each is of most use. However, the growth of micro-electronics and remote processing capabilities effectively means that the modern HFE professional does not have to choose between each and all of these techniques but now has the luxury of accessing and using all of these, and potentially others, in their explorations. The challenge now comes in the form of interpreting these summed data.
7 ACKNOWLEDGMENTS We would very much like to thanks Alejandro Arca for all his assistance with respect to the preparation of this present chapter.
REFERENCES Allen, M. T., & Crowell, M. D. (1989). Patterns of autonomic response during laboratory stressors. Psychophysiology, 26(5), 603–614. Baber, C. (1991). Speech technology in control room systems: A human factors perspective. Chichester: Ellis Horwood. Beilock, S. L., Wierenga, S. A., & Carr, T. H. (2002). Expertise, attention, and memory in sensorimotor skill execution: Impact
HUMAN FACTORS FUNDAMENTALS of novel task constraints on dual task performance and episodic memory. Quarterly Journal of Experimental Psychology, 55A(4), 1211–1240. Brookhuis, K. A., & De Waard, D. (2000). Assessment of drivers’ workload: performance, subjective and physiological indices. In P.A. Hancock & P.A. Desmond (Eds.), Stress, workload and fatigue (pp. 321–333). Mahwah, NJ: Lawrence Erlbaum. Brookhuis, K. A., & de Waard, D. (2010). Monitoring drivers’ mental workload in driving simulator using physiological measures. Accident Analysis and Prevention, 42, 898–903. Brouwer, A. M., Hogervorst, M. A., Van Erp, J. B., Heffelaar, T., Zimmerman, P. H., & Oostenveld, R. (2012). Estimating workload using EEG spectral power and ERPs in the n-back task. Journal of Neural Engineering, 9(4), 045008. Brouwer, W. H., Waterink, W., van Wolffelaar, P. C., & Rothengatter, T. (1991). Divided attention in experienced young and older drivers: lane tracking and visual analysis in a dynamic driving simulator. Human Factors, 33(5), 573–582. Brown, I. D. (1978). Dual task methods of assessing work-load. Ergonomics, 21, 221–224. Brown, J. W. H., Revell, K. M., & Stanton, N. A. (2020). Quantifying mental workload in performance driving: The motor racing load index (MRLIN). In R. Charles & D. Golightly (Eds.), Contemporary ergonomics and human factors 2020 (pp. 109-112). Chartered Institute of Ergonomics and Human Factors. Cain, B. (2007). A review of the mental workload literature. (Tech Report #RTO-TR-HFM-121-Part-II). Toronto, Canada: North Atlantic Treaty Organisation/Defence Research and Development Toronto. Carlson, N. R. (2013). Physiology of behavior (11th ed.). Boston, MA: Pearson. Charles, R. L., & Nixon, J. (2019). Measuring mental workload using physiological measures: A systematic review. Applied Ergonomics, 74, 221–232. Chi, C.-F., Cheng, C.-C., Shih, Y.-C., Sun, I.-S., & Chang, T.-C. (2019). Learning rate and subjective mental workload in five truck driving tasks. Ergonomics, 62(3), 391–405. Cnossen, F., Meijman, T., & Rothengatter, T. (2004). Adaptive strategy changes as a function of task demands: a study of car drivers. Ergonomics, 47(2), 218–236. Cooper, G. E., & Harper, R.P. (1969). The use of pilot ratings in the evaluation of aircraft handling qualities. Advisory Group for Aerospace Research and Development (AGARD). Report 567, NATO. London: Technical Editing and Reproduction Ltd. Cummings, M. L., Myers, K., & Scott, S. D. (2006). Modified Cooper Harper evaluation tool for unmanned vehicle displays. In Proceedings of UVS Canada: Conference on Unmanned Vehicle Systems Canada. Desmond, P. A., & Hoyes, T. W. (1996). Workload variation, intrinsic risk and utility in a simulated air traffic control task: evidence for compensatory effects. Safety Science, 22(1–3), 87–101. Desmond, P. A., Hancock, P. A., & Monette, J. L. (1998). Fatigue and automation-induced impairments in simulated driving performance. Transportation Research Record, 1628, 8–14. Eggemeir, F. T. (1988) Properties of workload assessment techniques. In P. A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 41–62). Amsterdam: Elsevier. Eggemeier, F. T., & Wilson, G. F. (1991). Performance-based and subjective assessment of workload in multi-task environments. In D.L. Damos (Ed.), Multiple-task performance (pp. 217–278). London: Taylor & Francis. Eggemeier, F. T., Wilson, G. F., Kramer, A. F., & Damos, D. L. (1991). General considerations concerning workload assessment in multi-task environments. In D. L. Damos (Ed.), Multiple task performance (pp. 207– 216). London: Taylor & Francis. Ephrath, A. R., & Young, L.R. (1981). Monitoring vs. man-in-the-loop detection of aircraft control failures. In J. Rasmussen & W. B.
MENTAL WORKLOAD Rouse (Eds.), Human detection and diagnosis of system failures (pp. 143–154). New York: Plenum Press. Eriksson, A., & Stanton, N.A. (2017). Takeover time in highly automated vehicles: Noncritical transitions to and from manual control. Human Factors, 59(4), 689–705. Estes, S. (2015). The workload curve: Subjective mental workload. Human Factors, 57(7), 1174–1187. Evans, D. C., & Fendley, M. (2017). A multi-measure approach for connecting cognitive workload and automation. International Journal of Human-Computer Studies, 97, 182–189. Fairclough, S. H. (2009). Fundamentals of physiological computing. Interacting with Computers, 21(1–2), 133–145. Fallahi, M., Motamedzade, M., Heidarimoghadam, R., Soltanian, A. R., & Miyake, S. (2016a). Assessment of operators’ mental workload using physiological and subjective measures in cement, city traffic and power plant control centers. Health Promotion Perspectives, 6(2), 96–103. Fallahi, M., Motamedzade, M., Heidarimoghadam, R., Soltanian, A. R., & Miyake, S. (2016b). Effects of mental workload on physiological and subjective responses during traffic density monitoring: A field study. Applied Ergonomics, 52, 95–103. Foy, H. J., & Chapman, P. (2018). Mental workload is reflected in driver behaviour, physiology, eye movements and prefrontal cortex activation. Applied Ergonomics, 73, 90–99. Frey, C. B., & Osborne, M. A. (2013). The future of employment: How susceptible are jobs to computerization. Oxford: Oxford Martin School, University of Oxford. Gaba, D. M., & Lee, T. (1990), Measuring the workload of the anesthesiologist, Anesthesia and Analgesia, 71, 354–361. Gawron, V. J. (2008). Human performance, workload, and situational awareness measures handbook (2nd ed.). Boca Raton, FL: CRC Press. Gawron, V. J., Schiflett, S. G., & Miller, J. C. (1989). Measures of in-flight workload. In R.S. Jensen (Ed.), Aviation psychology (pp. 240–287). Aldershot: Brookfield. Ghanbary Sartang, A., Ashnagar, M., Habibi, E., & Sadegi, S. (2016). Evaluation of Rating Scale Mental Effort (RSME) effectiveness for mental workload assessment in nurses. Journal of Occupational Health and Epidemiology, 5(4), 211–217. Gopher, D., & Braune, R. (1984). On the psychophysics of workload: Why bother with subjective ratings? Human Factors, 26, 519–532. Gopher, D., & Donchin, E. (1986). Workload: An examination of the concept. In K. R. Boff, L. Kaufman, & J. P. Thomas. (Eds.), Handbook of perception and human performance, Vol. II, Cognitive Processes and Performance. New York: Wiley. Gopher, D., & Kimchi, R. (1989). Engineering psychology. Annual Review of Psychology, 40, 431–455. Hancock, P. A. (1989). The effect of performance failure and task demand on the perception of mental workload. Applied Ergonomics, 20(3), 197–205. Hancock, P. A. (1996). Effect of control order, augmented feedback, input device and practice on tracking performance and perceived workload. Ergonomics, 39, 1146–1162. Hancock, P. A. (2009). Mind, machine and morality: Toward a philosophy of human-technology symbiosis. Chichester: Wiley. Hancock, P. A. (2017). Whither workload? Mapping a path for its future development. In L. Longo & M. Chiara Leva (Eds.). Human mental workload: Models and applications. (pp. 3–17). Cham: Springer. Hancock, P. A. (2020). The humanity of humanless systems. Ergonomics in Design, in press. https://doi.org/10.1177/ 1064804619880047\ Hancock, P. A., & Caird, J. K. (1993). Experimental evaluation of a model of mental workload. Human Factors, 35(3), 413–429. Hancock, P. A., & Chignell, M. H. (1987). Adaptive control in human-machine systems. In P.A. Hancock (Ed.). Human Factors Psychology (pp. 305–345). Amsterdam: North-Holland.
223 Hancock, P. A., & Chignell, M. H. (1988). Mental workload dynamics in adaptive interface design. IEEE Transactions on Systems, Man, and Cybernetics, 18(4), 647–658. Hancock, P. A., & Desmond, P. A. (Eds.). (2001). Stress, workload, and fatigue. Mahwah, NJ: Lawrence Erlbaum. Hancock, P. A., & Matthews, G. (2019). Workload and performance: Associations, insensitivities, and dissociations. Human Factors, 61(3), 374–392. Hancock, P. A., & Meshkati, N. (Eds.). (1988). Human mental workload. Amsterdam: North-Holland. Hancock, P. A., Nourbakhsh, I., & Stewart, J. (2019). On the future of transportation in an era of automated and autonomous vehicles. Proceedings of the National Academy of Sciences, 116(16), 7684–7691. Hancock, P. A., Pepe, A. A., & Murphy, L. L. (2005). Hedonomics: The power of positive and pleasurable ergonomics. Ergonomics in Design, 13(1), 8–14 Hancock, P. A., & Volante, W. (2020). Quantifying the qualities of language. PLoS ONE, 15(5): e0232198. Hancock, P.A., Wulf, G., Thom. D., & Fassnacht, P. (1990). Driver workload during differing driving maneuvers. Accident Analysis and Prevention, 22(3), 281–290. Harms, L. (1991). Variation in drivers’ cognitive load: Effects of driving through village areas and rural junctions. Ergonomics, 34(2), 151–160. Hart, S. G. (2006). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society, 5, 211–217. Hart, S. G. & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P.A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 139–183). Amsterdam: North-Holland. Hefron, R., Borghetti, B., Schubert Kabban, C., Christensen, J., & Estepp, J. (2018). Cross-participant EEG-based assessment of cognitive workload using multi-path convolutional recurrent neural networks. Sensors, 18(5), 1339. Hendy, K. C., Hamilton, K. M., & Landry, L .N. (1993), Measuring subjective workload: When is one scale better than many? Human Factors, 35, 579–601. Herculano-Houzel, S. (2009). The human brain in numbers: A linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3, 31. Hilburn, B. (1997). Dynamic decision aiding: the impact of adaptive automation on mental workload. In D. Harris (Ed.), Engineering psychology and cognitive ergonomics. Vol. I Transportation systems (pp. 193–200). Aldershot: Ashgate. Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner, A. C., Zakland, A. L., & Christ, R. E. (1992). Comparison of four subjective workload rating scales. Human Factors, 34, 429–439. Hockey, G. R. J. (1986). Changes in operator efficiency as a function of environmental stress, fatigue, and circadian rhythms. In K.R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (pp. 44.1–44.49). New York: Wiley. Hockey, G. R. J. (1997). Compensatory control in the regulation of human performance under stress and high workload: A cognitive-energetical framework. Biological Psychology, 45, 73–93. Hockey, G. R. J., Briner, R. B., Tattersall, A. J., & Wiethoff, M. (1989). Assessing the impact of computer workload on operator stress: the role of system controllability. Ergonomics, 32(11), 1401–1418. Hsu, B-W., Wang, M-J. J., & Chen, C-Y. (2015). Effective indices for monitoring mental workload while performing multiple tasks. Perceptual & Motor Skills: Learning & Memory, 121(1), 94–117. Hughes, A. M., Hancock, G. M., Marlow, S. L., Stowers, K., & Salas, E. (2019). Cardiac measures of cognitive workload: A meta-analysis. Human Factors, 61(3), 393–414. Hutchins, E. (2001). Distributed cognition. In N. J. Smelser & P. B. Baltes (Eds.), The international encyclopedia of the
224 social and behavioral sciences (pp. 2068–2072). Oxford: Pergamon Press. Johannsen, G. (1979). Workload and workload measurement. In Mental workload (pp. 3–11). Boston: Springer. Johnson, A., & Widyanti, A. (2011). Cultural influences on the measurement of subjective mental workload. Ergonomics, 54(6), 509–518. Kajiwara, S. (2014). Evaluation of driver’s mental workload by facial temperature and electrodermal activity under simulated driving conditions. International Journal of Automotive Technology, 15(1), 65–70. Käthner, I., Wriessnegger, S. C., Müller-Putz, G. R., Kübler, A., & Halder, S. (2014). Effects of mental workload and fatigue on the P300, alpha and theta band power during operation of an ERP (P300) brain–computer interface. Biological Psychology, 102, 118129. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall. Kanitakis, J. (2002). Anatomy, histology and immunohistochemistry of normal human skin. European Journal of Dermatology, 12(4), 390–401. Kantowitz, B. H. (2000). Attention and mental workload: Ergonomics for the new millennium/ In Proceedings of the XIVth Triennial Congress of the International Ergonomics Association and the 44th Annual Meeting of the Human Factors and Ergonomics Society, San Diego, CA, July 29–August 4, 2000. Vol. 3: Complex Systems and Performance (pp. 456–459). Santa Monica, CA: HFES. Keene, J. R., Clayton, R. B., Berke, C. K., Loof, T., & Bolls, P. D. (2017). On the use of beats-per-minute and interbeat interval in the analysis of cardiac responses to mediated messages. Communication Research Reports, 34(3), 265–274. Kim, H-G., Cheon, E-J., Bai, D-S., Lee, Y. H., Koo, B-H. (2018). Stress and heart rate variability: A meta-analysis and review of the literature. Psychiatry Investigation, 15(3), 235–245. Kitamura, K., Murai, K., Furusho, M., Wang, Y. B., Wang, J., & Kunieda, Y. (2016). Evaluation of mixed culture bridge teammates’ mental workload using heart rate variability: Simulator-based ship handling. IEEE International Conference on Systems, Man, and Cybernetics (pp. 875–879). New York: IEEE. Kolb, B., & Whishaw, I. Q. (2009). Fundamentals of human neuropsychology (6th ed.). New York: Worth Publishers. Lansdown, T. C. (2002). Individual differences during driver secondary task performance: verbal protocol and visual allocation findings. Accident Analysis and Prevention, 34, 655–662. Lansdown, T. C., Brook-Carter, N., & Kersloot, T. (2004). Distraction from multiple in-vehicle secondary tasks: Vehicle performance and mental workload implications. Ergonomics, 47 (1), 91–104. Lesch, M. F. & Hancock, P.A. (2004). Driving performance during concurrent cell-phone use: Are drivers aware of their performance decrements? Accident Analysis and Prevention, 36, 471–480. Lidderdale, I.G. (1987). Measurement of aircrew workload in low-level flight. AGARDograph No. 282—The practical assessment of pilot workload. In A. Roscoe (Ed.), Advisory Group for Aerospace Research and Development/ (pp. 67–77).Neuilly sur Seine, France Liu, Y. (1996). Quantitative assessment of effects of visual scanning on concurrent task performance. Ergonomics, 39 (3), 382–399. Liu, Y. (2003). Effects of Taiwan in-vehicle cellular audio phone system on driving performance. Safety Science, 41, 531–542. Liu, Y., & Wickens, C. D. (1994). Mental workload and cognitive task automaticity: an evaluation of subjective and time estimation metrics. Ergonomics, 37(11), 1843–1854. Longo, L. (2014). Formalising human mental workload as a defeasible computational concept Doctoral dissertation, Trinity College. Longo, L. (2015). A defeasible reasoning framework for human mental workload representation and assessment. Behaviour & Information Technology, 34(8), 758–786. Longo, L. (2016). Mental workload in medicine: Foundations, applications, open problems, challenges and future perspectives. IEEE
HUMAN FACTORS FUNDAMENTALS 29th International Symposium on Computer-Based Medical Systems (pp. 106–111). New York: IEEE. Longo, L. (2018). Experienced mental workload, perception of usability, their interaction and impact on task performance. PLoS ONE, 13(8), e0199661. Longo, L. & Barrett, S. (2010a). A computational analysis of cognitive effort. In Proceedings of Intelligent Information and Database Systems, Second International Conference, ACIIDS, Hue City, Vietnam, March 24–26, 2010. Part II (pp. 65–74). Longo, L. & Barrett, S. (2010b). Cognitive effort for multi-agent systems. In Proceedings of Brain Informatics, International Conference, BI 2010, Toronto, ON, Canada, August 28–30, 2010 (pp. 55–66). Longo, L., & Chiara Leva, M. (Eds.). (2017). Human mental workload: Models and applications. Cham: Springer. Longo, L., & Dondio, P. (2014). Defeasible reasoning and argument-based systems in medical fields: An informal overview. In IEEE 27th International Symposium on Computer-Based Medical Systems (pp. 376–381). IEEE. Longo, L., & Pandian, H. (in press). A systematic review on mental workload. Matthews, G., & Reinerman-Jones, L. (2017). Workload assessment: How to diagnose workload issues and enhance performance. Santa Monica, CA: Human Factors and Ergonomics Society. Matthews, G., Reinerman-Jones, L. E., Barber, D. J., & Abich, J. (2015). The psychometrics of mental workload: Multiple measures are sensitive but divergent. Human Factors, 57(1), 125–143. Mehler, B., Reimer, B., Coughlin, J. F., & Dusek, J. A. (2009). Impact of incremental increases in cognitive workload on physiological arousal and performance in young adult drivers. Transportation Research Record, 2138, 6–12. Mehta, R. K., & Parasuraman, R. (2013). Neuroergonomics: A review of applications to physical and cognitive work. Frontiers in Human Neuroscience, 7, 889. Menkes, D. L., & Pierce, R. (2019). Needle EMG muscle identification: A systematic approach to needle EMG examination. Clinical Neurophysiology Practice, 4, 199–211. Merletti, R., & Muceli, S. (2019). Tutorial. Surface EMG detection in space and time: Best practices. Journal of Electromyography and Kinesiology, 49, 102363. Mesin, L. (2020). Crosstalk in surface electromyogram: Literature review and some insights. Physical and Engineering Sciences in Medicine, 43, 481–492. Moray, N. (Ed.). (1979), Mental workload: Theory and measurement. New York: Plenum. Moray, N., Eisen, P., Monet, L., & Turksen, I.B. (1988). Fuzzy analysis of skill and rule-based mental workload. In P. A. Hancock & N. Meshkati, (Eds.), Human mental workload (pp. 289–304). Amsterdam: North-Holland. Morgan, J.F., & Hancock. P.A. (2011). The effect of prior task loading on mental workload: An example of hysteresis in driving. Human Factors, 53(1), 75–86. Mouloua, M., Gilson, R., Kring, J., & Hancock, P.A. (2001, October). Workload, situation awareness, and teaming issues for UAV/UCAV operations. Proceedings of the Human Factors and Ergonomics Society, 43, 162–165. Moustafa, K., & Longo, L. (2018). Analysing the impact of machine learning to model subjective mental workload: A case study in third-level education. In International Symposium on Human Mental Workload: Models and Applications (pp. 92–111). Cham: Springer. Moustafa, K., Luz, S., & Longo, L. (2017). Assessment of mental workload: a comparison of machine learning methods and subjective assessment techniques. In International Symposium on Human Mental Workload: Models and Applications (pp. 30–50). Cham: Springer. Mulder, L. J. M., & Mulder, G. (1987). Cardiovascular reactivity and mental workload. In R. I. Kitney & O. Rompelman (Eds.),
MENTAL WORKLOAD The beat-by-beat investigation of cardiovascular function (pp. 216–253). Oxford: Clarendon Press. Myrtek, M., Deutschmann-Janicke, E., Strohmaier, H., Zimmermann, W., Lawerenz, S., Brügner, G., & Müller, W. (1994). Physical, mental, emotional, and subjective workload components in train drivers. Ergonomics, 37, 1195–1203. Navon, D., & Gopher, D. (1979). On the economy of the human-processing system. Psychological Review, 86(3), 214–255. Ng, J.K., Kippers, V., & Richardson, C. A. (1998). Muscle fiber orientation of abdominal muscles and suggested surface EMG electrode positions. Electromyography and Clinical Neurophysiology, 38 (1), 51–58. Norman, D. A. & Bobrow, D. G. (1975). On data-limited and resources-limited processes. Cognitive Psychology, 7, 44-64. Noy, Y. I., Lemoine, T. L., Klachan, C. & Burns, P. C. (2004). Task interruptability and duration as measures of visual distraction. Applied Ergonomics, 35, 207–213. Nygren, T. E. (1991). Psychometric properties of subjective workload techniques: implications for their use in the assessment of perceived mental workload. Human Factors, 33, 17–33. O’Donnell, R., & Eggemeier, F. T. (1986). Workload assessment methodology. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. Vol. II, Cognitive processes and performance. New York: Wiley. Parasuraman, R., Cosenzo, K. A., & De Visser, E. (2009). Adaptive automation for human supervision of multiple uninhabited vehicles: Effects on change detection, situation awareness, and mental workload. Military Psychology, 21(2), 270–297. Parasuraman, R., & Hancock, P. A. (2004). Neuroergonomics: Harnessing the power of brain science for human factors and ergonomics. Human Factors and Ergonomics Society Bulletin, 47(12), 4–5. Parasuraman, R., Mouloua, M., & Hilburn, B. (1999). Adaptive aiding and adaptive task allocation enhance human-machine interaction. In M. W. Scerbo & M. Mouloua (Eds.), Automation technology and human performance: Current research and trends (pp. 119–123). Mahwah, NJ: Erlbaum. Parasuraman, R., & Wickens, C. D. (2008). Humans: Still vital after all these years of automation. Human Factors, 50(3), 511–520. Parasuraman, R., & Wilson, G. (2008). Putting the brain to work: Neuroergonomics past, present, and future. Human Factors, 50, 468–474. Pauzié, A. (2008). A method to assess the driver mental workload: The driving activity load index (DALI). IET Intelligent Transport Systems, 2(4), 315–322. Pereda, A. E. (2014). Electrical synapses and their functional interactions with chemical synapses. Nature Reviews Neuroscience, 15(4), 250–263. Peterson, D. A., & Kozhokar, D. (2017). Peak-end effects for subjective mental workload ratings. Proceedings of the Human Factors and Ergonomics Society, 2052–2056. Petrusic, W. M., & Cloutier, P. (1992). Metacognition in psychophysical judgment: an unfolding view of comparative judgments of mental workload. Perception and Psychophysics, 51, 485–499. Prinzel III, L. J., Freeman, F. G., Scerbo, M. W., Mikulka, P. J., & Pope, A.T. (2003). Effects of a psychophysiological system for adaptive automation on performance, workload, and the event-related potential P300 component. Human Factors, 45(4), 601–614. Raufi, B. (2019, November). Hybrid Models of Performance Using mental workload and usability features via supervised machine learning. In International Symposium on Human Mental Workload: Models and Applications (pp. 136–155). Cham: Springer. Recarte, M. A., & Nunes, L. (2002). Mental load and loss of control over speed in real driving. Towards a theory of attentional speed control. Transportation Research Part F, 5, 111–122. Reid, G. B., & Nygren, T. E. (1988). The subjective workload assessment technique: A scaling procedure for measuring mental workload. In, P. A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 185–218). Amsterdam: Elsevier.
225 Rizzo, L .M., & Longo, L. (2017). Representing and inferring mental workload via defeasible reasoning: a comparison with the NASA task load index and the workload profile. In Proceedings of the 1st Workshop on Advances in Argumentation in Artificial Intelligence AI^3 co-located with the XVI International Conference of the Italian Association for Artificial Intelligence AI*IA (pp 126–140). Rizzo, L., & Longo, L. (2018, November). Inferential models of mental workload with defeasible argumentation and non-monotonic fuzzy reasoning: A comparative study. In AI3 @ AI* IA (pp. 11–26). Roscoe, A. H. (1984). Assessing pilot workload in flight. In Flight Test Techniques (AGARD-CP-373). Paris: AGARD. Roth, W. T. (1983). A comparison of P300 and the skin conductance response. In A.W.K. Gaillard & W. Ritter (Eds.), Tutorials in ERP research—endogenous components (pp. 177–199). Amsterdam: North-Holland. Rouse, W. B., Edwards, S. L., & Hammer, J. M. (1993). Modeling the dynamics of mental workload and human performance in complex systems. IEEE Transactions on Systems, Man, and Cybernetics, 23(6), 1662–1671. Schaap, T. W., Van der Horst, A. R. A., Van Arem, B. & Brookhuis, K. A. (2008). Drivers’ reactions to sudden braking by lead car under varying workload conditions; towards a driver support system. IET Intelligent Transport Systems, 2, 249–257. Schaap, T. W., Van der Horst, A. R. A., Van Arem, B. & Brookhuis, K. A. (2013). The relationship between driver distraction and mental workload. In M. A. Regan, J. D. Lee and T. W. Viktor (Eds.), Driver distraction and inattention: advances in research and countermeasures (Vol. 1, pp. 63–80). Farnham: Ashgate. Schlegel, R. E. (1993). Driver mental workload. In B. Peacock & W. Karwowski (Eds.), Automotive ergonomics (pp. 359–382). London: Taylor & Francis. Schomer, D. L., & Lopes da Silva, F. H. (Eds.). (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields. Philadelphia, PA: Lippincott Williams & Wilkins. Shaffer, F., & Ginsburg, J. P. (2017). An overview of heart rate variability metrics and norms. Frontiers in Public Health, 5, 258. Shier, D., Butler, J., & Lewis, R. (2006). Hole’s essentials of human anatomy and physiology. Boston, MA: McGraw-Hill. Simon, H. A. (1996). The sciences of the artificial. (3rd ed.). Cambridge, MA: MIT Press. Solís-Marcos, I., & Kircher, K. (2019). Event-related potentials as indices of mental workload while using an in-vehicle information system. Cognition Technology & Work, 21(1), 55–67. Stauss, H. M. (2003). Heart rate variability. American Journal of Physiology—Regulatory Integrative and Comparative Physiology, 285, R927–R931. Sur, S., & Sinha, V. K. (2009). Event-related potential: An overview. Industrial Psychiatry Journal, 18(1), 70–73. Tattersall, A. J., & Foord, P. S. (1996). An experimental evaluation of instantaneous self-assessment as a measure of workload. Ergonomics, 39(5), 740748. Temprado, J. J., Zanone, P. G., Monno, A., & Laurent, M. (2001). A dynamical framework to understand performance trade-offs and interference in dual tasks. Journal of Experimental Psychology: Human Perception and Performance, 27 (6), 1303–1313. Thornton, C., Braun, C., Bowers, C., & Morgan, B. B. (1992). Automation effects in the cockpit: A low-fidelity investigation. Proceedings of the Human Factors Society, 36, 30–34. Tsang, P. S., & Velazquez, V.L. (1996). Diagnosticity and multidimensional subjective workload ratings. Ergonomics, 39(3), 358–381. Tsang, P. S., & Vidulich, M. A. (1989). Cognitive demands of automation in aviation. In R. S. Jensen (Ed.). Aviation psychology (pp. 66–95), Gower, Aldershot. Tsang, P. S., & Vidulich, M. A. (2006). Mental workload and situation awareness. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 243–268). Hoboken, NJ: Wiley.
226 Tsang, P. S., & Wilson, G. (1997). Mental workload. In G. Salvendy (Ed.). Handbook of human factors and ergonomics (2nd ed., pp. 417–449). New York: Wiley. Van Acker, B. B., Parmentier, D. D., Vlerick, P., & Saldien, J. (2018). Understanding mental workload: From a clarifying concept analysis toward an implementable framework. Cognition Technology & Work, 20(3), 351–365. Van Winsum, W., Martens, M., & Herland, L. (1999). The effects of speech versus tactile driver support messages on workload, driver behaviour and user acceptance. Report TM–01–D009. Soesterberg, The Netherlands: TNO Human Factors Research Institute. Verwey, W. B., & Veltman, H. A. (1996). Detecting short periods of elevated workload. a comparison of nine workload assessment techniques. Journal of Experimental Psychology: Applied, 2(3), 270–285. Vidulich, M. A., & Tsang, P. S. (1986). Techniques of subjective workload assessment: a comparison of SWAT and the NASA-Bipolar methods. Ergonomics, 29(11), 1385–1398. Vidulich, M.A. (1989). The use of judgment matrices in subjective workload assessment: The subjective workload dominance (SWORD) technique. Proceedings of the Human Factors Society, 33, 1406–1410. Vidulich, M.A. & Tsang, P.S. (1986). Techniques of subjective workload assessment: A comparison of SWAT and the NASA-bipolar methods. Ergonomics, 29(11),1385–1398. Vidulich, M. A., & Tsang, P. S. (2012). Mental workload and situation awareness. In G. Salvendy & W. Karwowski (Eds.), Handbook of human factors (4th ed.). Hoboken, NJ: Wiley. Vidulich, M. A., & Wickens, C. D. (1986). Causes of dissociation between subjective workload measures and performance: Caveats for the use of subjective assessments. Applied Ergonomics, 17, 291–296. Wang, W., & Siau, K. (2019). Artificial intelligence, machine learning, automation, robotics, future of work and future of humanity: A review and research agenda. Journal of Database Management (JDM), 30(1), 61–79. Wang, Z. T., Zhou, Q. X., & Wang, Y. (2020). Study on mental workload assessment of tank vehicle operators. In H. Ayaz (Ed.), Advances in neuroergonomics and cognitive engineering (Vol. 953, pp. 173–184). Cham: Springer International Publishing. Wei, Z., Zhuang, D., Wanyan, X., Liu, C., & Zhuang, H. (2014). A model for discrimination and prediction of mental workload of aircraft cockpit display interface. Chinese Journal of Aeronautics, 27(5), 170–177. Wickens, C. D. (1979). Measures of workload, stress and secondary tasks. In Mental workload (pp. 79–99). Boston, MA: Springer. Wickens, C. D. (1980). The structure of attentional resources. In R. Nickerson (ed.), Attention and performance, Vol. VIII (pp. 239–257). Hillsdale, NJ: Lawrence Erlbaum. Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 63–102). New York: Academic Press. Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science, 3(2), 159–177. Wickens, C. D. (2008). Multiple resources and mental workload. Human Factors, 50(2), 449–454.
HUMAN FACTORS FUNDAMENTALS Wickens, C. D., Gempler, K., & Morphew, M.E. (2000). Workload and reliability of predictor displays in aircraft traffic avoidance. Transportation Human Factors, 2(2), 99–126. Wierwille, W. W., & Casali, J. G. (1983). A validated rating scale for global mental workload measurement applications. Proceedings of the Human Factors Society, 27(2), 129–133. Wierwille, W. W., & Gutmann, J. C. (1978). Comparison of primary and secondary task measures as a function of simulated vehicle dynamics and driving conditions. Human Factors, 20, 233–244. Wierwille, W. W., Gutmann, J. C., Hicks, T. G., & Muto, W. H. (1977). Secondary task measurement of workload as a function of simulated vehicle dynamics and driving conditions. Human Factors, 19, 557–565. Williams, R. W., & Herrup, K. (1988). The control of neuron number. Annual Review of Neuroscience,11(1), 423–453. Wilson, M. R., Poolton, J. M., Malhotra, N., Ngo, K., Bright, E. & Masters, R. S. W. (2011). Development and validation of a surgical workload measure: The Surgery Task Load Index (SURG-TLX). World Journal of Surgery, 35, 1961–1969. Xie, B., & Salvendy, G. (2000a). Prediction of mental workload in single and multiple tasks environments. International Journal of Cognitive Ergonomics, 4, 213–242. Xie, B., & Salvendy, G. (2000b). Review and reappraisal of modelling and predicting mental workload in single and multi-task environments. Work and Stress, 14, 74–99. Yan, S., Tran, C. C., Chen, Y., Tan, K., & Habiyaremye, J. L. (2017). Effect of user interface layout on the operators’ mental workload in emergency operating procedures in nuclear power plants. Nuclear Engineering and Design, 322, 266–276. Yan, S., Tran, C. C., Wei, Y. & Habiyaremye, J. L. (2019). Driver’s mental workload prediction model based on physiological indices. International Journal of Occupational Safety and Ergonomics, 25(3), 476–484. Yeh, Y., & Wickens, C. D. (1988). Dissociation of performance and subjective measures of workload. Human Factors, 30, 111–120. Yin, Z., & Zhang, J. (2018). Task-generic mental fatigue recognition based on neurophysiological signals and dynamical deep extreme learning machine. Neurocomputing, 283, 266–281. Young, M. S., Brookhuis, K., Wickens, C. D., & Hancock, P. A. (2015). State of the science: Mental workload in ergonomics. Ergonomics, 58(1), 1–17. Young, M. S. & Stanton, N. A. (2002). Malleable attentional resources theory: A new explanation for the effects of mental underload on performance. Human Factors, 44(3), 365–375. Young, M. S. & Stanton, N. A. (2004). Taking the load off: investigations of how adaptive cruise control affects mental workload. Ergonomics, 47(9), 1014–1035. Young, M. S., & Stanton, N. A. (2007). Miles away. Determining the extent of secondary task interference on simulated driving. Theoretical Issues in Ergonomics Science, 8(3), 233–253. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. Zijlstra, F. R. H. (1993). Efficiency in work behavior. A design approach for modern tools. PhD thesis, Delft University of Technology. Delft, The Netherlands: Delft University Press.
CHAPTER
8
SOCIAL AND ORGANIZATIONAL FOUNDATION OF ERGONOMICS: MULTI-LEVEL SYSTEMS APPROACHES Pascale Carayon University of Wisconsin-Madison Madison, Wisconsin
1
INTRODUCTION
2
SYSTEMS PERSPECTIVES IN HUMAN FACTORS IN ERGONOMICS 2.1 2.2
3
Systems Approach at the Core of Human Factors and Ergonomics Wilson’s Human Factors and Ergonomics Systems Approach
227 227 227
5 228
2.3
Microergonomics, Mesoergonomics, and Macroergonomics
228
2.4
System of Systems
228
DESIGN OF WORK SYSTEMS 3.1
4
6
228
Social and Psychosocial Aspects of Work Systems: Healthy Workplace
229
3.2
Job and Work Design
229
3.3
Job Stress Models
229
1 INTRODUCTION The importance and influence of the social and organizational context in human factors and ergonomics (HFE) are widely recognized. Wilson (2000, p. 557) defines HFE “as the theoretical and fundamental understanding of human behavior and performance in purposeful interacting socio-technical systems, and the application of that understanding to the design of interactions in the context of real settings.” HFE focuses on interactive behavior, i.e., what people do in various work systems with interconnected parts, such as tasks, tools and technologies, other people, the physical environment, and the organizational context. In HFE approaches, human behavior in complex interacting systems has a central role. In this chapter, we argue that these behaviors are deeply immersed in and cannot be separated from their social and organizational context and its multiple, nested systems. We review various systems approaches developed and used in HFE, and describe participatory methods for designing complex, multi-level sociotechnical systems. 2 SYSTEMS PERSPECTIVES IN HUMAN FACTORS IN ERGONOMICS 2.1 Systems Approach at the Core of Human Factors and Ergonomics According to the International Ergonomics Association, human factors (or ergonomics) (HFE) is defined as “the scientific discipline concerned with the understanding of interactions among humans and other elements of a system, and the profession that applies theory, principles, data, and methods to design in order
7
MACROERGONOMICS AND SOCIOTECHNICAL SYSTEMS
230
4.1
Sociotechnical Systems Theory: Historical Perspective and Emerging Issues
230
4.2
Macroergonomics
231
HUMAN FACTORS AND ERGONOMICS IN THE LARGE SOCIAL CONTEXT
231
5.1
Community Ergonomics
231
5.2
Human Factors and Ergonomics and Sustainability
232
SOCIAL AND ORGANIZATIONAL HUMAN FACTORS AND ERGONOMICS METHODS
232
6.1
Participatory Ergonomics
232
6.2
Multiple Goals and Perspectives
233
CONCLUSION
REFERENCES
233 233
to optimize human well-being and overall system performance” (www.iea.cc). This definition clearly puts the “system” at the core of HFE. However, the systems perspective embedded in the IEA definition of HFE has not always been given sufficient attention, even from HFE specialists themselves. In 2000, John Wilson (2000) proposed that HFE should be “the discipline to understand modern interacting systems” with a focus on interactions between people and various elements of the system. Wilson developed an interacting systems model for ergonomics that lists eight types of interactions: (1) interface interactions (such as people interacting with various hardware, software and other artefacts); (2) task interactions (e.g., evaluating the goals of tasks and their workload); (3) setting interactions or interactions with the physical environment; (4) logistics interaction or interactions with supplies and the overall supply chain; (5) organization interactions (e.g., characterizing the person’s role in the organization); (6) cooperation interactions or understanding interaction between people; (7) temporal and spatial interactions; and (8) contextual interactions or interactions of people with the broad external environment, e.g., socio-economic, political environment. The objective of HFE is therefore to use a holistic approach to understand, analyze, evaluate, and improve the “complex interacting systems involving people” (Wilson, 2000). In an effort initiated by the International Ergonomics Association on the future of HFE, Dul and colleagues (2012) redefined the focus of HFE by highlighting (1) its dual objective of improving both performance and well-being; (2) its systems approach; and (3) its contribution to system design. The authors propose a renewed emphasis on the systems approach at the core of HFE, 227
228
HUMAN FACTORS FUNDAMENTALS
and emphasizes that this systems approach needs to consider the various interacting system elements and the interconnected levels of system. These themes are further developed in the following sections. 2.2 Wilson’s Human Factors and Ergonomics Systems Approach The landmark paper by Wilson (2014) defined the six characteristics of an HFE systems approach that are described in Table 1. 2.3 Microergonomics, Mesoergonomics, and Macroergonomics Hal Hendrick (1986, 1991) introduced the concept of macroergonomics that is defined conceptually as “a top-down sociotechnical systems approach to organizational and, ultimately, work systems design and the design of related human-machine, user, and human-environment interfaces” (Hendrick, 1991, p. 747). Macroergonomics was proposed to broaden the scope of HFE and is often described in contrast to microergonomics. It is important to recognize that microergonomics and macroergonomics are related to each other. For instance, one can map out the relationships between microergonomic issues and macroergonomics concepts (Zink, 2000). Physical stressors in the workplace, such as high biomechanical load (e.g., microergonomic issue), are related to the
Table 1 Human Factors and Ergonomics Systems Approach of Wilson (2014) Characteristics of HFE systems approach Systems focus
Context Interactions
Holism
Emergence
Embedding
Description HFE focuses on the system and its various interconnected elements and levels Human behavior and performance take place in a context or setting HFE focuses on interactions between human, technical, information, social, political, economic, and organizational components of the system and aims to optimize these interactions by integrating the components Holistic perspective of people: HFE considers physical, cognitive, and social, and emotional characteristics of people in order to enhance their interactions with other elements of the system. Holistic perspective of outcomes: HFE considers impact of system design on both performance and (physical, mental, emotional) well-being of people Because systems have emergent properties, HFE needs to consider the temporal, dynamic evolution of how people interact with other elements of the system HFE professionals are embedded in organizational structures and should collaborate with each other within and across organizations
Source: Based on Wilson, 2014.
way work is organized and structured (e.g., macroergonomics concept). The (microergonomic) cognitive characteristics of technologies such as information load are defined during the technology design process and, therefore, influenced by macroergonomics concepts such as extent of user participation in design. Zink (2000) argues for the need to further deepen our understanding of how microergonomic issues are related to macroergonomics and more broadly management concepts; this will lead to greater acceptance and usage of HFE methods. See below for additional description of macroergonomics. Karsh and colleagues (2014) have proposed the concept of mesoergonomics to enhance our HFE systems approaches and further bridge the gap between microergonomics and macroergonomics. Mesoergonomics is a way of defining and identifying relationship between two levels of work system analysis, i.e., the micro and the macro levels. The mesoergonomic framework is used to analyze complex sociotechnical systems and involves four steps: (1) defining the purpose of the meso-level analysis; (2) defining the HFE independent and dependent variables to be used in the analysis; (3) defining the type of meso-level analysis to be used; and (4) interrogating system levels in line with the type of analysis (whole system, multi-level) with a focus on identifying relationships between system levels. Karsh and colleagues (2014) describe two examples of mesoergonomic analysis for identifying system levels that influence safety of medication administration and those that influence infection prevention practices in hospitals. Mesoergonomics helps to clarify how factors at different system levels interact and influence each other; this is in line with the HFE approach advocated by Wilson (2000) with its focus on system interactions. 2.4 System of Systems In an increasingly interconnected socio-economic-cultural environment, systems are becoming more and more linked; this has led to the concept of system of systems. We know that systems are composed of sub-systems (see Table 1); this is often thought of as a set of nested systems. The concept of system of systems assumes that independently functioning and managed systems need to collaborate in order to achieve a goal. The systems that comprise a system of systems possess “inter-operational dependence” (i.e., they need to work together or collaborate) and “managerial independence” (i.e., they are managed and organized separately and differently) (Siemieniuch & Sinclair, 2014). For instance, the health care delivery system can be conceptualized as a system of systems. It is comprised of multiple systems, such as hospitals, primary care and specialty clinics, long-term care facilities, patient advocacy groups, insurance companies, etc. These component systems are likely to have different management, structures, technologies, and cultures (managerial independence); but they may need to interact when, for instance, a patient is transferred from one facility to another one, e.g., discharge from a hospital to a rehabilitation facility (inter-operational dependence). In the context of system of systems, interfaces between the systems are what matters. From an HFE perspective, this means that interfaces need to be designed in order to foster collaboration, coordination and communication between the various systems. There is also a need to build trust between the different systems in order to foster collaboration.
3 DESIGN OF WORK SYSTEMS Many HFE professionals and researchers today embrace a definition of work that includes both its physical and cognitive aspects (Marras & Hancock, 2014) but other aspects of work
SOCIAL AND ORGANIZATIONAL FOUNDATION OF ERGONOMICS: MULTI-LEVEL SYSTEMS APPROACHES
are important to consider, including social and psychosocial aspects of work. In this section, we describe various theories and models for psychosocial work design. 3.1 Social and Psychosocial Aspects of Work Systems: Healthy Workplace In the past twenty years, there has been increasing attention to the role of the workplace in health and well-being. Concepts such as a healthy workplace and organizational health have gained attention as work organization has been increasingly linked to workers’ health, safety, and well-being (Cooper & Cartwright, 1994; Lim & Murphy, 1999; Murphy & Cooper, 2000; Sauter, Lim, & Murphy, 1996). Leaders, managers, and supervisors make decisions about workplace practices, such as work-life balance, recognition, and employee involvement; these decisions have a major impact on employee well-being as well as organizational outcomes (Grawitch, Gottschalk, & Munz, 2006). Organizing work that has a dual benefit, for both workers and organizations, has been increasingly emphasized. Healthy workplace practices can improve physical and mental health and motivation, as well as lead to reductions in absenteeism and turnover and improvement in customer satisfaction and overall organizational performance (Grawitch et al., 2006; Lim & Murphy, 1999). The World Health Organization has developed a model that defines four categories of factors contributing to a healthy workplace: (1) physical work environment; (2) psychosocial work environment; (3) personal health resources; and (4) enterprise community involvement (World Health Organization, 2010a, 2010b). Many aspects of the physical work environment can affect worker physical safety and health, such as exposure to chemical, physical, or biological hazards, and physical ergonomic stressors (e.g., excessive force, awkward postures, excessive lifting). The factors of the psychosocial work environment are often called workplace stressors and relate to poor work organization, negative organizational culture, and other daily practices in the organization that affect the mental and physical well-being of workers. Factors at the individual level (e.g., physical inactivity, poor diet, smoking) also affect workers; health and well-being and are influenced by what individual workers do as well as resources provided by companies, such as food choices in cafeterias and vending machines, medical services, and access to fitness facilities. Finally, the interface and linkages between companies and the communities affect the broader physical and social environment of the broader community (e.g., community health screening, organizational initiatives to control pollution emissions), and therefore the physical and mental health, safety, and well-being of workers. The WHO model of healthy workplaces emphasizes multiple system levels of action: the individual worker, units or departments within a company, the company itself, and the broad community. The model also emphasizes the physical and psychosocial aspects of the work environment as key determinants of worker health and safety. In the rest of this section, we describe various approaches to improve the psychosocial work environment: job design models and theories, and models of job stress. 3.2 Job and Work Design Scientific Management by Frederick Taylor was an early organizational initiative aimed at job design with the objective of simplifying jobs and enhancing efficiency (Taylor, 1911). Scientific Management has had a major influence on how work is organized, even today. Job simplification and process standardization are two examples of the application of Scientific Management. Scientific Management was based on a series of
229
assumptions about workers, their needs, and motivation, which were based on Theory X (McGregor, 1960). Theory X includes the following assumptions: 1. 2. 3.
The average human being has an inherent dislike of work and will avoid it if possible. Most people must be coerced, controlled, directed, or threatened with punishment to get them to put forth adequate effort. The average human being prefers to be directed, wishes to avoid responsibility, has little ambition, wants security above all.
In contrast, Theory Y has the following assumptions that are in line with modern approaches of job design: 1. 2. 3. 4. 5. 6.
Expenditure of physical and mental effort in work is as natural as play or rest. External control is not the only method of obtaining effort. The most significant reward that can be offered in order to obtain commitment is the satisfaction of the individual’s self-actualizing needs. The average human being learns, under proper conditions, not only to accept but also to seek responsibility. Many more people are able to contribute creatively to the solution of organizational problems than do so. At present, the potentialities of the average person are not being fully used.
Theory Y assumptions underline modern models of job and work design such as the Job Characteristics Theory (Hackman & Lawler, 1971; Hackman, Oldham, Janson, & Purdy, 1975), and the job enrichment theory of Herzberg (1974, 2003). In an extensive review of the literature, Parker and colleagues (2017) propose a multi-level model of multiple factors that influence the design of jobs. External factors such as the role of national institution and institutional regimes can influence decisions and choices that managers make about how to organize work, which can then influence many job characteristics, such as job autonomy and task variety. At the organizational level, factors that influence job design include strategy, operational uncertainty, technology, and organizational design. For instance, in a strategy focused on gaining a competitive advantage in a mass market, managers are likely to adopt approaches in line with Scientific Management principles, therefore, leading to job simplification. On the contrary, a company whose strategy is to develop products or services for “niche markets” may be adopting operational processes of high involvement human resources practices and, therefore, provide employees with autonomy and encourage them to innovate and develop knowledge and skills. This multi-level model of influences on work design is in line with HFE systems approaches reviewed earlier; it is important to understand work at the “sharp end” in light of the external environment of the company or organization, as well as the organizational context and decisions made by managers. 3.3 Job Stress Models Job stress is an umbrella term that describes working conditions (also called job stressors) that can produce multiple strain outcomes, e.g., behavioral (e.g., turnover, absenteeism), psychological (e.g., anxiety, depression, dissatisfaction), and physical (e.g., musculoskeletal discomfort and pain, hypertension) strain outcomes. Over the years, multiple job stress models have been developed to identify work-related factors in
230
need of redesign. Many conceptual frameworks of job stress provide list of job stressors, For instance, in a classic paper published in the 1970s, Cooper and Marshall (1976) list five categories of job stressors: (1) factors intrinsic to the job (e.g., overload); (2) role-related stressors (e.g., role ambiguity, role conflict); (3) career-related stressors (e.g., lack of promotion); (4) poor or difficult relationships at work; and (5) stressors related to organizational structure and climate (e.g., lack of participation in decision making). Models of job stress have often been complemented by survey tools that assess multiple job stressors. The NIOSH Job Stress Questionnaire is one example of an extensive survey that assesses multiple job stressors such as lack of job control, job demands, role stressors, and job insecurity (Hurrell & McLaney, 1988). The chapter by Kalimo et al. (1997) in a previous edition of the Handbook of Human Factors and Ergonomics provides additional information on job stressors and their measurement. Several models of job stress propose combinations of work factors as key determinants of stress and subsequent psychological, behavioral and physical strain outcomes. The Job Strain model (Karasek, 1979) is a well-known model that defines two dimensions, i.e., demands and job decision latitude, which in combination can produce negative health and well-being outcomes. According to the Job Strain model, it is the combination of high demands (e.g., high workload) and low job decision latitude (e.g., little job control) that precipitates negative outcomes. The model was later extended to three dimensions with the addition of social support (Johnson & Hall, 1988; Karasek & Theorell, 1990). The Job Strain model has been extensively used in multiple studies around the world; there is robust evidence for the combined impact of high demands, low decision latitude and low social support on various psychological, behavioral, and physical stress outcomes such as cardiovascular health (Sara et al., 2018) and depression (Madsen et al., 2017). According to the effort-reward model of Siegrist (1996), the imbalance between high effort and low reward, or the lack of reciprocity between costs and gains, creates emotional distress, which can lead to various strain reactions. Effort involves two dimensions: (1) extrinsic (e.g., demands, work pressure, interruptions); and (2) intrinsic (e.g., individual need for control). Rewards include money, self-esteem, and status control. In a similar approach to that of the Job Strain model, it is not the individual factors of effort and reward that are critical, but their combination. Siegrist and colleagues (Bosma, Peter, & Siegrist, 1998; Siegrist, 1996) have shown that the combined effect of effort-reward imbalance affects cardiovascular health. The job demands-resources model (Demerouti, Bakker, Nachreiner, & Schaufeli, 2001) categorizes work factors into (1) job demands, and (2) resources, and suggests that these factors have a different impact on outcomes, in particular the different components of burnout, i.e., emotional exhaustion, depersonalization and feelings of reduced personal accomplishment (or professional efficacy). Job demands are “physical, social, or organizational aspects of the job that require sustained physical or mental effort” (Demerouti et al., 2001, p. 501); workload and time pressure are examples of job demands. Job resources are “physical, psychological, social, or organizational aspects of the job that may do any of the following: (a) be functional in achieving work goals; (b) reduce job demands at the associated physiological and psychological costs; (c) stimulate personal growth and development” (Demerouti et al., 2001, p. 501). Examples of job resources include job control, participation in decision making, task variety, and social support. According to the job demands-resources model, high job demands primarily influence emotional exhaustion and job resources primarily influence engagement at work.
HUMAN FACTORS FUNDAMENTALS
4 MACROERGONOMICS AND SOCIOTECHNICAL SYSTEMS Increasingly, the HFE discipline is emphasizing the systems perspective. Dul and colleagues (2012) include the systems approach as one of the three pillars of HFE (the other two core elements are design-driven and joint optimization of performance and well-being). Macroergonomics, or organizational ergonomics, is the domain of HFE that emphasizes the organizational context and other system elements that can influence human performance and other important outcomes, such as safety and quality of working life. Macroergonomics is anchored in the Sociotechnical Systems theory (Trist, 1981; Trist & Bamforth, 1951). 4.1 Sociotechnical Systems Theory: Historical Perspective and Emerging Issues Trist and Bamforth (1951) documented changes that occurred in a coal mine when the technology was upgraded and allowed the removal of coal on long walls in contrast to short-wall mining previously. This landmark paper set the foundation for the Sociotechnical Systems theory. Trist and Bamforth showed the profound impact of the long-wall method of coal mining (i.e., new technology) on coal miners’ performance, quality of working life, and health. The paper highlights the interdependencies between the social system and the technical system, therefore, paving the way for the concept of sociotechnical systems and the idea of joint optimization of the social and technical systems. The original Sociotechnical Systems theory was developed by a group of researchers and practitioners at the Tavistock Institute for Human Relations with early studies done in the British coal mining industry (Trist, 1981), later in the textile industry and expanding worldwide. The Sociotechnical Systems theory relied heavily on the open systems concept of Van Bertalanffy (1950) and, over time, emphasized the need to look at multiple system levels: (1) the primary work systems where activities are carried out in a sub-system of a whole organization, such as a unit or a department; (2) whole organization systems such as entire corporations; and (3) macrosocial systems such as systems in communities and industrial sectors that operate at the society level (Trist, 1981). At the primary work system level, the Sociotechnical Systems theory emphasized the concept of autonomous work groups (Cummings, 1978). Autonomous work groups have been implemented in many industries around the world with some initial success (Pasmore, Francis, Haldeman, & Shani, 1982). Recent research on autonomous work groups shows the critical role of the organizational context and the implementation process in their (lack of) success (Parker, Morgeson, & Johns, 2017). Cherns (1976, 1987) codified the sociotechnical system design principles in 1976, and revised them in 1987. Clegg (2000) proposed a revision of the sociotechnical system design principles, which were combined into: (1) meta-principles; (2) content; and (3) process or implementation. Meta-principles represent a “worldview of design” and include the idea that design itself is systemic, involves making choices, and is an extended social process. Content principles are about the design itself; for instance, systems should be simple in design, variances should be controlled at the source, and tasks and processes should be flexibly specified (i.e., minimal criteria specification). Process principles are about the “how,” i.e., how the new system is designed. This includes the key aspects of participation, and adequate resources and support for design. The sociotechnical system design principles rely on various methods or approaches, such as variance analysis (Pasmore, 1988) and organizational scenarios (Clegg et al., 1996). Sommerville and colleagues have developed various sociotechnical systems methods for designing computer systems (Baxter &
SOCIAL AND ORGANIZATIONAL FOUNDATION OF ERGONOMICS: MULTI-LEVEL SYSTEMS APPROACHES
Sommerville, 2011; Sommerville et al., 2012; Sommerville & Dewsbury, 2007). The continually increasing rate of technological development is challenging our ability to develop safe, satisfactory, and effective sociotechnical systems. Pasmore and colleagues (2019) have recently proposed the “next generation” of sociotechnical systems design. This involves design at multiple levels (i.e., strategic design, operating system, work), and design of multiple interconnected sub-systems (ecosystem design, organization design, technical system design, social system design). The “next gen” sociotechnical systems design is not a one-shot process, but a continuous redesign process that requires “balanced optimization” in order to achieve positive outcomes for individuals, organizations and society at large. Continuous system design is also a key element of the macroergonomics approach for complex sociotechnical systems (Carayon, 2006). 4.2 Macroergonomics Macroergonomics is “the design of work systems which focuses on organization-system interaction” (Kleiner, 2006, p. 81). Therefore, it is a domain of HFE that emphasizes the organizational context in which work occurs (Hendrick, 1991; Hendrick & Kleiner, 2001; Kleiner, 2004). Hal Hendrick, the “father of macroergonomics,” argued for the need for HFE to integrate organizational design and management (ODAM) factors in our research and project work (Hendrick, 1991, 2008). Incorporating ODAM factors in HFE has taken multiple forms, such as defining both micro- and macro-level aspects of the work system (Kleiner, 2008; Smith & Carayon-Sainfort, 1989), linking safety climate to the work system (Murphy, Robertson, & Carayon, 2014), and integrating training in HFE interventions (Robertson, 2005; Robertson, Ciriello, & Garabet, 2013). Macroergonomics has developed methods that incorporate the organizational context in HFE work system analysis; these include: the Macroergonomic Analysis of Structure (MAS) (Haro & Kleiner, 2008), the Macroergonomic Analysis of Structure and macroergonomic analysis and
Table 2
231
design (MEAD) (Hendrick & Kleiner, 2001; Kleiner, 2006), the Macroergonomic Organizational Questionnaire Survey (MOQS) (Carayon & Hoonakker, 2004; Hoonakker & Carayon, 2006), and participatory ergonomics (Noro & Imada, 1991). The major macroergonomic approach of participatory ergonomics is described in Section 6.1. Macroergonomics builds on the Sociotechnical Systems theory and has integrated aspects of the Sociotechnical Systems theory into its major elements: systems approach, joint optimization of performance and well-being, consideration of organizational and sociotechnical context, system interactions and levels, and implementation process (e.g., participatory ergonomics) (Carayon et al., 2013; Carayon, Kianfar, Li, & Wooldridge, 2015) (see Table 2). Macroergonomics elevates the systems approach in the HFE discipline; this is consistent with the recent emphasis and recognition of the systems approach by HFE experts (Dul et al., 2012). See above for additional discussion about the systems approach and its various applications to HFE.
5 HUMAN FACTORS AND ERGONOMICS IN THE LARGE SOCIAL CONTEXT In 2000, Neville Moray called for HFE to get involved in broad societal issues: “It is about the implication of the belief in ergonomics as a systems approach to the changes in society which are coming, changes such as the shortage of water, pollution, urbanization and overcrowding, an ageing population, and climate change” (Moray, 2000, p. 859). Ergonomists have applied their skills and approaches to tackle large societal problems, such as regional economic development (Kleiner & Drury, 1999) and environmental sustainability (Zink, 2008). This involvement of HFE in the large social context is in line with the integration of multiple system levels in HFE; see the above discussion of the systems approach. In this section, we describe selected initiatives aimed at HFE approaches that go beyond the organizational system level and address the major societal problems of community health and well-being (Smith et al., 2002) and sustainability (Zink, 2014; Zink & Fischer, 2013).
Key Elements of Macroergonomics
Key elements of macroergonomics Systems approach
Joint optimization of performance and well-being
Consideration of organizational and sociotechnical context System interactions and levels
Implementation process
Description and examples Human work is embedded in a system with multiple elements See Table 1 for the characteristics of the HFE systems approach The entire sociotechnical system needs to be improved, not just separate elements Both performance and well-being need to be optimized in the system design process Work at the sharp end is embedded in a larger organizational context that defines constraints and opportunities Systems have sub-systems that are nested and interact with each other System of systems are comprised of multiple systems that are interconnected from an operational viewpoint Participatory ergonomics Continuous improvement
5.1 Community Ergonomics The Community Ergonomics approach was initiated by a group of HFE researchers and professionals at the University of Wisconsin-Madison (Smith et al., 2002; Smith & Smith, 1994). It focuses on distressed community settings characterized by poverty, social isolation, and dependency, such as inner cities (Smith et al., 2002). Building on the Sociotechnical Systems theory, Community Ergonomics proposed a community ergonomics system model that is comprised of the following elements: community social sub-system, community technical sub-system, community ergonomics process, and the environmental context (Smith et al., 2002). Therefore, in line with Wilson’s (2014) HFE systems approach (see Table 1), Community Ergonomics is about designing interfaces between people and systems and sub-systems in societal contexts. Several Community Ergonomics projects aimed at revitalizing inner cities through enhancement of local community banks (Smith & Smith, 1994) and improving access to credit for local actors and stakeholders. HFE methods have been developed and applied to assess the fit (or lack of fit) between the banking system and community members. For instance, a survey helped to identify characteristics of the community bank that created barriers for about two-thirds of the community members (Newman & Carayon, 1994); these data were used to improve the design of the community bank and improved
232
HUMAN FACTORS FUNDAMENTALS
access to credit for community members. Other Community Ergonomics projects address HFE issues in companies engaged in international trade and development (Smith, Derjani-Bayeh, & Carayon, 2009), and improving community-wide care coordination for patients with chronic diseases (Holden, McDougald Scott, Hoonakker, Hundt, & Carayon, 2014). In line with HFE and its renewed focus on systems thinking, the central principles of Community Ergonomics are (Smith et al., 2002; Smith, Carayon, Smith, Cohen, & Upton, 1994): • Community renewal must be based on a systems approach. • Goals need to be established for all system levels for effective performance. • Performance feedback needs to be provided to all system levels. • Mechanisms need to be established to support information and knowledge sharing across system elements and levels. • Mechanisms need to be established for planning, and compensatory action as necessary. • People must be actively involved in actions that affect their lives. • Cooperation is necessary among system elements and system levels to achieve effective performance and balanced outcomes. 5.2 Human Factors and Ergonomics and Sustainability Klaus Zink spearheaded a movement within the international HFE community to address sustainability, including sustainable development and corporate social responsibility (Zink, 2008). The three pillars of sustainable development are social, economic, and environmental (Zink, Steimle, & Fisher, 2008). The translation of the concept of sustainable development at the level of a company refers to corporate social responsibility, which needs to strongly consider working conditions and worker well-being (Pfeffer, 2010). The HFE discipline can make important contributions to sustainability in areas such as improving environmental and social compatibility of products and work systems for human needs (Zink et al., 2008). How can we design work systems in the domains of transportation and energy in order to prevent both environmental and human harm? This is another area where HFE can make important contributions. Can we better align the needs of stakeholders at various system levels in order to achieve social, economic and environmental goals of sustainability? HFE has a range of participatory design tools, such as participatory ergonomics (see Section 6.1), which have a role to play in answering this question. With its emphasis on the design of complex sociotechnical systems, HFE can make important contributions to sustainability issues such as climate change. Using the concept of system of systems (see above for a discussion of system of systems), Thatcher and Yeow (2018) proposed an HFE model that describes the multiple and nested systems that contribute to sustainability. The nested hierarchies include various levels: individual, teamwork, organization, inter-organization, and ecology. The model also considers the time dimension over which sustainability evolves. It is important to examine the lifespan of systems and technologies. For instance, how can we design a workplace that accommodates the needs of workers, while considering potential dynamic changes in consumer demands. The model of Thatcher and Yeow (2018) also emphasizes the critical need to recognize the presence of multiple goals. Goals may be distributed across the levels of
the nested hierarchies, such as goals of a team versus goals of an organization versus environmental goals. As emphasized by Zink and colleagues (2008), we also need to recognize that goals have multiple dimensions, such as human, social, economic, ecological, cultural. For instance, how can we design a green building (ecological goal) that also meets the task requirements of the workers and create a positive social climate for performance (social and economic goals)? See the additional description of HFE in sustainability in Chapter 57 by Zink in this volume.
6 SOCIAL AND ORGANIZATIONAL HUMAN FACTORS AND ERGONOMICS METHODS HFE is about system design; therefore, we need usable, useful, valid, and reliable HFE methods for analyzing, assessing, redesigning, and continuously improving systems in their social and organizational context (Salmon et al., 2017). A recent review by Waterson et al. (2015) describes a broad range of macroergonomics methods for safety. Multiple HFE books also provide information on HFE methods that can be applied to social and organizational ergonomics (Hendrick & Kleiner, 2002; Stanton, Hedge, Brookhuis, Salas, & Hendrick, 2004). In this section, we review one particular approach, participatory ergonomics, which is a foundational macroergonomics method to address social and organizational aspects of HFE. We then discuss the critical challenge of system tradeoff, which emerges as HFE researchers and professionals tackle larger and broader social and organizational issues. 6.1 Participatory Ergonomics Participation of people in the design of systems that they are involved in, work in, and interact with, is a major principle of HFE and its multiple systems approaches, such as macroergonomics and Community Ergonomics. Noro, Imada, Zink and Wilson spearheaded the concept of participatory ergonomics (Noro & Imada, 1991; Wilson, 1991; Zink, 1996). Participatory ergonomics (PE) refers to “the involvement of people in planning and controlling a significant amount of their own work activities, with sufficient knowledge and power to influence both process and outcomes to achieve desirable goals” (Wilson, Haines, & Morris, 2005, p. 933). Most participatory ergonomics projects have tackled physical ergonomic issues such as high exertion and repetitiveness in manual tasks (Burgess-Limerick, 2018). Recent participatory ergonomics have been conducted to improve health care processes and systems, such as bedside rounding in a pediatric hospital (Xie et al., 2015), usability of health information technology (Carayon et al., 2020), and physical layout of health care settings (Andersen & Broberg, 2017). It is important to recognize that participatory ergonomics projects involve many decisions about the what, how, who and when. The participatory ergonomics framework (PEF) described multiple dimensions of participation (Haines, Wilson, Vink, & Koningsveld, 2002): 1. Permanence of PE within the organization. 2. Whether people participate directly or indirectly in the PE project. 3. The organizational level at which PE takes place; across the organization or only one department or team. 4. Who makes decisions in the PE project? 5. Composition of the PE team. 6. Requirement for participation. 7. Focus of the PE project.
SOCIAL AND ORGANIZATIONAL FOUNDATION OF ERGONOMICS: MULTI-LEVEL SYSTEMS APPROACHES
8. Remit of participation, e.g., in setting up the participatory process, involvement in problem identification, solution generation, or evaluation. 9. Role of the ergonomist. 6.2 Multiple Goals and Perspectives HFE has the dual objective of improving both performance and well-being. The connected domain of human-systems integration (HSI) expands the domains that need to be considered in any system design project: manpower, personnel, training, human factors engineering, system safety, health hazards, and survivability (Booher, 2003). In this chapter we have reviewed HFE approaches that tackle complex societal issues such as community health and well-being, and environmental sustainability. These various extensions of the HFE discipline require that we consider a wider range of goals, the synergies and complementarity of the goals, as well as the potential conflicts and trade-offs between the goals. Booher (2003) describes various areas of trade-off for HFE, and more broadly HSI, such as: • Worker versus automation: How do we allocate tasks to workers versus technology? How do we effectively and safely use different types and levels of automation (Parasuraman, Sheridan, & Wickens, 2000)? • Design versus staffing: System design influences workload and, therefore, the number of workers and support personnel. These are examples of trade-off between the elements of the system itself, such as technology, tasks, and training. In dealing with increasingly complex sociotechnical systems, HFE researchers and professionals need to analyze, assess, and consider trade-off in system design and system goals. Involvement and active participation of stakeholders and users can facilitate considerations for various types of trade-off. Participatory approaches have been developed to incorporate multiple perspectives in system design (Détienne, 2006). As described above, participatory ergonomics projects have multiple dimensions, including the composition of the project team. The project teams in charge of system design are likely to include members with different backgrounds, expertise, and experience. During the system design process, these multiple perspectives may produce divergent opinions, and potentially conflicts. Building common ground is, therefore, necessary to ensure optimal outcomes of participatory design of complex sociotechnical systems (Détienne, 2006). HFE methods need to be further developed in order to tackle those difficult issues of bringing together multiple perspectives and addressing the trade-off between multiple system elements, interactions, levels, and outcomes. 7 CONCLUSION Serious consideration of the social and organizational context in HFE requires that we adopt and adapt systems approaches. In this chapter, we have reviewed fundamental issues of systems thinking and we have called attention to models and theories in connected disciplines (e.g., job and work design, job stress, Sociotechnical Systems theory) in order to deepen our understanding of how HFE can improve the design of human/system interactions in the broad social and organizational context. This will inevitably leads us to address values in HFE (Dekker, Hancock, & Wilkin, 2012; Hancock & Drury, 2011; Lange-Morales, Thatcher, & Garcia-Acosta, 2014). What is our role in the sustainability of the ecological environment and the health of the world population? HFE aims at improving
233
performance and well-being of increasingly complex sociotechnical systems; but we need to further clarify the underlying values of our HFE methods, such as participation of users and other stakeholders, respect for diversity and human rights, compassion, and benevolence. In order to address major societal problems, including some described in this chapter, we need to develop HFE models, theories, and methods for humane sociotechnical system design. REFERENCES Andersen, S. N., & Broberg, O. (2017). A framework of knowledge creation processes in participatory simulation of hospital work systems. Ergonomics, 60(4), 487–503. doi:10.1080/00140139. 2016.1212999 Baxter, G., & Sommerville, I. (2011). Socio-technical systems: From design methods to systems engineering. Interacting with Computers, 23(1), 4–17. Retrieved from http://www.sciencedirect.com/ science/article/pii/S0953543810000652 Bertalanffy, L. V. (1950). The theory of open systems in physics and biology. Science, 111, 23–29. Booher, H. R. (2003). Introduction: Human systems integration. In H. R. Booher (Ed.), Handbook of human systems integration (pp. 1–30). New York: John Wiley & Sons. Bosma, H., Peter, R., & Siegrist, J. (1998). Two alternative job stress models and the risk of coronary heart disease. American Journal of Public Health, 88(1), 68–74. Burgess-Limerick, R. (2018). Participatory ergonomics: Evidence and implementation lessons. Applied Ergonomics, 68, 289–293. doi:https://doi.org/10.1016/j.apergo.2017.12.009 Carayon, P. (2006). Human factors of complex sociotechnical systems. Applied Ergonomics, 37(4), 525–535. Carayon, P., & Hoonakker, P. L. T. (2004). Macroergonomics organizational questionnaire survey (MOQS). In N. A. Stanton, A. Hedge, K. Brookhuis, E. Salas, & H. Hendrick (Eds.), Handbook of human factors and ergonomics methods. Boca Raton, FL: CRC Press. Carayon, P., Hoonakker, P., Hundt, A. S., Salwei, M., Wiegmann, D., Brown, R. L., … Patterson, B. (2020). Application of human factors to improve usability of clinical decision support for diagnostic decision-making: A scenario-based simulation study. BMJ Quality & Safety, 29, 329–340. doi:10.1136/bmjqs-2019009857 Carayon, P., Karsh, B.-T., Gurses, A. P., Holden, R. J., Hoonakker, P., Hundt, A. S., … Wetterneck, T. B. (2013). Macroergonomics in health care quality and patient safety. Review of Human Factors and Ergonomics, 8, 4–54. Carayon, P., Kianfar, S., Li, Y., & Wooldridge, A. (2015). Organizational design: Macroergonomics as a foundation for human systems integration. In D. A. Boehm-Davis, F. T. Durso, & J. D. Lee (Eds.), APA handbook on human systems integration (pp. 573–-588). Washington, DC: American Psychological Association. Cherns, A. (1976). The principles of sociotechnical design. Human Relations, 29(8), 783–792. Cherns, A. (1987). Principles of sociotechnical design revisited. Human Relations, 40, 153–162. Clegg, C. (2000). Sociotechnical principles for system design. Applied Ergonomics, 31, 463–477. Clegg, C., Coleman, P., Hornby, P., Maclaren, R., Robson, J., Carey, N., & Symon, G. (1996). Tools to incorporate some psychological and organizational issues during the development of computer-based systems. Ergonomics, 39(3), 482–511. Cooper, C. L., & Cartwright, S. (1994). Healthy mind; healthy organization: A proactive approach to occupational stress. Human Relations, 47(4), 455–469. Cooper, C. L., & Marshall, J. (1976). Occupational sources of stress: A review of the literature relating to coronary heart disease
234
HUMAN FACTORS FUNDAMENTALS and mental ill health. Journal of Occupational Psychology, 49(1), 11–25.
Cummings, T. G. (1978). Self-regulating work groups: A socio-technical synthesis. Academy of Management Review, 3(3), 625–634. doi:10.5465/amr.1978.4305900 Dekker, S. W. A., Hancock, P. A., & Wilkin, P. (2012). Ergonomics and sustainability: Towards an embrace of complexity and emergence. Ergonomics, 56(3), 357–364. doi:10.1080/00140139. 2012.718799 Demerouti, E., Bakker, A. B., Nachreiner, F., & Schaufeli, W. B. (2001). The Job Demands-Resources model of burnout. Journal of Applied Psychology, 86(3), 499–512. Détienne, F. (2006). Collaborative design: Managing task interdependencies and multiple perspectives. Interacting with Computers, 18(1), 1–20. Dul, J., Bruder, R., Buckle, P., Carayon, P., Falzon, P., Marras, W. S., … van der Doelen, B. (2012). A strategy for human factors/ergonomics: Developing the discipline and profession. Ergonomics, 55(4), 377–395. Grawitch, M. J., Gottschalk, M., & Munz, D. C. (2006). The path to a healthy workplace: A critical review linking healthy workplace practices, employee well-being, and organizational improvements. Consulting Psychology Journal: Practice and Research, 58(3), 129–147. Hackman, J. R., & Lawler, E. E. (1971). Employee reactions to job characteristics. Journal of Applied Psychology, 55, 259–286. Hackman, J. R., Oldham, G. R., Janson, R., & Purdy, K. (1975). A new strategy for job enrichment. California Management Review, 17(4), 57–71. Haines, H., Wilson, J. R., Vink, P., & Koningsveld, E. (2002). Validating a framework for participatory ergonomics (the PEF). Ergonomics, 45(4), 309–327. Hancock, P. A., & Drury, C. G. (2011). Does human factors/ergonomics contribute to the quality of life? Theoretical Issues in Ergonomics Science, 12(5), 416–426. Haro, E., & Kleiner, B. M. (2008). Macroergonomics as an organizing process for systems safety. Applied Ergonomics, 39(4), 450–458. Hendrick, H. W. (1986). Macroergonomics: A conceptual model for integrating human factors with organizational design. In O. J. Brown & H. W. Hendrick (Eds.), Human factors in organizational design and management, Vol. II (pp. 467–478). Amsterdam: North-Holland. Hendrick, H. W. (1991). Ergonomics in organizational design and management. Ergonomics, 34, 743–756. Hendrick, H. W. (2008). Applying ergonomics to systems: Some documented “lessons learned”. Applied Ergonomics, 39(4), 418–426. Retrieved from http://www.sciencedirect.com/science/article/ B6V1W-4S9R4PJ-1/2/dc70b49d66a78b9d615df27a8425d37e Hendrick, H. W., & Kleiner, B. M. (2001). Macroergonomics: An introduction to work system design. Santa Monica, CA: The Human Factors and Ergonomics Society. Hendrick, H. W., & Kleiner, B. M. (Eds.). (2002). Macroergonomics: Theories, methods, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Herzberg, F. (1974). The wise old Turk. Harvard Business Review, 52(September/October), 70–80. Herzberg, F. (2003). One more time: How do you motivate employees? Harvard Business Review, 81(1), 87–96. Retrieved from http:// search.ebscohost.com/login.aspx?direct=true&AuthType=ip, uid&db=bth&AN=8796887&site=ehost-live&scope=site Holden, R. J., McDougald Scott, A., Hoonakker, P. L. T., Hundt, A. S., & Carayon, P. (2014). Data collection challenges in community settings: Insights from two studies of patients with chronic disease. Quality of Life Research, 24, 1043–1055. Hoonakker, P., & Carayon, P. (2006). Macroergonomic Organizational Questionnaire Survey (MOQS). In R. N. Pikaar,
E. A. P. Koningsveld, & P. J. M. Settels (Eds.), Proceedings of the IEA2006 Congress. Elsevier. Hurrell, J. J. J., & McLaney, M. A. (1988). Exposure to job stress: A new psychometric instrument. Scandinavian Journal of Work Environment and Health, 14(suppl. 1), 27–28. Johnson, J. V., & Hall, E. M. (1988). Job strain, work place social support and cardiovascular disease: A cross-sectional study of a random sample of the Swedish working population. American Journal of Public Health, 78, 1336–1342. Kalimo, R., Lindstrom, K., & Smith, M. J. (1997). Psychosocial approach in occupational health. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (pp. 1059–1084). New York: John Wiley & Sons. Karasek, R. A. (1979). Job demands, decision latitude, and mental strain: Implications for job design. In Job characteristics and mental strain. Ithaca, NY: Cornell University Press. Karasek, R. A., & Theorell, T. (1990). Healthy work: stress, productivity and the reconstruction of working life. New York: Basic Books. Karsh, B.-T., Waterson, P., & Holden, R. J. (2014). Crossing levels in systems ergonomics: A framework to support “mesoergonomic” inquiry. Applied Ergonomics, 45(1), 45–54. Kleiner, B. M. (2004). Macroergonomics as a large work-system transformation technology. Human Factors and Ergonomics in Manufacturing, 14(2), 99–115. Kleiner, B. M. (2006). Macroergonomics: Analysis and design of work systems. Applied Ergonomics, 37(1), 81–89. Kleiner, B. M. (2008). Macroegonomics: Work system analysis and design. Human Factors, 50(3), 461467. Kleiner, B. M., & Drury, C. G. (1999). Large-scale regional economic development: Macroergonomics in theory and practice. Human Factors and Ergonomics in Manufacturing & Service Industries, 9(2), 151–163. Lange-Morales, K., Thatcher, A., & Garcia-Acosta, G. (2014). Towards a sustainable world through human factors and ergonomics: It is all about values. Ergonomics, 57(11), 1603–1615. doi:10.1080/ 00140139.2014.945495 Lim, S. Y., & Murphy, L. R. (1999). The relationship of organizational factors to employee health and overall effectiveness. American Journal of Industrial Medicine, supplement 1, 64–65. Madsen, I. E. H., Nyberg, S. T., Magnusson Hanson, L. L., Ferrie, J. E., Ahola, K., Alfredsson, L., … Kivimäki, M. (2017). Job strain as a risk factor for clinical depression: Systematic review and meta-analysis with additional individual participant data. Psychological Medicine, 47(8), 1342–1356. doi:10.1017/ S003329171600355X Marras, W. S., & Hancock, P. A. (2014). Putting mind and body together: A human-systems approach to the integration of the physical and cognitive dimensions of task design and operations. Applied Ergonomics, 45(1), 55–60. McGregor, D. (1960). The human side of enterprise. New York: McGraw-Hill. Moray, N. (2000). Culture, politics and ergonomics. Ergonomics, 43(7), 858–868. Murphy, L. A., Robertson, M. M., & Carayon, P. (2014). The next generation of macroergonomics: Integrating safety climate. Accident Analysis and Prevention, 68, 16–24. doi:http://dx.doi.org/10.1016/ j.aap.2013.11.011 Murphy, L. R., & Cooper, C. L. (Eds.). (2000). Healthy and productive work: An international perspective. London: Routledge. Newman, L., & Carayon, P. (1994). Community ergonomics: Data collection methods and analysis of human characteristics. Paper presented at the Human Factors and Ergonomics Society 38th Annual Meeting. Noro, K., & Imada, A. (1991). Participatory ergonomics. London: Routledge. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans, 30(3), 286–297.
SOCIAL AND ORGANIZATIONAL FOUNDATION OF ERGONOMICS: MULTI-LEVEL SYSTEMS APPROACHES Parker, S. K., Morgeson, F. P., & Johns, G. (2017). One hundred years of work design research: Looking back and looking forward. Journal of Applied Psychology, 102(3), 403–420. doi:10.1037/ apl0000106 Parker, S. K., Van Den Broeck, A., & Holman, D. (2017). Work design influences: A synthesis of multilevel factors that affect the design of jobs. Academy of Management Annals, 11(1), 267–308. doi:10.5465/annals.2014.0054 Pasmore, W. (1988). Designing effective organizations: The sociotechnical systems perspective. New York: Wiley. Pasmore, W., Francis, C., Haldeman, J., & Shani, A. (1982). Sociotechnical Systems: A North American reflection on empirical studies of the seventies. Human Relations, 35(12), –1179–1204. doi:10.1177/001872678203501207 Pasmore, W., Winby, S., Mohrman, S. A., & Vanasse, R. (2019). Reflections: Sociotechnical systems design and organization change. Journal of Change Management, 19(2), 67–85. doi:10.1080/14697017.2018.1553761 Pfeffer, J. (2010). Building sustainable organizations: The human factor. Academy of Management Perspectives, 24(1), 34–45. Robertson, M. M. (2005). Macroergonomics in training systems development. In H. Hendrick & B. M. Kleiner (Eds.), Macroergonomics: Theory, methods, and applications (pp. 249–272). Mahwah, NJ: Lawrence Erlbaum Associates. Robertson, M. M., Ciriello, V. M., & Garabet, A. M. (2013). Office ergonomics training and a sit-stand workstation: Effects on musculoskeletal and visual symptoms and performance of office workers. Applied Ergonomics, 44(1), 73–85. doi:10.1016/j.apergo.2012.05.001 Salmon, P. M., Walker, G. H., M. Read, G. J., Goode, N., & Stanton, N. A. (2017). Fitting methods to paradigms: Are ergonomics methods fit for systems thinking? Ergonomics, 60(2), 194–205. doi:10.1080/00140139.2015.1103385 Sara, J. D., Prasad, M., Eleid, M. F., Zhang, M., Widmer, R. J., & Lerman, A. (2018). Association between work-related stress and coronary heart disease: A review of prospective studies through the Job Strain, Effort-Reward Balance, and Organizational Justice models. Journal of the American Heart Association, 7(9). doi:10.1161/JAHA.117.008073 Sauter, S., Lim, S. Y., & Murphy, L. R. (1996). Organizational health: A new paradigm for occupational stress at NIOSH. Japanese Journal of Occupational Mental Health, 4(4), 248–254. Siegrist, J. (1996). Adverse health effects of high-effort/low-reward conditions. Journal of Occupational Health Psychology, 1(1), 27–41. Siemieniuch, C. E., & Sinclair, M. A. (2014). Extending systems ergonomics thinking to accommodate the socio-technical issues of Systems of Systems. Applied Ergonomics, 45(1), 85–98. Smith, J. H., Cohen, W. J., Conway, F. T., Carayon, P., Derjani Bayeh, A., & Smith, M. J. (2002). Community ergonomics. In H. Hendrick & B. Kleiner (Eds.), Macroergonomics: Theory, methods, and applications (pp. 289–309). Mahwah, NJ: Lawrence Erlbaum Associates. Smith, J. H., & Smith, M. J. (1994). Community ergonomics: An emerging theory and engineering practice. In The Human Factors and Ergonomics Society (Ed.), Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting (pp. 729–733). Santa Monica, CA: The Human Factors and Ergonomics Society. Smith, M. J., Carayon, P., Smith, J., Cohen, W., & Upton, J. (1994). Community ergonomics: A theoretical model for rebuilding the inner city. In The Human Factors and Ergonomics Society (Ed.), The Human Factors and Ergonomics Society 38th Annual Meeting (pp. 724–728). Santa Monica, CA: The Human Factors and Ergonomics Society. Smith, M. J., & Carayon-Sainfort, P. (1989). A balance theory of job design for stress reduction. International Journal of Industrial Ergonomics, 4(1), 67–79.
235
Smith, M. J., Derjani-Bayeh, A., & Carayon, P. (2009). Community ergonomics and globalization: A conceptual model of social awareness. In C. M. Schlick (Ed.), Industrial Engineering and Ergonomics (pp. 57–66). Berlin: Springer. Sommerville, I., Cliff, D., Calinescu, R., Keen, J., Kelly, T., Kwiatkowska, M., … Paige, R. (2012). Large-scale complex IT systems. Communications of the ACM, 55(7), 71–77. Sommerville, I., & Dewsbury, G. (2007). Dependable domestic systems design: A socio-technical approach. Interacting with Computers, 19(4), 438–456. Stanton, N., Hedge, A., Brookhuis, K., Salas, E., & Hendrick, H. W. (Eds.). (2004). Handbook of human factors and ergonomics methods. Boca Raton, FL: CRC Press. Taylor, F. (1911). The principles of scientific management. New York: Norton and Company. Thatcher, A., & Yeow, P. H. P. (2018). A sustainable system-of-systems approach: Identifying the important boundaries for a target system in human factors and ergonomics. In A. Thatcher & P. H. P. Yeow (Eds.), Ergonomics and human factors for a sustainable future (pp. 23–45). Boca Raton, FL: CRC Press. Trist, E. (1981). The evolution of socio-technical systems. Toronto: Quality of Working Life Center. Trist, E. L., & Bamforth, K. (1951). Some social and psychological consequences of the long-wall method of coal getting. Human Relations, 4, 3–39. Waterson, P., Robertson, M. M., Cooke, N. J., Militello, L., Roth, E., & Stanton, N. A. (2015). Defining the methodological challenges and opportunities for an effective science of sociotechnical systems and safety. Ergonomics, 58(4), 565–599. doi:10.1080/00140139.2015.1015622 Wilson, J. R. (1991). Participation:- A framework and a foundation for ergonomics? Journal of Occupational Psychology, 64, 67–80. Wilson, J. R. (2000). Fundamentals of ergonomics in theory and practice. Applied Ergonomics, 31(6), 557–567. Wilson, J. R. (2014). Fundamentals of systems ergonomics/human factors. Applied Ergonomics, 45(1), 5–3. Wilson, J., Haines, H., & Morris, W. (2005). Participatory ergonomics. In Evaluation of human work (3rd ed.). Boca Raton, FL: CRC Press. World Health Organization. (2010a). Healthy workplaces: A model for action. Geneva, Switzerland: WHO Press. World Health Organization. (2010b). WHO healthy workplace framework and model: background and supporting literature and practice. Geneva, Switzerland: WHO. Xie, A., Carayon, P., Cox, E. D., Cartmill, R., Li, Y., Wetterneck, T. B., & Kelly, M. M. (2015). Application of participatory ergonomics to the redesign of the family-centred rounds process. Ergonomics, 1–19. doi:10.1080/00140139.2015.1029534 Zink, K. J. (1996). Continuous improvement through employee participation: Some experiences from a long-term study. In O. Brown Jr. & H. W. Hendrick (Eds.), Human factors in organizational design and management (Vol. V, pp. 155–160). Amsterdam: Elsevier. Zink, K. (2000). Ergonomics in the past and the future: From a German perspective to an international one. Ergonomics, 43(7), 920–930. Zink, K. (Ed.) (2008). Corporate sustainability as a challenge for comprehensive management. Heidelberg, Germany: Physica Verlag. Zink, K. J. (2014). Designing sustainable work systems: The need for a systems approach. Applied Ergonomics, 45, 126–132. Zink, K. J., & Fischer, K. (2013). Do we need sustainability as a new approach in human factors and ergonomics? Ergonomics, 56(3), 348–356. doi:10.1080/00140139.2012.751456 Zink, K. J., Steimle, U., & Fisher, K. (2008). Human factors, business excellence and corporate sustainability: Differing perspectives, joint objectives. In K. J. Zink (Ed.), Corporate sustainability as a challenge for comprehensive management (pp. 3–18). Heidelberg, Germany: Physica-Verlag.
CHAPTER
9
EMOTIONAL DESIGN Feng Zhou University of Michigan-Dearborn Dearborn, Michigan
Yangjian Ji Zhejiang University Hangzhou, China
Roger Jianxin Jiao Georgia Institute of Technology Atlanta, Georgia
1
2
3
INTRODUCTION
236
3.2
Affective-Cognitive Needs Analysis
243
1.1
What Is Emotion?
236
3.3
Affective-Cognitive Needs Fulfillment
245
1.2
Emotion in Human Factors and Ergonomics
237
CHALLENGES AND FUTURE DIRECTIONS
245
CONNECTING EMOTION TO DESIGN
237
2.1
237
Emotional Associations
4.1
Measuring Emotion and Cognition in Naturalistic Settings
245
Integration of Affect and Cognition
246
Emotional Design for Product Ecosystems
246
2.2
Factors Influencing Emotional Experience
237
4.2
2.3
Models and Methods Related to Emotional Design
238
4.3
A SYSTEMATIC PROCESS FOR EMOTIONAL DESIGN 3.1
Affective-Cognitive Needs Elicitation and Measurement
5 241
CONCLUSION
REFERENCES
247 247
242
1 INTRODUCTION 1.1 What Is Emotion? The concept of emotion is closely related to affect, which is an encompassing term, consisting of emotions, feelings, moods, and evaluations (Simon, Clark, & Fiske, 1982). The most important concept is probably emotion. Nevertheless, in psychology, the theories about emotion consider it a “very confused and confusing field of study” (Ortony, Clore, & Collins, 1988) and thus there is no consensus on a final definition. Various factors are associated with emotions, including subjective factors, environmental factors, and neural and hormonal processes. In this chapter, we follow the summary of emotion provided by Kleinginna and Kleinginna (1981), which incorporates the key elements of definitions in psychology as follows: 1. Emotions give rise to affective experience, such as pleasure or displeasure. 2. Emotions stimulate us to generate cognitive explanations—to attribute the cause to ourselves or to the environment, for example. 3. Emotions trigger a variety of internal adjustments in the autonomic nervous system, such as an increased heart rate and a decreased skin conductance response. 4. Emotions elicit behaviors that are often, but not always, expressive (laughing or crying), goal-directed (approaching or avoiding), and adaptive (removal of a potential threat).
236
4
Feelings can be used to describe physical sensation of touch through either experience or perception, and are subjective representations of emotions, which can be consciously felt (Davidson, Sherer, & Goldsmith, 2009). Thus, they are often used as self-reported measures for emotions in the literature (e.g., Zhou, Qu, Helander, & Jiao, 2011). Moods are associated with affective states with a longer duration (Picard, 1997). They can last for hours, days, or even longer without an attributed object. Emotions are often short-lived, but when an emotion, thought, or action, is repeatedly activated, it can result in a mood (Russell, 2003). For instance, a negative mood can be produced by repeated negative emotions, thoughts, actions or induced by drugs or medication (Picard, 1997). Subjective evaluation is often defined as a valenced affective response that can assess an object or a situation with positive or negative opinions, views, or reactions (Simon et al., 1982). Russell (2003) used core affect to describe all the emotionally charged events, including emotion, mood, and evaluation. It has two important dimensions, i.e., valence (pleasuredispleasure) and arousal (sleepy-activated). Compared to the discrete emotion models, such as basic emotions proposed by Paul Ekman (1992), who argued that there were six basic emotions (i.e., anger, disgust, fear, happiness, sadness, and surprise) and that they could be recognized by facial expressions across different cultures, Russell argued that valence and arousal were two important dimensions in the continuous emotion model. Individual emotions can be specified with these
EMOTIONAL DESIGN
two dimensions (Russell et al., 1989). For example, excitement is characterized by positive valence and high arousal while sadness is characterized by negative valence and low arousal. 1.2 Emotion in Human Factors and Ergonomics Traditional human factors and ergonomics (HFE) researchers mainly addressed the physical and cognitive aspects of the human to prevent frustration, pain, stress, fatigue, overload, injury, and death in the design, development, and deployment of products and systems (Hancock, Pepe, & Murphy, 2005; Wickens, Gordon, & Liu, 1998). Since the 1990s, researchers in HFE have started to advocate positive experience of the human, including flow experience (Csikszentmihalyi, 1990) and hedonomics (Hancock et al., 2005; Helander & Tham, 2003). Contrary to traditional HFE which prevented negative aspects of human experience, positive psychology advocated positive aspects of human experience, such as happiness, well-being, and positivity (Csikszentmihalyi & Seligman, 2000). This notion started to influence HFE and one good example is the concept of the flow experience, in which one is so intensely absorbed and immersed in the task that it results in positive emotions, exploratory behavior, and behavioral perceived control (Csikszentmihalyi, 1990) during the human–product interaction. This can only happen when the task difficulty level matches the user’s skill level with a clear set of goals and immediate feedback. The view of positive psychology further influenced pioneer researchers in HFE to pursue hedonomics (Hancock et al., 2005; Helander & Khalid, 2005; Helander & Tham, 2003). This promotes pleasurable experience and individuation in the process of human-product interaction. Pleasurable experience goes beyond safety, reliability, and usability to include joy, fun, and positive experience as a result of users’ appraisal, perception, and interaction with the product while individuation emphasizes customization and personalization of the tools for individuals to optimize efficiency and pleasure (Hancock et al., 2005). Recently, hedonomics has been proposed as reaching its fullest potential to collective goals in organizational and social contexts, such as the workplace (Oron-Gilad & Hancock, 2017). In addition, organizations, conferences, and special issues related to emotion and design in HFE have also been burgeoning. In 1999, the Design and Emotion Society was formed (Desmet, 1999) with the First International Conference on Design and Emotion held in Delft, The Netherlands. Since then, it has been held bi-annually, where researchers and industry practitioners and leaders interact with each other in the domain of design and emotion. At the tenth anniversary of the International Conference on Design and Emotion, a special issue was created of the International Journal of Design to synthesize different design and emotion-related studies (Desmet & Hekkert, 2009). In addition, the International Conference on Kansei Engineering and Emotion Research was created in 2007 and held bi-annually to invite related researchers and industrial practitioners and leaders to exchange knowledge on emotion and Kansei research (Nagamachi, 1995) in product design and development. Both the International Conference on Kansei Engineering and the International Conference on Affective and Pleasurable Design are affiliated with the International Conference on Applied Human Factors and Ergonomics series. Emotion-related topics on design also frequently appear in the ACM CHI Conference on Human Factors in Computing Systems, which is the premier international conference of Human-Computer Interaction, and recent examples of studies include (Altarriba Bertran, Márquez Segura, & Isbister, 2020; Dmitrenko et al., 2020).
237
2 CONNECTING EMOTION TO DESIGN 2.1 Emotional Associations Core affect is object-free without being directed at anything, i.e., no emotional associations, whereas affective quality is related to or belongs to the product and has the ability to cause a change in core affect during the human–product interaction process so that the product is attributed with creating emotional associations (Russell, 2003; Zhou, Xu, & Jiao, 2011). Note core affect is within the user, but affective quality lies in the product. Similar to core affect, affective quality can also be described with valence and arousal as a dimensional construct. Valence, as the intrinsic pleasure or displeasure, of a product feature often governs the fundamental user responses or reactions in the interaction process, i.e., likes and attraction, which encourage approach, versus dislikes or aversion, which lead to withdrawal and avoidance (Bradley et al., 2001; Zhou, Xu, & Jiao, 2011). Despite the distinct personalities, emotional baggage, and unique dispositions, there are common psychology principles that are common to all humans that we can use to build emotional associations (Walter, 2011), such as the baby face bias, the golden ratio rule, and Gestalt principles. For example, designers can make use of the baby face bias to motivate users and high baby schema infants were considered to be cuter and elicited stronger motivation for caretaking than low baby schema infants (Glocker et al., 2009), the golden ratio rule is widely applied in website design, such as Twitter (Walter, 2011), and Gestalt principles of perceptual organization can make a design coherent and orderly and, therefore, pleasant to look at (Desmet & Hekkert, 2007; Schifferstein & Hekkert, 2011). Arousal also influences the resultant emotional responses to human-product interaction. It can be defined as a psychological and physiological level of alertness and it can influence a person’s sensory alertness, mobility, and readiness to respond (Kubovy, 1999). Studies showed that there was an optimal level of arousal for individual task performance, i.e., the inverted-U shape, in the Yerkes-Dodson law (Yerkes & Dodson, 1908). For example, a state of high vigilance is still required in human–automation interaction in conditional automated driving for the driver to be ready for takeover transitions (Ayoub, Zhou, Bao, et al., 2019; Du et al., 2020; Zhou, Alsaid, et al., 2020). In other interactive applications and areas, including training, learning, and gaming, an optimal level of arousal is also important in order to maintain or produce particular alertness for optimal performance and positive emotions (Zhou, Lei, et al., 2017; Zhou, Qu, et al., 2011; Zhou, Qu, Jiao, & Helander, 2014), which can be similar to the flow experience in the human–product interaction process. 2.2 Factors Influencing Emotional Experience We examine the factors that influence emotional experience using the appraisal theory (Clore & Ortony, 2013; Ellsworth & Scherer, 2003; Ortony, Clore, & Collins, 1988), in which human users assess stimuli with regard to their perceived significance considering their goals and needs, comparing their coping capabilities with corresponding consequences and the compatibility of the actions with perceived social norms and self-ideals. Under such a framework, we categorize the factors into human needs, product quality, and ambient factors as well as their dynamic relationships, which is termed human–product–ambience interaction (Zhou, Ji, & Jiao, 2013; Zhou, Xu, & Jiao, 2011). According to the motivational theory in psychology (Maslow & Lewis, 1987), human needs comprise a five-tier
238
hierarchy, and from the bottom upwards, they are physiological needs, safety, love and belonging, esteem, and self-actualization. Correspondingly, for product design, human needs can be divided into a similar hierarchy of needs, including functional, reliable, usable, pleasurable, and individuation (Hancock et al., 2005; Walter, 2011). With regard to emotional design, the higher level of user needs that go beyond the instrumental ones (Hassenzahl & Tractinsky, 2006), i.e., affective needs, including pleasurable and individuation, are defined in a broader perspective to focus on emotional responses and aspirations (Jiao et al., 2007), and are deeply implanted in the lower levels of basic needs to minimize pain and maximize pleasure, both psychologically and physically. The strength of such pain or pleasure is built on the user’s appraisal process and ensuing results. During the interaction process between the human user and the product, the user evaluates whether the tasks involved are facilitating (affective) needs fulfillment. If so, positive emotional responses can be elicited. As mentioned earlier, good affective quality related to or within the product can greatly satisfy affective needs by attributing positive emotional responses to the product (features). For example, if an automated school bus is able to assure safety in transporting children, the parents will have no anxiety or worry, but rather trust and ease (Ayoub et al., 2020). Moreover, from affective computing’s point of view, smart products equipped with emotion-sensing capabilities may help frustrated users and prevent other negative emotions (e.g., road rage) by designed interventions (Picard & Klein, 2002). For example, in education and learning, many researchers make use of the emotional lens to prevent negative emotional responses, optimize learning performance, and advocate positive emotional outcomes (Yadegaridehkordi et al., 2019). Consistent with the flow experience (Csikszentmihalyi, 1990; Csikszentmihalyi & Seligman, 2000), the appraisal theory also considers users’ ability to deal with the tasks in the human–product interaction process by reaching, modifying, postponing, or giving up goals or needs to modulate their emotional responses (Ellsworth & Scherer, 2003). When one’s coping capabilities match the task challenge levels (dynamically), one is able to sustain the flow experience continuously. Examples in HFE often compare novice users with experienced users, old users with young users, and male users with female users, etc. in evaluating product performance, usability, and affective quality. For instance, ordinary use cases were compared with extraordinary use cases in order to elicit latent customer needs that could delight customers unexpectedly (Zhou, Jiao, & Linsey, 2015). As another example, trust in automated vehicles consisted of multiple interacting variables, including the age of the drivers, risks, and reliability of the vehicle, and younger drivers reduced their trust significantly more than older drivers when there were automation failures (Rovira, McLaughlin, Pak, & High, 2019). Other particular ambient factors that can potentially influence one’s emotional responses include environmental settings and cultural differences (Zhou, Xu, & Jiao, 2011; Zhou et al., 2013). These factors can be considered as moderator variables that can either improve or weaken the relationship between the user factors and product factors. The environmental settings are factors that influence where the product will be used and how the product will be used in combination with other products. These factors affect users’ perception of product value and assessment. For example, the interior setting of a plane, including the humidity level, the noise level, the lighting, the interior color and interior design patterns, can significantly influence a passenger’s flying experience (Zhou, Ji, & Jiao, 2014b). For another example, a Kindle device is supposed to
HUMAN FACTORS FUNDAMENTALS
be used in various environments and places, and the designer needs to consider whether it will be sensitive to various environmental settings (e.g., light conditions, parental control for kids) (Zhou et al., 2015). In addition, the sequence effect states that a product is positively evaluated in isolation, but can eventually not be used or possessed due to its unfitness with other products that have been previously purchased, including furniture, computer hardware and software, and appliances (Bloch, 1995). For instance, when Microsoft rolled out the Windows Vista operating system, its compatibility issues (e.g., the Aero interface) caused negative emotional responses (Livingston & Thurott, 2007). Cultural factors can also influence users’ perception of products due to the fact that humans are social species. Typical examples include aesthetic stereotypes, national shapes and colors, social rules and norms, historical beliefs, customs, practices, and so on (Qin, Song, & Tian, 2019). For example, participants from countries with individualistic cultures (e.g., the United States, Canada, Germany, and the United Kingdom) liked angular patterns while those from countries with collective cultures (e.g., Japan, South Korea, and Hong Kong) preferred round patterns. Designers should take account of cultural differences in the design process not only across cultures, but also across generations within one culture. For example, how young people perceive traditional cultural design can significantly influence their emotional responses to and attitudes toward cultural product design (Chai, Bao, Sun, & Cao, 2015). 2.3 Models and Methods Related to Emotional Design 2.3.1 Norman’s Emotional Design The book, Emotional Design: Why We Love (or Hate) Everyday Things by Donald Norman (2004) described three levels of cognitive processes that give rise to emotional associations between the human user and the product: visceral, behavioral, and reflective (see Figure 1). The visceral level focuses on the immediate sensory reactions to the product’s physical features (e.g., the look, the feel, and the sound), which are directly related to valenced reactions to the product (i.e., approach or avoidance). Users’ visceral reactions to a product are wired in and the design principles tend to be universal. This is consistent with the baby face bias, the golden ratio rule, and the Gestalt principles mentioned above and good visceral design needs skilled visual and industrial designers. For example, Park, Lee, and Kim (2011) explored a new interactive touch system on a mobile touch screen by making use of the weight factor in the Laban’s Effort system, and they found that it significantly improved the physical feel of the interface emotionally at the visceral level. Visceral design is prevalent in industries like automotive (e.g., the Mini Cooper and Tesla cars), electronics (e.g., iWatch and MacBook Pro), packaging design, and so on. The visceral level informs the behavioral level and the user subconsciously evaluates the design in terms of whether it helps complete goals with effectiveness, efficiency, and satisfaction. Behavioral design aims to improve human–product interaction, focusing on usability, performance, and function. Traditional HFE heavily emphasizes usability and performance and in this sense, behavioral design tends to be consistent with human-centered design in that it puts the user’s needs foremost (Norman, 2013). Many user research methods (Baxter, Courage, & Caine, 2015) in human-centered design are useful to discover the user’s needs for good behavioral design, such as observation, ethnography, contextual inquiry, and scenario-based design. Thus, good behavioral design begins with understanding the user’s needs, generating ideas, testing concepts, and obtaining
EMOTIONAL DESIGN
239
inhibit or enhance
Visceral level: Subconscious Biologically determined Attractiveness Rapid judgment First impression Figure 1
inhibit or enhance
Behavioral level: Subconscious Influenced by training Usability Understandability Performance Function
Reflective level: Conscious Influenced by experience Extending much longer Value Meaning Culture
The key features involved in Norman’s emotional design.
feedback, and iteratively refines the product. For example, an autonomous system was designed for school buses using human-centered design in order to meet the needs of the parents (e.g., trust) and kids (e.g., fun, safety) at the same time (Ayoub et al., 2020). From an affective computing point of view, systems that use behavioral (e.g., facial expressions) (Zhou, Kong, et al., 2020) and physiological measures (e.g., heart rate, galvanic skin response (GSR)) (Zhou, Qu, et al., 2011) to continuously monitor the human–product interaction process can potentially respond to interaction issues to improve behavioral design. For example, multiple physiological measures were used to monitor driver states in order to improve in-vehicle system usability (Zhou, Ji, & Jiao, 2014a). With the accumulation of the interaction between the user and the product, at the reflective level, the user consciously assesses the benefits, values, culture, and meaning brought by the product, which often forms emotional bonds between the user and the product. At this reflective level, the real value of the product can be way beyond the value at the visceral and behavioral levels by meeting people’s affective needs and establishing their self-image and identity in the society. A good example was described by Helander, Khalid, Lim, Peng, and Yang (2013) of the user’s emotional intent or desire for vehicles and for a long time, consumption of vehicles has always been more than just rational economic choices and it connects the users by aesthetic, emotional, and sensory responses to driving and symbolic relationships at both the social and cultural levels (Sheller, 2004). Unlike the previous two levels, users consciously evaluate the product at the reflective level, the real values are influenced by knowledge, experience, and culture to a great extent. For example, many special objects are associated with personal experience and memories of their own, which are often not the objects themselves, but rather the relationships and attachment to them, as described in the book The Meaning of Things (Csikszentmihalyi & Halton, 1981). In addition, at the reflective level, users can sometimes forgive the negative experience involved at the visceral or behavioral levels. For example, long-term customer experience and loyalty can often be sustained if good customer services are provided along the customer journeys by fixing defects in the initial interactions with the product, integrating multiple business functions, and creating and delivering positive customer experience (Lemon & Verhoef, 2016).
2.3.2 Jordan’s Four Pleasures Jordan (2000) considered products to be living objects that could elicit both positive and negative emotional responses and products should be designed to be useful, usable, and pleasurable. He proposed four pleasures, i.e., physiological, social, psychological, and ideological, to support pleasurable design. Physiological pleasure refers to pleasures generated from sensory responses, including visual, auditory, tactile, olfactory, and so on, which seem to be consistent with visceral design in Norman’s emotional design. One example in vehicle-related research is to make use of the odors inside the vehicle (e.g., rose compared to leather) to reduce the effect of visually induced motion sickness to improve physiological pleasure (Keshavarz, Stelzmann, Paillard, & Hecht, 2015). Social pleasure is the enjoyment as a result of social interaction with others using the product as the medium. Direct examples are popular social media apps nowadays. Others can be the talking points involved in the social interactions, such as smart speakers, and those indicate users’ specific social groups, such as Porsches for “yuppies” (Jordan, 1997). Psychological pleasure is derived from conducting and accomplishing a task through human–product interaction, which tends to be similar to the behavioral level in Norman’s emotional design. It focuses on enjoyment as a result of achieving tasks with usable products. For example, an assistance system was developed for the elderly to aid their activities in daily living and due to its proactive and case-driven characteristics, it was usable and pleasurable at the same time (Zhou, Jiao, Chen, & Zhang, 2010). Ideological pleasure is related to personal aspirations and values and is derived from artistic products, such as books, music, movies, and products that embody their values. For example, some consumers were willing to buy sustainable and organic foods due to social identity and attitudes toward environmental responsibility (Bartels & Onwezen, 2014). Thus, products that embed such values can be popular among these consumers. 2.3.3 Kansei Engineering Kansei engineering originated in Japan as long ago as the 1970s and it maps users’ Kansei into product attributes in the design process using engineering methods, where Kansei is defined as the state of mind where knowledge, emotion, and passion are harmonized (Nagamachi, 1995). The key questions in Kansei engineering are: (1) how to understand Kansei accurately; (2) how to reflect and translate Kansei understanding into
240
design elements; and (3) how to create a system and organization for Kansei-oriented design (Nagamachi & Lokman, 2016). Although Kansei can be represented with different forms, adjectives are most frequently used (Zhou, Jiao, Schaefer, & Chen, 2010) with semantic differential scales (e.g., simple–complex, spacious–narrow, boring–interesting) (Osgood, May, Miron, & Miron ,1975). There are three major types of Kansei engineering methods. The type I method uses a tree structure to decompose the 0-Order Kansei concept into n-th order sub-concepts until these sub-concepts can be mapped omto physical design elements without difficulty. The success of this method not only depends on the understanding of users’ Kansei, but also the decomposition of design elements that form the product. For example, a speedometer was decomposed into meter layouts, meter types and numbers, panel colors, materials, and so forth, to match users’ Kansei, and the contribution of each design element to specific Kansei was identified by the partial correlation coefficients based on subjective evaluation with semantic differential scales (Jindo & Hirasago, 1997). The type II method uses expert systems to automatically map Kansei sub-concepts to physical design elements by constructing a Kansei database, which allows the designers to understand the users’ Kansei better. The type III method uses hybrid mapping, i.e., forward mapping from Kansei to design elements and backward mapping from design elements to Kansei. The backward mapping starts from the designers and the mapping relationships can then be revised and validated by user evaluation. For example, Zhou et al. (2010) used both K-optimal rule discovery for forward Kansei mapping (from Kansei to design elements) and ordinal logistic regression for backward Kansei mapping (from design elements to Kansei) to support truck cab interior design. Other methods were also proposed in order to deal with the issues involved in the previous three types, such as uncertainty of users’ Kansei, product element presentation (e.g., virtual reality), and the effectiveness of expert systems (Marghani et al., 2013). For example, a deep learning method based on short-term memory was used to extract users’ Kansei from online product reviews, which improved the efficiency and effectiveness of understanding users’ Kansei and reduced uncertainty involved in users’ Kansei (Wang et al., 2019). 2.3.4 Affective Computing Picard (1997) coined the term affective computing in 1997. Affective computing aims to design and develop systems that can recognize, interpret, respond to, and simulate human emotions. This is consistent with the view that emotional intelligence is one of the basic components of intelligence (Goleman, 1995). There are two major areas of research in affective computing, including (1) recognizing and responding to user emotions, i.e., affect sensing; and (2) simulating emotions in machines, affect generation, in order to enrich and facilitate interactivity between humans and machines. First, affect sensing refers to a system that can recognize emotion by collecting data through sensors and building algorithms to recognize emotion patterns (Picard, 1997), based on Ekman’s discrete emotion model (Ekman, 1992) or Russell’s dimensional emotion model (Russell, 2003). According to the component process model of emotion (Scherer, 2005), many researchers used psychophysiological signals (e.g., galvanic skin response (GSR), electroencephalogram (EEG), heart rate), facial and vocal expressions, and/or gestures to recognize emotions. For example, GSR, facial electromyography (EMG), and EEG were used to predict emotions using a machine-learning technique named rough set to recognize seven discrete emotions (Zhou, Qu, et al., 2011, 2014). Recently, deep learning models
HUMAN FACTORS FUNDAMENTALS
were also used to recognize emotions, such as bi-level convolutional neural networks for fine-grained emotion recognition using Russell’s dimensional emotion model (Zhou, Kong, et al., 2020). By recognizing and monitoring users’ emotions, the system can respond to the users to improve learning in education (Wu, Huang, & Hwang, 2016), communications for autistic children (Messinger et al., 2015), and video gaming (Guthier, Dorner, & Martinez, 2016), to name but a few. Second, many researchers simulate human emotions in social robots and virtual agents to optimize the human–robot/agent interaction. The capabilities of recognizing and expressing emotions assign characteristics to social robots and virtual agents, which can form impressions during social interactions, especially when the non-human entities are human-like, i.e., anthropomorphism (Eyssel & Kuchenbrandt, 2012). These social robots and agents can be widely applied in offices, hotels, education, personal assistants, avatars, entertainment, nursing care, therapy, and rehabilitation (Breazeal, 2011; Dautenhahn, 2002; Thalmann, Yumak, & Beck, 2014). For example, a previous study showed that social robots were used as tutors or peer learners, which achieved similar cognitive and affective outcomes compared to human tutors (Belpaeme et al., 2018).
2.3.5 Emotional and Cognitive Design for Mass Personalization Mass personalization is the strategy of producing goods and services to meet individual customers’ latent needs and the surplus is positive both for customers and producers, considering both the values and costs associated (Kumar, 2007; Zhou et al., 2013). Note this is different from mass customization, which aims to customize products and services for individual customers at a mass-production price (Tseng & Jiao, 2001). The major differences are: (1) mass personalization is fulfilled at the personal level, i.e., market-of-one with customer co-creation (e.g., Netflix movie recommendation), while mass customization is fulfilled for a certain market segment, i.e., market-of-few, with customer configuration (e.g., Apple computer configuration); (2) mass personalization emphasizes high-level non-instrumental needs, including cognitive needs and emotional needs with values outperforming costs, while mass customization focuses on functional needs with near mass production efficiency; and (3) mass personalization is usually producer-initiated to delight customers with a surprise while mass customization is mostly user-initiated within the configuration defined by the producer. Furthermore, mass personalization is not personalization per se, but personalization with affordable fulfillment costs for both customers and producers (Kumar, 2007). Many of the personalization techniques are now based on Big Data analytics and artificial intelligence and once the algorithms are developed, the costs associated with them tend to be minimal to provide personalized, satisfactory services for the majority of users (Alkurd, Abualhaol, & Yanikomeroglu, 2020). What mass personalization emphasizes is latent customer needs that users might not be aware of (Zhou et al., 2015), mainly including affective and cognitive needs according to their profiles, behavioral patterns, affective and cognitive states, aesthetics preferences, and so on (Zhou et al., 2013). We have explained affective needs above. Cognitive needs are those non-functional requirements of how products and systems are designed to accommodate human cognitive limitations (Zhou, Xu, & Jiao, 2011), which are similar to what behavioral design addresses in Norman’s emotional design (Norman, 2004). Under the framework of mass personalization, we aim to integrate both affective and cognitive needs to create a positive user experience throughout the product life cycle.
EMOTIONAL DESIGN
241
2.3.6 Summary We summarize the multiple models and methods related to emotional design in Table 1. Emotional design and the four-pleasure framework are deeply rooted in human-centered design and go beyond it to include fun and pleasure. Other methods include sustainable design, participatory design, and even universal design. Thus, both can make full use of many qualitative methods in human-centered design, which are useful to incorporate affective needs in the design process. However, they do suffer some limitations, including: (1) research quality can depend heavily on the researchers’ skills with subjectivity; (2) data analysis can be time-consuming; (3) the results can be difficult to verify; and (4) there is no straightforward mapping between design elements and affective needs (Anderson, 2010; Zhou et al., 2010). Kansei engineering has been widely applied in Japan with successes in different areas, such as the automotive industry, cosmetics, and clothing (Nagamachi & Lokman, 2016). However, the subjectivity, uncertainty, and cultural barriers associated with Kansei have restricted its applications to other countries. In addition, the designed product is often the result of the averaged Kansei of the sampled participants with perceptual preferences although other presentation methods have been proposed, such as virtual reality (Marghani et al., 2013). The data collection process is often time-consuming with the active participation of customers and researchers (Wang et al., 2019). Affective computing mainly uses machine learning and artificial intelligence techniques for the machine to recognize and simulate emotions. With the development of deep learning techniques, more sophisticated and successful models have been built (e.g., Zhou, Kong, et al., 2020). However, the models trained on a specific dataset can be unfair for those who are less representative (e.g., black people) in the dataset (Lohr, 2018). The privacy and moral issues associated with giving the machine the capabilities to monitor and intervene in users’ emotional states are still under debate (Daily et al., 2017). Mass personalization can rework many methods in engineering design for all the steps involved and many user research methods in human-centered design can Table 1
also be adopted, especially for affective-cognitive needs elicitation. Mass personalization is based on mass customization and incorporates affective and cognitive needs from HFE and thus is complementary to mass customization. However, mass personalization tends to be applicable to the “soft” characteristics of the product that are changeable and adaptable to be personalized for individual customers, such as those that create the experience of drinking coffee in a certain store though it is built on the “hard” components of the product that can be configurable (i.e., mass customization), such as the coffee cups, beans, and other ingredients (Zhou et al., 2013).
3 A SYSTEMATIC PROCESS FOR EMOTIONAL DESIGN By examining the advantages and disadvantages of different models and methods related to emotional design, we propose a three-step systematic process based on mass personalization and human-centered design to transform customers’ affective and cognitive needs from the customer domain into design elements in the designer domain, including affective-cognitive needs elicitation, affective-cognitive needs analysis, and affectivecognitive needs fulfillment (Zhou et al., 2013), as shown in Figure 2. The first step aims to elicit the affective and cognitive needs of customers systematically, and many user research methods can be applied in this stage. One of the key issues is how to measure affect and cognition constructs involved in affective and cognitive needs. At the same time, this stage also identifies the involved stakeholders (e.g., customers and manufacturers), goals, use cases, and constraints. The second step aims to understand affective and cognitive needs and transform them into explicit requirements for engineers and marketers. Formal representations should be used to synthesize the needs from the first step, concepts should be generated and selected based on the priorities of customer needs. The final step aims to identify the mapping relationships between the requirements and product specifications with an iterative process of prototype testing. This three-step process itself
Summary of Different Models Related to Emotional Design
Model
Focus
Major methods
Advantages
Limitations
Emotional design
Visceral, behavioral, reflective
Human-centered design methods, industrial design
Solid theory support in psychology
Four pleasures
Physiological, social, psychological, ideological pleasures
Kansei engineering
Translate Kansei into product elements
Human-centered design methods (user research), pleasure analysis Subjective evaluation, expert systems
A framework of four pleasures and applicable qualitative research methods Widely applied in Japan with success
No straightforward mapping from three levels to specific product parameters Similar to the limitations of emotional design
Affective computing
Emotion recognition, Emotion generation
Machine learning, artificial intelligence
Development of deep learning models
Mass personalization
Personalized products for individuals, affective and cognitive needs
Engineering design methods, machine learning models, human-centered design method
Solid support in engineering design and human factors engineering
Uncertainty of Kansei, Averaged Kansei for sampled participants Cultural barriers to be applicable in other countries Heavily dependent on models trained on a specific dataset, privacy and moral issues Only applicable for certain products and services May need Big Data for personalization
242
HUMAN FACTORS FUNDAMENTALS
Affective-cognitive needs elicitation
• User research methods • Affect and cognition measurement • Stakeholders, goals, constraints identification
• Formal representation and synthesis of needs • Idea generation and selection
Affective-cognitive needs fulfillment
Figure 2
Affective-cognitive needs analysis
• Expert systems and machine learning models • Engineering design methods
The proposed three-step process for emotional design.
should also be iterative to refine the product. At the same time, many machine-learning models and expert systems involved in affective computing and Kansei engineering can also be used to support emotional design. The related work is reviewed below. 3.1 Affective-Cognitive Needs Elicitation and Measurement 3.1.1 User Research for Needs Elicitation Many user research methods in human-centered design have been proposed for affective and cognitive needs elicitation (Baxter et al., 2015). For example, in order to understand the cognitive needs of healthcare workers when designing medical software, Johnson and Turley (2006) used a think-aloud protocol. A diary study (with introductory, mid-study, and final interview with each participant, spaced seven days apart) was conducted over two weeks to understand the informational needs of mobile phones (Sohn et al., 2008). Observation was used in public transportation, such as trains, in order to understand user needs to support non-driving-related tasks in automated vehicles (Pfleging, Rang, & Broy, 2016). Contextual inquiry was used to gain a deeper understanding of how drivers interact with vehicles’ infotainment systems to create a positive driver-vehicle interactive experience (Gellatly et al., 2010). For more examples, refer to (Baxter et al., 2015). For affective needs elicitation, Ng and Khong (2014) reviewed various methods for affective human-centered design for video games and proposed two types of methods, including user-feedback methods (e.g., focus group, survey, interviews, usability testing methods) and non-intrusive methods (e.g, observation of facial and vocal expressions, physiological sensors). Then, Ng, Khong, and Nathan (2018) undertook interviews as a user-feedback method and observation as a non-intrusive method to affective video game design, where the interviews were used to measure subjective feelings while observation was used to understand participants’ emotional responses during their game playing. Many researchers in Kansei engineering undertook surveys, questionnaires, and focus groups to collect Kansei from users (W. Wang et al., 2019). For example, Quan, Li, and Hu (2018) used questionnaires to elicit Kansei by reviewing clothes images from both designers and consumers.
Kwong, Jiang, and Luo (2016) used conjoint and lead user surveys to understand the customer Kansei of electric irons. Akay and Kurt (2009) interviewed users and surveyed magazines to understand the customer Kansei of mobile phones. The sample sizes in these Kansei studies were relatively small (90∘
Relative discomfort score 1 3 5 2 7 2 7
Source: Kee & Karwowski, 2001. © 2001 Elsevier.
• max discomfort is the maximum discomfort in the kth subject • min discomfort is the minimum discomfort in the kth subject • Normalized discomfort at the ith level of the jth joint motion in the kth subject. The following equation was used to calculate the postural load index for joint motions deviated from their neutral positions in a given posture (Kee & Karwowski, 2001).
× 100,
i is the ith level of motion j is the jth joint motion k is the kth subject raw data is the discomfort at the ith level of the jth joint motion in the kth subject
Standing posture
Relative discomfort score
Postural load index =
where • • • •
Postural Classification Scheme for the Wrist
n mj ∑ ∑
Sij ,
j=1 i=1
where • • • • •
i is the ith joint motion j is the jth joint n is the number of joints involved m is the number of joint motions studied in the jth joint Si,j is the relative discomfort score of the ith joint motion in the jth joint
BASIC BIOMECHANICS AND WORKPLACE DESIGN Table 15
339
Postural Classification Scheme for the Shoulder
Posture and discomfort score\Joint motions Flexion
Extension
Adduction
Abduction
Medial rotation
Lateral rotation
Sitting posture
Class 0–45∘ 45–90∘ 90–150 >150∘ 0–20∘ 20–45∘ 45–60∘ >60∘ 0–10∘ 10–30∘ >30∘ 0–30∘ 30–90∘ >90∘ 0–30∘ 30–90∘ >90∘ 0–10∘ 10–30∘ >30∘
Relative discomfort score 1 3 6 11 1 4 9 13 1 2 8 1 3 10 1 2 7 1 3 7
Standing posture
Class 0–45∘ 45–90∘ 90–150∘ >150∘ 0–20∘ 20–45∘ 45–60∘ >60∘ 0–10∘ 10–30 >30∘ 0–30∘ 30–90∘ >90 0–30∘ 30–90∘ >90∘ 0–10∘ 10–30∘ >30∘
Relative discomfort score 1 3 6 11 1 3 6 10 1 2 8 1 3 7 1 2 5 1 2 5
Table 17
Postural Classification Scheme for the Back Sitting posture
Posture and discomfort score\Joint motions Flexion
Class
Relative discomfort score
0–20 = 20–60∘ >60∘
1 3 10
a
Extension
Lateral bending
Rotation
0–10∘ 10–20∘ 20–30∘ >30∘ 0–20 = 20–30∘ 30–45∘ >45∘
a
1 3 9 13 1 2 7 11
Standing posture
Class 0–30∘ 30–60∘ 60–90∘ >90∘ 0–10∘ 10–20 20–30∘ >30∘ 0–10∘ 10–20∘ 20–30∘ >30∘ 0–20∘ 20–60∘ >60∘
Table 18
Posture Category and MHT
Comfortable postures (MHT < 10 min)
The postural load index was correlated with the maximum holding times (MHT) proposed by Miedema et al. (1997) leading to the classification of working postures into three main categories: (1) comfortable; (2) moderate; and (3) uncomfortable postures (Table 18).
Table 16
Moderate postures (5 min >MHT >10 min)
Postural Classification Scheme for the Neck
Posture and discomfort score\Joint motions Flexion
Extension
Lateral bending
Ulnar deviation
Sitting posture
Class 0–20∘ 20–45∘ >45∘ 0–30∘ 30–60∘ >60∘ 0–30∘ 30–45∘ >45∘ 0–30∘ 30–60∘ >60∘
Relative discomfort score 1 3 5 1 6 12 1 3 10 1 2 8’
Standing posture
Class 0–20∘ 20–45∘ >45∘ 0–30∘ 30–60∘ >60∘ 0–30∘ 30–45 >45∘ 0–30∘ 30–60∘ >60∘
Source: Kee & Karwowski, 2001. © 2001 Elsevier.
Relative discomfort score 1 3 5 1 4 9 1 2 7 1 2 8
1 3 6 12 1 4 8 15 1 4 9 13 1 3 10
Source: Kee & Karwowski, 2001. © 2001 Elsevier.
Posture categories
Source: Kee & Karwowski (2001). © 2001 Elsevier.
Relative discomfort score
Uncomfortable postures (MHT < 5 min)
MHT (min) 37.0 18.0 17.0 14.0 12.0 12.0 10.0 9.0 9.0 8.0 6.0 6.0 5.5 5.5 5.0 4.0 3.5 3.3 3.0
Source: Kee & Karwowski, 2001. © 2001 Elsevier.
Based on the postural load index and maximum holding times, the analyzed postures can be classified using the acceptability criterion with the following four corrective action for job redesign: • Category I: Postures with the MHT of more than 10 min, and postural load index of five or less. This category of
340
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
postures is acceptable, except in special situations such as repeating and sustaining them for long periods. No corrective actions are needed. • Category II: Postures with the MHT of 5–10 min, and postural load index from 5–10. This category of postures requires further investigation and corrective changes during the next regular check, but immediate intervention is not needed. • Category III: Postures with the MHT of five or fewer minutes, and postural load index from 10–15. This category of postures requires corrective action through redesigning workplaces or working methods soon. • Category IV: Postures with the MHT of less than 2 min, and postural load index of 15 or more. This category of postures requires immediate consideration and corrective action. 5.11.4 A Kinetic Model of the Upper Extremity The Center for Ergonomics at the University of Michigan has developed a kinetic model of the upper extremity that is intended to be used to assess hand-intensive tasks (Armstrong et al., 2009). This model consists of a link system that represents the joints of the hand and cone shapes are used to represent finger surfaces. The model estimates hand postures and finger movements. The model has been used to determine how workers
Figure 41
grasp objects in the workplace and assesses how much space will be required for the hand and the required tendon forces and hand strength necessary to perform a task. The model has recently been used to evaluate hose insertion tasks. Figure 41 illustrates the graphical nature of this model. 5.11.5 The Occupational Repetitive Actions (OCRA) Method Occhipinti and Colombini (2001, 2007) developed the Occupational Repetitive Actions (OCRA) method to assess exposure to the risks factors of musculoskeletal disorders associated with repetitive motions of the upper limbs. The OCRA Checklist (see Figure 42) can be used to develop a map of the risk of the presence of repetitive work. The OCRA index offers an analytical risk assessment for design or redesign of jobs with due consideration of the effects of work organization, job rotation, the relocation of diseased workers, and the effects of management plans to increase productivity. The original OCRA index was defined as the ratio between the total numbers of actions (activities) effectively performed during the shift (ATA) and the corresponding number of recommended actions (activities) (RTA). The RTA is calculated as follows: RTA =
∑ [CF × (Ff i × Fpi × Fai ) × Di ] × Fr × Fd
Upper extremity biomechanical model used for ergonomics assessments. (Source: Courtesy of T. Armstrong).
Figure 42
Duration multipliere
Recovery multiplier
Additional factors
341
Posture
Force
Frequency
BASIC BIOMECHANICS AND WORKPLACE DESIGN
OCRA Checklist
The calculation procedure for the OCRA Checklist. (Source: Colombini et al., 2011. © 2013 Ergonomiesite.be.)
and predictive models of the OCRA method in the risk assessment of work-related musculoskeletal disorders of the upper limbs (Table 20). The OCRA Checklist has five parts, each devoted to the analysis of a different risk factor. These risk factors are divided into four main risk factors: lack of recovery time; movement frequency; force; and awkward postures with stereotyped movements. The additional risk factors include vibration transmitted to the hand-arm system, ambient temperatures below 0∘ C, precision work, kickback, use of inadequate gloves, etc. In addition to these factors, the final risk estimate also takes into account the net duration of the exposure to repetitive work. For example, Table 21 shows the score values according to the percentage of exposure time for each awkward posture and/or movement. According to (Occhipinti & Colombini, 2007) and Colombini, Occhipinti, and Álvarez-Casado (2013). a forecasting model based on known OCRA Index values can be used to estimate the possible occurrence of upper limbs work-related disorders.
where: • n = number of task(s) featuring repetitive movements of the upper limbs performed during the shift • CF = reference frequency (constant) of technical actions per minute (set at 30 action per minute) • Ffi, Fpi, Fai = multiplier factors, with scores ranging between 0 and 1, selected according to the behavior of the “force” (Ff), “posture” (Fp), and “additional elements” (Fa) risk factors, in each (i) of the (n) tasks • Di = duration of each (i) repetitive task in minutes • Fr = multiplier factor, with scores ranging between 0 and 1, selected according to the behavior of the “lack of recovery period” risk factor, during the entire shift • Fd = multiplier factor, with scores ranging between 0.5 and 2, selected according to the daily duration of tasks with repetitive upper limb movements (Occhipinti & Colombini, 2001)
5.11.6 Timing Assessment Computerized Strategy (TACOs) Recently, Colombini and Occhipinti (2018) proposed the Timing Assessment Computerized Strategy (TACOs) method, which enables an assessment of risk starting from work organizational study to identify the various tasks making up the job. TACOs offers criteria for calculating scores, including highly complex scenarios where individuals can perform multiple
The action frequency constant (CF) has been set at 30 actions per minute, indicating the optimal working conditions (Occhipinti & Colombini, 2001). CF is then being reduced by using appropriate risk factors. Table 19 summarizes the multiplier risk factors that account for force, posture, and some additional occupational factors related to lack of recovery periods, and daily duration of repetitive tasks. Colombini, Occhipinti, and Álvarez-Casado (2013) reported the updated reference values
Table 19 OCRA Elements for Determining the Multiplier Factors for Force (Ff), Posture (Fp), Additional Factors (Fa), Recovery Periods (Fr) and Overall Duration of Repetitive Tasks (Fd) R.P.E. (CR-10 Borg scale) Mean effort (% of MCV) Multiplier Factor (Ff) Force score
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5
10
15
20
25
30
35
40
45
50
1
0.85
0.75
0.65
0.55
0.45
0.35
0.2
0.1
0.01
Postural involvement score Multiplier Factor (Fp)
0–3
4–7
8–11
12–15
16
1
0.70
0.60
0.50
0.3
Additional Factors score
0
4
8
12
Multiplier Factor (Fa)
1
0.95
0.90
0.80
Number of hours without adequate recovery
0
1
2
3
4
5
6
7
8
Multiplier Factor (Fr)
1
0.90
0.80
0.70
0.60
0.45
0.25
0.10
0
Overall duration (in minutes) of repetitive tasks during shift Multiplier Factor (Fd)
480
2
1.5
1
0.5
342
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Table 20
OCRA Assessment Scores for Awkward Shoulder, Elbow, Wrist, and Hand Postures
Time in awkward posture
Score
Shoulder The arms are kept at about shoulder height, without support, (or in other extreme postures) for 10%–24% of the time 25%–50% of the time 51%–80% of the time more than 80% of the time
2 6 12 24
Elbow The elbow executes sudden movements (wide flexion-extension or prono-supination, jerking movements, striking movements) for 25%–50% of the time 51%–80% of the time more than 80% of the time
2 4 8
Wrist The wrist must bent in an extreme position, or must keep awkward postures (such as wide flexion/extension, or wide lateral deviation) for 25%–50% of the time 51%–80% of the time more than 80% of the time
2 4 8
Hand The hand take objects or tools in pinch, hook grip, pinch or other different kinds of grasp for 25%–50% of the time 51%–80% of the time more than 80% of the time
2 4 8
Table 21 Classification Criteria (According to Exposure Level) of the Final OCRA Index and OCRA Checklist scores and the Corresponding Expected Prevalence (%) of Workers with Upper-Limb WMSDs OCRA Checklist 22.6
OCRA Index 9.1
Level
Risk
Green Yellow Light red Dark red Purple
Acceptable risk Very low risk Medium-low risk Medium risk High risk
tasks over cycles lasting longer than one day. The method identifies the postures that need to be analyzed within each task and defines the scores for these postures depending on their duration, using time-integration principles and calculating final risk scores The recommended analytical process (Tasso, 2018) is as follows: 1. Perform an organizational study in order to identify the various tasks making up the job, and to assess their duration in the work cycle. 2. Within each task identified, assess the intrinsic risk scores both with regards to complete biomechanical overload of the upper limbs (OCRA) or the spine, if manual lifting is involved (NIOSH) and to the postural risk scores for the spine (without manual lifting) and the lower limbs.
Predicted worker population with WMSDs 21.5
3. Integrate the data for each job to calculate the final exposure index based on percentages assigned to each task using specific mathematical models (Tasso, 2018). It should be noted that TACOs method considers body parts that were not incorporated by the OCRA method and the Revised NIOSH Lifting Equation, including lower limbs; low back for standing postures; low back for sitting postures; sitting postures with pedal; neck-head postures; and complex and integrated postures, for specific tasks, i.e., working on a ladder or carrying on the head. 5.11.7 Revised Strain Index (RSI) Garg et al. (2017a) developed a Revised Strain Index (RSI) for a distal upper extremity (DUE) physical exposure assessment. The original Strain Index (SI) proposed by Moore and Garg (1995)
BASIC BIOMECHANICS AND WORKPLACE DESIGN
and the Revised Strain Index (RSI) (Garg et al., 2017b) have been designed to combine work-related risk factors of repetition, force, posture, duration of exertion and/or duration of exposure per day to evaluate the risk of distal upper extremity musculoskeletal disorders (DUE MSDs) for simple, mono-task jobs. The RSI aims to improve the 1995 Strain Index (SI) by using continuous rather than categorical multipliers and replacing the duty cycle with a duration per exertion (Garg et al., 2017b). Two main assumptions for designing RSI/SI are: (1) the constituent variables of repetition, force, posture, duration of exertion and/or duration of exposure per day do not change between different exertions during a task cycle; and (2) between tasks during a work shift, the worker does not rotate (Moore & Garg, 1995). The RSI score was defined as the product of five multipliers, including IM = Intensity of exertion (force) multiplier, EM = Exertions per minute (frequency) multiplier, DM = Duration per exertion multiplier, PM = Hand/wrist posture multiplier, and HM = Duration of task per day multiplier. The RSI score is calculated as follows: RSI = IM ⋅ EM ⋅ DM ⋅ PM ⋅ HM where:
{ 30.00 ⋅ I 3 − 15.60 ⋅ I 2 + 13.00 ⋅ I + 0.40, 0.0 < I ≤ 0.4 36.00 ⋅ I 3 − 33.30 ⋅ I 2 + 24.77 ⋅ I − 1.86, 0.4 < I ≤ 1.0 { 0.10 + 0.25 ⋅ E, E ≤ 90∕m EM = 0.00334 ⋅ E1.96 , E > 90∕m { 0.45 + 0.31 ⋅ D, D ≤ 60s DM = 19.17 ⋅ loge (D) − 59.44, D > 60s IM =
⎧1.2 ⋅ e(0.009⋅P) − 0.2, P = Degrees of wrist flexion ⎪ PM = ⎨1.0, P ≤ 30 Degrees of wrist extension ⎪1.0 + 0.00028 ⋅ (P − 30)2 , P > 30 Degrees of wrist extension ⎩ { 0.20, H ≤ 0.05 h HM = 0.042 ⋅ H + 0.090 ⋅ loge (H) + 0.477, H > 0.05 h
In the above formulas, the intensity of exertion (I) denotes the magnitude of muscular effort required to perform the task and is defined as the percentage of maximum strength (%MVC) required to perform the task once (% MVC expressed numerically from 0 to 1.0, or Borg CR-10 rating divided by 10.0). Efforts/min (E), defined as a direct application of force through the hand, usually occur with prehension (count of exertions divided by total observation time in minutes). Duration per exertion (D) is the average time (measured in seconds) that exertion is applied (seconds). Hand/wrist posture (P) refers to the anatomical position of the hand/wrist relative to anatomical neutral (degrees from anatomical neutral). Duration of a task (H) per day (hours) is the total time that a task is performed per day. The proposed RSI was designed such that a score of up to 10.0 is considered “safe,” and a score of >10.0 is considered “hazardous” (Garg et al., 2017b). 5.11.8 Composite Strain Index (COSI) and the Cumulative Strain Index (CUSI) The main objectives of the Composite Strain Index (COSI) and the Cumulative Strain Index (CUSI) are to quantify varying physical exposures to determine the risk of musculoskeletal symptoms and disorders, either from a multi-subtask task or from a multi-task job. While the above methods are conceptually similar to the Revised NIOSH Lifting Equation (Garg & Kapellusch, 2016), COSI and CUSI algorithms combine biomechanical stressors from multiple subtasks at the task level and multiple tasks at the job level to determine the risk of distal upper extremity disorders (Garg et al., 2017b). Since
343
the categorical variables and multipliers could reduce the practical utility of the prior COSI and CUSI algorithms, a task-level COSI to integrate stressors from multiple subtasks was proposed (Garg et al., 2017b). Furthermore, CUSI was also introduced to allow for the integration of biomechanical stressors from different tasks (COSIs) that are performed during a work shift into a single, job-level exposure index (Garg et al., 2017b). The main components and procedure for the application of both methods are illustrated in Table 22. 6 RISK OF MUSCULOSKELETAL DISORDERS ASSOCIATED WITH THE USE OF MOBILE DIGITAL TECHNOLOGY 6.1 Technostress The recent proliferation of digital mobile communication and computation technology, including the widespread use of laptops, tablets, and smartphones can lead to increased risk for musculoskeletal disorders due to exposure to such technology at work and in everyday life (Fares et al., 2017; Kim et al., 2012; Shah & Sheth, 2018; Sharan et al., 2012). Boonjing and Chanvarasuth (2017) coined the term “technostress” to define the stress resulting from the use of mobile information technology at work (see Figure 43). The results of their empirical study revealed that overusing mobile smartphones can lead to technostress and other problems in personal health and work-related issues. Shan et al. (2013) investigated the relationships between the low back pain (LBP) and neck/shoulder pain (NSP) concerning high school students’ physical activity, use of digital products, and psychological status. The self-reports of the use of personal computers (PC). mobile phones, and tablet computers, LBP and NSP, and the level of physical activity were used. The reported prevalence of NSP and LBP was affected by the use of digital products, students’ grades, and mental status by varying degrees. Specifically, the results showed that soreness after exercise, gender, tablet use, grade, PC using habits, sitting time after school and academic stress were related to NSP, while gender, grades, PC using habits, soreness after exercise, sitting time after school, use of mobile phone use, and academic stress were associated with LBP. Ospina-Mateus et al. (2017) postulated that prolonged and inappropriate use of electronic devices could cause musculoskeletal discomfort and health, especially in children. They have used the RULA method was used to analyze the postures of a 5-year-old girl while using a laptop computer, a tablet, and a smartphone. They concluded that while sitting on a desk and standing were the safest positions, the body’s neck-trunk area was at the highest risk of musculoskeletal disorders. Based on the experimental study on the impact of using computer products’ under different task categories that mitigate psychosocial stress on muscle activity. Taib et al. (2016) reported the lowest EMG activity for all studied muscles (upper trapezius, extensor digitorum, extensor carpi ulnaris, and anterior deltoid) during the use of a smartphone or tablet compared to desktop and laptop when used in a comfortable environment. 6.2 Digital Neck Syndrome The recent proliferation of smartphones and other mobile digital devices can contribute to musculoskeletal problems (Ning et al., 2015; Xie et al., 2017). For example, Howie et al., (2017). based on the examination of the head, trunk, and arm postures, upper trapezius muscle activity, and total body and upper limb physical activity of children during playing with tablets compared to watching TV and playing with non-screen toys, concluded that such using tablets could contribute to increased musculoskeletal risk. Using most mobile device tasks that require users to look downwards or to hold their arms out in front of them to read
344 Table 22
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Summary of the Composite Strain Index (COSI) and Cumulative Strain Index (CUSI)
The Composite Strain Index (COSI)
The Cumulative Strain Index (CUSI)
Purpose: Quantifies biomechanical stressors for complex tasks consisting of exertions at different force levels and/or with different exertion times. Application: For a job consisting of a single task: CUSI = COSI Procedure: Calculate Revised Strain Index (RSI) for each subtask and arrange them in a descending order:
Purpose: Integrates biomechanical stressors from different tasks to quantify exposure for the entire work shift. Application: For a job consisting of a single task: CUSI = COSI Procedure: Calculate COSI for each task and arrange them in descending order:
RSI1 ≥ RSI2 ≥ RSI3 ≥ … ≥ RSIn
COSI1 ≥ COSI2 ≥ COSI3 ≥ … ≥ COSIm
where n is the total number of subtasks in the task Calculate the COSI score. The COSI score is RSI1 (peak exposure subtask) plus an incremental increase in physical exposure, ΔRSI, as each subsequent subtask is added to the peak subtask: ∑ COSI = RSI1 + n2 ΔRSIi
where m is the total number of tasks performed in a work shift The CUSI score is COSI1 (i.e., COSI from peak exposure task) plus an incremental increase in physical exposure as each subsequent task is added to the peak task:
Calculate ΔRSI for subtasks 2 to n
Calculate ΔCOSI for tasks 2 to m
ΔRSIi = (FIRSIi × ΔEMi )
ΔCOSIk = (HICOSIk × ΔHMk )
FIRSIi = RSIi ÷ EMi
HICOSIk = COSIk ÷ HMk
ΔEMi = EM(∑i
1
) Ej
− EM(∑i−1 1
)
CUSI = COSI1 +
ΔHMk = HM(∑k
Ej
1
where: RSIi is the RSI for the ith subtask, and EMi is the efforts per minute (i.e., frequency) multiplier for subtask i, is the exertions per minute for subtask j. Conclusion: The COSI can be used to integrate biomechanical stressors from the multiple subtasks and summarize them at the task level.
∑m 2
) Hj
ΔCOSIk
− HM(∑i−k 1
) Hj
where ΔCOSIk is the hours independent COSI, e COSIk is the COSI for the kth task and HMk is the hours per day multiplier for task k, Hj is the hours per day for task j. Conclusion: The CUSI can be used to integrate biomechanical stressors from the multiple tasks and summarize them at the job level.
Source: Based on Garg et al., 2017a.
[Stress]
[Strain]
[Outcomes]
Health Issues - Physical problem - Psychological problem Overusing Mobile
Technostress Personal Work-related Issues - Job satisfaction - Job efficiency - Job effectiveness
Figure 43
Technostress. (Source: Boonjing & Chanvarasuth, 2017. © 2017 Elsevier.)
the screen can lead to fatigue and pain in the neck and shoulders (Fares et al., 2017). Kuo et al. (2019) reported that smartphone use increases head and neck flexion in different postures, including sitting with or without back support and standing. Furthermore, sitting without back support when texting showed the highest head and neck flexion. Berolo et al. (2011) performed a cross-sectional design, with an internet-based questionnaire, to collect self-reported measures of daily mobile hand-held device use and self-reported symptoms of pain in the upper extremity, upper back, and neck in students, staff, and faculty. The distribution of the reported musculoskeletal symptoms is shown in Table 23. It should be
noted that 68% and 62% of the participants reported pain in the neck and the upper back, respectively. The results also revealed significant associations between mobile hand-held devices and pain in the shoulders and neck. The total time spent using a mobile device on a typical day was also associated with pain reported in the left shoulder, the right shoulder, and the neck (Berolo et al., 2011). Based on the systematic literature review, Xie et al. (2017) concluded that holding a phone near the ear during phone calls for a long time is associated with a prolonged static posture. Texting and gaming were more correlated with repetitive movements, both of which involve sustained contraction of neck
BASIC BIOMECHANICS AND WORKPLACE DESIGN
345
Table 23
Distribution of Musculoskeletal Pain Symptoms (%).
Body part
None (0)
Slight (1e3)
Moderate (4e6)
Severe (7e10)
87.9 82.9 72.1 83.6 84.3 78.6
10.7 14.2 17.1 11.4 10.0 14.3
1.4 2.8 8.6 3.6 3.6 5.0
0.0 0.0 2.1 1.4 2.1 2.1
92.1 90.7 80.0 89.3 87.9 87.9
7.1 7.9 15.0 8.6 10.0 8.6
0.7 1.4 4.3 1.4 2.1 2.8
0.0 0.0 0.7 0.7 0.0 0.7
67.9 72.9
24.3 22.9
5.0 4.3
2.8 0.0
47.9 54.3 32.1 37.9
30.0 27.1 32.1 33.6
16.4 14.3 26.4 20.7
5.7 4.3 9.3 7.9
Right hand Tip of thumb Middle of thumb Base of thumb Fingers Front Back Left hand Tip of thumb Middle of thumb Base of thumb Fingers Front Back Elbows Right Left Shoulders Right Left Neck Upper back
Source: Berolo et al., 2011. © 2011 Elsevier. Note; Pain scale: (0 on pain scale). slight (1e3 on pain scale). moderate (4e6 on pain scale); severe (7e10 on pain scale).
and shoulder muscles to maintain the posture or control the movement. Fares et al. (2017) indicated that the effects of prolonged forward neck flexion could contribute to nearsightedness, eye strain, or dry eyes, as the eyes are forced to focus on an object placed nearby. They also suggested a link between forward-leaning postures observed when people engage in texting, studying, surfing the web, emailing, and playing video games, and hyperkyphosis, which can lead to pulmonary disease and cardiovascular problems. When looking at a smartphone or a tablet, it is harder to take a full breath since the ribs cannot move properly, which can lead to the diminished performance of the heart and lungs. Fares et al. (2017) also suggested that younger people, who are the most frequent users of smartphones and tablets, are at risk of disabilities and reduced life expectancy. Shah and Sheth (2018) assessed the self-reported addiction to smartphone use and correlated its use and musculoskeletal disorders of the neck and hand in young, healthy adults. For this purpose, the Neck Disability Index (NDI) assessment was used, which involves a 10-item, 50-point index questionnaire that assesses the effects of neck pain and symptoms during a range of functional activities. A higher NDI score indicates a more significant neck disability. According to Shah and Sheth (2018), NDI is the most widely used and validated instrument for assessing self-rated disability in patients with neck pain. As discussed by Xie et al. (2018), in contrast to unilateral texting, bilateral texting is associated with a significantly larger cervical flexion angle. This finding is likely attributed to variations in the upper limb postures between the two texting methods. There may be a natural tendency of the human body to reduce loads on the shoulder by keeping the arm closer to the trunk and the face during one-handed texting, leading to a more
erect neck posture. Furthermore, while the unilateral texting was associated with less neck flexion, it also resulted in more cervical asymmetry, with a significantly larger rotation angle to the right compared with bilateral texting. 6.3 Myofascial and Optometry Aspects of Neck Strain The frequent users of smartphones can suffer from myofascial pain and headache caused by spinal deviations when holding the phone (Koleva, Yoshinov, & Yoshinov, 2017). The myofascial pain syndrome (MPS) is defined as the muscle, sensory, motor, and autonomic nervous system symptoms caused by stimulation of myofascial trigger points (MTP) (Stecco et al., 2013). Another problem reported by the users of mobile digital devices is the so-called digital eye strain or digital vision syndrome that involves a group of ocular and non-ocular symptoms among the users of smartphones (Coles-Brennan et al., 2019). Ocular symptoms of digital eye strain include tearing, tired eyes, blurred vision, general fatigue, burning sensation, redness, and double vision. For non-ocular symptoms, these include stiff neck, general fatigue, headache, and backache. 6.4 Prevention of Neck Strain from Using Mobile Devices Based on the systematic literature review, Xie et al. (2017) concluded that a sustained flexed neck posture adopted by frequent mobile device users is probably one of the critical factors that can be used to explain the high prevalence rate of neck complaints. Therefore, correcting awkward neck postures while using mobile devices is an important strategy to reduce or prevent neck pain among users of mobile devices. However, Xie et al. (2017) also pointed out that more well-designed studies
346
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
are needed to identify what degree of neck flexion is critical for increasing the compressive loading on the cervical spine and what duration of holding such a flexion posture is a “safe” or critical threshold for not over-loading the neck structures when using mobile devices. Xie et al. (2018) also pointed out that due to different configurations of hand-held and desktop devices, ergonomic guidelines for computer use cannot be directly applied to smartphone use. Therefore, smartphone users are encouraged to take advantage of hand-held smartphones’ portability and mobility, keeping the spine in an erect and upright position by adjusting the height of the phone being held and altering postures periodically. Gustafsson et al. (2017) examined whether texting on a mobile phone is a risk factor for musculoskeletal disorders in the neck and upper extremities due to smartphone design features, such as the placement of the keys and the screen in the same plane. A longitudinal population-based cohort study of Swedish young adults, with data collected at baseline and after one and five years, revealed several musculoskeletal problems. For example, when using a smartphone for texting, it is difficult to reach a comfortable posture for the arms since most people hold the phone quite low in front of the belly, which requires significant flexing the neck to read the screen. To prevent the development of MSDs due to extensive texting on mobile phones, Gustafsson et al. (2017) recommended sharing the information about the risks of texting and adopting proper technique when texting on a mobile phone. Hwang et al. (2018) demonstrated that using the mobile phone without adequate chair support (armrest and back support) may elevate the risks of musculoskeletal disorders in neck and shoulder regions by increasing head/neck flexion, gravitational moment, and muscular demand of the neck and shoulder. Therefore, the authors also suggested that placing the phone at eye level with adequate chair support could significantly reduce the physical demands on the neck and shoulder. In order to lower the risk of musculoskeletal disorders due to the use of smartphones, Xie et al. (2018) recommended adoption of two-handed texting or variation between different text input methods such as entering text with the right thumb, the left thumb or with other fingers or texting by voice. Finally, Neupane et al. (2017) discussed the main problem of the text neck syndrome, such as muscle spasm, neck pain, stiffness, postural deformity (see Figure 44), and provided the following guidelines to prevent neck problems: • Warming up neck muscles every 30–40 min of using smartphone or I-readers by short exercises, i.e., head rotation or change in directions or posture (repeat 10 times).
Aligned Posture
• Stretching different muscles stretch and holding for 10–30 seconds (i.e., side neck stretch, levator scapula stretch, front neck stretch). • Retracting chin and scapula and holding for 20–30 seconds to strengthen the muscles of neck and head, stabilizing muscles to reduce neck pain and postural instability. • Talking more and texting less to make more genuine connections. • Resting: With most neck strains and sprains, going easy for a few days is needed while the muscles and tendons heal independently. It is essential to be careful to avoid strenuous activities or movements that are causing more pain. • Applying ice (as an anti-inflammatory) to reduce swelling and pain. • Getting a massage after applying ice or heat to soothe muscle tension and spasms, and reduce pain. • Assuming better postures to keep the body, head, and neck more aligned in a natural position, learning to sleep on the back with an ergonomically-friendly pillow and mattress. • Modifying lifestyle by limiting those activities that cause neck pain (reducing smartphone use for texting; holding the phone up closer to eye level to keep the neck more upright while texting).
7 APPLICATIONS OF ARTIFICIAL INTELLIGENCE (AI) TO ASSESS THE RISK OF WORK-RELATED MUSCULOSKELETAL DISORDERS The past two decades have witnessed rapid progress in applications of artificial intelligence, notably machine learning (ML) and deep learning (DL) methods in a wide range of engineering, science, business, healthcare, and medicine, including applications to decision support systems, diagnosis, classification, prediction, and monitoring (Bengio, Courville, & Vincent, 2013; LeCun, Bengio, & Hinton, 2015; Schmidhuber, 2015). Recently, there have also been some interesting applications of AI in the area of health and safety at work (Achunair & Patel, 2020; Moore, 2019). For example, Cheng et al. (2020) developed a work injury database and used AI, and machine learning: (1) to predict accurate cost of work injuries; (2) to predict the return-to-work (RTW) trajectory; and (3) to provide advice on appropriate medical care. The proposed Smart Work Injury Management (SWIM) system uses a two-stage process: (1) identifying human factors; and (2) using AI in the form of text mining, extracting, and
Head Forward Posture
A Tight Suboccipital muscles
B Overstretched Anterior neck muscles A Tight Sternocleidomastoid Pectoralis & Latissimus dorsi muscles
B Overstretched Posterior neck muscles
Common places to feel pain
Figure 44
Illustration of text neck syndrome. (Source: Neupane et al., 2017. © 2017 IJIR.)
BASIC BIOMECHANICS AND WORKPLACE DESIGN
selecting features, and developing a machine learning model. Also, Serna et al. (2019) discussed using AI-based virtual machines to identify errors of operations in the structure of the codes to reduce physical task repetition in the process of software testing. Some of the other recent efforts in applications of AI-based methods and techniques for the assessment of the risk of work-related musculoskeletal disorders are described below. More research in this area is needed to take advantage of the AI-based systems’ remarkable capabilities, which could be applied for workplace predictive
Domain group
347
modeling, surveillance, evaluation, as well as diagnosis and intervention planning. 7.1 Artificial Neural Networks The role of artificial intelligence in treating musculoskeletal disorders was recently discussed by Achunair and Patel (2020). For example, AI has proven to be effective in treating disorders like arthritis and spinal deformities (Kafri et al., 2020). Reed et al. (2020) applied explainable deep learning method to
Ergonomic indicator Trunk bending angle (degree) Waist rotation angle (degree)
Low 0°–15° 0°–15°
Criticality Category Medium 15°–30° 15°–40°
Arm height
Waist height
At shoulders
According to Rapid Upper Limb Assessment (RULA) and Rapid Entire Body Assessment (REBA) ergonomic assessment methods, the criticality of neck Neck bending posture has been divided with reference to the angle created or rotation angle (degree) between the neck and vertebral column either flexion or extension Bending higher than 20° with head rotation or neck extension results in high criticality
Upper Limb (UL)
Forearm rotation angle (degree) Wrist rotation with respect to the neutral posture up to the maximum torsion potential
Lower Limb (LL)
Figure 45
HIGH
0°–90°
>90°
>90° and crossed
0
0–50%
> 50%
According to RULA and Health Safety Executive guidelines (2014), if the wrist Wrist bending causes an ulnar or radial angle (degree) deviation (inward or outward rotation), the relative to the wrist score will be higher Knee bending angle (degree)
0°–30°
30°–60°
Leg position
Standing
One leg
Arm position for material withdrawal
Stereotypy, loads, typical actions (TA)
LOW MEDIUM
High >30° >45° Over shoulders
LOW
Without extending an arm LOW
MEDIUM
HIGH
Extending an arm
>60° On one or both knees Two hands needed
MEDIUM
HIGH
45°–90°
>90°
Trunk Rotation (degree) 0–45° Walked steps number Carried loads (kg)
0–4 10 >5
Fraction F of effective task duration with respect to working time Tn - in minutes - that is the shift time minus the break intervals.
F < 1/3 Tn
1/3 Tn < F < 2/3 Tn
F > 2/3 Tn
The ergonomic domains and criticality indicators. (Source: Savino et al., 2017. © 2017 Elsevier.)
348
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
evaluate complementary and integrative health treatments in patients with musculoskeletal disorders. They pointed out that the advantage of deep neural networks (DNN) approach is that it does not depend on a priori assumptions and limitations of traditional statistical models. Furthermore, DNN allows more complex, non-linear relationships to be represented between the large number of relevant factors associated with the utilization of complementary and integrative health (CIH) approaches for pain treatments among patients with musculoskeletal disorders. Al Kafri et al. (2020) described a framework detection of chronic lower back pain using artificial intelligence and computer graphics technologies and concluded that through machine-learning methods, diagnosing and treating various forms of musculoskeletal disorders can be done to enhance the patient’s condition. Hopkins et al. (2020) applied machinelearning algorithms to predict the presence of cervical spondylotic myelopathy (CSM) symptoms, a degenerative disease of the upper spine. A deep neural network classification model was successfully trained to diagnose and predict CSM severity with high accuracy. Recently, Sachdeva, Gupta, and Anand (2020) reported developing feed-forward back-propagation neural networks with fuzzified neurons for the assessment and forecasting of the potential risk of musculoskeletal disorders at work. The proposed system considers the effects of manual handling task-related variables, including frequency of lift, angle of twist, lifting height, and ambient conditions. 7.2 Fuzzy-Based Inference Systems An early application of the fuzzy-based methodology for evaluating the risk factors that can lead to the development of work-related musculoskeletal disorders was reported by Nunes (2009). FAST ERGO X, a fuzzy-based expert system, is an ergonomic analysis tool supporting the ergonomic auditing activities. In addition, the system is capable of generating recommendations for reducing the risk factors present in the given work sites under evaluation. The visual management and artificial intelligence system, integrated with a fuzzy rule-based method for evaluating full-body postures at work, was discussed by Savino, Battini, and Riccio (2017). The system utilizes ergonomic indicators of body stress based on postural (RULA) assessment, task repetition, and other task-related factors to derive the postural criticality factors for upper and lower limbs, and generates recommendations for relevant job redesign actions (see Figure 45). The system architecture is shown in Figure 46. Recently, Amiri and Khadivar (2020) developed a fuzzy expert system for diagnosing and treating musculoskeletal disorders that are capable of diagnosing seven
ERGONOMIC INDICATORS Upper limb
Lower limb
Stereotypy, loads, typical actions
Fuzzy Inference Engine
POSTURAL CRITICALITY INDICES Upper limb criticality index
Lower limb criticality index
Typical actions criticality index
Figure 46 A structure of the fuzzy-based model for assessment of postural indices. (Source: Savino et al., 2017. © 2017 Elsevier.)
disorders of the wrist. The clinical system validation revealed showed no significant difference in performance between systemic diagnosis conducted by the fuzzy expert system and elite medical diagnosis. 7.3 Other AI-Based Approaches An application of the K-nearest neighbor (KNN). A nonparametric algorithm, for workers’ classification according to their risk for the development of job-related musculoskeletal disorders was reported by Sanches et al. (2016). The experimental data used for model development and validation was based on the Sixth National Survey on Working Conditions (2007). which aimed to identify which workplace factors affect workers’ health and the extent to which workers are exposed. The survey also identified existing occupational health and safety management structures and to assess their activities according to the practical measures undertaken, and identify trends in working conditions in Spain. A total of 5917 male and 5137 female workers responded to a questionnaire consisting of 78 items. The self-reported musculoskeletal symptoms and/or disorders included back pain, slipped disc, upper limb pain, including the pain of the shoulder, arm, elbow and forearm (excluding wrist, hand and finger pain). wrist, hand and finger pain, lower limb pain, including the pain of the hip, thigh, knee, lower leg, ankle, and feet. The proposed method was able to identify workers who reported work-related musculoskeletal disorders over the last twelve months. The model also revealed that workplace design was the most relevant factor in predicting musculoskeletal disorders, including such factors as poor lighting, exposure to vibrations, use of uncomfortable chairs, working postures, work schedules, high mental demands, and psychosocial variables. It was pointed out that the application of the KNN technique performed better than the traditional statistical learning techniques in predicting the onset of musculoskeletal disorders with very high specificity. Ahn et al. (2018) used a Bayesian Network (BN) model to examine the relationships between job characteristics and work-related musculoskeletal disorders and assess the probability that an employee will develop a disorder given the specific working conditions. A conceptual model based on a BN as it is represented in shown in Figure 47. The developed model was validated by testing a broad set of hypotheses. The results indicated that the proposed model achieved a better diagnostic performance than other modeling approaches, including a support vector machine, a neural network, and decision tree approaches. 8 CONCLUSION This chapter has shown that biomechanics provides a means to quantitatively consider the implications of workplace design. Biomechanical design considerations are important when a particular job is suspected of imposing large or repetitive forces on the structures of the body. It is particularly important to recognize that the internal structures of the body, such as muscles, are the primary generators of force within the joint and tendon structures. In order to evaluate the risk of injury due to a particular task, one must consider the contribution of both the external loads and internal loads upon a structure and how they relate to the tolerance of the structure. Armed with an understanding of some general biomechanical concepts (presented in this chapter) and how they apply to different parts of the body (affected by work), one can logically reason through the design considerations and trade-offs so that musculoskeletal disorders are minimized due to the design of the work.
BASIC BIOMECHANICS AND WORKPLACE DESIGN
349
Plate A
OC
Plate B
WH
Plate C
WR
Plate D
RT
RFM1
RFM2
RFM3
RFM4
RFM5
Plate E
WMSD1
Figure 47
WMSD2
WMSD3
The BN-based conceptual model to assessing the risk of musculoskeletal disorders. (Source: Adapted from Ahn et al., 2018.)
REFERENCES Achunair, A., & Patel, V. (2020). The role of artificial intelligence in treating musculoskeletal disorders. Critical Reviews in Physical and Rehabilitation Medicine, 32(1). Adams, M. A., Freeman, B. J., Morrison, H. P., Nelson, I. W., & Dolan, P. (2000). Mechanical initiation of intervertebral disc degeneration. Spine, 25, 1625. Agrawal, S., Sisodia, D. S., & Nagwani, N. K. (2018). Neuro-fuzzy approach for reconstruction of 3-D spine model using 2-D spine images and human anatomy. In International Conference on Next Generation Computing Technologies (pp. 102–115). Ahn, G., Hur, S., & Jung, M.-C. (2018). Bayesian network model to diagnose WMSDs with working characteristics. International Journal of Occupational Safety and Ergonomics. Alizadeh, M., Knapik, G. G., Mageswaran, P., Mendel, E., Bourekas, E., & Marras, W. S. (2020). Biomechanical musculoskeletal models of the cervical spine: A systematic literature review. Clinical Biomechanics, 71, 115–124. Al Kafri, A. S., Sudirman, S., Hussain, A. J., Fergus, P., Al-Jumeily, D., Al-Jumaily, M., & Al-Askar, H. (2016). A framework on a computer assisted and systematic methodology for detection of chronic lower back pain using artificial intelligence and computer graphics technologies. In D.-S. Huang, V. Bevilacqua, & P. Premaratne (Eds.). Intelligent computing theories and application (pp. 843–854). Cham: Springer International Publishing. Amiri, F. M., & Khadivar, A. (2017). A fuzzy expert system for diagnosis and treatment of musculoskeletal disorders in wrist. Tehnicki Vjesnik/Technical Gazette, 24. Anderson, C. K., Chaffin, D. B., Herrin, G. D., & Matthews, L. S. (1985). A biomechanical model of the lumbosacral joint during lifting activities. Journal of Biomechanics, 18, 571. Andersson, B. J., Ortengren, R., Nachemson, A. L., Elfstrom, G., & Broman, H. (1975). The sitting posture: an electromyographic and discometric study. Orthopedic Clinics of North America, 6, 105. Andersson, G. B. (1997). The epidemiology of spinal disorders. In J. W. Frymoyer (Ed.), The adult spine: Principles and practice, Vol. 1, (pp. 93–141). Philadelphia, PA: Lippincott-Raven Publishers. Arjmand, N., & Shirazi-Adl, A. (2005). Biomechanics of changes in lumbar posture in static lifting. Spine, 30, 2637. Armstrong, T. J., Best, C., Bae, S., Choi, J., Grieshaber, D. C., Park, D., … & Zhou, W. (2009, July). Development of a kinematic
hand model for study and design of hose installation. In International Conference on Digital Human Modeling (pp. 85–94). Berlin: Springer. Astrand, P. O., & Rodahl, K. (1977). The textbook of work physiology, New York: McGraw-Hill. Azadeh, A., Fam, I. M., Khoshnoud, M., & Nikafrouz, M. (2008). Design and implementation of a fuzzy expert system for performance assessment of an integrated health, safety, environment (HSE) and ergonomics system: The case of a gas refinery. Information Sciences, 178(22), 4280–4300. Barbe, M. F., & Barr, A. E. (2006). Inflammation and the pathophysiology of work-related musculoskeletal disorders. Brain, Behavior, and Immunity, 20(5), 423–429. Barr, A. E., & Barbe, M. F. (2004). Inflammation reduces physiological tissue tolerance in the development of work-related musculoskeletal disorders. Journal of Electromyography and Kinesiology, 14(1), 77–85. Basmajian, J. V., & De Luca, C. J. (1985). Muscles alive: Their functions revealed by electromyography (5th ed.). Baltimore, MD: Williams and Wilkins. Battini, D., Botti, L., Mora, C., & Sgarbossa, F. (2018). Ergonomics and human factors in waste collection: Analysis and suggestions for the door-to-door method. IFAC-Papers On Line, 51(11), 838–843. Bazrgari, B., & Xia, T. (2017). Application of advanced biomechanical methods in studying low back pain–recent development in estimation of lower back loads and large-array surface electromyography and findings. Journal of Pain Research, 10, 1677. Bean, J. C., Chaffin, D. B., & Schultz, A. B. (1988). biomechanical model calculation of muscle contraction forces: a double linear programming method. Journal of Biomechanics, 21, 59. Beheshti, M. H. (2014). Evaluating the potential risk of musculoskeletal disorders among bakers according to LUBA and ACGIH-HAL indices. Journal of Occupational Health and Epidemiology, 3(2), 72–80. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions Pattern Analysis of Machine Intelligence, 35, 1798–1828. https://doi.org/ 10.1109/TPAMI.2013.50 Berolo, S., Wells, R. P., & Amick III, B. C. (2011). Musculoskeletal symptoms among mobile hand-held device users and their relationship to device use: A preliminary study in a Canadian university population. Applied Ergonomics, 42(2), 371–378.
350 Bettinger , P. C., Smutz, W. P., Linscheid, R. L., Cooney III, W. P., & An, K. N. (2000). Material properties of the trapezial and trapeziometacarpal ligaments. Journal of Hand Surgery America, 25, 1085. Beumer, A., van Hemert, W. L., Swierstra, B. A., Jasper, L. E., & Belkoff, S. M. (2003). A Biomechanical evaluation of the tibiofibular and tibiotalar ligaments of the ankle, Foot and Ankle International, 24, 426. Bono, C. M., et al., (2007). Residual Sagittal Motion after Lumbar Fusion: A Finite Element Analysis with Implications on radiographic flexion-extension criteria. Spine, 32, 417. Boonjing, V., & Chanvarasuth, P. (2017). Risk of overusing mobile phones: Technostress effect. Procedia Computer Science, 111, 196–202. Borghetti, B. J., Giametta, J. J., & Rusnock, C. F. (2017). Assessing continuous operator workload with a hybrid scaffolded neuroergonomic modeling approach. Human Factors, 59(1), 134–146. Bowden, A. E., Guerin, H. L, Villarraga, M. L., Patwardhan, A. G., & Ochoa, J. A. (2008). Quality of motion considerations in numerical analysis of motion restoring implants of the spine. Clinical Biomechanics (Bristol, Avon) , 23, 536. Brinkmann, P., Biggermann, M., & Hilweg, D. (1988). Fatigue fracture of human lumbar vertebrae. Clinical Biomechanics (Bristol, Avon), 3, S1. Brown, S. H., Howarth, S. J., & McGill, S. M. (2005). Spine stability and the role of many muscles. Archives of Physical Medicine and Rehabilitation, 86, 1890; author reply p. 1890. Bruno, A. G., Mokhtarzadeh, H., Allaire, B. T., Velie, K. R., De Paolis Kaluza, M. C., Anderson, D. E., & Bouxsein, M. L. (2017). Incorporation of CT-based measurements of trunk anatomy into subject-specific musculoskeletal models of the spine influences vertebral loading predictions. Journal of Orthopaedic Research, 35(10), 2164–2173. Busto Serrano, N. M., García Nieto, P. J., Suárez Sánchez, A., Sánchez Lasheras, F., & Riesgo Fernández, P. (2018). A hybrid algorithm for the assessment of the influence of risk factors in the development of upper limb musculoskeletal disorders. In F. J. de Cos Juez (Ed.). Hybrid artificial intelligent systems. HAIS 2018. (pp. 634–646). Lecture Notes in Computer Science, vol 10870. Cham: Springer. Butler, D. L., Kay, M. D., & Stouffer, D. C. (1986). Comparison of material properties in fascicle-bone units from human patellar tendon and knee ligaments. Journal of Biomechanics, 19, 425. Callaghan, J. P., & McGill, S. M. (2001). Intervertebral disc herniation: studies on a porcine model exposed to highly repetitive flexion/extension motion with compressive force. Clinical Biomechanics (Bristol, Avon) , 16, 28. Cambridge, E. D. J. (2020). Hip & spine mechanics : Understanding the linkage from several perspectives of injury mechanisms to rehabilitation using biomechanical modelling. Spine. Canonico, L. B., Flathmann, C., & McNeese, N. (2019). Collectively intelligent teams: Integrating team cognition, collective intelligence, and ai for future teaming. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 63(1), 1466–1470. Cavanaugh, J. M. (1995). Neural mechanisms of lumbar pain. Spine, 20, 1804. Cavanaugh, J. M., et al. (1997). Mechanisms of low back pain: A neurophysiologic and neuroanatomic stud. Clinical Orthopaedics, 335, 166. Ceriani, N. M., Zanchettin, A. M., Rocco, P., Stolt, A., & Robertsson, A. (2015). Reactive Task adaptation based on hierarchical constraints classification for safe industrial robots. IEEE/ASME Transactions on Mechatronics, 20(6), 2935–2949. Chaffin, D. B. (1969). A computerized biomechanical model: development of and use in studying gross body actions. Journal of Biomechanics, 2, 429. Chaffin, D. B. (1973). Localized muscle fatigue—definition and measurement. Journal of Occupational Medicine, 15, 346.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Chaffin, D. B., &Andersson, G. B. (1991). Occupational biomechanics. New York: Wiley. Chaffin, D. B., Andersson, G. B. J., & Martin, B. J. (2006). Occupational biomechanics (4th ed.). New York: Wiley. Chaffin, D. B., & Muzaffer, E. (1991). Three-dimensional biomechanical static strength prediction model sensitivity to postural and anthropometric inaccuracies. IIE Transactions, 23, 215. Cheng, A. S., Ng, P. H., Sin, Z. P., Lai, S. H., & Law, S. W. (2020). Smart Work Injury Management (SWIM) system: Artificial intelligence in work disability management. Journal of Occupational Rehabilitation, 1–8. Cholewicki, J., & McGill, S. M. (1994). EMG Assisted optimization: a hybrid approach for estimating muscle forces in an indeterminate biomechanical model. Journal of Biomechanics, 27, 1287. Cholewicki, J., Simons, A. P., & Radebold, A. (2000). Effects of external trunk loads on lumbar spine stability. Journal of Biomechanics, 33, 1377. Cholewicki, J., & VanVliet, I. J. (2002). Relative contribution of trunk muscles to the stability of the lumbar spine during isometric exertions. Clinical Biomechanics (Bristol, Avon) , 17, 99. Cholewicki, J., et al. (2005). Delayed trunk muscle reflex responses increase the risk of low Back injuries. Spine, 30, 2614. Chollet, M., Ochs, M., & Pelachaud, C. (2017). A methodology for the automatic extraction and generation of non-verbal signals sequences conveying interpersonal attitudes. IEEE Transactions on Affective Computing. Chouraqui, E., & Doniat, C. (2003). The s-ethos system: A methodology for systematic flight analysis centered on human factors. Applied Artificial Intelligence, 17(7), 583–629. https://doi.org/10 .1080/713827211 Clavert, P., Kempf, J. F., & Kahn, J. L. (2009). Biomechanics of open bankart and coracoid abutment procedures in a human cadaveric shoulder model. Journal of Shoulder and Elbow Surgery, 18, 69. Coles-Brennan, C., Sulley, A., & Young, G. (2019). Management of digital eye strain. Clinical and Experimental Optometry, 102(1), 18–29. Colombini, D., & Occhipinti, E. (Eds.). (2018). Working posture assessment: The TACOS (Time-Based Assessment Computerized Strategy) method. Boca Raton, FL: CRC Press. Colombini, D., Occhipinti, E., & Álvarez-Casado, E. (2013). The revised OCRA Checklist method. Barcelona, Spain: Editorial Factors Humans. Dash, R., McMurtrey, M., Rebman, C., & Kar, U. K. (2019). Application of artificial intelligence in automation of supply chain management. Journal of Strategic Innovation and Sustainability, 14(3). Davis, K. G., Hou, Y., Marras, W. S., Karwowski, W., Zurada, J. M., & Kotowski, S. E. (2009). Utilization of a hybrid neuro-fuzzy engine to predict trunk muscle activity for sagittal lifting. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 52(15), 1064–1067). Los Angeles, CA: SAGE. De Melo, C. M., Carnevale, P., Read, S., Antos, D., & Gratch, J. (2012). Bayesian model of the social effects of emotion in decision-making in multiagent systems. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 55–62). Di Nardo, F., Mengarelli, A., Maranesi, E., Burattini, L., & Fioretti, S. (2015). Gender differences in the myoelectric activity of lower limb muscles in young healthy subjects during walking. Biomedical Signal Processing and Control, 19, 14-22. Dooris, A. P., Goel, V. K., Grosland, N. M., Gilbertson, L. G., & Wilder, D. G. (2001). Load-sharing between anterior and posterior elements in a lumbar motion segment implanted with an artificial disc. Spine, 26, E122. Douglas, E. C., & Gallagher, K. M. (2017). The influence of a semi-reclined seated posture on head and neck kinematics and muscle activity while reading a tablet computer. Applied Ergonomics, 60, 342–347.
BASIC BIOMECHANICS AND WORKPLACE DESIGN Dul, J., Bruder, R., Buckle, P., Caryon, P., Falzon, P., Marras, W.S., Wilson, J.R., & van der Boelen, B. (2012) A strategy for human factors/ergonomics: Developing the discipline and profession. Ergonomics, 55(4), 377–395. Eitivipart, A. C., Viriyarojanakul, S., & Redhead, L. (2018). Musculoskeletal disorder and pain associated with smartphone use: A systematic review of biomechanical evidence. Hong Kong Physiotherapy Journal, 38(2), 77–90. Endsley, M. R. (2017). From here to autonomy: Lessons learned from human–automation research. Human Factors, 59(1), 5–27. Erlandson, R. F., & Fleming, D. G. (1974). Uncertainty sets associated with saccadic eye movements—basis of satisfaction control. Vision Research, 14, 481. Eskandari, A. H., Arjmand, N., Shirazi-Adl, A., & Farahmand, F. (2017). Subject-specific 2D/3D image registration and kinematics-driven musculoskeletal model of the spine. Journal of Biomechanics, 57, 18–26. Fares, J., Fares, M. Y., & Fares, Y. (2017). Musculoskeletal neck pain in children and adolescents: Risk factors and complications. Surgical Neurology International, 8. Ferguson, S. A., Allread, W. G., Burr, D. L., Heaney, C., & Marras, W. S. (2012). Biomechanical, psychosocial and individual risk factors predicting low back functional impairment among furniture distribution employees. Clinical Biomechanics, 27(2), 117–123. Ferguson, S. A., Marras, W. S., & Burr, D. (2005) Workplace design guidelines for asymptomatic vs. low back injured workers. Applied Ergonomics, 36(1), 85–95. Ferguson, S. A., Merryweather, A., Matthew, S., Thiese, M. S., Hegmann, K. T., Lu, M-L., M. Kapellusch, J. M., & Marras, W. S. (2019) Prevalence of low back pain, seeking medical care, and lost time due to low back pain among manual material handling workers in the United States. BMC Musculoskeletal Disorders, 20, 243. https://doi.org/10.1186/s12891-019-2594-0 Fremerey, R., Bastian, L., & Siebert, W. E. (2000). The coracoacromial ligament: anatomical and biomechanical properties with respect to age and rotator cuff disease, knee surgery, sports traumatology. Arthroscopy, 8, 309. Gaffar, A., & Monjezi, S. (2016). Using artificial intelligence to automatically customize modern car infotainment systems. Proceedings on the International Conference on Artificial Intelligence (ICAI), 151. Gallagher, S. & Marras, W.S. (2012). Tolerance of the lumbar spine to shear: a review and recommended exposure limits. Clinical Biomechanics, 27(10), 973–978. Gallagher, S., Marras, W. S., Litsky, A. S., & Burr, D. (2005). Torso flexion loads and the fatigue failure of human lumbosacral motion segments. Spine, 30, 2265. Garg, A., & Kapellusch, J. M. (2016). The cumulative lifting index (CULI) for the revised NIOSH lifting equation: Quantifying risk for workers with job rotation. Human Factors, 58(5), 683–694. Garg, A., Moore, J. S., & Kapellusch, J. M. (2017a). The Composite Strain Index (COSI) and Cumulative Strain Index (CUSI): Methodologies for quantifying biomechanical stressors for complex tasks and job rotation using the Revised Strain Index. Ergonomics, 60(8), 1033–1041. Garg, A., Moore, J. S., & Kapellusch, J. M. (2017b). The Revised Strain Index: an improved upper extremity exposure assessment model. Ergonomics, 60(7), 912–922. Gedalia, U., et al. (1999). Biomechanics of increased exposure to lumbar injury caused by cyclic loading. Part 2. Recovery of reflexive muscular stability with rest. Spine, 24, 2461. Ghezelbash, F., Shirazi-Adl, A., Arjmand, N., El-Ouaaid, Z., & Plamondon, A. (2016). Subject-specific biomechanics of trunk: musculoskeletal scaling, internal loads and intradiscal pressure estimation. Biomechanics and Modeling in Mechanobiology, 15(6), 1699–1712.
351 Gnanayutham, P. (2003). Artificial intelligence to enhance a brain computer interface. In: HCI International 2003 Proceedings. Human Factors and Ergonomics. Goel, V. K., & Gilbertson, L. G. (1995). Applications of the finite element method to thoracolumbar spinal research—past, present, and future. Spine, 20, 1719. Goel, V. K., Pope, & M. H. (1995). Biomechanics of fusion and stabilization. Spine, 20, 85S. Goodall, N. J. (2014). Ethical decision making during automated vehicle crashes. Transportation Research Record, 2424(1), 58–65. Granata, K. P., & England, S. A. (2006). Stability of dynamic trunk movement. Spine, 31, E271. Granata, K. P., & Marras, W. S. (1993). An EMG-assisted model of loads on the lumbar spine during asymmetric trunk extensions. Journal of Biomechanics, 26, 1429. Granata, K. P., & Marras, W. S. (1995a). The influence of trunk muscle coactivity on dynamic spinal loads. Spine, 20, 913. Granata, K. P., & Marras, W. S. (1995b). An EMG-assisted model of trunk loading during free-dynamic lifting. Journal of Biomechanics, 28(1), 309. Granata, K. P., & Marras, W. S. (2000). Cost-benefit of muscle cocontraction in protecting against spinal instability. Spine, 25, 1398. Granata, K. P., & Orishimo, K. F. (2001). Response of trunk muscle coactivation to changes in spinal stability. Journal of Biomechanics, 34, 1117. Grandjean, E. (1982). Fitting the task to the man: An ergonomic approach. London: Taylor & Francis. Guo, H. R. (1993). NIOSH Report. In American Occupational Health Conference. Guo, H. R., Tanaka, S., Halperin, W. E., & Cameron, L. L. (1999, July). Back pain prevalence in US industry and estimates of lost workdays. American Journal of Public Health, 89, 1029. Gustafsson, E., Thomée, S., Grimby-Ekman, A., & Hagberg, M. (2017). Texting on mobile phones and musculoskeletal disorders in young adults: A five-year cohort study. Applied Ergonomics, 58, 208–214. Hajihosseinali, M., Arjmand, N., & Shirazi-Adl, A. (2015). Effect of body weight on spinal loads in various activities: a personalized biomechanical modeling approach. Journal of Biomechanics, 48(2), 276–282. Heer, J. (2019). Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences, 116(6), 1844–1850. Hefron, R. G., Borghetti, B. J., Christensen, J. C., & Kabban, C. M. S. (2017). Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation. Pattern Recognition Letters, 94, 96–104. Hewitt, J. D., Glisson, R. R., Guilak, F., & Vail, T. P. (2002). The mechanical properties of the human hip capsule ligaments. Journal of Arthroplasty, 17, 82. Hewitt, J. D., Guilak, F., Glisson, R., & Vail, T. P. (2001). Regional material properties of the human hip joint capsule ligaments. Journal of Orthopaedic Research, 19, 359. Hignett, S., & McAtamney, L. (2000). Rapid entire body assessment (REBA). Applied Ergonomics, 31(2), 201–205. Hoefnagels, E. M., Waites, M. D., Wing, I. D., Belkoff, S. M., & Swierstra, B. A. (2007). Biomechanical comparison of the interosseous tibiofibular ligament and the anterior tibiofibular ligament. Foot and Ankle International, 28, 602. Hollbrook, T. L., Grazier, K., Kelsey, J. L., & Stauffer, R. N. (1984). The frequency of occurrence, impact and cost of selected musculoskeletal conditions in the United States (pp. 24–45). Chicago, IL: American Academy of Orthopaedic Surgeons. Hopkins, B.S., Weber, K.A., Kesavabhotla, K., Paliwal, M., Cantrell, D. R., & Smith, Z. A. (2019). Machine learning for the prediction of cervical spondylotic myelopathy: A post hoc pilot study of 28 participants. World Neurosurgery, 1(127), e436–e442.
352 Hou, Y., Zurada, J. M., Karwowski, W., & Marras, W. S. (2005). A hybrid neuro-fuzzy approach for spinal force evaluation in manual materials handling tasks. In International Conference on Natural Computation (pp. 1216–1225). Berlin: Springer. Hou, Y., Zurada, J. M., Karwowski, W., Marras, W. S., & Davis, K. (2007). Estimation of the dynamic spinal forces using a recurrent fuzzy neural network. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 37(1), 100–109. Hoy, D., March, L., Brooks, P., Blyth, F., Woolf, A., Bain, C., … & Murray, C. (2014). The global burden of low back pain: Estimates from the Global Burden of Disease 2010 study. Annals of the Rheumatic Diseases 73(6), 968–974. Hoy, M. G., Zajac, F. E., & Gordon, M. E. (1990). A musculoskeletal model of the human lower extremity: the effect of muscle, tendon, and moment arm on the moment-angle relationship of musculotendon actuators at the hip, knee, and ankle. Journal of Biomechanics, 23, 157. Howie, E. K., Coenen, P., Campbell, A. C., Ranelli, S., & Straker, L. M. (2017). Head, trunk and arm posture amplitude and variation, muscle activity, sedentariness and physical activity of 3 to 5 year-old children during tablet computer use compared to television watching and toy play. Applied Ergonomics, 65, 41–50. Hughes, R. E., & Chaffin, D. B. (1995). The effect of strict muscle stress limits on abdominal muscle force predictions for combined torsion and extension loadings. Journal of Biomechanics, 28, 527. Hwang, C.-L., & Yoon, K. (1981). Methods for multiple attribute decision making. In Multiple attribute decision making (pp. 58–191). Cham: Springer. Hwang, J., Knapik, G. G., Dufour, J. S., & Marras, W. S. (2017). Curved muscles in biomechanical models of the spine: A systematic literature review. Ergonomics, 60(4), 577–588. Hwang, J., Dufour, J.S., Knapik, G. G., Best, T. M., Khan S. N., Mendel, E., & Marras, W. S. (2016a). Prediction of magnetic resonance imaging-derived trunk muscle geometry with application to spine biomechanical modeling. Clinical Biomechanics, 37, 60–64. Hwang, J., Knapik, G. G., Dufour, J. S., Best, T. M., Khan, S. N., Mendel, E., & Marras, W. S. (2016b). A biologically-assisted curved muscle model of the lumbar spine: model validation. Clinical Biomechanics, 37, 153–159. Hwang, J., Syamala, K. R., Ailneni, R. C., & Kim, J. H. (2018). Effects of Chair Support on Biomechanical exposures on the neck during mobile phone use. proceedings of the human factors and ergonomics society. Annual Meeting, 62(1), 948–951. IEA. (2020). Human Factors/ergonomics (HF/E). https://iea.cc/what-isergonomics/ Israelsen, B. W., & Ahmed, N. R. (2019). “Dave … I can assure you … That it’s going to be all right … ” A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships. ACM Computing Surveys (CSUR), 51(6), 1–37. Jacquier-Bret, J., Gorce, P., Motti Lilian, G., & Vigouroux, N. (2017). Biomechanical analysis of upper limb during the use of touch screen: motion strategies identification. Ergonomics, 60(3), 358–365. Jafari, Z., Edrisi, M., & Marateb, H. R. (2014). An electromyographicdriven musculoskeletal torque model using neuro-fuzzy system identification: a case study. Journal of Medical Signals and Sensors, 4(4), 237. Jager, M., Luttmann, A., & Laurig, W. (1991). Lumbar load during one-hand bricklaying, International Journal of Industrial Ergonomics, 8, 261. Jahncke, H., Hygge, S., Mathiassen, S. E., Hallman, D., Mixter, S., & Lyskov, E. (2017). Variation at work: alternations between physically and mentally demanding tasks in blue-collar occupations. Ergonomics, 60(9), 1218–1227. Jie, L., Jian, C., & Lei, W. (2017). Design of multi-mode UAV human-computer interaction system. 2017 IEEE International Conference on Unmanned Systems (ICUS), pp. 353–357.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Johnston, J. D., Small, C. F., Bouxsein, M. L., & Pichora, D. R. (2004). Mechanical properties of the scapholunate ligament correlate with bone mineral density measurements of the hand. Journal of Orthopaedic Research, 22, 867. Jones, R. E. (2015). Artificial intelligence and human teams: Examining the role of fuzzy cognitive maps to support team decision-making in a crisis-management simulation. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 59(1), 190–194. Jung, H. J., Fisher, M. B., & Woo, S. L. (2009). Role of biomechanics in the understanding of normal, injured, and healing ligaments and tendons. Sports Medicine, Arthroscopy, Rehabilitation, Therapy and Technology, 1, 9. Kadir, B. A., Broberg, O., & da Conceição, C. S. (2019). Current research and future perspectives on human factors and ergonomics in Industry 4.0. Computers & Industrial Engineering, 137, 106004. Kanazawa, O. T. S. O. Y., & Georgescu, O. S. (2017). Application of artificial intelligence technology in product design. FUJITSU Science Technical Journal, 53(4), 43–51. Karwowski, W. (2005). Ergonomics and human factors: the paradigms for science, engineering, design, technology and management of human-compatible systems. Ergonomics, 48(5), 436–463. Karwowski, W., Gaweda, A., Marras, W. S., Davis, K., Zurada, J. M., & Rodrick, D. (2006). A fuzzy relational rule network modeling of electromyographical activity of trunk muscles in manual lifting based on trunk angels, moments, pelvic tilt and rotation angles. International Journal of Industrial Ergonomics, 36(10), 847–859. Kee, D., & Karwowski, W. (2001). LUBA: An assessment technique for postural loading on the upper body based on joint motion discomfort and maximum holding time. Applied Ergonomics, 32(4), 357–366. Khalsa, P. S. (2004, February). Biomechanics of musculoskeletal pain: dynamics of the neuromatrix. Journal of Electromyography and Kinesiology, 14, 109. Khan, A. (2013). Cognitive connected vehicle information system design requirement for safety: Role of Bayesian artificial intelligence. Journal of Systemics, Cybernetics and Informatics, 11(2), 54–59. Khan, A. (2017). Modelling human factors for advanced driving assistance system design. In N. A. Stanton, S. Landry, G. DiBucchianico, & A. Vallicelli (Eds.). Advances in human aspects of transportation (Vol. 484, pp. 3–14). Cham: Springer International. https://doi.org/10.1007/978-3-319-41682-3_1 Khan, A. (2018). Cognitive vehicle design guided by human factors and supported by Bayesian artificial intelligence. In N. A. Stanton (Ed.). Advances in human aspects of transportation (Vol. 597, pp. 362–372). Cham: Springer International. https://doi.org/10.1007/ 978-3-319-60441-1_36 Kim, G. Y., Ahn, C. S., Jeon, H. W., & Lee, C. R. (2012). Effects of the use of smartphones on pain and muscle fatigue in the upper extremity. Journal of Physical Therapy Science, 24(12), 1255–1258. Kim, J., Stuart-Buttle, C., & Marras, W. S. (1994). The effects of mats on back and leg fatigue. Applied Ergonomics, 25, 29. Kistan, T., Gardi, A., & Sabatini, R. (2018). Machine learning and cognitive ergonomics in air traffic management: Recent developments and considerations for certification. Aerospace, 5(4), 103. Koleva, I. B., Yoshinov, B. R., & Yoshinov, R. D. (2017). Physical therapy and manual therapy for prevention and rehabilitation of cervical myofascial pain and headache, due to spine malposition in users (abusers) of smart phones. Journal of Yoga and Physical Therapy, 7(268), 2. Knapik, G. G., & Marras, W. S. (2009). Spine loading at different lumbar levels during pushing and pulling. Ergonomics, 52, 60. Knapik, G. G., Mendel, E. & Marras, W. S. (2012). Use of a personalized hybrid biomechanical model to assess change in lumbar spine function with a TDR compared to an intact spine. European Spine Journal, 21(Suppl 5), S641–S652. Konz, S. A. (1983). Work design: Industrial ergonomics (2nd ed.). Columbus, OH: Grid Publishing.
BASIC BIOMECHANICS AND WORKPLACE DESIGN Kovacs, K., Splittstoesser, R., Maronitis, A., & Marras, W. S. (2002). grip force and muscle activity differences due to glove type, AIHA Journal (Fairfax, VA), 63, 269. Kroemer, K. H. E. (1987). Biomechanics of the human body. In G. Salvendy (Ed.), Handbook of human factors. New York: Wiley. Kuo, Y.-R., Fang, J.-J., Wu, C.-T., Lin, R.-M., Su, P.-F., & Lin, C.-L. (2019). Analysis of a customized cervical collar to improve neck posture during smartphone usage: A comparative study in healthy subjects. European Spine Journal, 28(8), 1793–1803. Kura, H., Luo, Z. P., Kitaoka, H. B., Smutz, W. P., & An, K. N. (2001). Mechanical Behavior of the lisfranc and dorsal cuneometatarsal ligaments: In vitro biomechanical study. Journal of Orthopaedic Trauma, 15, 107. Lang, J., Ochsmann, E., Kraus, T., & Lang, J. W. (2012). Psychosocial work stressors as antecedents of musculoskeletal problems: A systematic review and meta-analysis of stability-adjusted longitudinal studies. Social Science & Medicine, 75(7), 1163–1174. Lau, N., Fridman, L., Borghetti, B. J., & Lee, J. D. (2018). Machine learning and human factors: Status, applications, and future directions. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62(1), 135–138. Le, P., Aurand, A., Walter, B. A., Best, T. M., Khan, S. N., Mendel, E., & Marras, W. S. (2018). Development of a lumbar EMG-based coactivation index for the assessment of complex dynamic tasks. Ergonomics, 61(3), 381–389. Le, P., Dufour, J., Monat, H., Rose, J., Huber, Z., Alder, E., Umar, R., Hennessey, B., Dutt, M., & Marras, W. S. (2012) Association between spinal loads and the psychophysical determination of maximum acceptable force during pushing tasks. Ergonomics, 55(9), 1104–1114. Le, P., & Marras, W. S. (2016). Evaluating the low back biomechanics of three different office workstations: Seated, standing, and perching. Applied Ergonomics, 56, 170–178. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. https://doi.org/10.1038/nature14539 Lee, W., Karwowski, W., & Marras, W. S. (2000). in XIVth Triennial Congress of the International Ergonomics Association, 5, Human Factors and Ergonomics Society, San Diego, CA (pp. 276–279). Lee, W., Karwowski, W., Marras, W. S., & Rodrick., D. (2003). A neuro-fuzzy model for estimating electromyographical activity of trunk muscles due to manual lifting. Ergonomics, 46(1–3), 285–309. DOI: 10.1080/00140130303520. Lewis Jr, G. K., Langer, M. D., Henderson Jr, C. R., & Ortiz, R. (2013). Design and evaluation of a wearable self-applied therapeutic ultrasound device for chronic myofascial pain. Ultrasound in Medicine & Biology, 39(8), 1429–1439. Li, G., Pierce, J. E., & Herndon, J. H. (2006). A global optimization method for prediction of muscle forces of human musculoskeletal system. Journal of Biomechanics, 39, 522. Lugano, G. (2017). Virtual assistants and self-driving cars: To what extent is artificial intelligence needed in next-generation autonomous vehicles? In J. Rak, M. Berbineau, J. Marais, & A. Vinel (Eds.). 2017 15th International Conference on Its Telecommunications (ITST). IEEE. Luo, X., Pietrobon, R., S, X. S., Liu, G. G., & Hey, L. (2004). Estimates and patterns of direct health care expenditures among individuals with back pain in the United States. Spine, 29, 79. Makki, I., Alhalabi, W., & Adham, R. S. (2019). Using emotion analysis to define human factors of virtual reality wearables. Procedia Computer Science, 163, 154–164. Marras, W. S. (2008). The working back: A systems view. Hoboken, NJ: Wiley-Interscience. Marras, W. S. (2012). The complex spine: The multidimensional system of causal pathways for low-back disorders. Human Factors, 54(6), 881–889. Marras, W.S. (2015) The secret life of the spine. Mechanical Engineering, 137(10), 30–35.
353 Marras, W. S., Allread, W. G., Burr, D. L., & Fathallah, F. A. (2000). Prospective validation of a low-back disorder risk model and assessment of ergonomic interventions associated with manual materials handling tasks. Ergonomics, 43, 1866. Marras, W. S., Davis, K. G., Heaney, C. A., Maronitis, A. B., & Allread, W. G. (2000). The influence of psychosocial stress, gender, and personality on mechanical loading of the lumbar spine. Spine, 25(23), 3045–3054. Marras, W. S., Davis, K. G., & Splittstoesser, R. E. (2001). Spine loading during whole body free dynamic lifting, Final project report. Grant #:R01 OH03289. Columbus, OH: The Ohio State University. Marras, W. S., Fine, L. J., Ferguson, S. A., & Waters, T. R. (1999). The effectiveness of commonly used lifting assessment methods to identify industrial jobs associated with elevated risk of low-back disorders. Ergonomics, 42, 229. Marras, W. S., & Granata, K. P. (1995). A biomechanical assessment and model of axial twisting in the thoracolumbar spine. Spine, 20, 1440. Marras, W. S., & Granata, K. P. (1997a). Spine loading during trunk lateral bending motions. Journal of Biomechanics, 30, 697. Marras, W. S., & Granata, K. P. (1997b). the development of an EMG-assisted model to assess spine loading during whole-body free-dynamic lifting. Journal of Electromyography and Kinesiology, 7, 259. Marras, W.S. & Hancock, P. A. (2014) Putting mind and body back together: a human-systems approach to the integration of the physical and cognitive dimensions of task design and operations. Applied Ergonomics, 45(1), 55–60. Marras, W. S., Knapik, G. G., & Ferguson, S. (2009a). Loading along the lumbar spine as influenced by speed, control, load magnitude, and handle height during pushing, Clinical Biomechanics (Bristol, Avon) , 24, 155. Marras, W. S., Knapik, G. G., & Ferguson, S. (2009b). Lumbar spine forces during maneuvering of ceiling-based and floor-based patient transfer devices. Ergonomics, 52, 384. Marras, W. S., Knapik, G. G., & Gabriel, J. (2008). The development of a personalized hybrid emg-assisted/finite element biomechanical model to access surgical options. In J. J. Yue, R. Bertagnoli, P. C. McAfee, & H. S. An (Eds.), Motion preservation surgery of the spine: advanced techniques and controversies (pp. 687–694). Philadelphia, PA: Saunders. Marras, W. S., Lavender, S. A., Ferguson, S. A., Splittstoesser, R. E., & Yang, G. (2010). Quantitative dynamic measures of physical exposure predict low back functional impairment. Spine (Phila Pa 1976) , 35, 914. Marras, W. S., & Mirka, G. A. (1993). electromyographic studies of the lumbar trunk musculature during the generation of low-level trunk acceleration. Journal of Orthopaedic Research, 11, 811. Marras, W. S., & Reilly, C. H. (1988). Networks of internal trunk-loading activities under controlled trunk-motion conditions. Spine, 13, 661. Marras, W. S., & Schoenmarklin, R. W. (1993). Wrist motions in industry. Ergonomics, 36, 341. Marras, W. S., & Sommerich, C. M. (1991a). A three-dimensional motion model of loads on the lumbar spine: I. Model structure, Human Factors, 33, 123. Marras, W. S., & Sommerich, C. M. (1991b). A three-dimensional motion model of loads on the lumbar spine: II. Model validation. Human Factors, 33(2), 139–149. Marras, W.S., Walter, B. A., Purmessur, D., Mageswaran, P., & Wiet, M. G. (2016). The contribution of biomechanical-biological interactions of the spine to low back pain. Human Factors, 58(7), 965–975. Marras, W. S., et al. (1993). The role of dynamic three-dimensional trunk motion in occupationally- related low back disorders. the effects of workplace factors, trunk position, and trunk motion characteristics on risk of injury. Spine, 18, 617.
354 Marras, W. S., et al. (1995). Biomechanical risk factors for occupationally related low back disorders. Ergonomics, 38, 377. Matsas, E., & Vosniakos, G.-C. (2017). Design of a virtual reality training system for human–robot collaboration in manufacturing tasks. International Journal on Interactive Design and Manufacturing (IJIDeM) , 11(2), 139–153. McAtamney, L., & Corlett, E. N. (1993). RULA: A survey method for the investigation of work-related upper limb disorders. Applied Ergonomics, 24(2), 91–99. McDonald, A. D., Lee, J. D., Schwarz, C., & Brown, T. L. (2014). Steering in a random forest: Ensemble learning for detecting drowsiness-related lane departures. Human Factors, 56(5), 986–998. McGill, S. M. (1997). The biomechanics of low back injury: Implications on current practice in industry and the clinic. Journal of Biomechanics, 30, 465. McGill, S. M. (2002). Low back disorders: Evidence-based prevention and rehabilitation. Champaign, IL: Human Kinetics. McGill, S. M. (2015). Low back disorders: Evidence-based prevention and rehabilitation (2nd ed.). Champaign, IL: Human Kinetics. McGill, S. M., & Norman, R. W. (1985). Dynamically and statically determined low back moments during lifting. Journal of Biomechanics, 18, 877. McGill, S. M., & Norman, R. W. (1986). Partitioning of the L4-L5 dynamic moment into disc, ligamentous, and muscular components during lifting [see comments]. Spine, 11, 666. McGill, S. M., Hughson, R. L., & Parks, K. (2000). Changes in lumbar lordosis modify the role of the extensor muscles. Clinical Biomechanics (Bristol, Avon), 15, 777. Michalos, G., Spiliotopoulos, J., Makris, S., & Chryssolouris, G. (2018). A method for planning human robot shared tasks. CIRP Journal of Manufacturing Science and Technology, 22, 76–90. Miedema, M. C., Douwes, M., & Dul, J. (1997). Recommended maximum holding times for prevention of discomfort of static standing postures. International Journal of Industrial Ergonomics, 19(1), 9–18. Miranda, H., Viikari-Juntura, E., Martikainen, R., Takala, E. P., & Riihimäki, H. (2002). Individual factors, occupational loading, and physical exercise as predictors of sciatic pain. Spine, 27(10), 1102–1108. Moore, J., & Garg, A. (1995). The strain index: A proposed method to analyze jobs for risk of distal upper extremity disorders. American Industrial Hygiene Association Journal, 56(5), 443–458. Moore, P. V. (2019). OSH and the future of work: benefits and risks of artificial intelligence tools in workplaces. In V. G. Duffy (Ed.). Digital human modeling and applications in health, safety, ergonomics and risk management. human body and motion (pp. 292–315). Cham: Springer. Moore, S. M., Ellis, B., Weiss, J. A., McMahon, P. J., & Debski, R. E. (2010). The glenohumeral capsule should be evaluated as a sheet of fibrous tissue: a validated finite element model, Annals of Biomedical Engineering, 38, 66. Nachemson, A. (1975). Towards a better understanding of low-back pain: a review of the mechanics of the lumbar disc. Rheumatology and Rehabilitation, 14, 129. Nachemson, A. L., & Evans, J. H. (1968). Some mechanical properties of the third human lumbar interlaminar ligament (ligamentum flavum). Journal of Biomechanics, 1, 211. Najmaei, N., & Kermani, M. R. (2010). Applications of artificial intelligence in safe human–robot interactions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 41(2), 448–459. Naserkhaki, S., Jaremko, J. L., & El-Rich, M. (2016). Effects of inter-individual lumbar spine geometry variation on load-sharing: Geometrically personalized finite element study. Journal of Biomechanics, 49(13), 2909–2917.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS National Institute for Occupational Safety and Health (NIOSH) (1981). Work Practices Guide for Manual Lifting, 81–122, Department of Health and Human Services (DHHS). Cincinnati, OH: NIOSH. National Institute for Occupational Safety and Health (NIOSH) (1994). Applications Manual for the Revised NIOSH Lifting Equation. Cincinnati, OH: NIOSH. National Research Council (NRC) (1999). Work-related musculoskeletal disorders: report, Workshop Summary, and Workshop Papers. Washington, DC: National Academy Press. National Research Council (NRC) (2001). Musculoskeletal disorders and the workplace: low back and upper extremity. Washington, DC; National Academy Press. Neerincx, M. A., Lindenberg, J., & van Maanen, P.-P. (2005). Integrating Human Factors and Artificial Intelligence in the Development of Human-Machine Cooperation. IC-AI. Nelson, N. A., & Hughes, R. E. (2009). Quantifying relationships between selected work-related risk factors and back pain: a systematic review of objective biomechanical measures and cost-related health outcomes. International Journal of Industrial Ergonomics, 39(1), 202–210. Neupane, S., Ali, U. I., & Mathew, A. (2017). Text neck syndrome-systematic review. Imperial Journal of Interdisciplinary Research, 3(7), 141–148. Nimbarte, A. D., Al Hassan, M. J., Guffey, S. E., & Myers, W. R. (2012). Influence of psychosocial stress and personality type on the biomechanical loading of neck and shoulder muscles. International Journal of Industrial Ergonomics, 42(5), 397–405. Ning, X., Huang, Y., Hu, B., & Nimbarte, A. D. (2015). Neck kinematics and muscle activity during mobile device operations. International Journal of Industrial Ergonomics, 48, 10–15. Nordin, M., & Frankel, V. (1989). Basic biomechanics of the musculoskeletal system (2nd ed.). Philadelphia, PA: Lea and Febiger. Nunes, I. L. (2009). FAST ERGO_X–a tool for ergonomic auditing and work-related musculoskeletal disorders prevention. Work, 34(2), 133–148. Nussbaum, M. A., & Chaffin, D. B. (1997). Pattern classification reveals intersubject group differences in lumbar muscle recruitment during static loading. Clinical Biomechanics (Bristol, Avon) , 12, 97. Occhipinti, E., & Colombini, D. (2001). The OCRA method: Assessment of exposure to occupational repetitive actions of the upper limbs. In International encyclopedia of ergonomics and human factors (Vol. 1, pp. 1875–1879). New York: Taylor & Francis/ Occhipinti, E., & Colombini, D. (2007). Updating reference values and predictive models of the OCRA method in the risk assessment of work-related musculoskeletal disorders of the upper limbs. Ergonomics, 50(11), 1727–1739. O’Connor, A. M., Tsafnat, G., Thomas, J., Glasziou, P., Gilbert, S. B., & Hutton, B. (2019). A question of trust: Can we build an evidence base to gain trust in systematic review automation technologies? Systematic Reviews, 8(1), 1–8. Olaya, S. S. P., Lehmann, R., & Wollschlaeger, M. (2019). Comprehensive management function models applied to heterogeneous industrial networks. 2019 IEEE 17th International Conference on Industrial Informatics (INDIN) , 1, 965–970. Ospina-Mateus, H., Niño-Prada, B., Tilbe-Ayola, K., & Contreras-Ortiz, S. (2017). Ergonomic and biomechanical evaluation of the use of computers, tablets and smart phones by children. A pilot study. VII Latin American Congress on Biomedical Engineering CLAIB (2016). Bucaramanga, Santander, Colombia, October 26–28, 2016, 320–324. Ozkaya, N., & Nordin, M. (1991). Fundamentals of biomechanics, equilibrium, motion and deformation. New York: Van Nostrand Reinhold. Padula, R. S., Comper, M. L. C., Sparer, E. H., & Dennerlein, J. T. (2017). Job rotation designed to prevent musculoskeletal disorders and control risk in manufacturing industries: A systematic review. Applied Ergonomics, 58, 386–397.
BASIC BIOMECHANICS AND WORKPLACE DESIGN Panjabi, M. M. (1992a). The stabilizing system of the spine. Part II. Neutral zone and instability hypothesis. Journal of Spinal Disorders, 5, 390. Panjabi, M. M. (1992b). The stabilizing system of the spine. Part I. function, dysfunction, adaptation, and enhancement. Journal of Spinal Disorders, 5, 383. Panjabi, M. M. (2003). Clinical spinal instability and low back pain. Journal of Electromyography and Kinesiology, 13, 371. Park, K., & Chaffin, D. (1974). A biomechanical evaluation of two methods of manual load lifting. AIIE Transactions, 6, 105. Pavlovic-Veselinovic, S., Hedge, A., & Veselinovic, M. (2016). An ergonomic expert system for risk assessment of work-related musculo-skeletal disorders. International Journal of Industrial Ergonomics, 53, 130–139. Pfaeffle, H. J., et al. (1996). Tensile properties of the interosseous membrane of the human forearm. Journal of Orthopaedic Research, 14, 842. Picchiotti, M.T., Weston, E.B., Knapik, G.G. Dufour, J.S., & W.S. Marras (2019) Impact of Two postural assist exoskeletons on biomechanical loading of the lumbar spine. Applied Ergonomics, 75, 1–7. Pintar, F. A., Yoganandan, N., Myers, T., Elhagediab, A., & Sances, Jr., A. (1992). Biomechanical properties of human lumbar spine ligaments. Journal of Biomechanics, 25, 1351. Pope, M. H. (1993). Report. In International Society of Biomechanics XIVth Congress, Paris. Praemer, A., Furner, S., & Rice. D. P. (1992). Musculoskeletal Conditions in the United States (pp. 23–33). Park Ridge, IL: American Academy of Orthopaedic Surgeons. Quapp, K. M., & Weiss, J. A. (1998). Material characterization of human medial collateral ligament. Journal of Biomechanical Engineering, 120, 757. Ragaglia, M., Zanchettin, A. M., & Rocco, P. (2018). Trajectory generation algorithm for safe human-robot collaboration based on multiple depth sensor measurements. Mechatronics, 55, 267–281. Rajnathsing, H., & Li, C. (2018). A neural network based monitoring system for safety in shared work-space human-robot collaboration. Industrial Robot: An International Journal, 45(4). 481–491. https://doi.org/10.1108/IR-04-2018-0079 Redd, D., Goulet, J., & Zeng-Treitler, Q. (2020, January 7). Using explainable deep learning and logistic regression to evaluate complementary and integrative health treatments in patients with musculoskeletal disorders. Regan, W. D., Korinek, S. L., Morrey, B. F., & An, K. N. (1991). Biomechanical study of ligaments around the elbow joint. Clinical Orthopaedics and Related Research, 271, 170. Reid, C.R., McCauley-Bush, P., Karwowski, W., & Durani, S. K. (2010). Occupational postural activity and lower extremity discomfort: A review. International Journal of Industrial Ergonomics, 40(3). 247–256. Reilly, C. H., & Marras, W. S. (1989). Simulift: A simulation model of human trunk motion. Spine, 14, 5. Rengevic, A., Kumicakova, D., Kuric, I., & Tlach, V. (2017). Approaches to the computer vision system proposal on purposes of objects recognition within the human-robot shared workspace collaboration. Communications – Scientific Letters of the University of Zilina, 19(2A), 68–73. Rodrigues, M. S., Sonne, M., Andrews, D. M., Tomazini, L. F., de Oliveira Sato, T., & Chaves, T. C. (2019). Rapid office strain assessment (ROSA): Cross-cultural validity, reliability and structural validity of the Brazilian-Portuguese version. Applied Ergonomics, 75, 143–154. Roman-Liu, D. (2014). Comparison of concepts in easy-to-use methods for MSD risk assessment. Applied Ergonomics, 45(3), 420–427. Rose, J. D., Mendel, E., & Marras, W. S. (2013) Carrying and spine loading. Ergonomics, 56(11). 1722–1732. Rusthollkarhu, S., & Aarikka-Stenroos, L. (2019). The effects of AI-human-interaction to value creation in multi-actor systems:
355 How AI shapes digital B2B sales. In K. Smolander, P. Grunbacher, S. Hyrynsalmi, & S. Jansen (Eds.). Proceedings of the 2nd ACM Sigsoft International Workshop on Software-Intensive Business: Start-Ups, Platforms, and Ecosystems (IWSIB ’19) (pp. 37–41). Association of Computing Machinery. https://doi.org/10 .1145/3340481.3342736 Sachdeva, A., Gupta, B. D., & Anand, S. (2011). Minimizing musculoskeletal disorders through fuzzified neural network approach. International Journal of Artificial Intelligence Applications, 2(3), 72–85. Sanders, M. S. & McCormick, E. F. (1993). Human factors in engineering and design. New York: McGraw-Hill. Savary-Leblanc, M. (2019). Improving MBSE Tools UX with AI-empowered software assistants. In L. Burgueno, A. Pretschner, S. Voss, M. Chaudron, J. Kienzle, … G. Kappel (Eds.). 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (models-C 2019) (pp. 648–652). IEEE Computer Society. https://doi.org/10 .1109/MODELS-C.2019.00099 Savino, M. M., Battini, D., & Riccio, C. (2017). Visual management and artificial intelligence integrated in a new fuzzy-based full body postural assessment. Computers & Industrial Engineering, 111, 596–608. Schaefer, K. E., Oh, J., Aksaray, D., & Barber, D. (2019). Integrating context into artificial intelligence: Research from the robotics collaborative technology alliance. AI Magazine, 40(3). 28–40. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/https://doi.org/10 .1016/j.neunet.2014.09.003 Schoenmarklin, R. W., Marras, W. S., & Leurgans, S. E. (1994). Industrial wrist motions and incidence of hand/wrist cumulative trauma disorders. Ergonomics, 37, 1449. Schultz, A. B., & Andersson, G. B. (1981). Analysis of loads on the lumbar spine. Spine, 6, p. 76. Schultz, I. Z., Crook, J. M., Berkowitz, J., Meloche, G. R., Milner, R., & Zuberbier, O. A. (2008). Biopsychosocial multivariate predictive model of occupational low back disability. In Handbook of complex occupational disability claims (pp. 191–202). Cham: Springer. Schultz, I. Z., Crook, J., Meloche, G. R., Berkowitz, J., Milner, R., Zuberbier, O. A., & Meloche, W. (2004). Psychosocial factors predictive of occupational low back disability: towards development of a return-to-work model. Pain, 107(1–2), 77–85. Schurr, N., Good, R., Alexander, A., Picciano, P., Ganberg, G., Therrien, M., Beard, B. L., & Holbrook, J. (2010). A testbed for investigating task allocation strategies between air traffic controllers and automated agents. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) (pp. 1839–1845). Association for Advancement in Artificial Intelligence. Sedighi, A., Sadati, N., Nasseroleslami, B., Vakilzadeh, M. K., Narimani, R., & Parnianpour, M. (2011). Control of human spine in repetitive sagittal plane flexion and extension motion using a CPG based ANN approach. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 8146–8149). IEEE. Serna, M. E., Acevedo, M, E., & Serna, A. A. (2019). Integration of properties of virtual reality, artificial neural networks, and artificial intelligence in the automation of software tests: A review. Journal of Software: Evolution and Process, 31(7). e2159. Shah, P. P., & Sheth, M. S. (2018). Correlation of smartphone use addiction with text neck syndrome and SMS thumb in physiotherapy students. International Journal of Community Medicine and Public Health, 5(6), 2512. Shan, Z., Deng, G., Li, J., Li, Y., Zhang, Y., & Zhao, Q. (2013). Correlational analysis of neck/shoulder pain and low back pain with the use of digital products, physical activity and psychological status among adolescents in Shanghai. PloS ONE, 8(10). e78109. Sharan, D., Mohandoss, M., Ranganathan, R., Jose, J. A., & Rajkumar, J. S. (2012). Distal upper extremity disorders due to extensive usage
356 of hand-held mobile devices. Human Factors in Organisational Design and Management, 51, 1041–1045. Shekhar, S. S. (2019). Artificial intelligence in automation. Artificial Intelligence, 3085(06), 14–17. Shirazi-Adl , A., Ahmed, A. M., & Shrivastava, S. C. (1986). A finite element study of a lumbar motion segment subjected to pure sagittal plane moments. Journal of Biomechanics, 19, 331. Silverstein, B. A., Stetson, D. S., Keyserling, W. M., & Fine, L. J. (1997). Work-related musculoskeletal disorders: Comparison of data sources for surveillance. American Journal of Industrial Medicine, 31, 600. Silverstein, M. A., Silverstein, B. A., & Franklin, G. M. (1996). Evidence for work-related musculoskeletal disorders: A scientific counterargument. Journal of Occupational and Environmental Medicine, 38, 477. Solomonow, M. (2004). Ligaments: A source of work-related musculoskeletal disorders. Journal of Electromyography and Kinesiology, 14, 49. Solomonow, M., Zhou, B. H., Baratta, R. V., Lu, Y., & Harris, M. (1999). Biomechanics of Increased exposure to lumbar injury caused by cyclic loading: Part 1. Loss of reflexive muscular stabilization. Spine, 24, 2426. Solomonow, M., Zhou, B., Baratta, R. V., Zhu, M., & Lu, Y. (2002). Neuromuscular disorders associated with static lumbar flexion: A feline model. Journal of Electromyography and Kinesiology, 12, 81. Solomonow, M., Zhou, B. H., Harris, M., Lu, Y., & Baratta, R. V. (1998). The ligamento-muscular stabilizing system of the spine. Spine, 23, 2552. Solomonow, M., et al. (2000). Biexponential recovery model of lumbar viscoelastic laxity and reflexive muscular activity after prolonged cyclic loading, Clinical Biomechanics (Bristol, Avon), 15, 167. Sonne, M., Villalta, D. L., & Andrews, D. M. (2012). Development and evaluation of an office ergonomic risk checklist: ROSA–Rapid office strain assessment. Applied Ergonomics, 43(1). 98–108. Souza, K. E., Seruffo, M. C., De Mello, H. D., Souza, D. D. S., & Vellasco, M. M. (2019). User experience evaluation using mouse tracking and artificial intelligence. IEEE Access, 7, 96506–96515. Splittstoesser, R.E., Marras, W.S., & Best, T.M. (2012) Immune responses to low back pain risk factors. Work, 41, 6016–6023. Stecco, A., Gesi, M., Stecco, C., & Stern, R. (2013). Fascial components of the myofascial pain Syndrome. Current Pain and Headache Reports, 17(8), 352. Straker, L. M., Coleman, J., Skoss, R., Maslen, B. A., Burgess-Limerick, R., & Pollock, C. M. (2008). A comparison of posture and muscle activity during tablet computer, desktop computer and paper use by young children. Ergonomics, 51(4), 540–555. Stubbs, M., et al. (1998). Ligamento-muscular protective reflex in the lumbar spine of the feline. Journal of Electromyography and Kinesiology, 8, 197. Suárez Sánchez, A., Iglesias-Rodríguez, F. J., Riesgo Fernández, P., & de Cos Juez, F. J. (2016). Applying the K-nearest neighbor technique to the classification of workers according to their risk of suffering musculoskeletal disorders. International Journal of Industrial Ergonomics, 52, 92–99. Subasi, A. (2012). Classification of EMG signals using combined features and soft computing techniques. Applied Soft Computing, 12(8), 2188–2198. Suwito, W., et al. (1992, March). Geometric and material property study of the human lumbar spine using the finite element method. Journal of Spinal Disorders, 5, 50. Taib, M. F. M., Bahn, S., & Yun, M. H. (2016). The effect of psychosocial stress on muscle activity during computer work: Comparative study between desktop computer and mobile computing products. Work, 54(3), 543–555. https://doi.org/10.3233/WOR-162334 Tasso, M. (2018). Multitask analysis of whole body working postures by TACOs: Criteria and tools. Congress of the International Ergonomics Association, 103–111.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Tegtmeier, P. (2018). A scoping review on smart mobile devices and physical strain. Work, 59(2), 273–283. Theado, E. W., Knapik, G. G., & Marras, W. S. (2007). Modification of an EMG-assisted biomechanical model for pushing and pulling. International Journal of Industrial Ergonomics, 37(11–12), 825–831. Thelen, D. G., Schultz, A. B., & Ashton-Miller, J. A. (1995). Co-contraction of lumbar muscles during the development of time-varying triaxial moments. Journal of Orthopaedic Research, 13, p. 390. Ticker, J. B., et al. (2006). The inferior glenohumeral ligament: A correlative investigation. Journal of Shoulder and Elbow Surgery, 15, 665. van Der Windt, D. A., Thomas, E., Pope, D. P., De Winter, A. F., Macfarlane, G. J., Bouter, L. M., & Silman, A. J. (2000). Occupational risk factors for shoulder pain: a systematic review. Occupational and Environmental Medicine, 57(7), 433–442. van Dieen, J. H., & Kingma, I. (2005, March 15). Effects of antagonistic co-contraction on differences between electromyography based and optimization based estimates of spinal forces. Ergonomics, 48, 411. van Dieen, J. H., Hoozemans, M. J., and Toussaint, H. M. (1999). Stoop or squat: A review of biomechanical studies on lifting technique. Clinical Biomechanics (Bristol, Avon) , 14, 685. Vatankhah, R., Broushaki, M., & Alasty, A. (2016). Adaptive optimal multi-critic based neuro-fuzzy control of MIMO human musculoskeletal arm model. Neurocomputing, 173, 1529–1537. Videman T., Nurminen, M., & Troup, J. D. (1990). 1990 Volvo award in clinical sciences. lumbar spinal pathology in cadaveric material in relation to history of back pain, occupation, and physical loading. Spine, 15, 728. Viegas, S. F., Yamaguchi, S., Boyd, N. L., & Patterson, R. M. (1999). The dorsal ligaments of the wrist: anatomy, mechanical properties, and function. Journal of Hand Surgery America, 24, 456. Waddell, G. (1987). 1987 Volvo award in clinical sciences: A new clinical model for the treatment of low-back pain. Spine, 12(7), 632–644. Wang, J. L., Parnianpour, M., Shirazi-Adl, A., & Engin, A. E. (2000). Viscoelastic finite-element analysis of a lumbar motion segment in combined compression and sagittal flexion. Effect of loading rate. Spine, 25, 310. Waters, R. L., & Morris, J. M. (1973). An in vitro study of normal and scoliotic interspinous ligaments. Journal of Biomechanics, 6, 343. Waters, T. R., Putz-Anderson, V., Garg, A., & Fine, L. J. (1993). Revised NIOSH equation for the design and evaluation of manual lifting tasks. Ergonomics, 36, 749. Weitz, K., Schiller, D., Schlagowski, R., Huber, T., & Andre, E. (2020). “Let me explain!”: Exploring the potential of virtual agents in explainable AI interaction design. Journal on Multimodal User Interfaces. https://doi.org/10.1007/s12193-020-00332-0 Weston, E. B., Alizadeh, M, Knapik, G. G., Wang, X, & Marras, W. S. (2018) Biomechanical evaluation of exoskeleton use on loading of the lumbar spine. Applied Ergonomics, 68, 101–108. Weston, E. B. Aurand, A., Dufour, J. S., Knapik, G. G., & Marras, W. S. (2020) One versus Two-Handed lifting and lowering: lumbar spine loads and recommended one-handed limits protecting the lower back. Ergonomics (in press). Weston, E. B., Dufour, J. S., Lu, M-L., & Marras, W. S. (2019) Spinal Loading and lift style in Confined vertical space. Applied Ergonomics, 84. https://doi.org/10.1016/j.apergo.2019.103021 (in press). Wilson, J. R. (2014). Fundamentals of systems ergonomics/human factors. Applied Ergonomics, 45(1), 5–13. Woo, S. L., Debski, R. E., Withrow, J. D., & Janaushek, M. A. (1999). Biomechanics of knee ligaments, American Journal of Sports Medicine, 27, 533. Xie, Y. F., Szeto, G., & Dai, J. (2017). Prevalence and risk factors associated with musculoskeletal complaints among users of mobile
BASIC BIOMECHANICS AND WORKPLACE DESIGN handheld devices: A systematic review. Applied Ergonomics, 59, 132–142. Xie, Y. F., Szeto, G., Madeleine, P., & Tsang, S. (2018). Spinal kinematics during smartphone texting–A comparison between young adults with and without chronic neck-shoulder pain. Applied Ergonomics, 68, 160–168. Yang, G., Marras, W. S., & Best, T. M. (2011) The biochemical response to biomechanical tissue loading on the low back during physical work exposure. Clinical Biomechanics, 26(5), 431–437. Yang, L., Grooten, W. J., & Forsman, M. (2017). An iPhone application for upper arm posture and movement measurements. Applied Ergonomics, 65, 492–500. Yassine, E., Abdelaziz, B., & Larbi, B. (2017, November). Implementation of adaptive neuro fuzzy inference system for study of EMG-force relationship. In 2017 International Conference on Electrical and Information Technologies (ICEIT) (pp. 1–6). IEEE.
357 Yoganandan, N., Kumaresan, S., & Pintar, F. A. (2000). Geometric and mechanical properties of human cervical spine ligaments. Journal of Biomechanical Engineering, 122, 623. Yoganandan, N., Kumaresan, S., & Pintar, F. A. (2001). Biomechanics of the cervical spine part 2. cervical spine soft tissue responses and biomechanical modeling, Clinical Biomechanics (Bristol, Avon), 16, 1. Zander, T., Rohlmann, A., & Bergmann, G. (2004). Analysis of simulated single ligament transection on the mechanical behaviour of a lumbar functional spinal unit. Biomedical Technology (Berlin) , 49, 27. Zhang, X., Kuo, A. D., & Chaffin, D. B. (1998). Optimization-based differential kinematic modeling exhibits a velocity-control strategy for dynamic posture determination in seated reaching movements. Journal of Biomechanics, 31, 1035. Zurada, J., Karwowski, W. & Marras, W. S., (2004). classification of jobs with risk of low back disorders by applying data mining techniques. Occupational Ergonomics, 4(4). 291–305.
CHAPTER
13
THE CHANGING NATURE OF TASK ANALYSIS Erik Hollnagel University of Jönköping Jönköping, Sweden
1
2
A NEED TO KNOW
358
1.1
358
THE NEW REALITY
363
5
364
359
FROM STRUCTURAL TO FUNCTIONAL ANALYSIS
2.1
359
5.1
Bottom-Up Task Analysis
364
5.2
Top-Down Task Analysis
364
5.3
Functional Analysis
365
The Changing Nature of Work
2.2
The First Push: Productivity
359
2.3
The Second Push: The Inconsistency of the Human Factor
360
What Should TA Bring About or Produce?
361
APPROACHES TO TASK ANALYSIS: FROM ANALYSIS TO SYNTHESIS
361
3.1
Sequential Task Analysis
362
3.2
Hierarchical Task Organization
362
1 A NEED TO KNOW In order to ensure that something goes well, it is necessary to understand what goes on. The “what goes on” is commonly referred to as a task, and the purpose of task analysis (TA) is consequently to find out how something is done or should be done. This is necessary whether the concern is the layout of the workplace, the design of the artifacts needed at work, how to organize work, or how to manage the resources—people, equipment, information, and materials—necessary to ensure that the intended outcomes are achieved. Task analysis has, therefore, been one of the primary tools of human factors from the very beginning. More formally, the purpose of task analysis, as it is commonly practiced, is to describe tasks and in particular to identify the fundamental characteristics of a specific activity or set of activities. For the purposes listed above, a task can be defined as “any piece of work that has to be done,” which is generally taken to mean the set of activities or functions that are needed to bring about a desired and intended outcome. Throughout most of the history of mankind, the organization of work activities happened naturally in the sense that people over time and through trial and error would find out what was necessary to get something done. This incremental adaptation was adequate because work only changed slowly and because innovations were infrequent—at least until the Industrial Revolution about 250–300 years ago. There was, therefore, time to absorb changes either in how work could be done—new technologies and new procedures—or changes in the conditions under which work took place, including increasing demands to the quality and quantity of the outcomes of work. After the beginning of the twentieth century, and in particular during the last half of the century, three things happened. One was that new technologies and new requirements due to changing demands both became more frequent; another that tasks tended to become more 358
4
WHY IS TASK ANALYSIS NEEDED?
2.4 3
Task Analysis and Planning
6
THE ROLE OF TASK ANALYSIS IN THE FUTURE
366
6.1
366
The Diminishing Role of the Human Factor
REFERENCES
366
complicated or intricate due to the developments in society (e.g., Wiener, 1994); and a third that there became less time to get used to changes in conditions and demands—specifically that the traditional approach of adapting naturally to them no longer sufficed. Task analysis therefore became necessary to determine what people, individually and collectively, should do in order to achieve a given goal or objective. Or simply put, to determine who should do what, why it was necessary, and finally how and when it should be done. 1.1 Task Analysis and Planning To prevent behavior or performance from becoming random and ineffective, it is necessary to plan what is to be done in as much detail as possible. Plans provide the structure of behavior, as argued by Miller, Galanter, and Pribram (1960) and everything we do can be seen as guided by plans, either implicitly in the form of routine and habits or explicitly in the form of instructions and procedures. When there is no plan, when we have no idea about what we should do, then we say that control has been lost—as described, e.g., by the scrambled and opportunistic control modes (Hollnagel, 1993). Task analysis may seem to be similar, if not identical, to planning. Task analysis is, however, not the same as planning. Plans are the rough-sketchy, flexible anticipations that usually are sufficient to organize what we do so that we can accomplish what we intend. Plans are what we use to structure behavior but they are not “elaborate blueprints for every moment of the day” (Miller, Galanter, and Pribram, 1960). Plans as the structure of behavior are furthermore mostly for ourselves and less for others. Task descriptions, on the other hand, are the plans that we make for others rather than for ourselves. Task descriptions represent our best understanding—or hopes—about how something can be done or should be done. The descriptions are intended to help people, both those who do the work and those who must
THE CHANGING NATURE OF TASK ANALYSIS
manage and monitor it. Tasks describe Work-as-Imagined in the best possible sense of the term.
2 WHY IS TASK ANALYSIS NEEDED? Throughout most of their existence, humans have depended on tools or artifacts to achieve their purposes, from the Denisovans’ stone tools 200,000 years ago, to the farmer’s plow or the blacksmith’s hammer, to machines and factories, and to today’s robots and computers. As long as users were artisans rather than workers and as long as work was an individual rather than a collective endeavor, there was no real need for prior planning. Anything resembling task analysis was therefore minimal or even non-existent. Local communities made the tools they needed whenever the need was there, and made them for themselves rather than for others. The demand for task analysis only arose when the use of technology became more pronounced or even indispensable, especially when tools changed from being simple to being complex. More generally, the need for a formal task analysis arises when one or more of the following three conditions are met: • When the physical and/or mental resources needed to accomplish a task exceed what a single human can provide, or when they depend on a combination of skills that goes beyond what a single individual can be expected to master. In such cases, task analysis can be used to break down a composite activity into a number of simpler and more elementary activities. For example, building a ship, in contrast to building a dinghy or a simple raft, requires the collaboration of many individuals and the coordination of many different types of work. Here work must be organized and people have to collaborate and must therefore adjust what they do to match the progress and demands of others. Task analysis is needed to identify what a single person can achieve or provide over a reasonable period of time, as well as to propose a way to combine and schedule the activities of the many to produce an overall whole. • When tasks become so complex that a single person can no longer control or comprehend them. This may happen when a task becomes so large or takes so long that one person is unable to complete it (i.e., the transition from individual to collective tasks) and if the task is such that it cannot be suspended—that it cannot be interrupted and resumed later. It may also happen when the execution of the task depends on technological artifacts where the very use of the artifacts becomes a task in its own right. This is the case, for instance, when the artifacts begin to function in an independent or semi-autonomous manner (i.e., that they begin partially to regulate themselves rather than passively provide single and well-defined functions in response to the user’s directions). Task analysis can similarly be used to describe situations where use of the technology is no longer straightforward, but requires mastery of other skills to such a degree that not everyone can use the equipment directly as designed and intended. • A similar condition arises when the operation of the technology itself—the machines—becomes so complex that the situation changes from simply being one of using the technology, to one of learning how to understand, master, or control the technology. In other words, being in control of the technology becomes a task in itself, as a means to carry out the original task. Examples are driving a car in contrast to riding an ordinary bicycle, using a food processor instead of a knife, using a computer (as in writing this chapter) rather than paper and pencil, and
359
so on. In these cases, and of course also in cases of far more complex work, the use of technology is not always intuitive and straightforward but may require preparation, training, and prior thought either by the people who do the work or by those who prepare tasks or tools for others. In such cases, task analysis can be used to understand the impact of the technology on the overall ability to carry out the work. 2.1 The Changing Nature of Work Altogether task analysis became necessary when technology began to encroach on work, when machines changed from being reactive or just responding to actually doing things themselves—as automata. Although automata have been known for thousands of years, it was only after the middle of the twentieth century, due to the technological advances made in the 1940s–1950s (information theory, computing machinery, cybernetics, etc.), that non-trivial activities could be carried out by machines. It became clear that machines for specific parts of a task could do them better than humans and this became the motivation for the new science of human factors engineering (Wickens & Hollands, 2000). In the beginning, the problem was how to balance or trade off the comparative advantages/disadvantages of humans relative to technology rather than a complete take-over or automation of tasks. That, however, soon happened leading to a need for task analysis as a means to ensure that the human–machine system (where the human was seen as a component that had to be engineered) could be made to perform optimally from an engineering point of view. The increasing dependence on technology also meant that work changed from something that could be done by an unaided individual to become something requiring the collective efforts either of people working together or of people working with technology. Although collective work has existed since the beginning of history, its presence became more conspicuous after the Industrial Revolution about 250 years ago. In this new situation the work of the individual became a part of the work of a larger system, and individual control of work was consequently often lost. The role of the human relative to technology changed from tracking (closed-loop, short-term detection, and correction) over regulating (operation of subsystems over longer time spans) and monitoring (following developments, selecting and generating plans, maintaining the performance envelope) to finally targeting (selecting goals to match conditions, process optimization). The worker became part of a larger context, a cog in complex social machinery that defined the demands and constraints to work (Hollnagel & Woods, 2005). An important consequence of that was that people no longer could work at a pace suitable for them and pause whenever needed, but instead had to comply with the pace set by others—which increasingly meant the pace set by machines. 2.2 The First Push: Productivity The first push that helped establish task analysis came with the formulation of the Scientific Management discipline. In 1880, Frederick W. Taylor had advanced to foreman at the Midvale Steel Company, one of America’s great armor plate-making plants, after starting as a clerk in 1877. In this role, Taylor observed that most workers who were forced to perform repetitive tasks tended to work at the slowest rate that would go unpunished. Taylor was, indeed, “constantly impressed by the failure of his neighbors to produce more than about one-third of a good day’s work” (Drury, 1918, p. 23). He became convinced that it would be possible to improve efficiency and thereby increase productivity by studying how work was done in a
360
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
systematic or “scientific” manner, by analyzing and optimizing workflows. He started to do so by combining his own method of time-studies with Frank and Lillian Gilbreth’s work on motion studies (Gilbreth, 1911) to describe a rational analysis that could be used to uncover the one best way of performing any particular task. (In modern terms this would be called Work-as-Imagined.) This work culminated in the publication of a report entitled The Principles of Scientific Management (Taylor, 1911). A classic example is Taylor’s (1911) analysis of the handling of pig iron, where the work was done by men with no “tools” other than their hands. The individual worker was thus seen as the basic component, or production unit, for the analysis of work. A pig-iron handler would stoop down, pick up a pig weighing about 92 pounds, walk up an inclined plank, and drop it on the end of a railroad car. Taylor and his associates found that a gang of pig-iron handlers was loading on average about 12.5 long tons per man per day. The aim of the study was to find ways in which to raise this output to 47 tons a day, not by making the men work harder but by reducing the number of unnecessary movements. This was to be achieved both by careful motion-and-time studies, and by a system of incentives that would benefit workers as well as management. Scientific management proposed four principles as the basis for studies of work: • The development of the science of work with rigid rules for each motion of every person, and the perfection and standardization of all implements and working conditions. • The careful selection and subsequent training of workers into first-class people, and the elimination of all people who refuse to or are unable to adopt the best methods. • Bringing the first-class workers and the science of working together, through the constant help and watchfulness of the management, and through paying each person a large daily bonus for working fast and doing what he is told to do. • An almost equal division of the work and responsibility between workers and management. The first principle is clearly recognizable even today. Indeed, the improvements in productivity that Scientific Management strove to bring about later became the mantra of Lean Manufacturing (Womack, Jones, & Roos, 2007). The belief that work can be standardized is by now ubiquitous, not just in relation to productivity and quality but also, and sometimes even more strongly so, in relation to safety. The other principles have lost much of their relevance because the industrialized societies have changed how they view their citizens. 2.3 The Second Push: The Inconsistency of the Human Factor Scientific Management had resolved the productivity issue when work was mostly manual work. But after the IT revolution in the late 1940s, work changed from being work with the body to become work with the mind, in other words from being predominantly manual to being predominantly cognitive. New technologies that increased the speed and precision of industrial processes, whether in manufacturing, construction, or transportation, were assumed to lead to corresponding improvements in productivity. When this did not happen as expected, the accepted explanation was that humans were too imprecise, variable, and slow. Humans therefore had to be “engineered” in order to make full use of the increased capabilities of machines
and technology. The problem was clearly stated in a seminal report by the psychologist Paul Fitts from the Ohio State University Research Foundation for the Air Navigation Development Board: We begin with a brief analysis of the essential functions of the air-traffic control problem. We then consider the basic question: Which of these functions should be performed by human operators and which by machine elements? … Human-engineering research … aims to provide principles governing the design of tasks and machines in relation to man’s capabilities, and to insure an efficient integration of man and machines for the accomplishment of an overall task. (Fitts, 1951, p. X) Where Taylor had analyzed work in order to find the one best way it could be carried out by human workers, Fitts analyzed work in terms of the needs of various capabilities in order to be able to assign specific parts of the task to either humans or technology. Fitts proposed that this should be based on a comparison of humans and machines in terms of a set of specific abilities which became known as the Fitts List (Table 1), but now often referred to as the Men-Are-Better-At/Machines-Are-Better-At (MABA-MABA) list (Dekker & Woods, 2002). From this perspective, the operator was no longer seen primarily as a worker or as a self-contained “production unit.” The essential role of people were not as a source of physical power but as a controller of technology and as a provider of functionality that the machines needed but were as yet unable themselves to provide. This was clearly expressed in the following statement, taken from the abstract of the report that heralded the new field of man–machine task analysis: The operator is treated as part of the system’s linkages from input to output functions. Information displayed to the operator is analysed into essential discrimination requirements; control activations necessary to control the machine’s outputs are analysed into component ‘effector’ or response requirements. (Miller, 1953, p. iii) Echoing Taylor’s second principle (see above), Miller continued to explain that: “The general purpose behind the development of these task analysis procedures was to fulfill the need for specifying training requirements in relatively detailed and unambiguous psychological terms” (Miller, 1953, p. 1). The training was needed because, unlike today, no one in the 1950s had any experience in serving as a “transducer” or “information processor.” The need to provide that function was even greater at the time because computing machinery
Table 1
A Comparison of Men and Machine Capabilities
What can men do better than machines?
What can machines do better than men?
Detection Perception Judgment Induction Improvisation Long-term memory
Speed Power Computation Replication Simultaneous operations Short-term memory
Source: Based on Fitts, 1951.
THE CHANGING NATURE OF TASK ANALYSIS
was in its infancy and without the sophistication that software developments in the 1970s and 1980s would bring. The imperfections and limitations of humans as machines, something that later would practically become a dogma of human factors, were also clearly recognized as an obstacle to overcome: A thorough task analysis may be used as an aid in modifying job operations so as to make their performance more simple and less liable to error … Furthermore, a task analysis obviously provides a basis for setting up a training syllabus. (Miller, 1953, p. 2) To capture the more complex task organization, Miller (1953) developed a method for human–machine task analysis in which main task functions could be decomposed into subtasks, thus heralding hierarchical task analysis. Each subtask could then be described in detail, for instance, by focusing on information display requirements and control actions. The recommended procedure for task analysis began by specifying the human–machine system criterion output, in other words, what the human–machine systems should be able to accomplish. The context was military rather than civilian activities and among the examples provided were to fly an airplane on a mission from origin to final destination, to command a crew to coordinate and accomplish a mission, or to communicate information to outside agencies. The definition of what the (hu)man–machine system should do was followed by several steps to determine which information should be displayed to the operator and which options he or she should be given so that the machine could be controlled. A new distinction was also made between discontinuous (procedural) and continuous (tracking) tasks, something that had not been needed in Taylor’s analysis of manual work. The final step in the task analysis was to ensure “that each stimulus is linked to a response and that each response is linked to a stimulus” (Miller, 1953, p. 13). This was consistent with mainstream psychological thinking at the time which was that of stimulus and response couplings. The operator was, in other words, seen as a transducer or a machine that was coupled to the “real” machine. In order for the (hu)man–machine system to work, it was necessary that the operator interpreted the machine’s output in the proper way and that he or she responded with the correct input. As a consequence of this, designers were forced willy-nilly to think of operators as “machines,” as simple but not quite reliable automata that would provide the required responses to given stimuli (Hollnagel & Woods, 2005). The purpose of task analysis became to determine what the operator had to do to enable the machine to function as efficiently as possible. 2.4 What Should TA Bring About or Produce? Task analysis is needed to provide concrete and practical answers to the questions of how things are done or should be done. When dealing with work, and more generally with how people use socio-technical artifacts to do their work, it is necessary to know both what activities (functions) are required to accomplish a specified objective and how people habitually go about doing them, particularly since the latter is usually different—and sometimes significantly so—from the former. Such knowledge is necessary to design, implement, and manage socio-technical systems, and task analysis looks specifically at how work takes place and how it can be facilitated. The focus of task analysis has inevitably changed as the nature of work has changed. At the beginning of the twentieth
361
century, work was mostly physical and therefore observable and the methods were the time and motion studies used by Taylor and his contemporaries. Around the middle of the century, when human factors became established as a discipline, task analysis still focused on the human but now as an information transducer rather than as a source of power or a “machine” in itself. This view became more refined as computers became the dominant “tool,” as illustrated by GOMS (Card et al., 1983). The purpose of GOMS, which is an acronym that stands for “goals, operators, methods, and selection rules,” was to provide a system for modeling and describing human task performance in the context of computer–human interaction. The term operators, one of the four components of GOMS, did actually not refer to people but denoted the set of atomic-level operations that a user could combine to serve as a solution to a goal, while the term methods represent sequences of operators or elementary operations grouped together to accomplish a single goal. The focus of a task analysis method such as GOMS was completely different from what Scientific Management had in mind. For example, the manual operators of GOMS were: Keystroke key_name, Type_in string_of_characters, Click mouse_button, Double_click mouse_button, Hold_down mouse button, Release mouse_button, Point_to target_object, and Home_to destination. It is interesting to compare these to the categories of basic tasks that Frank Gilbreth had offered. Based on the observations by trained time-and-motion specialists of human movement, specifically of the fundamental motions of the hands of a worker, Gilbreth found that it was possible to distinguish among the following 17 types of motion: search, select, grasp, reach, move, hold, release, position, preposition, inspect, assemble, disassemble, use, unavoidable delay, wait (avoidable delay), plan, and rest (to overcome fatigue). (The basic motions have become known as therbligs, using an anagram of the developer’s name.) Where the therbligs referred to wide range of physical tasks, GOMS referred to a rather narrow range of mediating activities for mental or cognitive tasks. Work, however, continued to develop and toward the end of the twentieth century, it became necessary to expand the focus yet again, first from (hu)man–machine systems to joint cognitive systems (Hollnagel & Woods, 1983), and following that from individual and localized work to distributed work (Hutchins, 1995). Today task analysis has goes well beyond work studies and the interface and interaction design of yesteryear. Task analysis today is used to address a host of issues, such as training, performance assessment, function allocation and automation, procedure writing, maintenance planning, staffing and job organization, personnel selection, work management, and probably others as well. 3 APPROACHES TO TASK ANALYSIS: FROM ANALYSIS TO SYNTHESIS In human factors, task analysis is basically a collection of methods, each of which describe how a specific task analysis shall be performed. Each method should also include a stop rule or criterion (i.e., define the principles needed to determine when the analysis has come to an end, for instance, that the level of elementary tasks has been reached). An overview of task analysis methods can be found in many places, e.g., Kirwan and Ainsworth (1992), Annett and Stanton (2000), Schraagen, Chipman, and Shalin (2000), Hollnagel (2003, 2006). Performing a task analysis is, of course, itself an activity or a task and a good task analysis method should therefore be able to describe itself. That is, unfortunately, not always the case.
362
All task analysis methods make use of the principle of decomposition, of breaking something into its constituent parts, to reason about activities and how they are related and organized. A critical issue in this is the identification or determination of the parts, the elementary activities or task components, either as therbligs or the requirements to the human as an input-output transducer. The physical nature of work has, of course, changed throughout history, not least after the Industrial Revolution and then again after the introduction of computing machinery, but the ways to analyze how to do things have largely been independent of how things are actually done. At the time of Gilbreth and Taylor, the focus was on observable physical activities. Fifty years or so later, the focus had changed to the human as part of a human–machine system and later still to the human in the context of computer–human interaction. This led to proposals such as the GOMS operators described above, which clearly refer to a type of work that is totally different from what Taylor studied. Yet another contribution is a list suggested by Rouse (1981). Where GOMS focused on the elements of computer–human interaction, Rouse looked at typical process control tasks. The list comprised 11 functions, which in alphabetical order were: communicating, coordinating tasks, executing procedures, maintaining, planning, problem solving, recognizing, recording, regulating, scanning, and steering. In contrast to the therbligs, it is possible to organize the functions in many ways, for instance, with reference to an input-output model of information processing, with reference to a control model, with reference to a decision-making model, and so on. The functions proposed by Rouse are on a higher level of abstraction than the therbligs because they refer to cognitive functions, or cognitive tasks, rather than to physical activities. It would clearly be useful for many purposes if it were possible to find a set of basic tasks that could serve as building blocks across contexts and applications. This would be akin to finding a set of elementary processes or functions from which more complex behavior could be built. Such endeavors are widespread in the behavioral and cognitive sciences—and in science in general—although the success rate usually is quite limited. In relation to task analysis, the main reason is that the level of an elementary task depends on the competence and habits of people as well as on the domain. Even if a common denominator were found, it would probably be at a level of detail that had little practical value for training, design, work organization, etc. 3.1 Sequential Task Analysis Task analysis has through its development embraced several different principles of task organization, of which the main ones are the sequential principle and the hierarchical principle. Scientific Management aimed to describe and design tasks in minute detail so that workers could be given precise instructions about how their tasks should be carried out. To do so, it was necessary that tasks could be analyzed unequivocally or “scientifically” as a basis for determining how each task step should be done in the most efficient way, and how the task components should be distributed among the people involved. In the case of manual work, this was quite feasible, since the task could be described as a single sequence of detailed actions or motions. The time-and-motion study method was, however, unable to cope with the growing complexity of tasks that was an unintended consequence of the eager—and sometimes overenthusiastic—application of developments in electronics, control theory, and computing during and after the 1940s and 1950s. Due to the increasing capabilities of machines, people were asked—and tasked—to engage in
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
multiple activities at the same time, either because individual tasks became more complex or because simpler tasks were combined into larger units in order to reduce costs. An important consequence of this was that tasks changed from being a sequence of activities referring to a single goal to become an organized set of activities referring to a hierarchy of goals—as Miller had realized at the start of human factors. The use of machines and technology also became more widespread, so that simple manual work was taken over by machines and automation, which in turn had to be operated or controlled by workers. 3.2 Hierarchical Task Organization Human factors engineering, or classical ergonomics, gradually came to acknowledge that traditional methods of breaking the task down into small pieces, which each could be performed by a person, were no longer adequate. The nature of work had changed and the human capacity for processing information became decisive for the capacity of the human–machine system as such. Since this capacity could not be extended beyond a “natural” upper limit, it soon became clear that the human capacity for learning and adaptation was insufficient to meet technological demands. The technological developments changed the nature of work from being predominantly manual and to become more dependent on mental or cognitive capabilities (comprehension, monitoring, planning). One reason was that as machines became better able to control their own “primitive” functions, humans—who were a costly resource—were left with the task of controlling not just one but many machines often organized in partly self-controlling systems or clusters. In this context it is more than a little ironic that the tasks which operators or humans became responsible for were the tasks that were least easy to describe. Bainbridge (1983) discussed this in her seminal paper about the so-called ironies of automation. She started by pointing out that “the increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator” (Bainbridge, 1983, p. 775). A consequence of this was that the human operators were given tasks that were increasingly difficult and definitely so difficult that it was impossible to design or program a machine to do them. The second irony is that the designer who tries to eliminate the operator still leaves the operator to do the tasks which the designer cannot think how to automate. It … means that the operator can be left with an arbitrary collection of tasks, and little thought may have been given to providing support for them. (Bainbridge, 1983, p. 775) This obviously presented a problem for task analysis, not least because little of what the operators did was observable or even imaginable. (Bainbridge also pointed to a third irony, namely that “it is the most successful automated systems, with rare need for manual intervention, which may need the greatest investment in human operator training” (Bainbridge, 1983, p. 777). And to design and provide that training it is, of course, necessary to carry out a task analysis. An inevitable consequence of these developments was, therefore, the need to analyze and understand work that could not be described as a simple sequence of steps or actions. Miller had partly anticipated hierarchical task analysis when he included the need to decompose task functions into subtasks but even a hierarchical task analysis is sequential when it is described as shown in Figure 1.
THE CHANGING NATURE OF TASK ANALYSIS
363
Transport machine
Lower tilt-back assembly
Tilt back machine
4 5 3Step Step 2Step Step 1Step
3 4 Step 2Step Step 1Step
Figure 1
4
Move machine to other area
4 5 3Step Step 2Step Step 1Step
Stow tilt-back assembly
Step 2 3 Step 1Step
The structure of hierarchical task analysis.
THE NEW REALITY
Decomposition worked effectively because work, as well as the surroundings in which work took place, including the organizations, for a long time were so simple that it was easy to follow what happened and understand why and how things were done. For that reason, work was also (relatively) easy to manage. As long as work as such was visible, analyses could be based on observations of how work was done. But the increasing inclusion of and dependence on technological artefacts meant that work changed from being manual to become cognitive. In other words, a substantial part of work changed from being physical movements to become mental operations or cognitive functions—from therbligs such as select, grasp, reach, move, hold, and release to cognitive functions such as maintaining, planning, problem solving, recognizing, regulating, and steering. In consequence, work gradually became unobservable and therefore had to be inferred from whatever observations could still be made. Work also changed from being localized to being distributed (Hutchins, 1995). This included both a physical or horizontal extension to functions and activities that were not in the same physical location or position (something that has increased dramatically with new web-based technologies for remote meetings), and a temporal or vertical extension to include both upstream and downstream processes. The latter in particular meant that previously separate functions no longer could be treated as such because there might be crucial Table 2
Return machine to standing position
dependencies to what went before (upstream) and what would come after (downstream). One way to characterize this development is to make a distinction between tractable and intractable systems (Hollnagel, 2010). Tractable systems can in practice be completely described or specified, while intractable systems cannot. The differences between the two types of systems are summarized in Table 2. Today’s task analysis must address systems that are larger and more complex than the systems of yesteryear. Because there are many more details to consider; because some modes of operation may be incompletely known; because of tight couplings among functions; and because systems may change faster than they can be described, the net result is that many systems today are underspecified or intractable. There is therefore uncertainty in the work situation, which requires continuous compensating changes or adjustments. Yet in order to specify how something should be done, it is necessary to assume a certain level or degree of predictability and stability. A task can only be prescribed if the conditions under which it is carried out are reasonably stable and therefore predictable. The specification of the task—Work-asImagined—therefore presupposes a stable and predictable work environment which we can refer to as “World-as-Imagined.” A common solution to achieve this is to combine a standardization of the working conditions, to ensure that there are no disturbances, with an insistence on compliance. This may well
Tractable and Intractable Systems
Number of parts and details Comprehensibility Stability
Dependencies among system parts and functions. Relationship to other systems
Tractable system
Intractable system
Few parts and simple descriptions with limited details Principles of functioning are known. Structure (architecture) is also known. Changes are infrequent and usually small. The system is in practice stable and can therefore be fully described. System parts and functions are relatively independent and only loosely coupled Independence, low level of vertical and horizontal integration
Many parts and elaborate descriptions with many details Principles of functioning are known for some parts but not for all Changes are frequent and can be large. The system changes before description is completed. System parts and functions are mutually dependent and tightly coupled Interdependence, high level of vertical and horizontal integration
364
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
be feasible for many tasks across industries, but it is by no means so for all. The fundamental problem of intractable systems is that it is impossible to produce a complete description, which means that there always will be some uncertainty. The challenge for task analysis is, therefore, how to describe something that is not stable, but varies all the time. For such systems and conditions, it is clearly not possible to prescribe tasks and actions in minute detail. Because of the uncertainty, in demands and resources, for instance, performance must be variable or flexible rather than rigid. In fact, the less completely the system is described, the more will performance adjustments be needed. Task analysis in these cases, therefore, requires a more comprehensive understanding of the surroundings, of what lies beyond the arbitrarily set boundary of the local work environment. Going into that is, however, beyond the scope of the present chapter. Since the assumption that systems are tractable no longer is universally valid, it is necessary to develop methods to deal with intractable systems and irregular work environments. The time-honored way of doing that is to focus on which functions are required—as means—to achieve a goal and how they are organized relative to the current situation (e.g., existing resources and demands). This was a natural continuation of the development that had taken us from sequential task analysis to hierarchical task analysis and from then on to goals-means analysis, see below. This development has had a major impact not only on how work situations are studied and analyzed, but also on how the efficiency and safety of work can be ensured. The focus today is not so much on which functions or tasks are needed as components, and how they can be allocated to, e.g., humans or technology, but rather on how tasks or functions are organized. And in particular how this organization or dependency can be ensured when the work environment is no longer stable or completely predictable. 5 FROM STRUCTURAL TO FUNCTIONAL ANALYSIS There are two fundamentally different approaches to task analysis and task description. One is structural and decomposes tasks
Define task under investigation
Goals
WHY
5.1 Bottom-Up Task Analysis The classical task studies represented by both sequential and hierarchical task analysis are data-driven and bottom-up. They start from observations of how tasks are performed, as illustrated by time-and-motion studies or Operational Sequence Diagrams (OSD) and organize these either according to naturally occurring patterns or in relation to the purpose of the task (moving something from one place to another). This naturally leads to a sequential description where each action follows the one before, since this is how work is done and how it can be observed and recorded. This is also what everyone knows when they have thought about their own work, about how they did something. 5.2 Top-Down Task Analysis Top-down or goals-means task analysis considers how work is organized in order to meet the overall objectives. Top-down task analysis starts by identifying the overall goal of the task and then continues, as shown in Figure 3. The classic illustration of the goals-means analysis (which sometimes confusingly also is called means-ends analysis) is the General Problem Solver or GPS, a computer program created in 1959 to work as a universal problem solving machine (Newell & Simon, 1961). (In psychology, the same principle was described by Duncker
Break down overall goal into subgoals
Define overall task goal
Figure 2
into elements; the emphasis is on analysis. The other is functional and looks at what happens as a whole; the emphasis is on synthesis. Both sequential and hierarchical task analysis are structural in the sense that they describe the order in which the prescribed activities are to be carried out. A hierarchy is by definition the description of how something is ordered, and the very representation of a hierarchy (as in Figure 2) emphasizes the structure. As an alternative, it is possible to analyze and describe tasks from a functional point of view (i.e., in terms of how tasks relate to or depend on each other). This changes the emphasis from how tasks and activities are ordered, to what the tasks and activities are supposed to achieve.
Continue until all operations are identified
Define plan to perform subgoal operations
Example of a hierarchical task analysis.
The purpose of the activity
WHAT
WHY
HOW
WHAT
WHY
HOW
WHAT
Means
HOW How it is done (“mechanism”) Figure 3
Goals-means analysis.
What is being done (“function”)
THE CHANGING NATURE OF TASK ANALYSIS
365
already in 1945.) The starting point of a goals-means analysis is a goal or an end, defined as a specified condition or state of the system. A description of the goal usually includes or implies the criteria of achievement or acceptability (i.e., the conditions that determine when the goal has been reached). To achieve the goal, certain means are required. These are typically one or more activities that need to be carried out (i.e., a task). Yet most tasks are possible only if specific conditions are fulfilled. For instance, you can work on your laptop only if you have access to an external power source or if the batteries are charged sufficiently. When these conditions are met, the task can be carried out. If not, bringing about these preconditions (e.g., making sure the batteries are charged) becomes a new goal, denoted a subgoal. In this way goals are decomposed recursively, thereby defining a set of goal–subgoal dependencies that also serves to structure or organize the associated tasks.
Table 3
Function name Input Output
Resource Control Time
necessary Precondition for it to be moved. A following question could be what the Inputs to the function are. In the FRAM, the Inputs represent inputs in the traditional sense as that which is being processed and changed by a function, but also the conditions that trigger or start a function. Here the answer could be that the tilt-back is done when instructions to move the platform into the work area (or into another room) have been given. The same type of question could be asked for the aspects called Preconditions, Resources, Control, and Time. The result could be the description shown in Table 3. By asking such questions it will gradually become clear how each function relates to other functions, not in a temporal sense as coming before or after and not in a hierarchical sense as being a subfunction of another function but in the sense of how they depend on or are coupled to other functions. For instance, the Precondition (the Platform is fully lowered) must be the result or Output from another function, which might be called . A functional analysis starts by describing the functions that are necessary—and hopefully also sufficient—to carry out the activity so that the intended results obtain. The order in which the functions are described is not important since the recursive nature of the analysis ensures that all needed functions will be identified. The outcome is a description of how functions are mutually coupled or dependent or, in other words, a functional analysis of the task. For the example used here, this may lead to a description that can be shown graphically as in Figure 4.
Provide logistic support
I
C
Prepare work P
Machine has been delivered
R Instruction to move machine into work area T
Instruction to move machine into work area C
Prepare operation
T
P Lower tilt-back assembly Check working area
I O
T
Platform is fully lowered
R Outriggers are in place
C
Tilt back machine P
R
O
Machine in tiltback position
Work (activity) planning Manage site activities
Area behind machine frame is clear of personnel and Competence in working with obstructions machine Manage human resources
Figure 4
C
Move I machine to O other area P
Supply machine I
Operating manual
Operating manual Instruction to move machine into work area
O
Tilt back machine Instruction to move lift into work area Machine is in tilt-back position Area behind machine is clear of personnel and obstructions Platform is fully lowered Outriggers are in place Competence in working with machine Operating manual N/A
Precondition
5.3 Functional Analysis A functional analysis starts neither from the top—the goal—nor from the bottom—the observable actions. Instead it starts by describing the functions that are necessary to do something, but as functions rather than as tasks. One approach to develop a description of the functions that are necessary for a specific activity is the Functional Resonance Analysis Method (FRAM; Hollnagel, 2012). The FRAM is a method to produce a functional model of how something can be or has been done. The basic principle is that functions are described in terms of what they do rather than by what they are; they are described by what their outcome or output is and by what is needed to produce the output. The description of a function in the FRAM must include its Inputs and Outputs, but may also include four other aspects called Preconditions, Resources, Control, and Time, respectively. The Preconditions represent that which must be true or verified before a function can begin; the Resources represent that which is needed or consumed while a function is carried out; the Controls represent that which supervises or regulates a function while it is carried out; and Time represents the various ways in which time and temporal conditions can affect how a function is carried out. One way to begin would be to ask what the Outputs are from the function , cf., Figure 1. In other words what is the result of doing this? The answer is, of course, simple, namely that the platform is in the tilt-back position, which is a
T
Description of a Function
Competence in working with machine
A functional analysis of a task.
R
Machine positioned in work area
Resume work
366
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
6 THE ROLE OF TASK ANALYSIS IN THE FUTURE The biggest challenge today is not to account for the (hierarchical) structure of tasks in either a bottom-up or top-down fashion, but to understand how activities or functions could be organized to ensure that the work goes well even when conditions are incompletely known or partly unpredictable. A major consequence of that is the need for methods that can describe and analyze Work-as-Done, in particular the ways in which ubiquitous and necessary adjustments and adaptations—individual as well as collective—are the basis for both work that goes well and for work that does not go as planned (Hollnagel, 2009). Here the main development is the use of functional rather than structural analyses to account for the dynamic couplings among the functions (or tasks) necessary to accomplish an activity. This chapter began by defining task analysis as the study of who should do what, why it is necessary and finally how and when it should be done. The future of task analysis is bright in the sense that there will always be a practical need to know how things should be done. The question is whether task analysis, as it is currently practiced, is capable of meeting this need in the long run. There are several reasons why the reply need not be unequivocally positive: • Task analysis has from the beginning been concerned mostly with individuals, whether as single workers or single users, despite the fact that most work by now involves multiple users (collaboration, distributed work) in complex systems (Hutchins, 1995). Although the importance of distributed cognition and collective work is generally acknowledged, only few methods are capable of analyzing that. • Many task analysis methods are adequate for describing single lines of activity. Unfortunately, most work involves multiple threads and timelines which furthermore may be asynchronous. Most task analysis methods disregard time and temporal relations and instead blithely assume that everything somehow is synchronized. (The indifference to time is unfortunately characteristic of human factors in general, cf., Decortis & De Keyser, 1989.) Although HTA represents a hierarchy of tasks, each subtask or activity is described as carried out on its own (Figure 2). It is awkward to describe two or more simultaneous tasks, even though that is often what people have to cope with in reality. Another shortcoming is how to represent the many important temporal relations other than simple durations of activities (Allen, 1983). • There is a significant difference between Work-asImagined (described tasks) and Work-as-Done (effective tasks) as described by researchers from Leplat and Hoc (1983) to Braithwaite, Wears, and Hollnagel (2017). Work in practice is characterized by ongoing adaptations and improvisations rather than the straightforward execution of a procedure or an instruction. The reasons for this are that demands and resources rarely correspond to what was anticipated when the task was developed, and that the actual situation may differ considerably from what was assumed by the task description, thereby rendering the latter unworkable. 6.1 The Diminishing Role of the Human Factor The focus on the human factor—as a factor in itself—can from a historical perspective be seen as an artifact of an engineering world-view. But decomposing systems into parts is no longer adequate, not least because the surroundings have changed in
ways that make today’s work practices incompatible with the assumptions on which classical human factors and ergonomics are based. Task analysis was first used to solve the practical problem of low productivity and later to solve the problem of the human factor as a limitation or the human as a “fallible machine.” It is clearly still essential to be able to analyze and describe how work has been done, is done, and could be done, but it is also necessary to acknowledge that the role of humans has changed completely. From a contemporary perspective the human is an asset rather than a limitation. Humans are not “fallible machines” but rather essential resources for the adjustment and creative work-arounds that are necessary for work to go well under expected and unexpected conditions alike. This must be recognized by the methods and models that we use. The legacy from the early human factors engineering was to make sure that the human, as a human factor, could meet the requirements of the system, which essentially meant filling in for the things that machines and technology were unable to accomplish. That was the case for MABA-MABA because the motivation was to overcome perceived limitations in a stable and tractable environment. Today the situation is radically different. The goal is not to sub-optimize on a few distinct criteria since these criteria cannot be considered independently of each other. The goal is rather to ensure the potentials/capacities that may be needed in situations that cannot be completely described. The old version of Fitts’ principle was based on the juxtapositions described in Table 1. A contemporary principle would be to look for where in an activity there is a need of variability and adjustments, and where there is not. In the former case, humans should be in charge, in the latter case, we may reluctantly leave it to machines and technology. Only in this way will systems be able to perform as we need them to, even when we do not completely understand how it is to be done.
REFERENCES Allen, J. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26, 832–843. Annett, J. & Stanton, N. A. (Eds.). (2000). Task analysis. Boca Raton, FL: CRC Press. Bainbridge, L. (1983). Ironies of automation. Automatica, 19, 775–779. Braithwaite, J., Wears, R. L. & Hollnagel, E. (Eds.). (2017). Resilient health care, vol. 3: reconciling work-as-imagined and work-asdone. Boca Raton, FL: CRC Press. Card, S., Moran, T. & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Decortis, F. & De Keyser, V. (1989). Time: The Cinderella of man-machine interaction. In G. Mancini(Ed.), Analysis, Design and evaluation of man–machine systems (pp. 65–70). Oxford: Pergamon. Dekker, S. W. & Woods, D. D. (2002). MABA-MABA or abracadabra? Progress on human–automation co-ordination. Cognition, Technology & Work, 4(4), 240–244. Drury, H. B. (1918). Scientific management. A history and criticism. Studies in History, Economics, and Public Law, 65 (2), Whole Number 157 (p. 23.) Duncker, K. (1945). On problem-solving (L.S. Lees, Trans.). Psychological Monographs, 58 (5), i–113. Fitts, P. M. (1951). Human engineering for an effective air-navigation and traffic-control system. Washington, DC: National Research Council Gilbreth, F. B. (1911). Motion study. Princeton, NJ: Van Nostrand. Hollnagel, E. (1993). Models of cognition: Procedural prototypes and contextual control. Le Travail humain, 56(1), 27–51.
THE CHANGING NATURE OF TASK ANALYSIS Hollnagel, E. (Ed.) (2003). Handbook of cognitive task design. Mahwah, NJ: Lawrence Erlbaum Associates. Hollnagel, E. (2006). Task analysis: Why, what, and how. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 373–383). New York: Wiley. Hollnagel, E. (2009). The ETTO principle: Why things that go right sometimes go wrong. Farnham: Ashgate. Hollnagel, E. (Ed.). (2010) Safer complex industrial environments. Boca Raton, FL: CRC Press. Hollnagel, E. (2012). FRAM - The functional resonance analysis method: modelling complex socio-technical systems. Farnham: Ashgate. Hollnagel, E., & Woods, D. D. (1983). Cognitive systems engineering: New wine in new bottles. International Journal of Man–Machine Studies, 18, 583–600. Hollnagel, E., & Woods, D. D. (2005). Joint cognitive systems. foundations of cognitive systems engineering. Boca Raton, FL: CRC Press. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press. Kirwan, B. & Ainsworth, L. K. (Eds.). (1992). A guide to task analysis. London: Taylor & Francis. Leplat, J. & Hoc, J.-M. (1983). Tache et activité dans l’analyse psychologique des situations. Cahiers de Psychologie Cognitive, 3(1), 49–63.
367 Miller, G.A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt, Rinehart and Winston. Miller, R. B. (1953). A method for man–machine task analysis. Technical Report 53–137, Dayton, OH: Wright Air Force Development Center. Newell, A. & Simon, H. A. (1961). GPS: A program that simulates human problem-solving. In Proceedings of a Conference on Learning Automata. Technische Hochschule, Karlsruhe, Germany, April 11–14. Rouse, W. B. (1981). Human-computer interaction in the control of dynamic systems. ACM Computing Survey, 13(1), 71–99. Schraagen, J. M., Chipman, S. F. & Shalin, V. L. (Eds.). (2000). Cognitive task analysis. Hove: Psychology Press. Taylor, F. W. (1911). The principles of scientific management. New York: Harper. Wiener, N. (1994). Invention. The care and feeding of ideas. Cambridge, MA. The MIT Press. Wickens, C. D. & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall. Womack, J. P., Jones, D. T. & Roos, D. (2007). The machine that changed the world: the story of lean production: Toyota’s secret weapon in the global car wars that is now revolutionizing world industry. New York: Simon & Schuster.
CHAPTER 14 WORKPLACE DESIGN Nicolas Marmaras and Dimitris Nathanael National Technical University of Athens Athens, Greece
1
2
3
INTRODUCTION
368
3.4
Phase 4: Setting Specific Design Goals
373
1.1
369
3.5
Phase 5: Design of a Prototype
374
3.6
Phase 6: Assessment of the Prototype
375
3.7
Phase 7: Improvements and Final Design
375
3.8
Final Remarks
375
The Importance of Satisfying Task Demands
PROBLEMS OF WORKING POSTURES
369
2.1
Sitting Posture and Seats
370
2.2
Sitting Posture and Work Surface Height
371
2.3
Spatial Arrangement of Work Artifacts
372
DESIGNING INDIVIDUAL WORKSTATIONS
372
THE LAYOUT OF WORKSTATIONS
375
4.1
Generic Types of Office Layouts
376
4.2
A Systematic Method for Office Layout
376
3.1
Phase 1: Decisions about Resources and High-Level Requirements
373
5
3.2
Phase 2: Identification of Work System Constraints and Requirements
373
REFERENCES
Phase 3: Identification of Users’ Needs
373
3.3
1 INTRODUCTION Workplace design deals with the shape, the dimensions, and the layout (i.e., the placement and orientation) of the different material elements that surround one or more working persons. Examples of such elements are the seat, working surfaces, desk, equipment, tools, controls, and displays used during the work as well as the passages, windows, and heating/cooling equipment. The ergonomic workplace design aims at improving work performance (both in quantity and quality) as well as ensuring occupational safety, health and well-being through: • minimizing the physical workload and the associated strain on the working person; • facilitating task execution, that is, ensuring effortless information exchange with the environment and colleagues, alleviation of various physical constraints, and so on; • achieving ease of use of the various workplace elements.
380
CONCLUSION
381
his or her physical workload, health (particularly if he or she has to work for a long period in this workplace), and finally overall performance. Furthermore, if behind the worker there is a window causing glare on the computer’s screen (characteristic of the environment), he or she will probably bend sideways (awkward posture) in order to clearly view what is presented on the screen (task demand), causing similar effects. Consequently, when designing a workplace, one has to adopt a systemic view, considering at least the characteristics of the working person, the task demands, and the environment in which the task will be performed. Furthermore, the elements of the work system are variable. The task demands may be multiple and variable. For example, at a secretarial workstation, the task may require exclusive use of the computer for a period of time, then data entry from paper forms, and then face-to-face contact with visitors. At the same
task demands
workplace components
working person
Environment
Putting together, a workplace which meets ergonomics requirements while at the same time satisfies task demands is not a trivial task. In fact, to achieve this, one should consider an important number of interacting and variable elements, and try to meet diverse requirements, some of which may be partly conflicting. As shown in Figure 1, in any work setting there is a continuous mutual adjustment between the workplace components, the task demands, and the working person. This mutual adjustment is also subject to broader environmental conditions. Therefore, regardless of how well each individual component of the workplace is designed, the habitual body movement and postures in everyday work emerge by an exploration of the constraints and affordances of the workplace as a whole. Consider, for example, a person working in a computerized office (task demand: work with a computer). If the desk (workplace component (1) is too low and the seat (workplace component (2) is too high for the anthropometric characteristics of the worker (characteristic of the working person), the worker will lean forward (awkward posture), with negative effects on 368
4
Body movements Postures Figure 1 There is interdependence between working person, task demands, workplace elements, the environment, and body movements and postures.
WORKPLACE DESIGN
369
time, the secretary should be able to monitor both the entry and the director’s doors. Finally, the workplace environment may be noisy or quiet, warm or cold, with annoying air streams, illuminated by natural or artificial light, and all the above may change during the course of a working day. In addition, as modern work, particularly in offices, is becoming increasingly collaborative and creative, workplace design should also focus on creating a sense of community that enables innovation, encourages new ways of working and nurtures employee well-being. If to the complexity of the work system and the multiplicity of ergonomics criteria one adds the organizational, social, aesthetic but also financial issues, successful design of a workplace becomes extremely complex. Hence, many people maintain that designing a good workplace is more an “art” than a “discipline” as there is no standard theory or method that ensures a successful result, the output depending heavily on the designer’s “inspiration.” Although this is true to a certain extent, good knowledge of the characteristics of the working persons who will occupy the workplace, of the tasks’ demands and of the broader environment, combined with an attempt at rigor during the design process, contribute decisively to a successful design. 1.1 The Importance of Satisfying Task Demands Despite the multiple external determinants, the workplace still leaves many degrees of freedom to the working person, who can exploit them in more than one way. As stated above, the habitual body movement and postures in everyday work emerge by an exploration of the constraints and affordances of the workplace. This exploration and adaptation process can be considered as a control task: the working person, trying to achieve a satisfactory or at least acceptable balance between multiple demands related to the task, his or her physical abilities, and perceived comfort. This control task can be approximated by a cybernetic model such as the one depicted in Figure 2 (Marmaras, Nathanael, & Zarboutis, 2008). According to this model, the postures adopted by the working persons are under the influence of two nested feedback loops: (1) a positive-feedback loop regarding the satisfaction of task demands (e.g., easy reading and writing for an office worker); and (2) a negative-feedback loop regarding the attenuation of the perceived physical strain and pain due to the body postures and the eventual disorders associated with them. These two loops work toward different objectives (i.e., meeting task demands vs. comfort satisfaction). Therefore, the simultaneous satisfaction of both objectives may prove unattainable. In such situations, their resolution involves a trade-off which will be moderated by the feedback power and the pace of incoming information for each of the two loops. However, the two loops operate on different time scales. The positive one, regarding the satisfaction of task demands, is immediate and constantly perceived, easily linked to the particular arrangement of the various workplace components, and equally easily interpretable (e.g., if an office worker cannot access the keyboard because it is probably placed too far away, he or she will move either the keyboard or the chair or extend his or her upper limbs). The negative Satisfaction of task demands Initial body posture
+ –
New body posture
Minimization of Strain & Pain Figure 2 A cybernetic model depicting the working person’s control task related to body postures modifications.
loop, regarding the attenuation of the perceived physical strain and pain, takes time to be perceived as it has a cumulative character, requiring prolonged exposure (e.g., it takes months or years for musculoskeletal disorders to be built). Furthermore, such feedback is not easily interpretable and attributable to either postures or workplace settings by a non-expert (e.g., even if back pain is felt, it is difficult for an individual to attribute it to a specific posture or workplace setting). We can argue therefore, that workers’ postures are more readily affected by the positive-feedback loop, which as a constant attractor forces the system to be self-organized in a way that favors the satisfaction of task demands. Such an argument has already been validated by Dainoff (1994) in research conducted in laboratory settings. This research indicated that participants performing a high-speed data entry task found it effective to sit in the forward-tilt posture, while participants who performed screen-based editing tasks (with very low keying requirements) found the backward-leaning posture more effective. The model presented above stresses both the existence of regulation mechanisms operating continuously as part of the working person’s activities, and the need to put the task demands and resulting work activities at the center of the design process. It is the latter that makes the ergonomic design synonymous with the user-centered design. The present chapter is mainly methodological; it presents and discusses a number of methods, techniques, guidelines, and typical design solutions which aim to support the decisions to be taken during the workplace design process. Section 2 discusses the problem of working postures and stresses the fact that there is no one best posture which can be assumed for long periods of time. Consequently, the effort should be put on designing the components of the workplace in such a way as to form a “malleable envelope” that permits the working persons to adopt various healthy postures. Sections 3 and 4 deal with the design of individual workstations and with the layout of groups of workstations in a given space.
2 PROBLEMS OF WORKING POSTURES A central issue of ergonomic workplace design is the postures the working person will adopt. In fact, the decisions made during the workplace design will affect to a great extent the postures that the working person will be able to adopt or not. The two most common working postures are sitting and standing. Of the two, the sitting posture is of course more comfortable. However, there is ample research evidence that sitting, adopted for prolonged periods of time, results in discomfort, aches, or even irreversible injuries and health deterioration. For example, Figure 3 shows the most common musculoskeletal disorders encountered at office workstations. Studying the effects of “postural fixity” while sitting, Grieco (1986) found that it causes, among others, (1) reduction of nutritional exchanges at the spine disks, and in the long term may promote their degeneration; (2) static loading of the back and shoulder muscles, which can result in aches and cramping; and (3) restriction in blood flow to the legs, which can cause swelling (edema) and discomfort. Consequently, the following conclusion can be drawn: The workplace should permit the alteration between various postures because there is no “ideal” posture which can be adopted for a long period of time. Based on this conclusion, the standing–sitting workstation has been proposed, especially for cases where the task requires long periods of continuous work (e.g., bank tellers or assembly workstations). This workstation permits the worker to perform a job alternating the standing with the sitting posture (see Figure 4 for an example). Over the last decade height-adjustable workstations are being introduced in office spaces that allow
370
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Hand and wrist tendinitis
Neck strain Shoulder tendinitis & bursitis Tennis and golfer’s elbow (epicondylitis)
Carpal tunnel syndrome
Low back pain Swelling (edema) Figure 3 Common musculoskeletal disorders encountered at office workstations.
the design of seats have attracted the interest of researchers, designers, and manufacturers due to the ever-increasing number of office workers and the importance of musculoskeletal problems encountered by them. This has resulted in the emergence of a proper research domain, and subsequently to a plethora of publications and design solutions (see, e.g., Corlett, 2009; Lueder & Noro, 1994; Mandal, 1985; Marras, 2005). As already stated, sitting posture is associated with a number of issues at the musculoskeletal level. One of the more prevalent is lumbar kyphosis. When one is sitting, the lumbar region of the back flattens out and may even assume an outward bend. This shape of the spine is called kyphotic and is somewhat the opposite to the lordotic shape of the spine when someone is standing erect (Figure 5). The smaller the angle between the thighs and the torso, the greater the kyphosis. This occurs because of the restrained rotation of the hip joint, which forces the pelvis to rotate backward. Kyphosis provokes increased pressure on the spine disks at the lumbar section. Nachemson and Elfstrom (1970) for example, found that unsupported sitting in upright posture resulted in a 40% increase in the disks’ pressure compared to the pressure when standing. There are three complementary ways to minimize lumbar kyphosis: (1) by using a thick lumbar support; (2) by reclining the backrest; and (3) by providing a forward-tilting seat. Andersson et al. (1979) found that the use of a 4-cm-thick lumbar support combined with a backrest recline of 110∘ resulted in a lumbar curve closely resembling the lumbar curve of a standing person. Another finding of Andersson et al. (1979) was that the exact location of the support within the lumbar region did not significantly influence any of the angles measured in the lumbar region. The studies of Bendix (1986) and Bridger (1988) support the proposition of Mandal (1985) for the forward-tilting seat. Considering the above, the following ergonomics requirements should be met: 1. The seats should have a backrest which can recline. 2. The backrest should provide a lumbar support. 3. The seat should provide a forward-tilting seat.
Figure 4
An example of a standing-sitting workstation.
workers to alternate between sitting and standing posture, by raising and lowering the work surface accordingly. Typically, most industry solutions are electrically driven for easy and fast adjustment. A large number of recent studies have shown the benefits of such a solution for reducing back-pain (Agarwal, Steinmaus, & Harris-Adamson, 2018) and alleviating worker discomfort with no negative effects on productivity (Karakolis & Callaghan, 2014). Workstation design efforts to increase bodily activity also include treadmill and cycling workstations, among others. There is evidence that such designs may reduce occupational sedentary time, without compromising work performance (Neuhaus et al., 2014). However, worker acceptance and long-term use of such workstations are still a matter of debate. Despite the absence of an ideal posture, there are, however, postures which are more comfortable and healthier than others. Ergonomic research aims at identifying these postures, and formulating requirements and principles which should be considered during the design of the components of a workplace. In this way the resulting design will promote healthy work postures and constrain the prolonged adoption of unhealthy ones. 2.1 Sitting Posture and Seats The problem of designing seats that are appropriate for work is far from solved. In recent decades the sitting posture and
However, as Dainoff (1994) observes, when tasks require close attention to the objects on the work surface or the computer screen, people usually bend forward, and the backrest support becomes useless. A design solution which aims to minimize lumbar kyphosis is the kneeling or balance chair (Figure 6) where the seat pan is
Lordotic inward arch
Kyphotic outward arch
Figure 5 Lordotic and kyphotic postures of the spine. (Source: Grandjean, 1988. © 1987 Taylor & Francis.)
WORKPLACE DESIGN
371
• Controls should provide immediate feedback (e.g., seats that adjust in height by rotating pan delay feedback because user must get up and down repeatedly to determine the correct position). • The direction of the operation of controls should be logical and consistent with their effect. • Few motions should be required to use the controls. • Adjustments should require the use of only one hand. • Special tools should not be necessary for the adjustment. • Labels and instructions on the furniture should be easy to understand. However, most modern office chairs are still far from meeting the above guidelines (Groenesteijna, Vinka, de Loozea, & Krause, 2009).
Figure 6 Example of a kneeling chair. (Source: Retrieved from www.comcare.gov.au/officewise.html. Licensed under CC BY 4.0.)
inclined more than 20∘ from the horizontal plane. Besides the somewhat unusual way of sitting, this chair has also the drawbacks of loading the area of knees as they receive a great part of the body’s load and of constraining free movement of the legs. On the other hand, it enforces a lumbar lordosis very close to the one adopted while standing, and does not constrain the torso from moving freely forward, backward, or sideways. There are quite a lot of detailed ergonomic requirements concerning the design of seats used at work. For example: • The seat should be adjustable in order to fit the various anthropometric characteristics of their users as well as different working heights. • The seat should offer adequate stability to the user. • The seat should offer freedom of movement to the user. • The seat should be equipped with armrests. • The seat lining material should be water-absorbent to absorb body perspiration. The detailed requirements will not be presented extensively here, as the interested reader can find them easily in any specialized handbook. Furthermore, these requirements have been adopted by regulatory documents such as health and safety or design standards, legislation, and so on (e.g., European standard EN 1335, International Organization for Standardization (ISO) 9241, American National Standards Institute (ANSI)/HFS 100-2007, and the German standard DIN 4543 for office work or EN 1729 for chairs and tables for educational institutions and ISO/DIS 16121 for the driver’s workplace in line-service buses). Although most of the modern seats for office work meet the basic ergonomics requirements, the design of their controls often does not meet the usability principles. This fact, combined with users’ poor knowledge about healthy sitting, results in the non-use of the adjustment possibilities offered by the seats (Vitalis, Marmaras, Legg, & Poulakakis, 2000). Lueder (1986) provides the following guidelines to increase the usability of controls: • Controls should be easy to find and interpret. • Controls should be easily reached and adjusted from the standard seated work position.
2.2 Sitting Posture and Work Surface Height Besides the problem of lumbar kyphosis, working in a sitting posture may also provoke excessive muscle strain at the level of the back and the shoulders. For example, if the working surface is too low, the person will bend forward too far; if it is too high, he or she will be forced to raise the shoulders. To alleviate these problems, appropriate design of the workstation is required. More specifically, the working surface should be at a height that permits a person to work with the shoulders at the relaxed posture. It should be noticed here that the working height does not always equate to the work surface height. The former depends on what one is working on (e.g., the keyboard of a computer) while the latter is the height of the upper surface of the table, desk, bench, and so on. Furthermore, to define the appropriate work surface height, one should consider the angles between the upper arms and the forearms, and the angle between the forearms and the wrists. To increase comfort and minimize musculoskeletal risks, the first of the two angles should be around 90∘ if no force is required, and broader if application of force is required. The wrists should be straight as far as possible in order to avoid carpal tunnel syndrome. Two other common problems encountered by people working in the sitting posture are neck aches and dry-eye syndrome. These problems are related to the prolonged gazing at objects placed too high, for example, when the computer monitor is positioned too high (Ankrum, 1997). Most research findings agree that: (1) neck flexion is more comfortable than extension, with the zero point (dividing flexion from extension) described as the posture of the head/neck when standing erect and looking at a visual target 15∘ below eye level, and (2) the visual system prefers downward gaze angles. Furthermore, there is evidence that when assuming an erect posture, people prefer to tilt their head, with the ear–eye line (i.e., the line which crosses by the cartilaginous protrusion in front of the ear hole and the outer slit in the eyelid) being about 15∘ below the horizontal plane (Grey, Hanson, & Jones, 1966; Jampel & Shi, 1992). Based on these findings many authors propose the following rule of thumb for the placement of the monitor: The center of the monitor should be placed at a minimum of 15∘ below the eye level, with the top and the bottom at an equal distance from the eyes (i.e., the screen plane should be facing slightly upward). Sanders and McCormick (1992) propose in addition the following general ergonomics recommendations for work surfaces: • If possible, the work surface height should be adjustable to fit individual physical dimensions and preferences. • The work surface should be at a level that places the working height at elbow height, with shoulders in a relaxed posture. • The work surface should provide adequate clearance for a person’s thighs under the work surface.
372
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
of the controlling artifacts should reflect the geographical arrangement of the former. • Functional grouping: Artifacts (e.g., dials, controls, visual displays) that are related to a particular function should be grouped together.
2.3 Spatial Arrangement of Work Artifacts While working, one uses a number of artifacts, for example, the controls and displays on a control panel, the different parts of an assembled object at an assembly workstation, or the keyboard, the mouse, the visual display terminal, the hard copy documents, and the telephone at an office workstation. Application of the following ergonomic recommendations for the arrangement of these artifacts helps to decrease workload, facilitate the work flow, and improve overall performance (adapted from Sanders & McCormick, 1992): • Frequency of use and criticality: Artifacts that are frequently used or are of special importance should be placed in prominent positions, for example, in the center of the work surface or near the right hand for right-handed people and vice versa for left-handed people. • Sequential consistency: When a particular procedure is always executed in a sequential order, the artifacts involved should be arranged according to this order. • Topological consistency: Where the physical location of controlled elements is important for the work, the layout
Decisions about the resources and the high-level requirements
Identification of work system constraints and requirements
Application of the above recommendations requires detailed knowledge of the task demands. Task analysis provides enough data to appropriately apply these recommendations, as well as solve eventual conflicts between them by deciding which arrangement best fits the situation at hand.
3 DESIGNING INDIVIDUAL WORKSTATIONS Figure 7 presents a generic process for the design of individual workstations, with the various phases, the data or sources of data to be considered at each phase, and methods that could be applied. It has to be noted that certain phases of the process may be carried out concurrently or in a different order, depending on the particularities of the workstation to design or the preferences and experience of the designers.
Users’ & stakeholders’ requirements analysis Design standards and legislation Work system analysis
Identification of the users’ needs
Task and users’ characteristics analysis
Setting specific design goals
Aggregation of requirements and constraints
Design of prototype(s)
Problems at similar workplaces
Users’ population anthropometric & biomechanical characteristics Existing design solutions
Assessment of the prototype(s)
Improvements & final design
Figure 7
A generic process for the ergonomic design of individual workstations.
WORKPLACE DESIGN
3.1 Phase 1: Decisions about Resources and High-Level Requirements The aim of the first phase of the design process is to decide the time to spend and the people who will participate in the design team. These decisions depend on the high-level requirements of the stakeholders (e.g., improvement of working conditions, increase in productivity, innovation, occupational safety, and health protection), as well as the financing and the significance of the project (e.g., the number of identical workstations, importance of the tasks carried out, special characteristics of the working persons). An additional issue that has to be dealt with in this phase, is to ensure participation in the design team of representatives of those who will occupy the future workstations. Indeed, worker participation is of the utmost importance since it not only ensures that no particular user needs will be neglected, but also it provides a sense of ownership of the resulting solution, enhancing acceptance. The access to workstations where similar jobs are being performed is also advisable. The rest of the design process will be significantly affected by the decisions made in this phase. 3.2 Phase 2: Identification of Work System Constraints and Requirements The aim of this phase is to identify constraints and requirements imposed by the work system or the work situation where the workstation is intended for. More specifically, during this phase the design team has to collect data about the following: • Types of tasks to be carried out at the workstation. • Work organization, for example, working hours, interdependencies between the tasks to be carried out at the workstation, and other tasks or organizational entities in the proximal environment. • Various technological equipment and tools that will be used, their functions and manipulation needs, their shape, dimensions, and user interfaces. • Environmental conditions of the broader area in which the workstation will be installed (e.g., illumination and sources of light, noise levels and sources, thermal conditions, and sources of warm or cold drafts). • Normal as well as exceptional situations in which the working persons could be found (e.g., electrical breakdowns, fire). • Any other element or situation of the work system that may directly or indirectly interfere with the workstation. These data can be collected by questioning the appropriate people, as well as observation and analysis of similar work situations. Specific design standards (e.g., ANSI, EC, DIN, or ISO) as well as legislation related to the type of the workstation designed should also be collected and studied in this phase. 3.3 Phase 3: Identification of Users’ Needs The needs of the future workstation users are identified during this phase, considering their task demands as well as their specific characteristics. Consequently, task analysis (see Chapter 13 by Erik Hollnagel in this volume) and users’ characteristics analysis should be carried out in this phase. The task analysis aims at identifying mainly: • the work processes that will take place and the workstation elements implicated in them; • the physical actions that will be carried out, for example, fine manipulations, whole-body movement, and force exertion;
373
• the required information exchange (visual, auditory, kinesthetic, etc.) and the information sources providing them; • the required privacy; • the required proximity to other workstations, equipment, or elements of the working environment. Observation and analysis of existing work situations with similar workstations may also provide valuable information about the users’ future work activity. In fact, as Leplat (2006) points out, work activity is a complex process which comprises essential dynamic and temporal aspects, and which integrates the effect of multiple constraints and demands. It should be distinguished from behavior that only constitutes its observable facet: Activity includes behavior and its regulating mechanisms (Leplat, 2006). Although work activity can operationally be described from many perspectives using diverse models, its most fundamental characteristic is that it should be studied intrinsically as an original construction by the workers (Daniellou & Rabardel, 2005; Nathanael & Marmaras, 2009). Therefore, users’ work activity needs cannot be fully identified solely by a normative task analysis but may require motivational and other subjective criteria. The specific characteristics of the users’ population may include their gender, age, particular disabilities, previous experiences and work practices, and cultural or religious obligations (e.g., in certain countries women are obliged to wear particular costumes). At this phase, data about performance and health problems of persons working in similar work situations should also be collected. Literature related to ergonomics and to occupational safety and health may be used as the main source for the collection of such data (see the websites of the U.S. Occupational Safety and Health Administration (http://www.osha.gov) and the European Organization of Occupational Safety and Health (http://osha.europa.eu)). Finally, as in the phase 2, the users’ needs should be identified not only for normal but also for exceptional situations in which the workstation occupants may be found (e.g., working under stress, electrical blackout, fire). 3.4 Phase 4: Setting Specific Design Goals Considering the outputs of the previous phases, the design team can now transform the generic ergonomics requirements of workstation design into a set of specific goals. These specific design goals will guide the choices and the decisions to be made in the next phase. Furthermore, they will be used as criteria for assessing the designed prototype and will guide its improvement. The specific goals are typically an aggregation of shoulds and consist of: • Requirements of the stakeholders (e.g., the workstation should be convenient for 95% of the user population, cost per unit may not exceed X amount, should increase productivity at least by 10%). • Constraints and requirements imposed by the work system in which the designed workstations will be installed (e.g., the workstation dimensions should not exceed X centimeters of length and Y centimeters of width, should offer environmental conditions not exceeding X decibels of noise and Y degrees of wet bulb globe temperature). • Users’ needs (e.g., the workstation should accommodate elderly people, should be appropriate for prolonged computer work, should facilitate cooperation with the neighboring workstations, should permit the alteration of sitting and standing postures).
374
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
• Requirements to avoid common health problems associated with similar workstations (e.g., the workstation should minimize upper limb musculoskeletal problems). • Design standards and related legislation (e.g., the workstation should ensure absence of glare or cold drafts). The systematic record of all the specific design goals is very helpful for the next phases. It is important to note that agreement on these specific goals between the design team, the management, and user representatives is indispensable. Foot rest
3.5 Phase 5: Design of a Prototype
Table 1
Figure 8 Where there is no room for a normal seat a support is desirable. (Source: Helander, 1995. © 1995 Taylor & Francis.)
Common comfort zone of upper limbs
Comfort zones of lower limbs Seat height (non adjustable)
This phase is the most demanding of the design process. In fact, the design team has to generate design solutions that meet all the specific design goals identified in phase 4. Due to the large number of design goals, as well as the fact that some of them may be conflicting, the design team has to make appropriate compromises, considering some goals as more important than others and eventually ignoring some of them. As already stated, good knowledge of the task demands and users’ needs, as well as the specific users’ characteristics, is the only way to set the right priorities and avoid serious mistakes. Furthermore, the use of data related to: (1) the size of the body parts (anthropometry, see Chapter 11) and (2) the ability and limits of their movements (biomechanics, see Chapter 12) of the users’ population should be considered in this phase. The first decision to make is the working posture(s) that the users of the workstation shall assume. Table 1 provides some recommendations for this, including a foot rest (Figure 8). Once the working postures have been decided, the design may continue to define the shape, the dimensions, and the arrangement of the various workstation elements. To do so,
Convention line Figure 9 Drawing the common comfort zones of hands and legs for the large and small users of a driving workplace with non-adjustable chair.
Recommendations for Choosing Working Posture
Working posture
Task requirements
Working person’s choice Sitting
It is preferable to arrange for both sitting and standing (see Figure 4) Where a stable body is needed: • For accurate control, fine manipulation • For light manipulation work (continuous) • For close visual work—with prolonged attention • For limited headroom, low work heights Where foot controls are necessary (unless of infrequent or short duration) Where a large proportion of the working day requires standing For heavy, bulky loads Where there are frequent moves from the workplace Where there is no knee room under the equipment Where there is limited front–rear space Where there is a large number of controls and displays Where a large proportion of the working day requires sitting Where there is no room for a normal seat but a support is desirable
Standing
Support seat (see Figure 8)
Source: Corlett & Clark (1995). © 1995 Taylor & Francis.
one has to consider the anthropometric and biomechanical characteristics of the users’ population, as well as the variety of working actions that will be performed. Besides the ergonomics recommendations presented in previous sections, some additional recommendations for the design of the workstation are as follows: • To define the clearance, that is, the minimum required free space for placement of the various body parts, one has to consider the largest user (usually the anthropometric dimensions corresponding to the 97.5 percentile). In fact, by providing free space for these users, all smaller users will also have enough space to place their body. For example, if the vertical, lateral, and forward clearances below the working desk are designed considering the height of the thigh upper surface for a sitting person, the hip width and the thigh length corresponding to the 97.5 percentile of the users’ population (plus 1 or 3 cm for allowance), 97.5% of the users of this desk will freely accommodate the desk while sitting. • To position the different workplace elements that must be reached by the users, consider the smaller user (usually the anthropometric dimensions corresponding to the 2.5 percentile). In fact, if the smaller users easily reach the various workstation elements, that is, without leaning forward or bending sideways, all larger users will also easily be able to reach them. • Draw the common kinetospheres or comfort zones for the larger and smaller users and position the various elements of the workstation that need to be manipulated (e.g., controls) (Figure 9).
WORKPLACE DESIGN
• When feasible, provide the various elements of the workstation with appropriate adjustability in order to fit the anthropometric characteristics of the users’ population. In this case, it is important to ensure the usability of the corresponding controls. • While envisioning design solutions, continuously check to ensure that the workstation elements do not obstruct the users’ courses of action (e.g., perception of necessary visual information, manipulation of controls). It should be stressed that at least some iterations between phases 2, 3, and the present phase of the design process are unavoidable. In fact, it is almost impossible to identify from the start all the constraints and requirements of the work system, the users’ characteristics, or the task requirements that intertwine with the elements of the anticipated workstation. Another issue to deal with in this phase is designing for the protection of the working person from possible annoying or hazardous environmental factors. If the workstation has to be installed in a harsh environment (noisy, cold, or warm, in a hazardous atmosphere, etc.), one has to provide appropriate protection. Again, attention should be paid to the design of such protective elements. These should take into consideration the anthropometric characteristics of the users’ population and the task demands in order not to obstruct the processes involved in both normal and degraded operation (e.g., maintenance, breakdowns). Other important issues that have to be resolved in this phase are the workstation maintainability, its unrestricted evacuation, its stability and robustness, as well as other safety issues, such as rough corners. The search for existing design ideas and solutions is quite useful. However, they should be carefully examined before their adoption. In fact, such design ideas, although valuable for anticipation, may not be readily applicable for the specific users’ population, the specific task demands, or the environment in which the workplace will be installed. Furthermore, existing design solutions may disregard important ergonomics issues. Finally, although the adoption of already existing design solutions exploits the design community’s experience and saves resources, it deprives the design team of generating innovative solutions. The use of computer-aided design (CAD) software with anthropometrically valid human models and/or Virtual Reality simulation is very helpful in this phase. If such software is not available, appropriate drawings and mock-ups should be developed for the generation of design solutions, as well as for their assessment (see phase 6). Given the complexity of generating good design solutions, the search for alternatives is essential. The members of the design team should not be anchored to the first design solution that comes to their minds. They should try to generate as many alternative ideas as possible, gradually converging on the ones that better satisfy the design goals. 3.6 Phase 6: Assessment of the Prototype Assessment of the designed prototype is required in order to check how well the specific design goals, set in phase 4, have been met, as well as to uncover possible omissions during the identification of the work system constraints and requirements and the users’ needs analysis (phases 2 and 3). The assessment can be performed analytically or/and experimentally, depending on the importance of the project. In the analytical assessment the design team assesses the designed workplace considering exhaustively the specific design goals using the drawings and mock-ups as support. Applying a multi-criteria method, the design team may rank the degree to which the design goals have been met. This ranking may be used as a basis for
375
the next phase of the design process (improvement of the prototype), as well as a means to choose among alternative design solutions. The experimental assessment (or user testing) is performed with the participation of a sample of future users, simulating the work with a full-scale mock-up of the designed workstation prototype(s). The assessment should be made in conditions as close as possible to the real work. Development of use scenarios of both normal and exceptional work situations is valid for this reason. Experimental assessment is indispensable for the identification of problematic aspects that are difficult, if not impossible, to realize before having a real workplace with real users. Furthermore, this type of assessment provides valuable insights for eventual needs during implementation (e.g., the training needed, the eventual need for documentation). 3.7 Phase 7: Improvements and Final Design In this phase, the design team proceeds with the required modifications of the designed prototype, considering the outputs of the assessment. The input from other specialists, such as architects and decorators, which have more to do with the aesthetics, or production engineers and industrial designers, which have more to do with production or materials and robustness matters, should be considered in this phase (if such specialists are not already part of the design team). The final design should be complemented with: • drawings for production and appropriate documentation, including the rationale behind the adopted solutions; • cost estimation for the production of the designed workstation(s); • implementation requirements such as the training needed and the users’ manual, if required. 3.8 Final Remarks The reason for conducting the users’ needs and requirements analysis is to anticipate the future work situation in order to design a workstation that fits its users, their tasks, and the surrounding environment. However, it is impossible to completely anticipate a future work situation in all its aspects, as work situations are complex, dynamic, and evolving. Furthermore, if the workstation is destined to form part of an already existing work system, it might affect the overall work ecology, something which is also very challenging to anticipate. Therefore, a number of modifications will eventually be needed some time after installation and use. Thus, it is strongly suggested to conduct a new assessment of the designed workstation once the users have been familiarized with the new work situation.
4 THE LAYOUT OF WORKSTATIONS Layout deals with the placement and orientation of individual workstations in a given space (a building). The main ergonomics requirements concern the tasks performed, the work organization, and the environmental factors. Specifically: • The layout of the workstations should facilitate the work flow. • The layout of the workstations should facilitate cooperation (of both personnel and external persons, e.g., customers). • The layout of the workstations should conform to the organizational structure. • The layout should ensure the required privacy.
376
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
• There should be appropriate lighting, conforming to the task’s and the working person’s needs. • The lighting should be uniform throughout the working person’s visual field. • There should be no annoying reflections or glare in the working area. • There should be no annoying hot or cold drafts in the workplace. • Access to the workstations should be unobstructed and safe. In this section we will focus on the layout of workplaces for office work for the following reasons: First, office layout is an exemplar case for the arrangement of many individual workstations in a given space, encompassing all the main ergonomics requirements found in most types of workplaces (with the exception of workplaces where the technology involved determines to a large extent the layout, e.g., machinery workstations). Second, office workplaces affect a growing percentage of the working population worldwide. For example, during the twentieth century the percentage of office workers increased from 17% to over 50% of the workforce in the United States, the rest working in agriculture, sales, industrial production, and transportation (Czaja, 1987). With the spread of information and communication technologies, the proportion of office workers is expected to further increase; in fact, Brounen and Eichholtz (2004) and Veitch et al. (2007) estimate that at least 50% of the world’s population currently works in some form of office. Third, a significant number of office workers suffer from musculoskeletal disorders or other work-related problems (Corlett, 2006; Griffiths, Mackey, & Adamson, 2007; Luttmann, Schmidt, & Jägera, 2010). Finally, current health problems encountered by office workers are to a great extent related to the inappropriate layout of their workplaces (Marmaras & Papadopoulos, 2002). 4.1 Generic Types of Office Layouts There are a number of generic types of office layouts (Shoshkes, 1976; Zelinsky, 1998). e.g., individual offices, reflecting hierarchy, cubicles office, and open plan. The two extremes are the “individual office”, where each worker has his or her personal closed space/room, and the “open plan”, where all the workstations are placed in a common space. In between are a multitude of combinations of individual offices with open plans. Workstation arrangements in open plans can be either orthogonal, with single, double, or four-fold desks forming parallel rows, or with the workstations arranged in groups, matching the organizational or functional structure of the work. A recent layout philosophy is the “flexible office”, where the furniture and the equipment are designed to be easily movable in order to be able to modify the workstation arrangement depending on the number of people present at the office, as well as the number of running projects or work schemes (Brunnberg, 2000). Finally, in order to respond to the current needs for flexibility in the organization and structure of enterprises, as well as reduce costs, a new trend in office management is the “free address office” or “non-territorial office”, where workers do not have their own workstation but use the workstation they find free whenever at the office. In addition, an increasing number of companies are adopting agile working policies, replacing rows of desktop personal computers with Wi-Fi shared desks, and a range of spaces for informal breakouts, formal meetings and quiet concentration (Gillen, 2019). Examples of this trend is the specification of collaboration areas to prompt better working across teams, concentration areas for quiet, focused tasks, learning zones and development facilities for small group working or self-study, with bookable spaces. Finally, there are amenity spaces for refreshment, reflection, relaxing, and engaging with colleagues.
Each type of layout has its strengths and weaknesses. Individual offices offer increased privacy and better control of environmental conditions, fitting to the particular preferences and needs of their users. However, they are more expensive both in terms of construction and cost of use, not easily modifiable to match changing organizational needs, and render cooperation and supervision difficult. Open-plan offices offer flexibility in changing organizational needs, and facilitate cooperation between co-workers but tend to suffer from environmental annoyances, such as noise and suboptimum climatic conditions as well as lack of privacy (see De Croon et al., 2005, for a review). To minimize the noise level and to create some sense of privacy in the open plans, movable barriers may be used. To be effective, the barriers have to be at least 1.5 m high and 2.5 m wide. Furthermore, Wichman (1984) proposes the following specific design recommendations to enhance the working conditions in an open-plan office: • Use sound-absorbing materials on all major surfaces wherever possible. Noise is often more of a problem than expected. • Equip the workstations with technological devices of low noise (printers, photocopy machines, telephones, etc.). For example, provide telephones that flash a light for the first two “rings” before emitting an auditory signal. • Leave some elements of design for the workstation user. People need to have control over their environments; leave some opportunities to change or rearrange things. • Provide both vertical and horizontal surfaces for the display of personal belongings. People like to personalize their workstations. • Provide several easily accessible islands of privacy. This would include small rooms with full walls and doors that can be used for conferences, teleconferences, and private telephone calls. • Provide all private work areas with a way to signal willingness of the occupant to be disturbed. • Have clearly marked flow paths for visitors. For example, hang signs from the ceiling showing where secretaries and department boundaries are located. • Design workstations so it is easy for drop-by visitors to sit down while speaking. This will tend to reduce disturbances to other workers. • Plan for ventilation air flow. Most traditional offices have ventilation ducting. This is usually not the case with open-plan cubicles, so these often become dead-air cul-de-sacs that are extremely resistant to post hoc resolution. • Overplan for storage space. Open-plan systems with their emphasis on tidiness seem to chronically underestimate the storage needs of people. The decision on the generic type of layout should be taken by the stakeholders. The role of the ergonomist here is to indicate the strengths and weaknesses of each alternative, in order to facilitate the adoption of the most appropriate type of layout for the specific situation. After this decision has been made, the design team should proceed to the detailed layout of the workstations. The next section describes a systematic method for this purpose. 4.2 A Systematic Method for Office Layout This method proposes a systematic way to design workplaces for office work. The method aims to alleviate the design process for workstation arrangement, by decomposing the whole problem into a number of stages during which only a limited number
WORKPLACE DESIGN
377
access to them), and any other need related to the particularities of the unit (e.g., security requirements). • the tasks carried out by each worker. Of particular interest are the need for cooperation with other workers, the privacy needs, the reception of external visitors, and the specific needs for lighting. • the equipment required for each task (e.g., computer monitors, printer, storage).
of ergonomic requirements are considered. Another characteristic of the method is that the ergonomics requirements to be considered have been converted to design guidelines (Margaritis & Marmaras, 2003). Figure 10 presents the main stages of the method. Before starting the layout design, the design team should collect data concerning the activities that will be performed in the workplace and the needs of the workers. More specifically, the following information should be gathered: • the number of people who will work there permanently or occasionally; • the organizational structure and the organizational units it comprises; • the activities carried out by each organizational unit. Of particular interest are the need for cooperation between the different units (and consequently the desired relative proximity between them), the need for reception of external visitors (and consequently the need to provide easy
At this stage, the design team should also get the detailed ground plan drawings of the space concerned, including all fixed structural elements (e.g., structural walls, heating systems). 4.2.1 Stage 1: Determination of Available Space The aim of this stage is to determine the space where no furniture should be placed, in order to ensure free passage by the doors and other elements such as windows and radiators to allow for manipulation and maintenance purposes.
Inputs Ergonomics requirements
(data & requirements)
• Walls • Columns • Doors • Windows • Radiators • ….
Determination of the available space
• Easy access to: doors windows radiators …
• Desk • Chairs • Computer • Lockers • ….
Design of workstation modules
• Fitting the workstation to its users and tasks • Easy access to the workstation
• Cooperation • Available surface • Required surface • Organizational units • Already existing closed spaces • …. • Organizational units • Cooperation • Lighting needs • Privacy needs • External visitors • Structural elements • ….
• Cooperation • Doors • Windows • Supervision • ….
Figure 10
Placement of the organizational units
• Facilitation of cooperation (internal and external) • Ensure working conditions meet task requirements • Utilization of already existing closed spaces
Placement of the workstation modules
• Facilitation of cooperation (internal and external) • Ensure working conditions meet task requirements • Easy access to the workstations • Ensure required privacy • ….
Orientation of the workstation modules
• Facilitation of cooperation (internal and external) • Ensure appropriate lighting conditions • Easy access to the workstations • Facilitation of supervision • ….
The main stages of a method for office layout meeting the ergonomic requirements.
378
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
55cm
75cm
55cm
55cm
Figure 11
Determining the available space.
100cm
To determine the free-of-furniture spaces the following suggestions can be used (Figure 11). Allow for the following: • an area of 50 cm in front of any window • an area of 3 m in front and 1 m on both sides of the main entrance door • an area of 1.50 m in front and 50 cm on both sides of any other door • an area of 50 cm around any radiator 4.2.2 Stage 2: Design of Workstation Modules The aim of this stage is to design workstation modules that meet the needs of the working persons. Each module is composed of the appropriate elements for the working activities, that is, desk, seat, storage cabinets, visitors’ seats, and any other equipment required for the work. A free space should be provided around the furniture for passage between workstations, as well as for unobstructed sitting and getting up from the seat. This free space may be delimited in the following way (minimum areas). Allow for: • an area of 55 cm along the front side of the desk or the outer edge of the visitor’s seat; • an area of 50 cm along the entry side of the workstation; • an area of 75 cm along the back side of the desk (seat side); • an area of 100 cm along the back side of the desk if there are storage cabinets behind the desk. A number of different modules will result from this stage, depending on the particular work requirements (e.g., secretarial module, head of unit module, client service module) (Figure 12). Laying out workstation modules instead of individual elements such as desks, seats, and so on, permits the designer to focus on the requirements related to the overall layout of the workplace, at the same time ensuring compliance with the requirements related to the individual workstations. 4.2.3 Stage 3: Placement of Organizational Units The aim of this stage is to decide the placement of the different organizational units (i.e., departments, working teams, etc.) within the various free spaces of the building. There are five main issues to be considered here: (1) the shape of each space; (2) the exploitable area of each space, that is, the area where workstations can be placed; (3) the required area for each unit; (4) the desired proximity between the different units; and (5) eventual particular requirements of each unit which may determine their absolute placement within the building (e.g., the reception should be placed right next to the main entrance).
Figure 12
Examples of workstation modules.
The exploitable area of each space is an approximation of the “free-of-furniture spaces” defined in the first stage, considering also narrow shapes where modules cannot fit. Specifically, this area can be calculated as follows: Aexploitable = Atotal − Awhere no modules can be placed where:
Atotal = total area of each space Awhere no modules can be placed = nonexploitable area, where workstation modules should not or cannot be placed
The required area for each organizational unit can be estimated considering the number of workstation modules needed and the area required for each module. Specifically, in order to estimate the required area for each organizational unit, Arequired , one has to calculate the sum of the areas of the different workstation modules of the unit. Comparing the exploitable area of the different spaces with the required area for each unit, the candidate spaces for placing the different units can be defined. Specifically, the candidate spaces for the placement of a particular unit are the spaces where Aexploitable ≥ Arequired Once the candidate spaces for each unit have been defined, the final decisions about the placement of organizational units can be made. This is done in two steps. In the first step the designer designates spaces for eventual units which present particular placement requirements (e.g., reception). In the second step he or she positions the remaining units considering their desirable relative proximity plus additional criteria such as the need for natural lighting or the reception of external visitors. To facilitate the placement of the organizational units according to their proximity requirements, a proximity table as well as proximity diagrams may be used. The proximity table represents the desired proximity of each unit with any other one, rated by using the following scale: 9: The two units cooperate firmly and should be placed close together. 3: The two units cooperate from time to time, and it would be desirable to place them in proximity. 1: The two units do not cooperate frequently, and it is indifferent if they will be placed in proximity.
WORKPLACE DESIGN
379
Figure 13 presents the proximity table of a hypothetical firm consisting of nine organizational units. At the right bottom of the proximity table, the total proximity rate (TPR) has been calculated for each unit as the sum of its individual proximity rates. The TPR is an indication of the cooperation needs of each unit with all the others. Consequently, the designer should try to place the units with high TPRs at a central position. Proximity diagrams are a graphical method for the relative placement of organizational units. They facilitate the heuristic search for configurations which minimize the distance between units with close cooperation. Proximity diagrams are typically
drawn with equidistant points, like the one shown at Figure 14. The different units are alternated at the different points, trying to find arrangements where the units with close cooperation will be as close as possible to each other. The following rules may be applied to obtain a first configuration: • Place the unit with the highest TPR at the central point. • If more than one unit has the same TPR, place the unit with the closest proximity rates (9s) first. • Continue placing the units having the higher proximity rates with the ones that have already been positioned. • If more than one unit has proximity rates equal to the one already positioned, place the unit with the higher TPR first. • Continue in the same manner until all the units have been positioned.
Direction 9 Secretariat
3 1 1
1
1
1
9 9
Engineers II 1
3
28
3
4.2.4 Stage 4: Placement of Workstation Modules Considering the outputs of the previous stage, placement of the workstation modules of each unit can start. The following guidelines provide help in meeting ergonomic requirements:
ro
28
ta To
42 24
28
3
3 Personnel
32
lP
Marketing
30
3 3
32 20
3
3 3
9
3 3
9
1 1
Sales
9 3
9
s
Engineers I
3
1
te
9
1
ity
R&D
1 1
Ra
3
xim
Accounting
More than one alternative arrangement may be obtained in this way. It should be noted that the proximity diagrams are drawn without taking into account the required area for each unit and the exploitable area of the spaces where the units may be placed. Consequently, the arrangements drawn cannot be directly transposed onto the ground plan of the building, without modifications. Drawing the proximity diagrams is a means of facilitating the decision concerning the relative positions between organizational units. This method becomes quite useful when the number of units is high.
3
1. 2.
Figure 13
The proximity table of a hypothetical firm.
Enll
Place the workstations in a way that facilitates cooperation between co-workers. In other words, workers who cooperate closely should be placed near each other. Place the workstations receiving external visitors near the entrance doors.
Enl
Pe
Se
RD
Ma
Di
Sa
Co
Proximity rate = 9 Proximity rate = 3 Figure 14
Example of a proximity diagram.
380
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
3. Place as many workstations as possible near the windows. Windows may provide benefits besides variety in lighting and a view (Hall, 1966). They permit fine adjustment of light through curtains or venetian blinds and provide distant points of visual focus, which can relieve eye fatigue. Furthermore, related research has found that people strongly prefer the workstations placed near windows (Manning, 1965; Sanders & McCormick, 1992). 4. Avoid placing the working persons in airstreams created by air conditioners, open windows, and doors. 5. Place the workstation modules in a way that forms clear corridors leading to the doors. Corridors widths for one-person passage should be at least 60 cm and for two-person passage at least 120 cm (Alder, 1999). 6. Leave the required space in front and to the sides of electric switches and wall plugs. 7. Leave the required space for waiting visitors. In cases where waiting queues are expected, provide at least a free space of 120 cm width and n × 45 cm length, where n is the maximum expected number of waiting people. Add to this length another 50 cm in front of the queue.
Windows
Workstations with VDUs Figure 15 Workstations with computer monitors should ideally be placed at right angles to the windows.
4.2.5 Stage 5: Orientation of Workstation Modules The aim of this stage is to define the direction of the workstation modules of each unit to meet the ergonomics requirements. This stage can be carried out either concurrently with or after stage 4. The use of specialized CAD tools is very helpful here, greatly facilitating the generation and assessment of alternative design solutions. The following guidelines may be applied, making appropriate trade-offs if all of them cannot be satisfied:
3. Orient the workstations in a way that allows working persons to observe entrance doors. 4. Orient the workstations so as to facilitate cooperation between members of work teams. Figure 16 shows alternative orientations of workstations, depending on the number of team members and the presence or not of a leader (Cummings, Huber, & Arendt, 1974).
1. Orient the workstations in such a way that there are no windows directly in front or behind the working persons when they are looking toward a computer monitor. In offices, windows play a role similar to lights: A window right in front of a working person disturbs through direct glare, while directly behind produces reflected glare. For this reason workstations ideally should be placed at right angles to the windows (Grandjean, 1987) (Figure 15). 2. Orient the workstations in such a way that there are no direct lighting sources within ±40∘ in the vertical and horizontal directions from the line of sight in order to avoid direct glare (Kroemer, Kroemer, & Kroemer-Elbert, 1994).
5 CONCLUSION Given the number of requirements to be met in workplace layout design, the design team, trying to apply the various ergonomics guidelines in the different phases, will almost definitely encounter conflicts. To resolve them, the design team should be able to focus on the ones considered more important for the case at hand, and pay less attention or eventually neglect others. Good knowledge of the generic human abilities and limitations, the specific characteristics of the people who will work in the designed workplace, and the specificities of the work to be carried out is a prerequisite for good decisions. Furthermore, the members of the design team should have open
Ia
Ib
Ic
IIa
IIb
IIc
Figure 16 Alternative orientations of workstations, depending on the number of team members and the presence or not of a leader. Ia, Ib and Ic: arrangements with leader; IIa, IIb and IIc: arrangements without leader.
WORKPLACE DESIGN
and innovative minds and try as many solutions as possible. A systematic assessment of these alternatives is advisable to decide on the most satisfactory solution. The participation of the different stakeholders in this process is strongly recommended. In fact, while higher management commitment is crucial to successful workplace design, the design team needs to actively involve the future occupants as well. Indeed, every new work configuration requires an effort from the people to adapt their behavior and attitudes. Therefore, to be sustainable, workplace design should not be directed only from top to bottom; it also needs to be informed and genuinely influenced from the bottom up. Only when people feel they have played a real part in its shape and development and got an understanding of “the stakes for them” will they be willing to own the design change and adapt their behaviors and attitudes (Gillen, 2019). Therefore, dialogue and conflict management at every level of the organization, through interviews, workshops and other engagement activities are crucial.
REFERENCES Agarwal, S., Steinmaus, C., & Harris-Adamson, C. (2018). Sit-stand workstations and impact on low back discomfort: a systematic review and meta-analysis. Ergonomics, 61(4). 538–552. Alder, D. (1999). Metric handbook planning and design data (2nd ed.). New York: Architectural Press. Andersson, G., Murphy, R., Ortengren, R., & Nachemson, A. (1979). The influence of backrest inclination and lumbar support on the lumbar lordosis in sitting. Spine, 4, 52–58. Ankrum, D. R. (1997). Integrating neck posture and vision at VDT workstations. In H. Miyamoto, S. Saito, M. Kajiyama & N. Koizumi (Eds.), Proceedings of the Fifth International Scientific Conference on Work with Display Units, Waseda University, Tokyo (pp. 63–64). Bendix, T. (1986). Seated trunk posture at various seat inclinations, seats heights, and table heights. Human Factors, 26, 695–703. Bridger, R. (1988). Postural adaptations to a sloping chair and work-surface. Human Factors,30, 237–247. Brounen, D., & Eichholtz, P. (2004). Demographics and the global office market—consequences for property portfolios. Journal of Real Estate Portfolio Management, 10(3), 231–242. Brunnberg, H. (2000). Evaluation of flexible offices. In Proceedings of the IEA 2000/HFES 2000 Congress, 1, HFES, San Diego (pp. 667–670). Corlett, E. N. (2006). Background to sitting at work: Research-based requirements for the design of work seats. Ergonomics, 49(14), 1538–1546. Corlett, E. N. (2009). Ergonomics and sitting at work. Work, 34(2), 235–238. Corlett, E. N., & Clark, T. S. (1995). The ergonomics of workspaces and machines. London: Taylor & Francis. Cummings, L., Huber, G. P., & Arendt, E. (1974). Effects of size and spatial arrangements on group decision-making. Academy of Management Journal, 17(3), 460–475. Czaja, S. J. (1987). Human factors in office automation. In G. Salvendy (Ed.), Handbook of human factors. New York: Wiley. Dainoff, M. (1994). Three myths of ergonomic seating. In R. Lueder & K. Noro (Eds.), Hard facts about soft machines: The ergonomics of seating. London: Taylor and Francis. Daniellou, F., & Rabardel, P. (2005). Activity-oriented approaches to ergonomics: some traditions and communities. Theoretical Issues in Ergonomics Science, 6(5), 353–357. De Croon, E., Sluiter, J., Kuijer, P. P., & Frings-Dresen. M. (2005). The effect of office concepts on worker health and performance: A systematic review of the literature. Ergonomics, 48(2), 119–134.
381 Gillen, N. (2019). Future office: Next-generation workplace design. London: RIBA Publishing. Grandjean, E. (1987). Ergonomics in computerized offices. London: Taylor and Francis. Grey, F. E., Hanson, J. A., & Jones, F. P. (1966). Postural aspects of neck muscle tension. Ergonomics, 9(3), 245–256. Grieco, A. (1986). Sitting posture: An old problem and a new one. Ergonomics, 29, 345–362. Griffiths, K. L., Mackey, M. G., & Adamson, B. J. (2007). The impact of a computerized work environment on professional occupational groups and behavioural and physiological risk factors for musculoskeletal symptoms: A literature review. Journal of Occupational Rehabilitation, 17(4), 743–765. Groenesteijna, L., Vinka, P., de Loozea, M., & Krause, F. (2009). Effects of differences in office chair controls, seat and backrest angle design in relation to tasks. Applied Ergonomics, 40(3), 362–370. Hall, E. T. (1966). The hidden dimension, New York: Doubleday. Helander, M. (1995). A guide to the ergonomics of manufacturing. London: Taylor and Francis. Jampel, R. S. & Shi, D. X. (1992). The primary position of the eyes, the resetting saccade, and the transverse visual head plane. Investigative Ophthalmology and Visual Science, 33, 2501–2510. Karakolis, T., & Callaghan, J. P. (2014). The impact of sit–stand office workstations on worker discomfort and productivity: A review. Applied Ergonomics, 45(3). 799–806. Kroemer, K., & Grandjean, E. (1997). Fitting the task to the human: A textbook of occupational ergonomics (5th ed.). London: Taylor & Francis. Kroemer, K., Kroemer, H., & Kroemer-Elbert, K. (1994). How to design for ease and efficiency. Englewood Cliffs, NJ: Prentice-Hall. Leplat, J. (2006). Activity. In W. Karwowski (Ed.), International encyclopedia of ergonomics and human factors (pp. 567–571). London: Taylor & Francis. Lueder, R. (1986). Work station design. In R. Lueder (Ed.), The ergonomics payoff: Designing the electronic office. Toronto: Holt, Rinehart & Winston. Lueder, R., & Noro, K. (1994). Hard facts about soft machines: The ergonomics of seating. London: Taylor & Francis. Luttmann, A., Schmidt, K. H., & Jägera M. (2010). Working conditions, muscular activity and complaints of office workers. International Journal of Industrial Ergonomics, 40(5), 549–559. Mandal, A. (1985). The seated man. Copenhagen, Denmark: Dafnia Publications. Manning, P. (1965). Office design: A study of environment by the Pilkington Research Unit. Liverpool: University of Liverpool Press. Margaritis, S., & Marmaras, N. (2003). Making the ergonomic requirements functional: The case of computerized office layout. In Proceedings of the XVth Triennial Congress of the International Ergonomics Association and The 7th Conference of Ergonomics Society of Korea /Japan Ergonomics Society. “Ergonomics in the Digital Age,” August 24–29, The Ergonomics Society of Korea, Seoul. Marmaras, N., Nathanael, D., & Zarboutis, N. (2008). The transition from CRT to LCD monitors: Effects on monitor placement and possible consequences in viewing distance and body postures. International Journal of Industrial Ergonomics, 38(7–8), 584–592. Marmaras, N., & Papadopoulos, S. (2002). A study of computerized offices in Greece: Are ergonomic design requirements met? International Journal of Human-Computer Interaction, 16(2), 261–281. Marras, W. S. (2005). The future of research in understanding and controlling work-related low back disorders. Ergonomics, 48, 464–477. Nachemson, A., & Elfstrom, G. (1970). Intravital dynamic pressure measurements in lumbar disks. Scandinavian Journal of Rehabilitation Medicine, Suppl., 1, 1–40.
382 Nathanael, D., & Marmaras, N. (2009). The redesign of a tram drivers workstation: An activity approach. In S. Sauter, M. Dainoff, & M. Smith (Eds.), Proceedings of the XVIIth World Congress of the IEA, Peking, China. Neuhaus, M., Eakin, E. G., Straker, L., Owen, N., Dunstan, D. W., Reid, N., & Healy, G. N. (2014). Reducing occupational sedentary time: A systematic review and meta-analysis of evidence on activity-permissive workstations. Obesity Reviews, 15(10), 822–838. Sanders, S. M., & McCormick, J. E. (1992). Human factors in engineering and design (7th ed.). New York: McGraw-Hill. Shoshkes, L. (1976). Space planning: Designing the office environment, New York: Architectural Record Books.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Veitch, J. A., Charles, K. E., Farley, K.M. J., & Newsham, G. R. (2007). A model of satisfaction with open-plan office conditions: COPE field findings. Journal of Environmental Psychology, 27(3), 177–189. Vitalis, A., Marmaras, N., Legg, S., &Poulakakis, G. (2000). Please be seated. In Proceedings of the 14th Triennial Congress of the International Ergonomics Association, HFES, San Diego, pp. 43–45. Wichman, H. (1984). Shifting from traditional to open offices: Problems and suggested design principles. In H. Hendrick & O. Brown, Jr. (Eds.), Human factors in organizational design and management. Amsterdam: Elsevier. Zelinsky, M. (1998). New workplaces for new work styles. New York: McGraw-Hill.
CHAPTER
15
JOB AND TEAM DESIGN Frederick P. Morgeson Eli Broad College of Business Michigan State University East Lansing, Michigan
Michael A. Campion Krannert School of Management Purdue University West Lafayette, Indiana
1
INTRODUCTION
383
1.1 1.2 2
Team Design
JOB DESIGN APPROACHES
383
IMPLEMENTATION ADVICE FOR JOB AND TEAM DESIGN
397
384
4.1
397
384
General Implementation Advice
4.2
Implementation Advice for Job Design and Redesign
399
4.3
Implementation Advice for Team Design
402
MEASUREMENT AND EVALUATION OF JOB AND TEAM DESIGN
406
2.1
Mechanistic Job Design Approach
384
2.2
Motivational Job Design Approach
387
5.1
Perceptual/Motor Job Design Approach
Using Questionnaires to Measure Job and Team Design
406
391
5.2
Choosing Sources of Data
407
Biological Job Design Approach
5.3
Long-Term Effects and Potential Biases
407
391
5.4
Job Analysis
407 408
2.3 2.4 3
Job Design
4
5
THE TEAM DESIGN APPROACH
392
5.5
Other Approaches
3.1
Historical Development
392
5.6
Example of an Evaluation of a Job Design
408
3.2
Design Recommendations
392
5.7
Example of an Evaluation of a Team Design
409
3.3
Advantages and Disadvantages
393
1 INTRODUCTION 1.1 Job Design Job design is an aspect of managing organizations that is so commonplace it often goes unnoticed. Most people realize the importance of job design when an organization or new plant is starting up, and some recognize the importance of job design when organizations are restructuring or changing processes. But fewer people realize that job design may be affected as organizations change markets or strategies, managers use their discretion in the assignment of tasks on a daily basis, people in the jobs or their managers change, the work force or labor markets change, or there are performance, safety, or satisfaction problems. Fewer yet realize that job design change can be used as an intervention to enhance organizational goals (Campion & Medsker, 1992). It is clear that many different aspects of an organization influence job design, especially an organization’s structure, technology, processes, and environment. These influences are beyond the scope of this chapter, but they are dealt with in other work (e.g., Davis, 1982; Davis & Wacker, 1982; Parker, Van den Broeck, & Holman, 2017). These influences impose constraints on how jobs are designed and will play a major role in any practical application. However, it is the assumption of this chapter that considerable discretion exists in the design of jobs in most situations, and the job (defined as a set of tasks
REFERENCES
409
performed by a worker) is a convenient unit of analysis in both developing new organizations or changing existing ones (Campion & Medsker, 1992). The importance of job design lies in its strong influence on a broad range of important efficiency and human resource outcomes. Job design has predictable consequences for outcomes, including the following (Campion & Medsker, 1992; Morgeson & Humphrey, 2008; Parker, 2014): 1. Productivity 2. Quality 3. Job satisfaction 4. Training times 5. Intrinsic work motivation 6. Staffing 7. Error rates 8. Accident rates 9. Mental fatigue 10. Physical fatigue 11. Stress 12. Mental ability requirements 13. Physical ability requirements 14. Job involvement 15. Absenteeism 383
384
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
16. Medical incidents 17. Turnover 18. Compensation rates According to Louis Davis, one of the most influential writers on job design in the engineering literature, many of the personnel and productivity problems in industry may be the direct result of the design of jobs (Davis, 1957; Davis et al., 1955; Davis & Taylor, 1979; Davis & Valfer, 1965; Davis & Wacker, 1982, 1987). Unfortunately, people mistakenly view the design of jobs as technologically determined and inalterable. However, job designs are actually social inventions. They reflect the values of the era in which they were constructed. These values include the economic goal of minimizing immediate costs (Davis et al., 1955; Taylor, 1979) and theories of human motivation (Steers & Mowday, 1977; Warr & Wall, 1975). These values, and the designs they influence, are not immutable givens, but are subject to modification (Campion & Medsker, 1992; Campion & Thayer, 1985). The question then becomes: What is the best way to design a job? In fact, there is no single best way. There are several major approaches to job design, each derived from a different discipline and reflecting different theoretical orientations and values (Parker, Morgeson, & Johns, 2017). This chapter describes these approaches, their costs and benefits, and tools and procedures for developing and assessing jobs in all types of organizations. It highlights trade-offs which must be made when choosing among different approaches to job design. This chapter also compares the design of jobs for individuals working independently to the design of work for teams, which is an alternative to designing jobs at the level of individual workers. It presents the advantages and disadvantages of designing work around individuals compared to designing work for teams and provides advice on implementing and evaluating the different work design approaches. 1.2 Team Design The major approaches to job design typically focus on designing jobs for individual workers. However, the approach to work design at the level of the group or team, rather than at the level of individual workers, has gained substantially in popularity, and many US organizations routinely are using teams (Guzzo & Shea, 1992; Hoerr, 1989; Mathieu, Gallagher, Domingo, & Klock 2019; Morgeson & Humphrey, 2008). New manufacturing systems (e.g., flexible, cellular, lean) and advances in our understanding of team processes not only allow designers to consider the use of work teams, but also often seem to encourage the use of team approaches (Gallagher & Knight, 1986; Majchrzak, 1988). In designing jobs for teams, one assigns a task or set of tasks to a team of workers, rather than to an individual, and considers the team to be the primary unit of performance. Objectives and rewards focus on team, not individual, behavior. Depending on the nature of its tasks, a team’s workers may be performing the same tasks simultaneously or they may break tasks into subtasks to be performed by individuals within the team. Subtasks can be assigned on the basis of expertise or interest, or team members might rotate from one subtask to another to provide variety and increase breadth of skills and flexibility in the work force (Campion, Cheraskin, & Stevens, 1994; Campion & Medsker, 1992). Some tasks are of a size, complexity, or otherwise seem to naturally fit into a team job design, whereas others may seem to be appropriate only at the individual job level. In many cases, though, there may be a considerable degree of choice regarding whether one organizes work around teams or individuals. In such situations, the designer should consider
advantages and disadvantages of the use of the job and team design approaches with respect to an organization’s goals, policies, technologies, and constraints (Campion, Medsker& Higgs, 1993).
2 JOB DESIGN APPROACHES This chapter adopts an interdisciplinary perspective on job design. Interdisciplinary research on job design has shown that different approaches to job design exist. Each is oriented toward a particular subset of outcomes, each has disadvantages as well as advantages, and trade-offs among approaches are required in most job design situations (Campion, 1988, 1989; Campion & Berger, 1990; Campion & McClelland, 1991, 1993; Campion & Thayer, 1985; Campion et al., 2005; Edwards, Scully, & Brtek 1999, 2000; Morgeson & Campion, 2002, 2003). While not new, contemporary work design researchers and practitioners have begun to reintegrate social and contextual aspects of employees’ work with the characteristics traditionally studied by job design. These approaches to work design have since led to new approaches and have become incorporated into new assessment tools (Grant & Parker, 2009; Humphrey et al., 2007; Morgeson & Humphrey, 2006; Morgeson et al., 2010). Building off and integrating the suggestions made in Campion’s (1988; Campion & Thayer, 1985) interdisciplinary model of job design (MJDQ), the Work Design Questionnaire (WDQ) represents a new tool with which to assess work design (Morgeson & Humphrey, 2006). This measure broadens the scope, discussion, and measurement of job design through the use of three broad categories of work characteristics (motivational, social, and work context). The WDQ assesses the job and its link to the worker’s social and physical context and allows job designers to assess important yet infrequently studied aspects of work design such as knowledge/ability characteristics and social characteristics. The key difference between the MJDQ and the WDQ is the perspective from which the job design is assessed. In the original MJDQ, each perspective (mechanistic, motivational, perceptual-motor, and biological) is proposed to assess a different set of design principles intended to create different outcomes, and thus appeal to a different set of stakeholders (Campion & Thayer, 1985). On the other hand, the WDQ aims to include the concerns of each approach captured in the MJDQ (with more emphasis on the motivational, perceptual-motor, and biological approaches than the mechanistic approach), along with social and contextual concerns in a single approach to designing better work. Based on a framework developed by Morgeson and Campion (2003) the authors used three categories to integrate aspects of work design (motivational, social, and contextual). The four major approaches to job design are reviewed below with a discussion of the applicability of the WDQ characteristics included. Table 1 summarizes the job design approaches and Tables 2 and 3 provide specific recommendations according to the MJDQ and the WDQ. The team design approach is reviewed in Section 3. 2.1 Mechanistic Job Design Approach 2.1.1 Historical Development The historical roots of job design can be traced back to the idea of the division of labor, which was very important to early thinking on the economies of manufacturing (Babbage, 1835; Smith, 1776). The division of labor led to job designs characterized by specialization and simplification. Jobs designed in this fashion had many advantages, including reduced learning time, time saved from not having to change tasks or tools, increased proficiency from repeating tasks, and development of specialized tools and equipment.
JOB AND TEAM DESIGN Table 1
385
Advantages and Disadvantages of Various Job Design Approaches
Approach/discipline Base references
Recommendations
Benefits
Costs
Mechanistic/classic industrial engineering (Gilbreth, 1911; Taylor, 1911; Niebel, 1988)
Increase in: • Specialization • Simplification • Repetition • Automation Decrease in: • Spare time
Decrease in: • Training • Staffing difficulty • Making errors • Mental overload and fatigue • Mental skills and abilities • Compensation
Increase in: • Absenteeism • Boredom Decrease in: • Satisfaction • Motivation
Motivational/organizational psychology (Hackman & Oldham, 1980; Herzberg, 1966)
Increase in: • Variety • Autonomy • Significance • Skill usage • Participation • Feedback • Recognition • Growth • Achievement
Increase in: • Satisfaction • Motivation • Involvement • Performance • Customer Service • Catching errors Decrease in: • Absenteeism • Turnover
Increase in: • Training time/cost • Staffing difficulty • Making errors • Mental overload • Stress • Mental skills and abilities • Compensation
Perceptual-Motor/experimental psychology, human factors (Salvendy, 1987; Sanders & McCormick, 1987)
Increase in: • Lighting quality • Display and control quality • User-friendly equipment Decrease in: • Information processing requirements
Decrease in: • Making errors • Accidents • Mental overload • Stress • Training time/cost • Staffing difficulty • Compensation • Mental skills and abilities
Increase in: • Boredom Decrease in: • Satisfaction
Biological/physiology, biomechanics, ergonomics (Astrand & Rodahl, 1977; Tichauer, 1978; Grandjean, 1980)
Increase in: • Seating comfort • Postural comfort Decrease in: • Strength requirements • Endurance requirements • Environmental stressors
Decrease in: • Physical abilities • Physical fatigue • Aches and pains • Medical incidents
Increase in: • Financial cost • Inactivity
Source: Campion & Medsker, 1992. © 1992 John Wiley & Sons. Note: Advantages and disadvantages based on findings in previous interdisciplinary research (Campion, 1988, 1989; Campion & Berger, 1990; Campion & McClelland, 1991, 1993; Campion & Thayer, 1985).
A very influential person for this perspective was Frederick Taylor (Hammond, 1971; Taylor, 1911). He explained the principles of scientific management, which encouraged the study of jobs to determine the “one best way” to perform each task. Movements of skilled workmen were studied using a stopwatch and simple analysis. The best and quickest methods and tools were selected, and all workers were trained to perform the job the same way. Standard performance levels were set, and incentive pay was tied to the standards. Gilbreth also contributed to this design approach (Gilbreth, 1911). With his time and motion study, he tried to eliminate wasted movements by the appropriate design of equipment and the placement of tools and materials. Surveys of industrial job designers indicate that this “mechanistic” approach to job design was the prevailing practice
throughout the twentieth century (Davis et al., 1955; Taylor, 1979). These characteristics are also the primary focus of many modern-day writers on job design (e.g., Mundel, 1985; Niebel, 1988) and are present in such newer techniques as lean production (Parker, 2003). The discipline base for this approach is early or “classic” industrial engineering. 2.1.2 Design Recommendations Table 2 provides a brief list of statements that describe the essential recommendations of the mechanistic approach. In essence, jobs should be studied to determine the most efficient work methods and techniques. The total work in an area (e.g., a department) should be broken down into highly specialized jobs
386 Table 2
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Multimethod Job Design Questionnaire
(Specific Recommendations from Each Job Design Approach) Instructions: Indicate the extent to which each statement is descriptive of the job using the scale below. Circle answers to the right of each statement. Please use the following scale: (5) (4) (3) (2) (1) ( )
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree Leave blank if do not know or not applicable
Mechanistic Approach 1. Job specialization: The job is highly specialized in terms of purpose, tasks, or activities. 2. Specialization of tools and procedures: The tools, procedures, materials, etc., used on this job are highly specialized in terms of purpose. 3. Task simplification: The tasks are simple and uncomplicated. 4. Single activities: The job requires you to do only one task or activity at a time. 5. Skill simplification: The job requires relatively little skill and training time. 6. Repetition: The job requires performing the same activity(s) repeatedly. 7. Spare time: There is very little spare time between activities on this job. 8. Automation: Many of the activities of this job are automated or assisted by automation. Motivational Approach 9. Autonomy: The job allows freedom, independence, or discretion in work scheduling, sequence, methods, procedures, quality control, or other decision making. 10. Intrinsic job feedback: The work activities themselves provide direct and clear information as to the effectiveness (e.g., quality and quantity) of job performance. 11. Extrinsic job feedback: Other people in the organization, such as managers and co-workers, provide information as to the effectiveness (e.g., quality and quantity) of job performance. 12. Social interaction: The job provides for positive social interaction such as team work or co-worker assistance. 13. Task/goal clarity: The job duties, requirements, and goals are clear and specific. 14. Task variety: The job has a variety of duties, tasks, and activities. 15. Task identity: The job requires completion of a whole and identifiable piece of work. It gives you a chance to do an entire piece of work from beginning to end. 16. Ability/skill level requirements: The job requires a high level of knowledge, skills, and abilities. 17. Ability/skill variety: The job requires a variety of knowledge, skills, and abilities. 18. Task significance: The job is significant and important compared with other jobs in the organization. 19. Growth/learning: The job allows opportunities for learning and growth in competence and proficiency. 20. Promotion: There are opportunities for advancement to higher level jobs. 21. Achievement: The job provides for feelings of achievement and task accomplishment. 22. Participation: The job allows participation in work-related decision making. 23. Communication: The job has access to relevant communication channels and information flows. 24. Pay adequacy: The pay on this job is adequate compared with the job requirements and with the pay in similar jobs. 25. Recognition: The job provides acknowledgement and recognition from others. 26. Job security: People on this job have high job security. Perceptual/Motor Approach 27. Lighting: The lighting in the work place is adequate and free from glare. 28. Displays: The displays, gauges, meters, and computerized equipment on this job are easy to read and understand. 29. Programs: The programs in the computerized equipment on this job are easy to learn and use. 30. Other equipment: The other equipment (all types) used on this job is easy to learn and use. 31. Printed job materials: The printed materials used on this job are easy to read and interpret. 32. Work place layout: The work place is laid out such that you can see and hear well to perform the job.
1 1
2 2
3 3
4 4
5 5
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
1 1
2 2
3 3
4 4
5 5
1 1
2 2
3 3
4 4
5 5
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
JOB AND TEAM DESIGN Table 2 33. 34. 35. 36. 37.
387
(continued)
Information input requirements: The amount of information you must attend to in order to perform this job is fairly minimal. Information output requirements: The amount of information you must output on this job, in terms of both action and communication, is fairly minimal. Information processing requirements: The amount of information you must process, in terms of thinking and problem solving, is fairly minimal. Memory requirements: The amount of information you must remember on this job is fairly minimal. Stress: There is relatively little stress on this job.
Biological Approach 38. Strength: The job requires fairly little muscular strength. 39. Lifting: The job requires fairly little lifting, and/or the lifting is of very light weights. 40. Endurance: The job requires fairly little muscular endurance. 41. Seating: The seating arrangements on the job are adequate (e.g., ample opportunities to sit, comfortable chairs, good postural support, etc.). 42. Size differences: The work place allows for all size differences between people in terms of clearance, reach, eye height, leg room, etc. 43. Wrist movement: The job allows the wrists to remain straight without excessive movement. 44. Noise: The work place is free from excessive noise. 45. Climate: The climate at the work place is comfortable in terms of temperature and humidity, and it is free of excessive dust and fumes. 46. Work breaks: There is adequate time for work breaks given the demands of the job. 47. Shift work: The job does not require shift work or excessive overtime. For jobs with little physical activity due to single work station add: 48. Exercise opportunities: During the day, there are enough opportunities to get up from the work station and walk around. 49. Constraint: While at the work station, the worker is not constrained to a single position. 50. Furniture: At the work station, the worker can adjust or arrange the furniture to be comfortable (e.g., adequate legroom, foot rests if needed, proper keyboard or work surface height, etc.).
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
1
2
3
4
5
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
1 1
2 2
3 3
4 4
5 5
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
Source: Campion, 1988. © 1988 American Psychological Association. See supporting reference and related research (e.g., Campion & McClelland, 1991, 1993; Campion & Thayer, 1985) for reliability and validity information. Scores for each approach are calculated by averaging applicable items.
assigned to different employees. The tasks should be simplified so skill requirements are minimized. There should also be repetition in order to gain improvement from practice. Idle time should be minimized. Finally, activities should be automated or assisted by automation to the extent that is possible and economically feasible. 2.1.3 Advantages and Disadvantages The aim of this approach is to maximize efficiency, both in terms of productivity and utilization of human resources. Table 1 summarizes some human resource advantages and disadvantages that have been observed in research. Jobs designed according to the mechanistic approach are easier and less expensive to staff. Training times are reduced. Compensation requirements may be less because skill and responsibility are reduced. And because mental demands are less, errors may be less common. Disadvantages include the fact that extreme use of the mechanistic approach may result in jobs so simple and routine that employees experience low job satisfaction and motivation. Overly mechanistic, repetitive work can lead to health problems such as repetitive motion disorders. 2.2 Motivational Job Design Approach 2.2.1 Historical Development Encouraged by the human relations movement of the 1930s (Hoppock, 1935; Mayo, 1933), people began to point out the
negative effects of the overuse of mechanistic design on worker attitudes and health (Argyris, 1964; Blauner, 1964). Overly specialized, simplified jobs were found to lead to dissatisfaction (Caplan et al., 1975) and adverse physiological consequences for workers (Weber et al., 1980). Jobs on assembly lines and other machine-paced work were especially troublesome in this regard (Salvendy & Smith, 1981; Walker & Guest, 1952). These trends led to an increasing awareness of employees’ psychological needs. The first attempts to enhance the meaningfulness of jobs involved the opposite of specialization. It was recommended that tasks be added to jobs, either at the same level of responsibility (i.e., job enlargement) or at a higher level (i.e., job enrichment) (Ford, 1969; Herzberg, 1966). This trend expanded into a pursuit of identifying and validating characteristics of jobs that make them motivating and satisfying (Griffin, 1982; Hackman & Oldham, 1980; Turner & Lawrence, 1965). This approach considers the psychological theories of work motivation (e.g., Steers & Mowday, 1977; Vroom, 1964), thus this “motivational” approach draws primarily from organizational psychology as a discipline base. A related trend following later in time, but somewhat comparable in content, is the sociotechnical approach (Emory & Trist, 1960; Hay, Klonek, & Parke, 2020; Pasmore, 1988; Rousseau, 1977;). It focuses not only on the work, but also on the technology itself and the relationship of the environment to work and organizational design. There is less interest in the job, and more in roles and systems. Key to this approach are work
388 Table 3
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Work Design Questionnaire
(Specific Recommendations from Each Job Design Approach) Instructions: Indicate the extent to which each statement is descriptive of the job using the scale below. Circle answers to the right of each statement. Please Use the Following Scale: (5) Strongly agree (4)
Agree
(3)
Neither agree nor disagree
(2)
Disagree
(1)
Strongly disagree
( )
Leave blank if do not know or not applicable
Task Characteristics Autonomy/Work Scheduling Autonomy 1. The job allows me to make my own decisions about how to schedule my work. 2. The job allows me to decide on the order in which things are done on the job. 3. The job allows me to plan how I do my work.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Autonomy/Decision Making Autonomy 4. The job gives me a chance to use my personal initiative or judgment in carrying out the work. 5. The job allows me to make a lot of decisions on my own. 6. The job provides me with significant autonomy in making decisions.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Autonomy/Work Methods Autonomy 7. The job allows me to make decisions about what methods I use to complete my work. 8. The job gives me considerable opportunity for independence and freedom in how I do the work. 9. The job allows me to decide on my own how to go about doing my work.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Task Variety 10. The job involves a great deal of task variety. 11. The job involves doing a number of different things. 12. The job requires the performance of a wide range of tasks. 13. The job involves performing a variety of tasks.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Task Significance 14. The results of my work are likely to significantly affect the lives of other people. 15. The job itself is very significant and important in the broader scheme of things. 16. The job has a large impact on people outside the organization. 17. The work performed on the job has a significant impact on people outside the organization.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Task Identity 18. The job involves completing a piece of work that has an obvious beginning and end. 19. The job is arranged so that I can do an entire piece of work from beginning to end. 20. The job provides me the chance to completely finish the pieces of work I begin. 21. The job allows me to complete work I start.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Feedback from Job 22. The work activities themselves provide direct and clear information about the effectiveness (e.g., quality and quantity) of my job performance. 23. The job itself provides feedback on my performance. 24. The job itself provides me with information about my performance. Knowledge Characteristics Job Complexity 25. The job requires that I only do one task or activity at a time (reverse scored). 26. The tasks on the job are simple and uncomplicated (reverse scored). 27. The job comprises relatively uncomplicated tasks (reverse scored). 28. The job involves performing relatively simple tasks (reverse scored).
JOB AND TEAM DESIGN Table 3
389
(continued)
Information Processing 29. The job requires me to monitor a great deal of information. 30. The job requires that I engage in a large amount of thinking. 31. The job requires me to keep track of more than one thing at a time. 32. The job requires me to analyze a lot of information.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Problem Solving 33. The job involves solving problems that have no obvious correct answer. 34. The job requires me to be creative. 35. The job often involves dealing with problems that I have not met before. 36. The job requires unique ideas or solutions to problems.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Skill Variety 37. The job requires a variety of skills. 38. The job requires me to utilize a variety of different skills in order to complete the work. 39. The job requires me to use a number of complex or high-level skills. 40. The job requires the use of a number of skills.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Specialization 41. The job is highly specialized in terms of purpose, tasks, or activities. 42. The tools, procedures, materials, and so forth used on this job are highly specialized in terms of purpose. 43. The job requires very specialized knowledge and skills. 44. The job requires a depth of knowledge and expertise.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Social Characteristics Social Support 45. I have the opportunity to develop close friendships in my job. 46. I have the chance in my job to get to know other people. 47. I have the opportunity to meet with others in my work. 48. My supervisor is concerned about the welfare of the people that work for him/her. 49. People I work with take a personal interest in me. 50. People I work with are friendly.
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
Interdependence/Initiated Interdependence 51. The job requires me to accomplish my job before others complete their jobs. 52. Other jobs depend directly on my job. 53. Unless my job gets done, other jobs cannot be completed.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Interdependence/Received Interdependence 54. The job activities are greatly affected by the work of other people. 55. The job depends on the work of many different people for its completion. 56. My job cannot be done unless others do their work.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Interaction Outside Organization 57. The job requires spending a great deal of time with people outside my organization. 58. The job involves interaction with people who are not members of my organization. 59. On the job, I frequently communicate with people who do not work for the same organization as I do. 60. The job involves a great deal of interaction with people outside my organization.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
1 1
2 2
3 3
4 4
5 5
1
2
3
4
5
Feedback from Others 61. I receive a great deal of information from my manager and coworkers about my job performance. 62. Other people in the organization, such as managers and coworkers, provide information about the effectiveness (e.g., quality and quantity) of my job performance. 63. I receive feedback on my performance from other people in my organization (such as my manager or coworkers).
(continued overleaf)
390 Table 3
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS (continued)
Work Context Ergonomics 64. The seating arrangements on the job are adequate (e.g., ample opportunities to sit, comfortable chairs, good postural support). 65. The work place allows for all size differences between people in terms of clearance, reach, eye height, leg room, etc. 66. The job involves excessive reaching (reverse scored).
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Physical Demands 67. The job requires a great deal of muscular endurance. 68. The job requires a great deal of muscular strength. 69. The job requires a lot of physical effort.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Work Conditions 70. The work place is free from excessive noise. 71. The climate at the work place is comfortable in terms of temperature and humidity. 72. The job has a low risk of accident. 73. The job takes place in an environment free from health hazards (e.g., chemicals, fumes, etc.). 74. The job occurs in a clean environment.
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
Equipment Use 75. The job involves the use of a variety of different equipment. 76. The job involves the use of complex equipment or technology. 77. A lot of time was required to learn the equipment used on the job.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Source: Morgeson and Humphrey, 2006. © 2006 American Psychological Association. See supporting reference for reliability and validity information. Scores for each approach are calculated by averaging applicable items.
system and job designs that fit their external environment and the joint optimization of both social and technical systems in the organization’s internal environment. Though this approach differs somewhat in that consideration is also given to the technical system and external environment, it is similar in that it draws on the same psychological job characteristics that affect satisfaction and motivation. It suggests that as organizations’ environments are becoming increasingly turbulent and complex, organizational and job design should involve greater flexibility, employee involvement, employee training, and decentralization of decision making and control, and a reduction in hierarchical structures and the formalization of procedures and relationships (Pasmore, 1988). Surveys of industrial job designers have consistently indicated that the mechanistic approach represents the dominant theme of job design (Davis et al., 1955; Taylor, 1979). Other approaches to job design, such as the motivational approach, have not been given as much explicit consideration. This is not surprising because the surveys only included job designers trained in engineering-related disciplines, such as industrial engineering and systems analysis. It is not necessarily certain that other specialists or line managers would adopt the same philosophies, especially in recent times. Nevertheless, there is evidence that even fairly naïve job designers (i.e., college students in management classes) also adopt the mechanistic approach in job design simulations. That is, their strategies for grouping tasks were primarily based on such factors as activities, skills, equipment, procedures, or location. Even though the mechanistic approach may be the most natural and intuitive, this research has also revealed that people can be trained to apply all four approaches to job design (Campion & Stevens, 1991). The motivational characteristics of the WDQ are an extension of this motivational approach to job design. This set of characteristics is based on the idea that high levels of these characteristics make work more motivating, satisfying,
and enriching. Subcategories of these characteristics include: task characteristics (task variety, task significance, task identity, and feedback from the job) and knowledge characteristics (job complexity, information processing, problem solving, skill variety, and specialization). Building on the ideas presented in Morgeson and Humphrey’s (2006) WDQ, scholars are beginning to examine the social aspects of work design and how they interact with the people’s on the job experience (Grant, 2007, 2008; Grant & Parker, 2009; Humphrey, Nahrgang, & Morgeson, 2007). Social characteristics consider the broader social environment in which the work is done as a component of workers’ job experience. Social characteristics include: social support (broadly refers to the support employees receive from others at work), interdependence (the inter-connections of the tasks, sequencing, and impact of an employee’s job with the jobs of others), interaction outside the organization, and feedback from others. Some of these social characteristics were originally encompassed within the motivational approach to job design (e.g., interdependence, and feedback from others). By separating the social work characteristics from the task and knowledge characteristics, the WDQ allows job designers to focus specifically on the design of the interpersonal aspects of the work. Managers often have to address these aspects of work design in a different manner than they do with task and knowledge aspects. Subsequent meta-analytic evidence suggests that social characteristics addressed in the WDQ explain incremental variance above and beyond that explained by motivational characteristics (Humphrey et al., 2007). 2.2.2 Design Recommendations Table 2 provides a list of statements that describe recommendations for the motivational approach. It suggests a job should allow a worker autonomy to make decisions about how and when
JOB AND TEAM DESIGN
tasks are to be done. A worker should feel his or her work is important to the overall mission of the organization or department. This is often done by allowing a worker to perform a larger unit of work, or to perform an entire piece of work from beginning to end. Feedback on job performance should be given to workers from the task itself, as well as from the supervisor and others. Workers should be able to use a variety of skills and to personally grow on the job. This approach also considers the social, or people-interaction, aspects of the job; jobs should have opportunities for participation, communication, and recognition. Finally, other human resource systems should contribute to the motivating atmosphere, such as adequate pay, promotion, and job security systems. 2.2.3 Advantages and Disadvantages The goal of this approach is to enhance psychological meaningfulness of jobs, thus influencing a variety of attitudinal and behavioral outcomes. Table 1 summarizes some of the advantages and disadvantages found in research. Jobs designed according to the motivational approach have more satisfied, motivated, and involved employees who tend to have higher performance and lower absenteeism. Customer service may be improved, because employees take more pride in their work and can catch their own errors by performing a larger part of the work. Social impact, social worth, and mere social contact have been shown to have a positive influence on workers’ performance (Grant, 2008; Grant et al., 2007). In a field experiment with community recreation center lifeguards, Grant (2008) demonstrated that task significance operated through their perceptions of social impact and social worth to influence job dedication and helping behavior. As an answer to the rapidly changing nature of work, researchers have begun to study work design characteristics that could stimulate employee proactivity. While some characteristics are already embedded within the current models of job design, the approach has led to a few additional characteristics that could prove beneficial in a dynamic work environment. Specifically, both ambiguity and accountability have been suggested to influence employees’ proactive behaviors (Grant & Parker, 2009; Staw & Boettger, 1990). In terms of disadvantages, jobs too high on the motivational approach require more training, have greater skill and ability requirements for staffing, and may require higher compensation. Overly motivating jobs may also be so stimulating that workers become predisposed to mental overload, fatigue, errors, and occupational stress. 2.3 Perceptual/Motor Job Design Approach 2.3.1 Historical Development This approach draws on a scientific discipline which goes by many names, including human factors, human factors engineering, human engineering, man-machine systems engineering, and engineering psychology. It developed from a number of other disciplines, primarily experimental psychology, but also industrial engineering (Meister, 1971). Within experimental psychology, job design recommendations draw heavily from knowledge of human skilled performance (Welford, 1976) and the analysis of humans as information processors (see Chapters 3–6). The main concern of this approach is efficient and safe utilization of humans in human-machine systems, with emphasis on selection, design, and arrangement of system components to take account of both human abilities and limitations (Pearson, 1971). It is more concerned with equipment than psychology, and more concerned with human abilities than engineering. This approach received public attention with the Three Mile Island incident where it was concluded that the control room operator job in the nuclear power plant may have placed too
391
many demands on the operator in an emergency situation, thus predisposing errors of judgment (Campion & Thayer, 1987). Government regulations issued since then require nuclear plants to consider “human factors” in their design (U.S. Nuclear Regulatory Commission, 1981). The primary emphasis of this approach is on the perceptual and motor abilities of people. The contextual characteristics of the WDQ reflect the physical and environmental contexts within which work is performed. This was an aspect initially described in the MJDQ, and subsequently elaborated upon in the WDQ. Many of the contextual characteristics of job design encompass the perceptual-motor and biological/physiological (described in Section 2.4) approaches to job design as addressed in the MJDQ. Contextual characteristics include: ergonomics, physical demands, work conditions, and equipment use. The WDQ’s discrimination between contextual characteristics and other forms of motivational characteristics (i.e., task, knowledge, and social characteristics) allows managers to focus specifically on the aspects of work that can produce worker strain or hazardous working conditions, while still assessing the motivating aspects of the work. Meta-analytic evidence suggests that the contextual work characteristics addressed in the WDQ explain incremental variance above and beyond that explained by the motivational characteristics (Humphrey et al., 2007). 2.3.2 Design Recommendations Table 2 provides a list of statements describing important recommendations of the perceptual/motor approach. They refer to either the equipment or environment and to information-processing requirements. Their thrust is to consider mental abilities and limitations of humans, such that the attention and concentration requirements of the job do not exceed the abilities of the least capable potential worker. Focus is on the limits of the least capable worker because this approach is concerned with the effectiveness of the total system, which is no better than its “weakest link.” Jobs should be designed to limit the amount of information workers must pay attention to and remember. Lighting levels should be appropriate, displays and controls should be logical and clear, workplaces should be well laid out and safe, and equipment should be easy to use. (See Chapters 50–57 in this volume for more information on human factors applications.) 2.3.3 Advantages and Disadvantages The aims of this approach are to enhance reliability, safety, and positive user reactions. Table 1 summarizes the advantages and disadvantages found in research. Jobs designed according to the perceptual/motor approach have lower errors and accidents. Like the mechanistic approach, it reduces the mental ability requirements of the job, thus employees may be less stressed and less mentally fatigued. It may also create some efficiencies, such as reduced training time and staffing requirements. On the other hand, costs from the excessive use of the perceptual/motor approach can include low satisfaction, low motivation, and boredom due to inadequate mental stimulation. This problem is exacerbated by the fact that designs based on the least capable worker essentially lower a job’s mental requirements. 2.4 Biological Job Design Approach 2.4.1 Historical Development This approach and the perceptual/motor approach share a joint concern for proper person–machine fit. The major difference is that this approach is more oriented toward biological considerations and stems from such disciplines as work physiology, biomechanics (i.e., the study of body movements) and
392
anthropometry (i.e., the study of body sizes, see Chapter 11). Although many specialists probably practice both approaches together as is reflected in many texts in the area (Konz, 1983), a split does exist between Americans who are more psychologically oriented and use the title “human factors engineer,” and Europeans who are more physiologically oriented and use the title “ergonomist” (Chapanis, 1970). Like the perceptual-motor approach, the biological approach is concerned with the design of equipment and workplaces, as well as the design of tasks (Grandjean, 1980). 2.4.2 Design Recommendations Table 2 lists important recommendations from the biological approach. This approach tries to design jobs to reduce physical demands to avoid exceeding people’s physical capabilities and limitations. Jobs should not require excessive strength and lifting, and again, the abilities of the least physically able potential worker set the maximum level. Chairs should be designed for good postural support. Excessive wrist movement should be reduced by redesigning tasks and equipment. Noise, temperature, and atmosphere should be controlled within reasonable limits. Proper work/rest schedules should be provided so employees can recuperate from the physical demands. 2.4.3 Advantages and Disadvantages The aims of this approach are to maintain employees’ comfort and physical well-being. Table 1 summarizes some advantages and disadvantages observed in the research. Jobs designed according to this approach require less physical effort, result in less fatigue, and create fewer injuries and aches and pains than jobs low on this approach. There are less occupational illnesses, such as lower back pain and carpal tunnel syndrome, in jobs designed with this approach. There may be lower absenteeism and higher job satisfaction on jobs which are not physically arduous. However, a direct cost of this approach may be the expense of changes in equipment or job environments needed to implement the recommendations. At the extreme, costs may include jobs with so few physical demands that workers become drowsy or lethargic, thus reducing performance. Clearly, extremes of physical activity and inactivity should be avoided, and an optimal level of physical activity should be developed.
3 THE TEAM DESIGN APPROACH 3.1 Historical Development An alternative to designing work around individual jobs is to design work for teams of workers. Teams can vary a great deal in how they are designed and can conceivably incorporate elements from any of the job design approaches discussed. However, the focus here is on the self-managing, autonomous type of team design approach, which has enjoyed considerable popularity in organizations and substantial research attention over the last 30 years (Campion, Papper, & Medsker, 1996; Guzzo & Shea, 1992; Hoerr, 1989; Ilgen, Hollenbeck, Johnson, & Jundt, 2005; LePine et al., 2008; Mathieu et al., 2019; Parker, 2003; Sundstrom, DeMeuse, & Futrell, 1990; Swezey & Salas, 1992). Autonomous work teams derive their conceptual basis from motivational job design and from sociotechnical systems theory, which in turn reflect social and organizational psychology and organizational behavior (Cummings, 1978; Davis, 1971; Davis & Valfer, 1965; Morgeson & Campion, 2003; Morgeson et al., 2012). The Hawthorne studies (Homans, 1950) and European experiments with autonomous work groups (Kelly, 1982; Pasmore, Francis, & Haldeman, 1982) called
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
attention to the benefits of applying work teams in situations other than sports and military settings. Although enthusiasm for the use of teams had waned in the 1960s and 1970s due to research discovering some disadvantages of teams (Buys, 1978; Zander, 1979), the 1980s brought a resurgence of interest in the use of work teams and it has become an extremely popular work design in organizations today (Hackman, 2002; Hoerr, 1989; Ilgen et al., 2005; Sundstrom et al., 1990). This renewed interest may be due to the cost advantages of having fewer supervisors with self-managed teams or the apparent logic of the benefits of teamwork. 3.2 Design Recommendations Teams can vary in the degree of authority and autonomy they have (Banker et al., 1996). For example, manager-led teams have responsibility only for the execution of their work. Management designs the work, designs the teams, and provides an organizational context for the teams. However, in autonomous work teams, or self-managing teams, team members design and monitor their own work and performance. They may also design their own team structure (e.g., delineating interrelationships among members) and composition (e.g., selecting members). In such self-designing teams, management is only responsible for the teams’ organizational context (Hackman, 1987). Although team design could incorporate elements of either mechanistic or motivational approaches to design, narrow and simplistic mechanistically designed jobs would be less consistent with other suggested aspects of the team approach to design than motivationally designed jobs. Mechanistically designed jobs would not allow an organization to gain as much of the advantages from placing workers in teams. Figure 1 and Table 4 provide important recommendations from the self-managing team design approach. Many of the advantages of work teams depend on how teams are designed and supported by their organization. According to the theory behind self-managing team design, decision-making and responsibility should be pushed down to the team members (Hackman, 1987). If management is willing to follow this philosophy, teams can provide several additional advantages. By pushing decision-making down to the team and requiring consensus, the organization will find greater acceptance, understanding, and ownership of decisions (Porter, Lawler, & Hackman, 1987). The perceived autonomy resulting from making work decisions should be both satisfying and motivating. Thus, this approach tries to design teams so they have a high degree of self-management and all team members participate in decision-making. The team design approach also suggests that the set of tasks assigned to a team should provide a whole and meaningful piece of work (i.e., have task identity as in the motivational approach to job design). This allows team members to see how their work contributes to a whole product or process, which might not be possible with individuals working alone. This can give workers a better idea of the significance of their work and create greater identification with the finished product or service. If team workers rotate among a variety of subtasks and cross-train on different operations, workers should also perceive greater variety in the work (Campion, Cheraskin, et al., 1994). Interdependent tasks, goals, feedback, and rewards should be provided to create feelings of team interdependence among members and focus on the team as the unit of performance, rather than on the individual. It is suggested that team members should be heterogeneous in terms of areas of expertise and background so their varied knowledge, skills, and abilities (KSAs) complement one another. Teams also need adequate training, managerial support, and organizational resources to carry out their tasks. Managers should encourage positive group
JOB AND TEAM DESIGN
393
Themes/Characteristics
Effectiveness Criteria
Job Design • Self-management • Participation • Task Variety • Task Significance • Task Identity
Interdependence • Task Interdependence • Goal Interdependence • Interdependent feedback and rewards
Composition • Heterogeneity • Flexibility • Relative Size • Prefernce for Team Work
Productivity Satisfaction Manager Judgments
Context • Training • Managerial Support • Communication/cooperation between Teams
Process • Potency • Social Support • Workload Sharing • Communication/Cooperation Within the Team Figure 1
Characteristics related to team effectiveness.
processes including open communication and cooperation within and between work groups, supportiveness and sharing of the workload among team members, and development of positive team spirit and confidence in the team’s ability to perform effectively. 3.3 Advantages and Disadvantages Table 5 summarizes the advantages and disadvantages of team design relative to individual job design. To begin with, teams designed so members have heterogeneity of KSAs can help team members learn by working with others who have different KSAs. Cross-training on different tasks can occur, and the work force can become more flexible (Goodman, Ravlin, & Argote, 1986). Teams with heterogeneous KSAs also allow for synergistic combinations of ideas and abilities not possible with individuals working alone, and such teams have generally shown higher performance, especially when task requirements are diverse (Goodman et al., 1986; Shaw, 1983). Social support can be especially important when teams face difficult decisions and deal with difficult psychological aspects of tasks, such as in military squads, medical teams, or police units (Campion & Medsker, 1992). In addition, the simple presence of others can be psychologically arousing. Research has shown that such arousal can have a positive effect on performance when the task is well learned (Zajonc, 1965) and when other team members are perceived as evaluating the performer (Harkins, 1987; Porter et al., 1987). With routine jobs,
this arousal effect may counteract boredom and performance decrements (Cartwright, 1968). Another advantage of teams is that they can increase information exchanged between members through proximity and shared tasks (McGrath, 1984). Increased cooperation and communication within teams can be particularly useful when workers’ jobs are highly interrelated, such as when workers whose tasks come later in the process must depend on the performance of workers whose tasks come earlier or when workers exchange work back and forth among themselves (Mintzberg, 1979; Thompson, 1967). In addition, if teams are rewarded for team effort, rather than individual effort, members will have an incentive to cooperate with one another (Leventhal, 1976). The desire to maintain power by controlling information may be reduced. More experienced workers may be more willing to train the less experienced when they are not in competition with them. Team design and rewards can also be helpful in situations where it is difficult to measure individual performance or where workers mistrust supervisors’ assessments of performance (Milkovich & Newman, 1993). Finally, teams can be beneficial if team members develop a feeling of commitment and loyalty to their team (Cartwright, 1968). For workers who do not develop high commitment to their organization or management and who do not become highly involved in their job, work teams can provide a source of commitment. That is, members may feel responsible to
394 Table 4
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Team Design Measure
Instructions: This questionnaire consists of statements about your team and how your team functions as a group. Please indicate the extent to which each statement describes your team by circling a number to the right of each statement. Please Use the Following Scale: (5)
Strongly agree
(4)
Agree
(3)
Neither agree nor disagree
(2)
Disagree
(1)
Strongly disagree
( )
Leave blank if do not know or not applicable
Self-Management 1. The members of my team are responsible for determining the methods, procedures, and schedules with which the work gets done. 2. My team rather than my manager decides who does what tasks within the team. 3. Most work-related decisions are made by the members of my team rather than by my manager.
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
Participation 4. As a member of a team, I have a real say in how the team carries out its work. 5. Most members of my team get a chance to participate in decision making. 6. My team is designed to let everyone participate in decision making.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Task Variety 7. Most members of my team get a chance to learn the different tasks the team performs. 8. Most everyone on my team gets a chance to do the more interesting tasks. 9. Task assignments often change from day to day to meet the workload needs of the team.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Task Significance (Importance) 10. The work performed by my team is important to the customers in my area. 11. My team makes an important contribution to serving the company’s customers. 12. My team helps me feel that my work is important to the company.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Task Identity (Mission) 13. The team concept allows all the work on a given product to be completed by the same set of people. 14. My team is responsible for all aspects of a product for its area. 15. My team is responsible for its own unique area or segment of the business.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Task Interdependence (Interdependence) 16. I cannot accomplish my tasks without information or materials from other members of my team. 17. Other members of my team depend on me for information or materials needed to perform their tasks. 18. Within my team, jobs performed by team members are related to one another.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Goal Interdependence (Goals) 19. My work goals come directly from the goals of my team. 20. My work activities on any given day are determined by my team’s goals for that day. 21. I do very few activities on my job that are not related to the goals of my team.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Interdependent Feedback and Rewards (Feedback and Rewards) 22. Feedback about how well I am doing my job comes primarily from information about how well the entire team is doing. 23. My performance evaluation is strongly influenced by how well my team performs. 24. Many rewards from my job (pay, promotion, etc.) are determined in large part by my contributions as a team member. Heterogeneity (Membership) 25. The members of my team vary widely in their areas of expertise. 26. The members of my team have a variety of different backgrounds and experiences. 27. The members of my team have skills and abilities that complement each other.
JOB AND TEAM DESIGN Table 4
395
(continued)
Flexibility (Member Flexibility) 28. Most members of my team know each other’s jobs. 29. It is easy for the members of my team to fill in for one another. 30. My team is very flexible in terms of membership.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Relative Size (Size) 31. The number of people in my team is too small for the work to be accomplished. (Reverse scored)
1
2
3
4
5
Preference for Team Work (Team Work Preferences) 32. If given the choice, I would prefer to work as part of a team rather than work alone. 33. I find that working as a member of a team increases my ability to perform effectively. 34. I generally prefer to work as part of a team.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Managerial Support 38. Higher management in the company supports the concept of teams. 39. My manager supports the concept of teams.
1 1
2 2
3 3
4 4
5 5
Communication/Cooperation between Work Groups 40. I frequently talk to other people in the company besides the people on my team. 41. There is little competition between my team and other teams in the company. 42. Teams in the company cooperate to get the work done.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Potency (Spirit) 43. Members of my team have great confidence that the team can perform effectively. 44. My team can take on nearly any task and complete it. 45. My team has a lot of team spirit.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
Workload Sharing (Sharing the Work) 49. Everyone on my team does their fair share of the work. 50. No one in my team depends on other team members to do the work for them. 51. Nearly all the members of my team contribute equally to the work.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Communication/Cooperation within the Work Group 52. Members of my team are very willing to share information with other team members about our work. 53. Teams enhance the communications among people working on the same product. 54. Members of my team cooperate to get the work done.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Training 35. The company provides adequate technical training for my team. 36. The company provides adequate quality and customer service training for my team. 37. The company provides adequate team skills training for my team (communication, organization, interpersonal, etc.).
Social Support 46. Being in my team gives me the opportunity to work in a team and provide support to other team members. 47. My team increases my opportunities for positive social interaction. 48. Members of my team help each other out at work when needed.
Source: Campion et al. 1993. © 1993 John Wiley & Sons. See reference and related research (Campion et al., 1995) for reliability and validity information. Scores for each team characteristic are calculated by averaging applicable items.
attend work, cooperate with others, and perform well because of commitment to their work team, even though they are not strongly committed to the organization or the work itself. Thus, designing work around teams can provide several advantages to organizations and their workers. Unfortunately, there are also disadvantages to using work teams and situations in which individual-level design is preferable to team design. For example, some individuals may dislike teamwork and may not have necessary interpersonal skills or desire to work in a
team. When selecting team members, one has the additional requirement of selecting workers to fit the team, as well as the job. (Section 4.3 provides more information on the selection of team members.) Individuals can experience less autonomy and less personal identification when working on a team. Designing work around teams does not guarantee workers greater variety, significance, and identity. If members within the team do not rotate among tasks or if some members are assigned exclusively to less
396
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Table 5
Advantages and Disadvantages of Work Teams Advantages
• •
•
• • • •
• • •
•
•
Team members learn from one another Possibility of greater work force flexibility with crosstraining Opportunity for synergistic combinations of ideas and abilities New approaches to tasks may be discovered Social facilitation and arousal Social support for difficult tasks and situations Increased communication and information exchange between team members Greater cooperation among team members Beneficial for interdependent work flows Greater acceptance and understanding of decisions when team makes decisions Greater autonomy, variety, identity, significance, and feedback possible for workers Commitment to the team may stimulate performance and attendance
Disadvantages •
•
•
•
• •
•
•
•
Lack of compatibility of some individuals with team work Additional need to select workers to fit team as well as job Possibility some members will experience less motivating jobs Possible incompatibility with cultural, organizational, or labor–management norms Increased competition and conflict between teams More time consuming due to socializing, coordination losses, and need for consensus Inhibition of creativity and decision making processes; possibility of groupthink Less powerful evaluation and rewards; social loafing or free-riding may occur Less flexibility in cases of replacement, turnover, or transfer
Source: Campion & Medsker (1992). © 1992 John Wiley & Sons.
members’ suggestions, not appraise alternatives adequately, and fail to work out contingency plans. In addition, team pressures distort judgments. Decisions may be based more on the persuasiveness of dominant individuals or the power of majorities, rather than on the quality of decisions. Research has found a tendency for team judgments to be more extreme than the average of individual members’ predecision judgments (Janis, 1972; McGrath, 1984; Morgeson & Campion, 1997). Although evidence shows highly cohesive teams are more satisfied with their teams, cohesiveness is not necessarily related to high productivity. Whether cohesiveness is related to performance depends on a team’s norms and goals. If a team’s norm is to be productive, cohesiveness will enhance productivity; however, if the norm is not one of commitment to productivity, cohesiveness can have a negative influence (Zajonc, 1965). The use of teams and team-level rewards can also decrease the motivating power of evaluation and reward systems. If team
Table 6 1.
2.
3. 4. 5. 6. 7. 8. 9. 10.
desirable tasks, not all members will benefit from team design. Members can still have fractionated, demotivating jobs. Teamwork can also be incompatible with cultural norms. The United States has a very individualistic culture (Hofstede, 1980). Applying team methods that have been successful in collectivistic societies like Japan may be problematic in the United States. In addition, organizational norms and labor–management relations may be incompatible with team design, making its use more difficult. Some advantages of team design can create disadvantages as well. First, though team rewards can increase communication and cooperation and reduce competition within a team, they may cause greater competition and reduced communication between teams. If members identify too strongly with a team, they may not realize when behaviors that benefit the team detract from organizational goals and create conflicts detrimental to productivity. Increased communication within teams may not always be task-relevant either. Teams may spend work time socializing. Team decision-making can take longer than individual decision-making, and the need for coordination within teams can be time-consuming. Decision-making and creativity can also be inhibited by team processes. When teams become highly cohesive, they may become so alike in their views that they develop “groupthink” (Janis, 1972). When groupthink occurs, teams tend to underestimate their competition, fail to adequately critique fellow team
11. 12. 13.
14. 15. 16.
17. 18.
19. 20.
When to Design Jobs Around Work Teams
Are workers’ tasks highly interdependent, or could they be made to be so? Would this interdependence enhance efficiency or quality? Do the tasks require a variety of knowledge, skills, abilities such that combining individuals with different backgrounds would make a difference in performance? Is cross-training desired? Would breadth of skills and work force flexibility be essential to the organization? Could increases arousal, motivation, and effort to perform make a difference in effectiveness? Can social support help workers deal with job stresses? Could increased communication and information exchange improve performance rather than interfere? Could increased cooperation aid performance? Are individual evaluation and rewards difficult or impossible to make or are they mistrusted by workers? Could common measures of performance be developed and used? Is it technically possible to group tasks in a meaningful, efficient way? Would individuals be willing to work in teams? Does the labor force have the interpersonal skills needed to work in teams? Would team members have the capacity and willingness to be trained in interpersonal and technical skills required for team work? Would team work be compatible with cultural norms, organizational policies, and leadership styles? Would labor-management relations be favorable to team job design? Would the amount of time taken to reach decisions, consensus, and coordination not be detrimental to performance? Can turnover be kept to a minimum? Can teams be defined as a meaningful unit of the organization with identifiable inputs, outputs, and buffer areas which give them a separate identity from other teams? Would members share common resources, facilities, or equipment? Would top management support team job design?
Source: Campion & Medsker (1992). © 1992 John Wiley & Sons. Affirmative answers support the use of team job design.
JOB AND TEAM DESIGN
members are not evaluated for individual performance, do not believe their output can be distinguished from the team’s, or do not perceive a link between their personal performance and outcomes, social loafing (Harkins, 1987) can occur. In such situations, teams do not perform up to the potential expected from combining individual efforts. Finally, teams may be less flexible in some respects because they are more difficult to move or transfer as a unit than individuals (Sundstrom et al., 1990). Turnover, replacements, and employee transfers may disrupt teams. And members may not readily accept new members. Thus, whether work teams are advantageous depends to a great extent on the composition, structure, reward systems, environment, and task of the team. Table 6 presents questions that can help determine whether work should be designed around teams rather than individuals. The more questions answered in the affirmative, the more likely teams are to be beneficial. If one chooses to design work around teams, suggestions for designing effective teams are presented in Section 4.3.
4 IMPLEMENTATION ADVICE FOR JOB AND TEAM DESIGN 4.1 General Implementation Advice 4.1.1 Procedures to Follow There are several general philosophies that are helpful when designing or redesigning jobs or teams: 1. As noted previously, designs are not inalterable or dictated by technology. There is some discretion in the design of all work situations, and considerable discretion in most. 2. There is no single best design, there are simply better and worse designs depending on one’s design perspective. 3. Design is iterative and evolutionary and should continue to change and improve over time. 4. Participation of workers affected generally improves the quality of the resulting design and acceptance of suggested changes. 5. The process of the project, or how it is conducted is important in terms of involvement of all interested parties, consideration of alternative motivations, and awareness of territorial boundaries. Procedures for the Initial Design of Jobs or Teams In consideration of process aspects of design, Davis and Wacker (1982) suggest four steps: 1. Form a steering committee. This committee usually consists of a team of high-level executives who have a direct stake in the new jobs or teams. The purposes of the committee are: (a) to bring into focus the project’s objective; (b) to provide resources and support for the project; (c) to help gain the cooperation of all parties affected; and (d) to oversee and guide the project. 2. Form a design task force. The task force may include engineers, managers, job or team design experts, architects, specialists, and others with relevant knowledge or responsibility relevant. The task force is to gather data, generate and evaluate design alternatives, and help implement recommended designs. 3. Develop a philosophy statement. The first goal of the task force is to develop a philosophy statement to guide decisions involved in the project. The philosophy statement is developed with input from the
397
4.
steering committee and may include the project’s purposes, organization’s strategic goals, assumptions about workers and the nature of work, and process considerations. Proceed in an evolutionary manner. Jobs should not be over-specified. With considerable input from eventual job holders or team members, the work design will continue to change and improve over time.
According to Davis and Wacker (1982), the process of redesigning existing jobs is much the same as designing original jobs with two additions. First, existing job incumbents must be involved. Second, more attention needs to be given to implementation issues. Those involved in the implementation must feel ownership of and commitment to the change and believe the redesign represents their own interests. Potential Steps to Follow Along with the steps discussed above, a redesign project should also include the following five steps: 1.
Measuring the design of the existing job or teams. The questionnaire methodology and other analysis tools described in Section 5.7 may be used to measure current jobs or teams. 2. Diagnosing potential design problems. Based on data collected in step 1, the current design is analyzed for potential problems. The task force and employee involvement are important. Focused team meetings are a useful vehicle for identifying and evaluating problems. 3. Determining job or team design changes. Changes will be guided by project goals, problems identified in step 2, and one or more of the approaches to work design. Often several potential changes are generated and evaluated. Evaluation of alternative changes may involve consideration of advantages and disadvantages identified in previous research (see Table 1) and opinions of engineers, managers, and employees. 4. Making design changes. Implementation plans should be developed in detail along with back-up plans in case there are difficulties with the new design. Communication and training are keys to implementation. Changes might also be pilot tested before widespread implementation. 5. Conducting a follow-up evaluation. Evaluating the new design after implementation is probably the most neglected part of the process in most applications. The evaluation might include the collection of design measurements on the redesigned jobs/teams using the same instruments as in step 1. Evaluation may also be conducted on outcomes, such as employee satisfaction, error rates, and training time (Table 1). Scientifically valid evaluations require experimental research strategies with control groups. Such studies may not always be possible in organizations, but often quasi-experimental and other field research designs are possible (Cook & Campbell, 1979). Finally, the need for adjustments are identified through the follow-up evaluation. (For examples of evaluations, see Section 5.7 and Campion & McClelland, 1991, 1993.)
4.1.2 Individual Differences Among Workers It is a common observation that not all employees respond the same to the same job. Some people on a job have high
398
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
satisfaction, whereas others on the same job have low satisfaction. Clearly, there are individual differences in how people respond to work. Considerable research has looked at individual differences in reaction to the motivational design approach. It has been found that some people respond more positively than others to highly motivational work. These differences are generally viewed as differences in needs for personal growth and development (Hackman & Oldham, 1980). Using the broader notion of preferences/tolerances for types of work, the consideration of individual differences has been expanded to all four approaches to job design (Campion, 1988; Campion & McClelland, 1991) and to the team design approach (Campion et al., 1993; Campion et al., 1995). Table 7 provides Table 7
scales that can be used to determine job incumbents’ preferences/tolerances. These scales can be administered in the same manner as the questionnaire measures of job and team design discussed in Section 5. Although consideration of individual differences is encouraged, there are often limits to which such differences can be accommodated. Jobs or teams may have to be designed for people who are not yet known or who differ in their preferences. Fortunately, although evidence indicates individual differences moderate reactions to the motivational approach (Campion, Mumford, Morgeson, & Nahrgang, 2005; Fried & Ferris, 1987), the differences are of degree but not direction. That is, some people respond more positively than others to motivational
Preferences/Tolerances for the Design Approaches
Instructions: Indicate the extent to which each statement is descriptive of your preferences and tolerances for types of work on the scale below. Circle answers to the right of each statement. Please Use the Following Scale: (5)
Strongly agree
(4)
Agree
(3)
Neither agree nor disagree
(2)
Disagree
(1)
Strongly disagree
( )
Leave blank if do not know or not applicable
Preferences/Tolerances for Mechanistic Design 1. I have a high tolerance for routine work. 2. I prefer to work on one task at a time. 3. I have a high tolerance for repetitive work. 4. I prefer work that is easy to learn.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Preferences/Tolerances for Motivational Design 5. I prefer highly challenging work that taxes my skills and abilities. 6. I have a high tolerance for mentally demanding work. 7. I prefer work that gives a great amount of feedback as to how I am doing. 8. I prefer work that regularly requires the learning of new skills. 9. I prefer work that requires me to develop my own methods, procedures, goals, and schedules. 10. I prefer work that has a great amount of variety in duties and responsibilities.
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
Preferences/Tolerances for Perceptual/Motor Design 11. I prefer work that is very fast paced and stimulating. 12. I have a high tolerance for stressful work. 13. I have a high tolerance for complicated work. 14. I have a high tolerance for work where there are frequently too many things to do at one time.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Preferences/Tolerances for Biological Design 15. I have a high tolerance for physically demanding work. 16. I have a fairly high tolerance for hot, noisy, or dirty work. 17. I prefer work that gives me some physical exercise. 18. I prefer work that gives me some opportunities to use my muscles.
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Preferences/Tolerances for Team Work 19. If given the choice, I would prefer to work as part of a team rather than work alone. 20. I find that working as a member of a team increases my ability to perform effectively. 21. I generally prefer to work as part of a team.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Source: Campion (1988) and Campion et al. (1993). Note: See reference for reliability and validity information. Scores for each preference/tolerance are calculated by averaging applicable items. Interpretations differ slightly across the scales. For the Mechanistic and Motivational designs, higher scores suggest more favorable reactions from incumbents to well designed jobs. For the Perceptual/Motor and Biological approaches, higher scores suggest less unfavorable reactions from incumbents to poorly designed jobs.
JOB AND TEAM DESIGN
399
work, but few respond negatively. It is likely that this also applies to the other design approaches.
2.
4.1.3 Some Basic Choices Hackman and Oldham (1980) have provided five strategic choices that relate to implementing job redesign. They note that little research exists indicating the exact consequences of each choice, and correct choices may differ by organization. The basic choices are:
3.
1. Individual versus team designs for work. An initial decision is to either enrich individual jobs or create teams. This also includes consideration of whether any redesign should be undertaken and its likelihood of success. 2. Theory-based versus intuitive changes. This choice was basically defined as the motivational (theory) approach versus no particular (atheoretical) approach. In the present chapter, this choice may be better framed as choosing among the four approaches to job design. However, as argued earlier, consideration of only one approach may lead to some costs or additional benefits being ignored. 3. Tailored versus broadside installation. This choice is between tailoring changes to individuals or making the changes for all in a given job. 4. Participative versus top-down change processes. The most common orientation is that participative is best. However, costs of participation include the time involved and incumbents’ possible lack of a broad knowledge of the business. 5. Consultation versus collaboration with stakeholders. The effects of job design changes often extend far beyond the individual incumbent and department. For example, a job’s output may be an input to a job elsewhere in the organization. The presence of a union also requires additional collaboration. Depending on considerations, participation of stakeholders may range from no involvement, through consultation, to full collaboration. 4.1.4 Overcoming Resistance to Change in Redesign Projects Resistance to change can be a problem in any project involving major changes (Morgeson et al., 1997). Failure rates of new technology implementations demonstrate a need to give more attention to the human aspects of change projects. This concern has also been reflected in the area of Participatory Ergonomics, which encourages the use of participatory techniques when undertaking an ergonomic intervention (Wilson & Haines, 1997). It has been estimated that between 50% and 75% of newly implemented manufacturing technologies in the United States have failed, with a disregard for human and organizational issues considered to be a bigger reason for the failure than technical problems (Majchrzak, 1988; Turnage, 1990). The number one obstacle to implementation was considered to be human resistance to change (Hyer, 1984). Based on the work of Majchrzak (1988), Gallagher and Knight (1986), and Turnage (1990), guidelines for reducing resistance to change include the following: 1. Involve workers in planning the change. Workers should be informed of changes in advance and involved in the process of diagnosing current problems and developing solutions. Resistance is decreased if participants feel the project is their own and not imposed from outside and if the project is adopted by consensus.
4.
5.
Top management should strongly support the change. If workers feel management is not strongly committed, they are less likely to take the project seriously. Create change consistent with worker needs and existing values. Resistance is less if change is seen to reduce present burdens, offer interesting experience, not threaten worker autonomy or security or be inconsistent with other goals and values in the organization. Workers need to see the advantages to them of the change. Resistance is less if proponents of change can empathize with opponents (recognize valid objections and relieve unnecessary fears). Create an environment of open, supportive communication. Resistance will be lessened if participants experience support and have trust in each other. Resistance can be reduced if misunderstandings and conflicts are expected as natural to the innovation process. Provision should be made for clarification. Allow for flexibility. Resistance is reduced if the project is kept open to revision and reconsideration with experience.
4.2 Implementation Advice for Job Design and Redesign 4.2.1 Methods for Combining Tasks In many cases, designing jobs is largely a function of combining tasks. Some guidance can be gained by extrapolating from specific design recommendations in Table 2. For example, variety in the motivational approach can be increased by simply combining different tasks in the same job. Conversely, specialization from the mechanistic approach can be increased by only including similar tasks in the same job. It is also possible when designing jobs to first generate alternative task combinations, then evaluate them using the design approaches in Table 2. A small amount of research within the motivational approach has focused explicitly on predicting relationships between combinations of tasks and the design of resulting jobs (Wong, 1989; Wong & Campion, 1991). This research suggests that a job’s motivational quality is a function of three task-level variables, as illustrated in Figure 2. 1.
2.
3.
Task design. The higher the motivational quality of individual tasks, the higher the motivational quality of a job. Table 2 can be used to evaluate individual tasks, then motivational scores for individual tasks can be summed together. Summing is recommended rather than averaging because both the motivational quality of the tasks and the number of tasks are important in determining a job’s motivational quality (Globerson & Crossman, 1976). Task interdependence. Interdependence among tasks has been shown to be positively related to motivational value up to some moderate point; beyond that point increasing interdependence has been shown to lead to lower motivational value. Thus, for motivational jobs, the total amount of interdependence among tasks should be kept at a moderate level. Both complete independence and excessively high interdependence should be avoided. Table 8 contains the dimension of task interdependence and provides a questionnaire to measure it. Table 8 can be used to judge the interdependence of each pair of tasks that are being evaluated for inclusion in a job. Task similarity. Similarity among tasks may be the oldest rule of job design, but beyond a moderate level, it tends to decrease a job’s motivational value. Thus, to
400
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Motivational Task Design
Motivational Job Design
High
Task Similarity Task Interdependence Low Low
High
Medium Task Measures
Figure 2 Effects of task design, interdependence, and similarity on motivational job design.
design motivational jobs, high levels of similarity should be avoided. Similarity at the task pair level can be judged in much the same manner as interdependence by using dimensions in Table 8 (see the note to Table 8). 4.2.2 Trade-Offs Among Job Design Approaches Although one should strive to construct jobs that are well designed on all the approaches, it is clear design approaches conflict. As Table 1 illustrates, benefits of some approaches are costs of others. No one approach satisfies all outcomes. The greatest potential conflicts are between the motivational and the mechanistic and perceptual/motor approaches. They produce nearly opposite outcomes. The mechanistic and perceptual/motor approaches recommend jobs that are simple, safe, and reliable, with minimal mental demands on workers. The motivational approach encourages more complicated and stimulating jobs, with greater mental demands. The team approach is consistent with the motivational approach, and therefore also may conflict with the mechanistic and perceptual/motor approaches. Because of these conflicts, trade-offs may be necessary. Major trade-offs will be in the mental demands created by the alternative design strategies. Making jobs more mentally demanding increases the likelihood of achieving workers’ goals of satisfaction and motivation, but decreases the chances of reaching the organization’s goals of reduced training, staffing costs, and errors. Which trade-offs will be made depends on which outcomes one prefers to maximize. Generally, a compromise may be optimal. Trade-offs may not always be needed, however. Jobs can often be improved on one approach while still maintaining their quality on other approaches. For example, in one redesign study, the motivational approach was applied to clerical jobs to improve employee satisfaction and customer service (Campion & McClelland, 1991). Expected benefits occurred along with some expected costs (e.g., increased training and compensation requirements), but not all potential costs occurred (e.g., quality and efficiency did not decrease). In another redesign study, Morgeson and Campion (2002) sought to increase both satisfaction and efficiency in jobs at a pharmaceutical company. They found that when jobs were designed to increase only satisfaction or only efficiency, the common trade-offs were present (e.g., increased or decreased satisfaction, training requirements). When jobs were designed to increase both satisfaction and efficiency, however, these
trade-offs were reduced. They suggested that a work design process that explicitly considers both motivational and mechanistic aspects of work is key to avoiding the trade-offs. Another strategy for minimizing trade-offs is to avoid design decisions that influence the mental demands of jobs. An example of this is to enhance motivational design by focusing on social aspects (e.g., communication, participation, recognition, feedback, etc.). These design features can be raised without incurring the costs of increased mental demands. Moreover, many of these features are under the direct control of managers. The independence of the biological approach provides another opportunity to improve design without incurring trade-offs with other approaches. One can reduce physical demands without affecting mental demands of a job. Of course, the cost of equipment may need to be considered. Adverse effects of trade-offs can often be reduced by avoiding designs that are extremely high or low on any approach. Or, alternatively, one might require minimum acceptable levels on each approach. Knowing all approaches and their corresponding outcomes will help one make more informed decisions and avoid unanticipated consequences. 4.2.3 Other Implementation Advice for Job Design and Redesign Griffin (1982) provides advice geared toward managers considering a job redesign intervention in their area. He notes that managers may also rely on consultants, task forces, or informal discussion groups. Griffin suggests nine steps: 1. Recognition of a need for change. 2. Selection of job redesign as a potential intervention. 3. Diagnosis of the work system and content on the following factors: a. Existing jobs. b. Existing work force. c. Technology. d. Organization design. e. Leader behaviors. f. Team and social processes. 4. Cost/benefit analysis of proposed changes. 5. Go/no-go decision. 6. Establishment of a strategy for redesign.
JOB AND TEAM DESIGN Table 8
401
Dimensions of Task Interdependence
Instructions: Indicate the extent to which each statement is descriptive of the pair of tasks using the scale below. Circle answers to the right of each statement. Scores are calculated by averaging applicable items. Please use the following scale: (5) (4) (3) (2) (1) ( )
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree Leave blank if do not know or not applicable
Inputs of the Tasks 1. Materials/supplies: One task obtains, stores, or prepares the materials or supplies necessary to perform the other task. 2. Information: One task obtains or generates information for the other task. 3. Product/service: One task stores, implements, or handles the products or services produced by the other task. Processes of the Tasks 4. Input-output relationship: The products (or outputs) of one task are the supplies (or inputs) necessary to perform the other task. 5. Method and procedure: One task plans the procedures or work methods for the other task. 6. Scheduling: One task schedules the activities of the other task. 7. Supervision: One task reviews or checks the quality of products or services produced by the other task. 8. Sequencing: One task needs to be performed before the other task. 9. Time sharing: Some of the work activities of the two tasks must be performed at the same time. 10. Support service: The purpose of one task is to support or otherwise help the other task get performed. 11. Tools/equipment: One task produces or maintains the tools or equipment used by the other task. Outputs of the Tasks 12. Goal: One task can only be accomplished when the other task is properly performed. 13. Performance: How well one task is performed has a great impact on how well the other task can be performed. 14. Quality: The quality of the product or service produced by one task depends on how well the other task is performed.
1
2
3
4
5
1 1
2 2
3 3
4 4
5 5
1
2
3
4
5
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
1 1
2 2
3 3
4 4
5 5
1
2
3
4
5
Source: Wong & Campion (1991). © 1991 American Psychological Association. See reference and Wong (1989) for reliability and validity information. Note: The task similarity measure contains 10 comparable items (excluding items 4, 6, 8, 9, and 14, and including an item on customer/client). Scores for each dimension are calculated by averaging applicable items.
7. Implementation of the job changes. 8. Implementation of any needed supplemental changes. 9. Evaluation of the redesigned jobs. 4.2.4 Individualized Work Design and Job Crafting The last twenty years have witnessed increased interest in the manner in which employees are proactive actors in the job design process. Employees can either actively craft or change their jobs, or they can negotiate idiosyncratic deals that alter the design of their work (Bruning & Campion, 2018; Grant & Parker, 2009; Hornung, Rousseau, & Glaser 2008; Hornung et al., 2014; Rousseau et al., 2006; Wrzesniewski & Dutton, 2001; Zhang & Parker, 2016). An idiosyncratic deal is a formal agreement that an employee and their manager or organization come to regarding the individual’s work, which creates a difference in the characteristics of the employee’s work from the characteristics of the work of employees in a similar position. These types of deals represent formal individualized work design arrangements (Morgeson & Humphrey, 2008). Job crafting differs from traditional job design as it describes the changes that employees make to their jobs to improve their outcomes. While traditional job design is implemented by a
manager or an organization, job crafting refers to the informal changes to task or social characteristics that employees make to their work. Employees make changes not only to improve their effectiveness, but also to decrease their strain. The changes are self-initiated and independent of manager approval. As such, it can increase organizational productivity or detract from it, thus the job designer should consider the effects of job crafting. Bruning and Campion (2018) developed a useful taxonomy (see Table 9) that defines the domains of job crafting (divided into role and resource crafting). Within each of these broad domains, there are approach and avoidance crafting elements. This provides a much more nuanced framework within which to understand the kinds of job crafting that can occur. Because job crafting is initiated by the worker for his or her personal benefit, research shows it can have three categories of benefits and costs (Bruning & Campion, 2019): 1. 2. 3.
Improve worker performance and positive worker behavior. Increase motivation, reduce strain, and improve well-being. Help reduce or increase work withdrawal, boredom, job performance, and turnover intentions.
402
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Table 9
Domains of Job Crafting
Approach Role Crafting Work Role Expansion: Involves the self-initiated enlargement of the incumbent’s work role to include elements of work and related activities not originally in the formal job description. Social Expansion: Occurs within the social domain of work and involves the proactive use of social resources or contribution of resources to another organizational member or collective. Avoidance Role Crafting Work Role Reduction: Consciously, proactively, and systematically reducing the work role, work requirements, effort expenditures, or task accountability. Approach Resource Crafting Work Organization: The active design of systems and strategies to organize the tangible elements of work, which can involve managing behavior or physical surroundings. Adoption: The active and goal-directed use of technology and other sources of knowledge to alter the job and enhance a work process. Metacognition: The autonomous task-related cognitive activity involving organization, sensemaking, and the manipulation of one’s own psychological states. Avoidance Resource Crafting Withdrawal Crafting: The systematic removal of oneself, either mentally or physically, from a person, situation, or event through changes to one’s job. Source: Adapted from Bruning and Campion (2018).
The implications are that employees will informally change the design of their work and, being informal, these changes often go undetected and can be difficult for a manger to control. Managers should both recognize that these changes do occur and design employees’ work with the understanding that the design of the work can and probably will be altered to some degree by the employee. Bruning and Campion (2018) developed a measure of job crafting that can be helpful in helping diagnose and understand the kinds of changes that employees might make (Table 10). Managers can use this measure to better understand the changes that have been made and then make any adjustments as needed. Engineers and managers designing jobs should be aware of the existence of job crafting by employees and take proactive steps to ensure that it contributes positively to both organizational outcomes and employee well-being. Guidelines for managing employee job crafting include the following (Bruning & Campion, 2019): 1. Be aware of job crafting and how to measure and evaluate it. 2. Support instances of job crafting that are positive for both the employee and the organization. 3. Work with employees to provide alternatives to detrimental job crafting. 4. Monitor job crafting and provide feedback and have ongoing discussions with employees. 5. Develop organizational support systems to manage job crafting, such as sharing improvements to jobs with other employees, revising job descriptions with the changes to the jobs, training employees how to be more effective at job crafting, including positive crafting in performance evaluations, and measuring job crafting when analyzing jobs.
4.3 Implementation Advice for Team Design 4.3.1 Deciding on Team Composition Research encourages heterogeneous teams in terms of skills, personality, and attitudes because it increases the range of competencies in teams (Gladstein, 1984) and is related to effectiveness (Campion et al., 1995). However, homogeneity is preferred if team morale is the main criterion, and heterogeneous attributes must be complementary if they are to contribute to effectiveness. Heterogeneity for its own sake is unlikely to enhance effectiveness (Campion et al., 1993). Another composition characteristic of effective teams is whether members have flexible job assignments (Campion et al., 1993; Sundstrom et al., 1990). If members can perform different jobs, effectiveness is enhanced because they can fill in as needed. A third important aspect of composition is team size. Evidence suggests the importance of optimally matching team size to team tasks to achieve high performance and satisfaction. Teams need to be large enough to accomplish work assigned to them, but may be dysfunctional when too large due to heightened coordination needs (O’Reilly & Roberts, 1977; Steiner, 1972) or increased social loafing (McGrath, 1984; Wicker, Kirmeyer, Hanson, & Alexander, 1976). Thus, groups should be staffed to the smallest number needed to do the work (Goodman et al., 1986; Hackman, 1987; Sundstrom et al., 1990). 4.3.2 Selecting Team Members With team design, interpersonal demands appear to be much greater than with traditional individual-based job design (Lawler, 1986; Morgeson et al., 2005). A team-based setting highlights the importance of employees being capable of interacting in an effective manner with peers, because the number of interpersonal interactions required is higher in teams (Stevens & Campion, 1994a, 1994b, 1999). Team effectiveness can depend heavily on members’ “interpersonal competence. or their ability to successfully maintain healthy working relationships and react to others with respect for their viewpoints (Perkins & Abramis, 1990). There is a greater need for team members to be capable of effective interpersonal communication, collaborative problem solving, and conflict management (Stevens & Campion, 1994a, 1994b, 1999). The process of employment selection for team members places greater stress on adequately evaluating interpersonal competence than is normally required in the selection of workers for individual jobs. To create a selection instrument for evaluating potential team members’ ability to work successfully in teams, Stevens and Campion (1994a, 1994b) reviewed the literature in areas of sociotechnical systems theory (e.g., Cummings, 1978; Wall, Kemp, Jackson, & Clegg, 1986), organizational behavior (e.g., Hackman, 1987; Shea & Guzzo, 1987; Sundstrom et al., 1990), industrial engineering (e.g., Davis & Wacker, 1987; Majchrzak, 1988), and social psychology (e.g., McGrath, 1984; Steiner, 1972) to identify relevant knowledge, skills, and abilities (KSAs). Table 11 shows the 14 KSAs identified as important for teamwork These KSAs have been used to develop a 35-item, multiple-choice employment test, which was validated in two studies to determine how highly related it was to team members’ job performance (Stevens & Campion, 1999). The job performance of team members in two different companies was rated by both supervisors and co-workers. Correlations between the test and job performance ratings were significantly high, with some correlations exceeding .50. The test was also able to add to the ability to predict job performance beyond that provided by a large battery of traditional employment aptitude tests. Thus, these findings provide support for the value of the teamwork KSAs and a selection test based on them (Stevens &
JOB AND TEAM DESIGN Table 10
403
Role–Resource Approach–Avoidance Job Crafting Measure
Please use the following scale: (5)
Daily
(4)
Weekly
(3)
Monthly
(2)
A Few Times a Yea
(1)
Yearly of Less
(0)
Never
Work Role Expansion Expand my role by providing opinions on important issues. Expand my work activities to make sure I take care of myself. Expand my work activities to acquire resources that will help me do my job. Expand my work by adding activities to my job that ensure the quality of my deliverables. Expand my work by adding activities to my job that enhance safety or security.
(0) (0) (0) (0) (0)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
Social Expansion Actively initiate positive interactions with others at work. Actively work to improve my communication quality with others at work. Actively develop my professional network at my job. Actively work to improve the quality of group interactions.
(0) (0) (0) (0)
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Work Role Reduction Find ways to get others to take my place in meetings. Find ways to outsource my work to others outside my group. Find ways to reduce the time I spend in meetings. Find ways to bypass time-consuming tasks.
(0) (0) (0) (0)
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Work Organization Create structure in my work processes. Create organization in my work environment. Create structure in my work schedule. Create plans and prioritize my work in an organized manner.
(0) (0) (0) (0)
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Adoption Use new knowledge or technology to enhance communication. On my own, seek training on new technology. On my own, seek training to improve my work. Use new knowledge or technology to automate tasks. Use new knowledge or technology to structure my work.
(0) (0) (0) (0) (0)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
Metacognition Use my thoughts to put myself into a good mood at work. Use my thoughts to get me out of a bad mood at work. Use my thoughts to help me focus and be engaged at work. Use my thoughts to create a personal mental approach to work. Use my thoughts to help me prepare for future work I will be doing.
(0) (0) (0) (0) (0)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
Withdrawal Work in a way that allows me to avoid others at work. Work in a way that allows me to avoid interacting with people when working. Work in a way that allows me to avoid bothersome tasks involved in my work.
(0) (0) (0)
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Source: Bruning & Campion (2018). © 2018 Academy of Management.
Campion, 1994a). Table 12 shows some example items from the teamwork KSA test. Aside from written tests, there may be other ways teamwork KSAs could be measured for purposes of selection. For example, interviews may be especially suited to measuring interpersonal attributes (e.g., Posthuma, Morgeson, & Campion, 2002). There
is evidence that a structured interview specifically designed to measure social (i.e., nontechnical) KSAs can have validity with job performance and predict incrementally beyond traditional employment tests (Campion, Campion, & Hudson, 1994). Assessment center techniques might also lend themselves to measuring teamwork KSAs. Group exercises have been used
404
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Table 11 Knowledge, Skill, and Ability (KSA) Requirements for Teamwork
Table 12
Example Items from the Teamwork KSA Test
Suppose you find yourself in an argument with several co-workers who should do a very disagreeable, but routine task. Which of the following would likely be the most effective way to resolve this situation? A. Have your supervisor decide, because this would avoid any personal bias. *B. Arrange for a rotating schedule so everyone shares the chore. C. Let the workers who show up earliest choose on a first-come, first-served basis. D. Randomly assign a person to do the task and don’t change it. Your team wants to improve the quality and flow of the 2. conversations among its members. Your team should: *A. use comments that build upon and connect to what others have said. B. set up a specific order for everyone to speak and then follow it. C. let team members with more to say determine the direction and topic of conversation. 1.
I.
II.
Interpersonal KSAs A. Conflict Resolution KSAs 1. The KSA to recognize and encourage desirable, but discourage undesirable, team conflict. 2. The KSA to recognize the type and source of conflict confronting the team and to implement an appropriate conflict resolution strategy. 3. The KSA to employ an integrative (win-win) negotiation strategy rather than the traditional distributive (win-lose) strategy. B. Collaborative Problem Solving KSAs 4. The KSA to identify situations requiring participative group problem solving and to utilize the proper degree and type of participation. 5. The KSA to recognize the obstacles to collaborative group problem solving and implement appropriate corrective actions. C. Communication KSAs 6. The KSA to understand communication networks, and to utilize decentralized networks to enhance communication where possible. 7. The KSA to communicate openly and supportively, that is, to send messages which are (a) behavior- or event-oriented, (b) congruent, (c) validating, (d) conjunctive, and (e) owned. 8. The KSA to listen non-evaluatively and to appropriately use active listing techniques. 9. The KSA to maximize consonance between nonverbal and verbal messages, and to recognize and interpret the nonverbal messages of others. 10. The KSA to engage in ritual greetings and small talk, and a recognition of their importance. Self-management KSAs D. Goal Setting and Performance Management KSAs 11. The KSA to help establish specific, challenging, and accepted team goals. 12. The KSA to monitor, evaluate, and provide feedback on both overall team performance and individual team member performance. E. Planning and Task Coordination KSAs 13. The KSA to coordinate and synchronize activities, information, and task interdependencies between team members. 14. The KSA to help establish task and role expectations of individual team members, and to ensure proper balancing of workload in the team.
to measure leadership and other social skills with good success (Gaugler, Rosenthal, Thornton, & Benston, 1987). It is likely that existing team exercises, such as group problem-solving tasks, could also be modified to score teamwork KSAs. Selection techniques using biodata may be another way to measure teamwork KSAs. Many items in biodata instruments reflect previous life experiences of a social nature, and recruiters interpret biodata information on applications and resumes as reflecting attributes such as interpersonal skills (Brown & Campion, 1994). A biodata measure developed to focus on teamwork KSAs might include items on teamwork in previous jobs, team experiences in school (e.g., college clubs, class projects), and recreational activities of a team nature (e.g., sports teams and social groups).
D. do all of the above. 3. Suppose you are presented with the following types of goals. You are asked to pick one for your team to work on. Which would you choose? A. An easy goal to ensure the team reaches it, thus creating a feeling of success. B. A goal of average difficulty so the team will be somewhat challenged, but successful without too much effort. *C. A difficult and challenging goal that will stretch the team to perform at a high level, but attainable so that effort will not be seen as futile. D. A very difficult, or even impossible goal so that even if the team falls short, it will at least have a very high target to aim for. Note: *Correct answers.
4.3.3 Designing the Teams’ Jobs This aspect of team design involves team characteristics derived from the motivational job design approach. The main distinction is in level of application rather than content (Campion & Medsker, 1992; Shea & Guzzo, 1987; Wall et al., 1986). All the job characteristics of the motivational approach to job design can be applied to team design. One such characteristic is self-management, which is the team level analogy to autonomy at the individual job level. It is central to many definitions of effective work teams (e.g., Cummings, 1978, 1981; Hackman, 1987). A related characteristic is participation. Regardless of management involvement in decision-making, teams can still be distinguished in terms of the degree to which all members are allowed to participate in decisions (McGrath, 1984, Porter et al., 1987). Self-management and participation are presumed to enhance effectiveness by increasing members’ sense of responsibility and ownership of the work. These characteristics may also enhance decision quality by increasing relevant information and by putting decisions as near as possible to the point of operational problems and uncertainties. Other important characteristics are task variety, task significance, and task identity. Variety motivates by allowing members to use different skills (Hackman, 1987) and by allowing both interesting and dull tasks to be shared among members
JOB AND TEAM DESIGN
(Davis & Wacker, 1987). Task significance refers to the perceived significance of the consequences of the team’s work, either for others inside the organization or its customers. Task identity (Hackman, 1987), or task differentiation (Cummings, 1978), refers to the degree to which the team completes a whole and meaningful piece of work. These suggested characteristics of team design have been found to be positively related to team productivity, team member satisfaction, and managers’ and employees’ judgments of their teams’ performance (Campion et al., 1993; Campion et al., 1995). 4.3.4 Developing Interdependent Relations Interdependence is often the reason teams are formed (Mintzberg, 1979) and is a defining characteristic of teams (Salas, Dickinson, Converse, & Tannenbaum, 1992; Wall et al., 1986). Interdependence has been found to be related to team members’ satisfaction and team productivity and effectiveness (Campion et al., 1993; Campion et al., 1995). One form of interdependence is task interdependence. Team members interact and depend on one another to accomplish their work. Interdependence varies across teams, depending on whether the work flow in a team is pooled, sequential, or reciprocal (Thompson, 1967). Interdependence among tasks in the same job (Wong & Campion, 1991) or between jobs (Kiggundu, 1983) has been related to increased motivation. It can also increase team effectiveness because it enhances the sense of responsibility for others’ work (Kiggundu, 1983) or because it enhances the reward value of a team’s accomplishments (Shea & Guzzo, 1987). Another form of interdependence is goal interdependence. Goal setting is a well-documented, individual-level performance improvement technique (Locke & Latham, 1990). A clearly defined mission or purpose is considered to be critical to team effectiveness (Campion et al., 1993; Campion et al., 1995; Davis & Wacker, 1987; Hackman, 1987; Sundstrom et al., 1990). Its importance has also been shown in empirical studies on teams (e.g., Buller & Bell, 1986; Woodman & Sherwood, 1980). Not only should goals exist for teams, but individual members’ goals must be linked to team goals to be maximally effective. Finally, interdependent feedback and rewards have also been found to be important for team effectiveness and team member satisfaction (Campion et al., 1993; Campion et al., 1995). Individual feedback and rewards should be linked to a team’s performance in order to motivate team-oriented behavior. This characteristic is recognized in many theoretical treatments (e.g., Hackman, 1987; Leventhal, 1976; Steiner, 1972; Sundstrom et al., 1990) and research studies (e.g., Pasmore et al., 1982; Wall et al., 1986). 4.3.5 Creating the Organizational Context Organizational context and resources are considered in all recent models of work team effectiveness (e.g., Guzzo & Shea, 1992; Hackman, 1987). One important aspect of context and resources for teams is adequate training. Training is an extensively researched determinant of team performance (for reviews, see Dyer, 1984; Salas et al., 1992), and training is included in most interventions (e.g., Pasmore et al., 1982; Wall et al., 1986). Training is related to team members’ satisfaction, and managers’ and employees’ judgments of their teams’ effectiveness (Campion et al., 1993; Campion et al., 1995). Training content often includes team philosophy, group decision-making, and interpersonal skills, as well as technical knowledge. Many team-building interventions focus on aspects of team functioning that are related to the teamwork KSAs shown in Table 11. A recent review of this literature divided such interventions into four approaches (Tannenbaum, Beard,
405
& Salas, 1992)—goal setting, interpersonal, role, and problem solving—which are similar to the teamwork KSA categories. Thus, these interventions could be viewed as training programs on teamwork KSAs. Reviews indicate that the evidence for the effectiveness of this training appears positive despite the methodological limitations that plague this research (Buller & Bell, 1986; Tannenbaum et al., 1992; Woodman & Sherwood, 1980). It appears that workers can be trained in teamwork KSAs. Regarding how such training should be conducted; there is substantial guidance on training teams in the human factors and military literatures (Dyer, 1984; Salas et al., 1992; Swezey & Salas, 1992). Because these topics are thoroughly addressed in the cited sources, they will not be reviewed here. Managers of teams also need to be trained in teamwork KSAs, regardless of whether the teams are manager-led or self-managed. The KSAs are needed for interacting with employee teams and for participating on management teams. It has been noted that managers of teams, especially autonomous work teams, need to develop their employees (Cummings, 1978; Hackman & Oldham, 1980; Manz & Sims, 1987). Thus, training must ensure not only that managers possess teamwork KSAs, but also that they know how to train employees on these KSAs. Managerial support is another contextual characteristic (Morgeson, 2005; Morgeson et al., 2010). Management controls resources (e.g., material and information) required to make team functioning possible (Shea & Guzzo, 1987), and an organization’s culture and top management must support the use of teams (Sundstrom et al., 1990). Teaching facilitative leadership to managers is often a feature of team interventions (Pasmore et al., 1982). Finally, communication and cooperation between teams are contextual characteristics because they are often the responsibility of managers. Supervising team boundaries (Cummings, 1978) and externally integrating teams with the rest of the organization (Sundstrom et al., 1990) enhance effectiveness. Research indicates that managerial support and communication and cooperation between work teams are related to team productivity and effectiveness and to team members’ satisfaction with their work (Campion et al., 1993; Campion et al., 1995). 4.3.6 Developing Effective Team Process Process describes those things that go on in the group that influence effectiveness. One process characteristic is potency, or the belief of a team that it can be effective (Guzzo & Shea, 1992; Shea & Guzzo, 1987). It is similar to the lay-term “team spirit.” Hackman (1987) argues that groups with high potency are more committed and willing to work hard for the group, and evidence indicates that potency is highly related to team members’ satisfaction with work, team productivity, and members’ and managers’ judgments of their teams’ effectiveness (Campion et al., 1993; Campion et al., 1995). Another process characteristic found to be related to team satisfaction, productivity, and effectiveness is social support (Campion et al., 1993; Campion et al., 1995). Effectiveness can be enhanced when members help each other and have positive social interactions. Like social facilitation (Harkins, 1987; Zajonc, 1965), social support can be arousing and may enhance effectiveness by sustaining effort on mundane tasks. Another process characteristic related to satisfaction, productivity, and effectiveness is workload sharing (Campion et al., 1993; Campion et al., 1995). Workload sharing enhances effectiveness by preventing social-loafing or freeriding (Harkins, 1987). To enhance sharing, group members should believe their individual performance can be distinguished from the group’s, and that there is a link between their performance and outcomes. Finally, communication and cooperation within the work group are also important to team effectiveness, productivity, and satisfaction (Campion et al., 1993; Campion et al.,
406
1995). Management should help teams foster open communication, supportiveness, and discussions of strategy. Informal, rather than formal, communication channels and mechanisms of control should be promoted to ease coordination (Bass & Klubeck, 1952; Majchrzak, 1988). Managers should encourage self-evaluation, self-observation, self-reinforcement, self-management, and self-goal setting by teams. Self-criticism for purposes of recrimination should be discouraged (Manz & Sims, 1987). Meta-analytic evidence suggests that numerous team processes are related to both team performance and member satisfaction (LePine et al., 2008). Many of these processes can be grouped into three categories. Transition processes are team actions that occur after one team task has ended and before the next begins and include actions such as mission analysis, goal specification, and strategy formulation/planning. Action processes are team activities that occur during the completion of a task. The four types of action processes include: monitoring progress toward goals, systems monitoring (assessing resources and environmental factors that could influence goal accomplishment), team monitoring and backup behavior (team members assisting each other in their individual tasks), and coordination. Finally, team activities geared toward maintaining the team’s interpersonal relationships are called interpersonal processes, and include conflict management, motivating/confidence building, and affect management (e.g., emotional balance, togetherness, and coping with demands/frustrations). The results suggest that there are specific team processes that occur at different stages of task completion, and that the occurrence, or lack thereof, of these processes has an impact on both the teams’ performance and team members’ satisfaction (LePine et al., 2008).
5 MEASUREMENT AND EVALUATION OF JOB AND TEAM DESIGN The purpose of an evaluation study for either a job or team design is to provide an objective evaluation of success and to create a tracking and feedback system to make adjustments during the course of the design project. An evaluation study can provide objective data to make informed decisions, help tailor the process to the organization, and give those affected by the design or redesign an opportunity to provide input (see Morgeson & Campion, 2002). An evaluation study should include measures that describe the characteristics of the jobs or teams so that it can be determined whether or not jobs or teams ended up having the characteristics they were intended to have. An evaluation study should also include measures of effectiveness outcomes an organization hoped to achieve with a design project. Measures of effectiveness could include such subjective outcomes as employee job satisfaction or employee, manager, or customer perceptions of effectiveness. Measures of effectiveness should also include objective outcomes such as cost, productivity, rework/ scrap, turnover, accident rates, or absenteeism. Additional information on measurement and evaluation of such outcomes can be found in Part VI of this Handbook. 5.1 Using Questionnaires to Measure Job and Team Design One way to measure job or team design is by using questionnaires or checklists. This method of measuring job or team design is highlighted because it has been used widely in research on job design, especially on the motivational approach. More importantly, questionnaires are a very inexpensive, easy, and flexible way to measure work design characteristics. Moreover,
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
they gather information from job experts, such as incumbents, supervisors, and engineers and other analysts. Several questionnaires exist for measuring the motivational approach to job design (Hackman & Oldham, 1980; Sims et al., 1976), but only one questionnaire, the Multimethod Job Design Questionnaire measures characteristics for all four approaches to job design. This questionnaire (presented in Table 2) evaluates the quality of a job’s characteristics based on each of the four approaches. The Team Design Measure (presented in Table 4) evaluates the quality of work design based on the team approach. Questionnaires can be administered in a variety of ways. Employees can complete them individually at their convenience at their workstation or some other designated area, or they can complete them in a group setting. Group administration allows greater standardization of instructions and provides the opportunity to answer questions and clarify ambiguities. Managers and engineers can also complete the questionnaires either individually or in a group session. Engineers and analysts usually find that observation of the work site, examination of the equipment and procedures, and discussions with any incumbents or managers are important methods of gaining information on the work before completing the questionnaires. Scoring for each job design approach or for each team characteristic on the questionnaires is usually accomplished by simply averaging the applicable items. Then scores from different incumbents, managers, or engineers describing the same job or team are combined by averaging. Multiple items and multiple respondents are used to improve the reliability and accuracy of the results. The implicit assumption is that slight differences among respondents are to be expected because of legitimate differences in viewpoint. However, absolute differences in scores should be examined on an item-by-item basis, and large discrepancies (e.g., more than one point) should be discussed to clarify possible differences in interpretation. It may be useful to discuss each item until a consensus rating is reached. The higher the score on a particular job design scale or work team characteristic scale, the better the quality of the design in terms of that approach or characteristic. Likewise, the higher the score on a particular item, the better the design is on that dimension. How high a score is needed or necessary cannot be stated in isolation. Some jobs or teams are naturally higher or lower on the various approaches, and there may be limits to the potential of some jobs. The scores have most value in comparing different jobs, teams, or design approaches, rather than evaluating the absolute level of the quality of a job or team design. However, a simple rule of thumb is that if the score for an approach is less than three, the job or team is poorly designed on that approach and it should be reconsidered. Even if the average score on an approach is greater than three, examine any individual dimension scores that are at two or one. 5.1.1 Uses of Questionnaires in Different Contexts 1. Designing new jobs or teams. When jobs or teams do not yet exist, the questionnaire is used to evaluate proposed job or team descriptions, workstations, equipment, and so on. In this role, it often serves as a simple design checklist. Additional administrations of the questionnaire in later months or years can be used to assess the longer-term effects of the job or team design. 2. Redesigning existing jobs or teams or switching from job to team design. When jobs or teams already exist, there is a much greater wealth of information. Questionnaires can be completed by incumbents, managers, and engineers. Questionnaires can be used to measure design both before and after changes are made to compare the redesign with the previous design approach. A premeasure before the redesign can be used as a
JOB AND TEAM DESIGN
baseline measurement against which to compare a post-measure conducted right after the redesign implementation. A follow-up measure can be used in later months or years to assess the long-term difference between the previous design approach and the new approach. If other sites or plants with the same types of jobs or teams are not immediately included in the redesign but are maintained with the older design approach, they can be used as a comparison or “control group” to enable analysts to draw even stronger conclusions about the effectiveness of the redesign. Such a control group allows one to control for the possibilities that changes in effectiveness were not due to the redesign but were in fact due to some other causes such as increases in workers’ knowledge and skills with the passage of time, changes in workers’ economic environment (i.e., job security, wages, etc.), or workers trying to give socially desirable responses to questionnaire items. 3. Diagnosing problem job or team designs. When problems occur, regardless of the apparent source of the problem, the job or team design questionnaires can be used as a diagnostic device to determine if any problems exist with the design of the jobs or teams. 5.2 Choosing Sources of Data 1. Incumbents. Incumbents are probably the best source of information for existing jobs or teams. Having input can enhance the likelihood that changes will be accepted, and involvement in such decisions can enhance feelings of participation thus increasing motivational job design in itself (see item 22 of the motivational scale in Table 2). One should include a large number of incumbents for each job or team because there can be slight differences in perceptions of the same job or team due to individual differences (discussed in Section 4.1). Evidence suggests that one should include at least five incumbents for each job or team, but more are preferable (Campion, 1988; Campion & McClelland, 1991; Campion et al., 1993; Campion et al., 1995). 2. Managers or supervisors. First-level managers or supervisors may be the next most knowledgeable persons about an existing work design. They may also provide information on jobs or teams under development. Some differences in perceptions of the same job or team will exist among managers, so multiple managers should be used. 3. Engineers or analysts. Engineers may be the only source of information if the jobs or teams are not yet developed. But also for existing jobs or teams, an outside perspective of an engineer, analyst, or consultant may provide a more objective viewpoint. Again, there can be differences among engineers, so several should evaluate each job or team. It is desirable to get multiple inputs and perspectives from different sources in order to get the most reliable and accurate picture of the results of the job or team design. 5.3 Long-Term Effects and Potential Biases It is important to recognize that some effects of job or team design may not be immediate, others may not be long-lasting, and still others may not be obvious. Initially, when jobs or teams are designed, or right after they are redesigned, there may be a short-term period of positive attitudes (often called a “Honeymoon Effect”). As the legendary Hawthorne studies
407
indicated, changes in jobs or increased attention paid to workers tend to create novel stimulation and positive attitudes (Mayo, 1933). Such transitory elevations in affect should not be mistaken for long-term improvements in satisfaction, as they may wear off over time. In fact, with time, employees may realize their work is now more complex and should be paid higher compensation (Campion & Berger, 1990). Costs which are likely to lag in time also include stress and fatigue, which may take a while to build up if mental demands have been increased excessively. Boredom may take a while to set in if mental demands have been overly decreased. In terms of lagged benefits, productivity and quality are likely to improve with practice and learning on the new job or team. And some benefits, like reduced turnover, simply take time to estimate accurately. Benefits which may potentially dissipate with time include satisfaction, especially if the elevated satisfaction is a function of novelty rather than basic changes to the motivating value of the work. Short-term increases in productivity due to heightened effort rather than better design may not last. Costs which may dissipate include training requirements and staffing difficulties. Once jobs are staffed and everyone is trained, these costs disappear until turnover occurs. So, these costs will not go away completely, but they may be less after initial start-up. Dissipating heightened satisfaction but long-term increases in productivity were observed in a recent motivational job redesign study (Griffin, 1989). These are only examples to illustrate how dissipating and lagged effects might occur. A more detailed example of long-term effects is given in Section 5.6. A potential bias which may confuse the proper evaluation of benefits and costs is spillover. Laboratory research has shown that the job satisfaction of employees can bias perceptions of the motivational value of their jobs (O’Reilly, Parlette, & Bloom, 1980). Likewise, the level of morale in the organization can have a spillover effect onto employees’ perceptions of job or team design. If morale is particularly high, it may have an elevating effect on how employees or analysts view the jobs or teams; conversely, low morale may have a depressing effect on views. The term morale refers to the general level of job satisfaction across employees, and it may be a function of many factors including management, working conditions, wages, and so on. Another factor which has an especially strong effect on employee reactions to work design changes is employment security. Obviously, employee enthusiasm for work design changes will be negative if they view them as potentially decreasing their job security. Every effort should be made to eliminate these fears. The best method of addressing these effects is to be attentive to their potential existence and to conduct longitudinal evaluations of job and team design. In addition to questionnaires, there are many other analytical tools that are useful for work design. The disciplines that contributed the different approaches to work design have also contributed different techniques for analyzing tasks, jobs, and processes for design and redesign purposes. These techniques include job analysis methods created by specialists in industrial psychology, variance analysis methods created by specialists in sociotechnical design, time and motion analysis methods created by specialists in industrial engineering, and linkage analysis methods created by specialists in human factors. This section briefly describes a few of these techniques to illustrate the range of options. The reader is referred to the citations for detail on how to use the techniques. 5.4 Job Analysis Job analysis can be broadly defined as a number of systematic techniques for collecting and making judgments about job information (Morgeson & Campion, 1997, 2000; Morgeson &
408
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Dierdorff, 2011). Information derived from job analysis can be used to aid in recruitment and selection decisions, determine training and development needs, develop performance appraisal systems, and evaluate jobs for compensation, as well as to analyze tasks and jobs for job design. Job analysis may also focus on tasks, worker characteristics, worker functions, work fields, working conditions, tools and methods, products and services, and so on. Job analysis data can come from job incumbents, supervisors, and analysts who specialize in the analysis of jobs. Data may also be provided by higher management levels or subordinates in some cases. A considerable body of literature has been published on the topic of job analysis (Ash, Levine, & Sistrunk, 1983; Dierdorff & Wilson, 2003; Gael, 1983; Harvey, 1991; Morgeson & Campion, 1997; Morgeson & Dierdorff, 2011; Morgeson et al., 2004; Morgeson et al., 2020; Peterson et al., 2001; U.S. Department of Labor, 1972). Some of the more typical methods of analysis are briefly described below: 1. Conferences and interviews. Conferences or interviews with job experts, such as incumbents and supervisors, are often the first step. During such meetings, information collected typically includes job duties and tasks, and knowledge, skill, ability (KSA), and other worker characteristics. 2. Questionnaires. Questionnaires are used to collect information efficiently from a large number of people. Questionnaires require considerable prior knowledge of the job to form the basis of the items (e.g., primary tasks). Often this information is first collected through conferences and interviews, and then the questionnaire is constructed and used to collect judgments about the job (e.g., importance and time spent on each task). Some standardized questionnaires have been developed which can be applied to all jobs to collect basic information on tasks and requirements. Examples of standardized questionnaires are the Position Analysis Questionnaire (McCormick, Jeanneret, & Mecham, 1972) and the Occupational Information Network (O*NET; Peterson et al., 2001). 3. Inventories. Inventories are much like questionnaires, except they are simpler in format. They are usually simple checklists where the job expert checks whether a task is performed or an attribute is required. 4. Critical incidents. This form of job analysis focuses only on aspects of worker behavior which are especially effective or ineffective. 5. Work observation and activity sampling. Quite often job analysis includes the actual observation of work performed. More sophisticated technologies involve statistical sampling of work activities. 6. Diaries. Sometimes it is useful or necessary to collect data by having the employee keep a diary of activities on his or her job. 7. Functional job analysis. Task statements can be written in a standardized fashion. Functional job analysis suggests how to write task statements (e.g., start with a verb, be as simple and discrete as possible, etc.). It also involves rating jobs on the degree of data, people, and things requirements. This form of job analysis was developed by the U.S. Department of Labor and has been used to describe over 12,000 jobs as documented in the Dictionary of Occupational Titles (Fine & Wiley, 1971; U.S. Department of Labor, 1977). Very limited research has been done to evaluate the practicality and quality of various job analysis methods for
different purposes. But analysts seem to agree that combinations of methods are preferable to single methods (Levine, Ash, Hall, & Sistrunk, 1983; Morgeson & Campion, 1997; Morgeson et al., 2020). Current approaches to job analysis do not give much attention to analyzing teams. For example, the Dictionary of Occupational Titles (U.S. Department of Labor, 1972) considers “people” requirements of jobs, but does not address specific teamwork KSAs. Likewise, recent reviews of the literature mention some components of teamwork such as communication and coordination (e.g., Harvey, 1991), but give little attention to other teamwork KSAs. Thus, job analysis systems may need to be revised. The Occupational Information Network (O*NET) reflects a major job analysis system that has replaced the DOT (Peterson et al., 2001). Although not explicitly addressing the issue of Teamwork KSAs, it does contain a large number of worker attribute domains that may prove useful. Teamwork KSAs are more likely to emerge with conventional approaches to job analysis because of their unstructured nature (e.g., interviews), but structured approaches (e.g., questionnaires) will have to be modified to query about teamwork KSAs. 5.5 Other Approaches Variance analysis is a tool used to identify areas of technological uncertainty in a production process (Davis & Wacker, 1982). It aids the organization in designing jobs to allow job holders to control the variability in their work. Industrial engineers have also created many techniques to help job designers visualize operations in order to improve efficiencies, which has led to the development of a considerable literature on the topic of time and motion analysis (e.g., Mundel, 1985; Niebel, 1988). Some of these techniques include: process charts (graphically represent separate steps or events that occur during performance of a task or series of actions); flow diagrams (utilize drawings of an area or building in which an activity takes place and use lines, symbols and notations to help designers visualize the physical layout of the work); possibility guides (tools for systematically listing all possible changes suggested for a particular activity or output, and examine the consequences of suggestions to aid in selecting the most feasible changes); and network diagrams (describe complex relationships, where a circle or square represents a “status. a partial or complete service, or substantive output. Heavy lines represent “critical paths. which determine the minimum expected completion time for a project). Linkage analysis is another technique used by human factors specialists to represent relationships (i.e., “links”) between components (i.e., people or things) in a work system (Sanders & McCormick, 1987). Designers of physical work arrangements use tools (i.e., link tables, adjacency layout diagrams, and spatial operational sequences) to represent relationships between components in order to better understand how to arrange components to minimize the distance between frequent or important links. 5.6 Example of an Evaluation of a Job Design Studies conducted by Campion and McClelland (1991, 1993) are described as an illustration of an evaluation of a job redesign project. They illustrate the value of considering an interdisciplinary perspective. The setting was a large financial services company. The units under study processed the paperwork in support of other units that sold the company’s products. Jobs had been designed in a mechanistic manner such that individual employees prepared, sorted, coded, and computer input the paper flow. The organization viewed the jobs as too mechanistically designed. Guided by the motivational approach, the project
JOB AND TEAM DESIGN
intended to enlarge jobs by combining existing jobs in order to attain three objectives: (1) enhance motivation and satisfaction of employees; (2) increase incumbent feelings of ownership of the work, thus increasing customer service; and (3) maintain productivity in spite of potential lost efficiencies from the motivational approach. The consequences of all approaches to job design were considered. It was anticipated that the project would increase motivational consequences, decrease mechanistic and perceptual/motor consequences, and have no effect on biological consequences (Table 1). The evaluation consisted of collecting detailed data on job design and a broad spectrum of potential benefits and costs of enlarged jobs. The research strategy involved comparing several varieties of enlarged jobs with each other and with unenlarged jobs. Questionnaire data were collected and focused team meetings were conducted with incumbents, managers, and analysts. The study was repeated at five different geographic sites. Results indicated enlarged jobs had the benefits of more employee satisfaction, less boredom, better quality, and better customer service; but they also had the costs of slightly higher training, skill, and compensation requirements. Another finding was that all potential costs of enlarging jobs were not observed, suggesting that redesign can lead to benefits without incurring every cost in a one-to-one fashion. In a two-year follow-up evaluation study, it was found that the costs and benefits of job enlargement had changed substantially over time, depending on the type of enlargement. Task enlargement, which was the focus of the original study, had mostly long-term costs (e.g., lower satisfaction, efficiency, and customer service, and more mental overload and errors). Conversely, knowledge enlargement, which emerged as a form of job design since the original study, had mostly benefits (e.g., higher satisfaction and customer service, and lower overload and errors). There are several important implications of the latter study. First, it illustrates that the long-term effects of job design changes can be different than the short-term effects. Second, it shows the classic distinction between enlargement and enrichment (Herzberg, 1966) in that simply adding more tasks did not improve the job, but adding more knowledge opportunities did. Third, it illustrates how the job design process is iterative. In this setting, the more favorable knowledge enlargement was discovered only after gaining experience with task enlargement. Fourth, as in the previous study, it shows that it is possible in some situations to gain benefits of job design without incurring all the potential costs, thus minimizing the trade-offs between the motivational and mechanistic approaches to job design. 5.7 Example of an Evaluation of a Team Design Studies conducted by the authors and their colleagues are described here as an illustration of an evaluation of a team design project (Campion et al., 1993; Campion et al., 1995). They illustrate the use of multiple sources of data and multiple types of team effectiveness outcomes. The setting was the same financial services company as in the example job design evaluation described in Section 5.6. Questionnaires based on Table 4 were administered to 391 clerical employees in 80 teams and 70 team managers in the first study (Campion et al., 1993) and to 357 professional workers in 60 teams (e.g., systems analysts, claims specialists, underwriters) and 93 managers in the second study (Campion et al., 1995) to measure teams’ design characteristics. Thus, two sources of data were used, team members and team managers, to measure the team design characteristics. In both studies, effectiveness outcomes included the organization’s employee satisfaction survey, which had been administered at a different time than the team design
409
characteristics questionnaire, and managers’ judgments of teams’ effectiveness, measured at the same time as the team design characteristics. In the first study, several months of records of team productivity were also used to measure effectiveness. Additional effectiveness measures in the second study were employees’ judgments of their team’s effectiveness, measured at the same time as the team design characteristics, managers’ judgments of teams’ effectiveness, measured a second time three months after the team design characteristics, and the average of team members’ most recent performance ratings. Results indicated that all of the team design characteristics had positive relationships with at least some of the outcomes. Relationships were strongest for process characteristics, followed by job design, context, interdependence, and composition characteristics (see Figure 1). Results also indicated that when teams were well designed according to the team design approach, they were higher on both employee satisfaction and team effectiveness ratings than less well designed teams. Results were stronger when the team design characteristics data were from team members, rather than from the team managers. This illustrates the importance of collecting data from different sources to gain different perspectives on the results of a team design project. Collecting data from only a single source may lead one to draw different conclusions about a design project than if one obtains a broader picture of the team design results from multiple sources. Results were also stronger when outcome measures came from employees (employee satisfaction, team member judgments of their teams), managers rating their own teams, or productivity records, than when they came from other managers or from performance appraisal ratings. This illustrates the use of different types of outcome measures to avoid drawing conclusions from overly limited data. This example also illustrates the use of separate data collection methods and times for collecting team design characteristics data versus team outcomes data. A single data collection method and time in which team design characteristics and outcomes are collected from the same source (e.g., team members only) on the same day can create an illusion of higher relationships between design characteristics and outcomes than really exist. Although it is more costly to use multiple sources, methods, and administration times, the ability to draw conclusions from the results is far stronger if one does.
REFERENCES Argyris, C. (1964). Integrating the individual and the organization. New York: Wiley. Ash, R. A., Levine, E. L., & Sistrunk, F. (1983). The role of jobs and job-based methods in personnel and human resources management. In K. M. Rowland & G. R. Ferris (Eds.), Research in personnel and human resources management, Vol. 1. Greenwich, CT: JAI Press. Astrand, P. O., & Rodahl, K. (1977). Textbook of work physiology: Physiological bases of exercise (2nd ed.). New York: McGraw-Hill. Babbage, C. (1835). On the economy of machinery and manufacturers. In L. E. Davis & J. C. Taylor (Eds.), Design of jobs (2nd ed.). Santa Monica, CA: Goodyear. Banker, R. D., Field, J. M., Schroeder, R. G., & Sinha, K. K. (1996). Impact of work teams on manufacturing performance: A longitudinal study. Academy of Management Journal, 39, 867–890. Bass, B. M., & Klubeck, S. (1952). Effects of seating arrangements and leaderless team discussions. Journal of Abnormal and Social Psychology, 47, 724–727. Blauner, R. (1964). Alienation and freedom. Chicago: University of Chicago Press.
410 Brown, B. K., & Campion, M. A. (1994). Biodata phenomenology: Recruiters’ perceptions and use of biographical information in personnel selection. Journal of Applied Psychology, 79, 897–908. Bruning, P. F., & Campion, M. A. (2018). A role-resource approach-avoidance model of job crafting: A multimethod integration and extension of job crafting theory. Academy of Management Journal, 61, 499–522. Bruning, P. F., & Campion, M. A. (2019). Diagnosing and responding to the seven ways your employees and coworkers change their jobs. Business Horizons, 62, 625–635. Buller, P. F., & Bell, C. H. (1986). Effects of team building and goal setting on productivity: A field experiment. Academy of Management Journal, 29, 305–328. Buys, C. J. (1978). Humans would do better without groups. Personality and Social Psychology Bulletin, 4, 123–125. Campion, M. A. (1988). Interdisciplinary approaches to job design: A constructive replication with extensions. Journal of Applied Psychology, 73, 467–481. Campion, M. A. (1989). Ability requirement implications of job design: An interdisciplinary perspective. Personnel Psychology, 42, 1–24. Campion, M. A., & Berger, C. J. (1990). Conceptual integration and empirical test of job design and compensation relationships. Personnel Psychology, 43, 525–554. Campion, M. A., Campion, J. E., & Hudson, J. P. (1994a). Structured interviewing: A note on incremental validity and alternative question types. Journal of Applied Psychology, 79, 998–1002. Campion, M. A., Cheraskin, L., & Stevens, M. J. (1994b). Job rotation and career development: Career-related antecedents and outcomes of job rotation. Academy of Management Journal, 37, 1518–1542. Campion, M. A., & McClelland, C. L. (1991). Interdisciplinary examination of the costs and benefits of enlarged jobs: A job design quasi-experiment. Journal of Applied Psychology, 76, 186–198. Campion, M. A., & McClelland, C. L. (1993). Follow-up and extension of the interdisciplinary costs and benefits of enlarged jobs. Journal of Applied Psychology, 78, 339–351. Campion, M. A., & Medsker, G. J. (1992). Job design. In G. Salvendy (Ed.), Handbook of industrial engineering. New York: Wiley. Campion, M. A., Medsker, G. J., & Higgs, A. C. (1993). Relations between work group characteristics and effectiveness: Implications for designing effective work groups. Personnel Psychology, 46, 823–850. Campion, M. A., Mumford, T. V., Morgeson, F. P., & Nahrgang, J. D. (2005). Work redesign: Eight obstacles and opportunities. Human Resource Management, 44, 367–390. Campion, M. A., Papper, E. M., & Medsker, G. J. (1996). Relations between work team characteristics and effectiveness: A replication and extension. Personnel Psychology, 49, 429–452. Campion, M. A., & Stevens, M. J. (1991). Neglected questions in job design: How people design jobs, influence of training, and task-job predictability. Journal of Business and Psychology, 6, 169–191. Campion, M. A., & Thayer, P. W. (1985). Development and field evaluation of an interdisciplinary measure of job design. Journal of Applied Psychology, 70, 29–43. Campion, M. A., & Thayer, P. W. (1987). Job design: Approaches, outcomes, and trade-offs. Organizational Dynamics, 15(3), 66–79. Caplan, R. D., Cobb, S., French, J. R. P., Van Harrison, R., & Pinneau, S. R. (1975). Job demands and worker health: Main effects and occupational differences. HEW Publication No. (NIOSH) 75-160. Washington, DC: U.S. Government Printing Office. Cartwright, D. (1968). The nature of team cohesiveness. In D. Cartwright & A. Zander (Eds.), Team dynamics: Research and theory (3rd ed.). New York: Harper & Row. Chapanis, A. (1970). Relevance of physiological and psychological criteria to man-machine systems: The present state of the art. Ergonomics, 13, 337–346. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand-McNally.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Cummings, T. G. (1978). Self-regulating work teams: A sociotechnical synthesis. Academy of Management Review, 3, 625-634. Cummings, T. G. (1981). Designing effective work groups. In P. C. Nystrom & W. H. Starbuck, (Eds.), Handbook of organization design, Vol. 2., New York: Oxford University Press. Davis, L. E. (1957). Toward a theory of job design. Journal of Industrial Engineering, 8, 305–309. Davis, L. E. (1971). The coming crisis for production management: Technology and organization. International Journal of Production Research, 9, 65–82. Davis, L. E. (1982). Organization design. In G. Salvendy (Ed.), Handbook of industrial engineering. New York: Wiley. Davis, L. E., Canter, R. R., & Hoffman, J. (1955). Current job design criteria. Journal of Industrial Engineering, 6(2), 5–8, 21–23. Davis, L. E., & Taylor, J. C. (1979). Design of jobs (2nd ed.). Santa Monica, CA: Goodyear. Davis, L. E., & Valfer, E. S. (1965). Intervening responses to changes in supervisor job design. Occupational Psychology, 39, 171–189. Davis, L. E., & Wacker, G. L. (1982). Job design. In G. Salvendy (Ed.), Handbook of industrial engineering. New York: Wiley. Davis, L. E., & Wacker, G. L. (1987). Job design. In G. Salvendy (Ed.), Handbook of human factors. New York: Wiley. Dierdorff, E. C., & Wilson, M. A. (2003). A meta-analysis of job analysis reliability. Journal of Applied Psychology, 88, 635–646. Dyer, J. (1984). Team research and team training: A state-of-the-art review. In F. A. Muckler (Ed.), Human factors review. Santa Monica, CA: Human Factors Society. Edwards, J. R., Scully, J. A., & Brtek, M. D. (1999). The measurement of work: Hierarchical representation of the multimethod job design questionnaire. Personnel Psychology, 52, 305–334. Edwards, J. R., Scully, J. A., & Brtek, M. D. (2000). The nature and outcomes of work: A replication and extension of interdisciplinary work-design research. Journal of Applied Psychology, 85, 860–868. Emory, F. E., & Trist, E. L. (1960). Sociotechnical systems. In C. W. Churchman & M. Verhulst, (Eds.), Management sciences, models, and techniques, Vol. 2. London: Pergamon Press. Fine, S. A., & Wiley, W. W. (1971). An introduction to functional job analysis. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Ford, R. N. (1969). Motivation through the work itself. New York: American Management Association. Fried, Y., & Ferris, G. R. (1987). The validity of the job characteristics model: A review and metaanalysis. Personnel Psychology, 40, 287–322. Gael, S. (1983). Job analysis: A guide to assessing work activities. San Francisco, CA. Jossey-Bass. Gallagher, C. C., & Knight, W. A. (1986). Team technology production methods in manufacture. Chichester: Ellis Horwood Limited. Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Benston, C. (1987). Meta-analysis of assessment center validity (Monograph). Journal of Applied Psychology, 72, 493–511. Gilbreth, F. B. (1911). Motion study: A method for increasing the efficiency of the workman. New York: Van Nostrand. Gladstein, D. L. (1984). Groups in context: A model of task group effectiveness. Administrative Science Quarterly, 29, 499–517. Globerson, S., & Crossman, E. R. (1976). Nonrepetitive time: An objective index of job variety. Organizational Behavior and Human Performance, 17, 231–240. Goodman, P. S., Ravlin, E. C., & Argote, L. (1986). Current thinking about teams: Setting the stage for new ideas. In P. S. Goodman and Associates (Eds.), Designing effective work teams. San Francisco: Jossey-Bass. Grandjean, E. (1980). Fitting the tasks to the man: An ergonomic approach. London: Taylor & Francis.
JOB AND TEAM DESIGN Grant, A. M. (2007). Relational job design and the motivation to make a prosocial difference. Academy of Management Review, 32, 393–417. Grant, A. M. (2008), The significance of task significance: Job performance effects, relational mechanisms, and boundary conditions. Journal of Applied Psychology, 93, 108–124. Grant, A. M., Campbell, E. M., Chen, G., Cottone, K., Lapedis, D., & Lee, K. (2007). Impact and the art of motivation maintenance: The effects of contact with beneficiaries on persistence behavior. Organizational Behavior and Human Decision Processes, 103, 53–67. Grant, A. M. & Parker, S. K. (2009). Redesigning work design theories: The rise of relational and proactive perspectives. The Academy of Management Annals, 3, 317–375. Griffin, R. W. (1982). Task design: An integrative approach. Glenview, IL: Scott-Foresman. Griffin, R. W. (1989). Work redesign effects on employee attitudes and behavior: A long-term experiment. In Academy of Management Best Papers Proceedings (pp. 214–219). Washington, DC. Guzzo, R. A., & Shea, G. P. (1992). Group performance and intergroup relations in organizations. In M. D. Dunnette, & L. M. Hough (Eds.), Handbook of industrial and organizational psychology, vol. 3. Palo Alto, CA: Consulting Psychologists Press. Hackman, J. R. (1987). The design of work teams. In J. Lorsch (Ed.), Handbook of organizational behavior. Englewood Cliffs, NJ: Prentice-Hall. Hackman, J. R. (2002). Leading teams: Setting the stage for great performances. Boston: Harvard Business School Press. Hackman, J. R., & Oldham, G. R. (1980). Work redesign. Reading, MA: Addison-Wesley. Hammond, R. W. (1971). The history and development of industrial engineering. In H. B. Maynard (Ed.), Industrial engineering handbook (3rd ed.). New York: McGraw-Hill. Harkins, S. G. (1987). Social loafing and social facilitation. Journal of Experimental Social Psychology, 23, 1–18. Harvey, R. J. (1991). Job analysis. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 2,(2nd ed.). Palo Alto, CA: Consulting Psychologists Press. Hay, G. J., Klonek, F. E., & Parker, S. K. (2020). Diagnosing rare diseases: A sociotechnical approach to the design of complex work systems. Applied Ergonomics, 86. Herzberg, F. (1966). Work and the nature of man, Cleveland, OH: World. Hoerr, J. (1989). The payoff from teamwork. Business Week, July 10, 56–62. Hofstede, G. (1980). Culture’s consequences. Beverly Hills, CA: Sage. Homans, G. C. (1950). The human group. New York: Harcourt, Brace, and World. Hoppock, R. (1935). Job satisfaction. New York: Harper and Row. Hornung, S., Rousseau, D. M., & Glaser, J. (2008). Creating flexible work arrangements through idiosyncratic deals. Journal of Applied Psychology, 93, 655–664. Hornung, S., Rousseau, D. M., Weigl, M., Muller, A., & Glaser, J. (2014). Redesigning work through idiosyncratic deals. European Journal of Work and Organizational Psychology, 23, 608–626. Humphrey, S. E., Nahrgang, J. D., & Morgeson, F. P. (2007). Integrating motivational, social, and contextual work design features: A meta-analytic summary and theoretical extension of the work design literature. Journal of Applied Psychology, 92, 1332–1356. Hyer, N. L. (1984). Management’s guide to team technology. In N. L. Hyer (Ed.), Team technology at work. Dearborn, MI: Society of Manufacturing Engineers. Ilgen, D. R., Hollenbeck, J. R., Johnson, M. D., & Jundt, D. (2005). Teams in organizations: From input-process-output models to IMOI models. Annual Review of Psychology, 56, 517–543. Janis, I. L. (1972). Victims of groupthink. Boston: Houghton-Mifflin. Kelly, J. (1982). Scientific management, job redesign, and work performance. London: Academic Press.
411 Kiggundu, M. N. (1983). Task interdependence and job design: Test of a theory. Organizational Behavior and Human Performance, 31, 145–172. Konz, S. (1983). Work design: Industrial ergonomics (2nd ed.). Columbus, OH: Grid. Lawler, E. E. (1986). High-involvement management: Participative strategies for improving organizational performance. San Francisco: Jossey-Bass. LePine, J. A., Piccolo, R. F., Jackson, C. L., Mathieu, J. E., & Saul, J. R. (2008). A meta-analysis of teamwork processes: Tests of a multidimensional model and relationships with team effectiveness criteria. Personnel Psychology, 61, 273–307. Leventhal, G. S. (1976). The distribution of rewards and resources in teams and organizations. In L. Berkowitz & E. Walster (Eds.), Advances in experimental social psychology, Vol. 9. New York: Academic Press. Levine, E. L., Ash, R. A., Hall, H., & Sistrunk, F. (1983). Evaluation of job analysis methods by experienced job analysts. Academy of Management Journal, 26, 339–348. Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice Hall. Majchrzak, A. (1988). The human side of factory automation. San Francisco: Jossey-Bass. Manz, C. C., & Sims, H. P. (1987). Leading workers to lead themselves: The external leadership of self-managing work teams. Administrative Science Quarterly, 32, 106–129. Mathieu, J. E., Gallagher, P. T., Domingo, M. A., & Klock, E. A. (2019). Embracing complexity: Reviewing the past decade of team effectiveness research. Annual Review of Organizational Psychology and Organizational Behavior, 6, 17–46. Mayo, E. (1933). The human problems of an industrial civilization. New York Macmillan. McCormick, E. J., Jeanneret, P. R., & Mecham, R. C. (1972). A study of job characteristics and job dimensions as based on the Position Analysis Questionnaire (PAQ). Journal of Applied Psychology, 56, 347–368. McGrath, J. E. (1984). Teams: Interaction and performance. Englewood Cliffs, NJ: Prentice-Hall. Meister, D. (1971). Human factors: Theory and practice. New York: Wiley. Milkovich, G. T., & Newman, J. M. (1993). Compensation (4th ed.). Homewood, IL: Business Publications. Mintzberg, H. (1979). The structuring of organizations: A synthesis of the research. Englewood Cliffs, NJ: Prentice-Hall. Morgeson, F. P. (2005). The external leadership of self-managing teams: Intervening in the context of novel and disruptive events. Journal of Applied Psychology, 90, 497–508. Morgeson, F. P., Aiman-Smith, L. D., & Campion, M. A. (1997). Implementing work teams: Recommendations from organizational behavior and development theories. In M. Beyerlein, D. Johnson, & S. Beyerlein (Eds.), Advances in interdisciplinary studies of work teams: Issues in the implementation of work teams (Vol. 4, pp. 1–44). Greenwich, CT: JAI Press. Morgeson, F. P., Brannick, M. T., & Levine, E. L. (2020). Job and work analysis: Methods, research, and applications for human resource management (3rd ed.). Thousand Oaks, CA: Sage Publications. Morgeson, F. P., & Campion, M. A. (1997). Social and cognitive sources of potential inaccuracy in job analysis. Journal of Applied Psychology, 82, 627–655. Morgeson, F. P., & Campion, M. A. (2000). Accuracy in job analysis: Toward an inference-based model. Journal of Organizational Behavior, 21, 819–827. Morgeson, F. P., & Campion, M. A. (2002). Minimizing tradeoffs when redesigning work: Evidence from a longitudinal quasi-experiment. Personnel Psychology, 55, 589–612.
412 Morgeson, F. P., & Campion, M. A. (2003). Work design. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.), Handbook of psychology: Industrial and organizational psychology, Vol. 12 (pp. 423–452). Hoboken, NJ: Wiley. Morgeson, F. P., Delaney-Klinger, K., Mayfield, M. S., Ferrara, P., & Campion, M. A. (2004). Self-presentation processes in job analysis: A field experiment investigating inflation in abilities, tasks, and competencies. Journal of Applied Psychology, 89, 674–686. Morgeson, F. P., DeRue, D. S., & Karam, E. P. (2010). Leadership in teams: A functional approach to understanding leadership structures and processes. Journal of Management, 36, 5–39. Morgeson, F. P., & Dierdorff, E. C. (2011). Work analysis: From technique to theory. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology. Vol. 2 (pp. 3–41). Washington, DC: APA. Morgeson, F. P., Dierdorff, E. C., & Hmurovic, J. L. (2010). Work design in situ: Understanding the role of occupational and organizational context. Journal of Organizational Behavior, 31, 351–360. Morgeson, F. P., Garza, A. S., & Campion, M. A. (2012). Work design. In N. W. Schmitt & S. Highhouse (Eds.), Handbook of psychology: Industrial and organizational psychology, Vol. 12 (2nd ed., pp. 525–559), Hoboken, NJ: John Wiley & Sons. Morgeson, F. P., & Humphrey, S. E. (2006). The work design questionnaire (WDQ): Developing and validating a comprehensive measure for assessing job design and the nature of work. Journal of Applied Psychology, 91, 1321–1339. Morgeson, F. P., & Humphrey, S. E. (2008). Job and team design: Toward a more integrative conceptualization of work design. In J. Martocchio (Ed.), Research in personnel and human resources management (Vol. 27, pp. 39–91), Bingley: Emerald Group Publishing Limited. Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The importance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology, 58, 583-611. Mundel, M. E. (1985). Motion and time study: Improving productivity (6th ed.), Englewood Cliffs, NJ: Prentice-Hall. Niebel, B. W. (1988). Motion and time study (8th ed.). Homewood, IL: Irwin. O’Reilly, C., Parlette, G., & Bloom, J. (1980). Perceptual measures of task characteristics: The biasing effects of differing frames of reference and job attitudes. Academy of Management Journal, 23, 118–131. O’Reilly, C., & Roberts, K. H. (1977). Task group structure, communication, and effectiveness. Journal of Applied Psychology, 62, 674–681. Parker, S. K. (2003). Longitudinal effects of lean production on employee outcomes and the mediating role of work characteristics. Journal of Applied Psychology, 88, 620–634. Parker, S. K. (2014). Beyond motivation: Job and work design for development, health, ambidexterity, and more. Annual Review of Psychology, 65, 661–691. Parker, S. K., Morgeson, F. P., & Johns, G. (2017). One hundred years of work design research: Looking back and looking forward. Journal of Applied Psychology, 102, 403–420. Parker, S. K., Van den Broeck, A., & Holman, D. (2017). Work design influences: A synthesis of multilevel factors that affect the design of jobs. Academy of Management Annals, 11, 267–308. Pasmore, W. A. (1988). Designing effective organizations: The sociotechnical systems perspective. New York: Wiley. Pasmore, W., Francis, C., & Haldeman, J. (1982). Sociotechnical systems: A North American reflection on empirical studies of the seventies. Human Relations, 35, 1179–1204. Pearson, R. G. (1971). Human factors engineering. In H. B. Maynard (Ed.), Industrial engineering handbook (3rd ed.). New York: McGraw-Hill.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Perkins, A. L., & Abramis, D. J. (1990). Midwest federal correctional institution. In J. R. Hackman (Ed.), Groups that work (and those that don’t). San Francisco: Jossey-Bass. Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., Fleishman, E. A., Campion, M. A., Levin, K. Y., Mayfield, M. S., Morgeson, F. P., Pearlman, K., Gowing, M. K., Lancaster, A., & Dye, D. (2001). Understanding work using the occupational information network (O*NET): Implications for practice and research. Personnel Psychology, 54, 451–492. Porter, L. W., Lawler, E. E., & Hackman, J. R. (1987). Ways teams influence individual work effectiveness. In R. M. Steers & L. W. Porter (Eds.), Motivation and work behavior (4th ed.). New York: McGraw-Hill. Posthuma, R. A., Morgeson, F. P., & Campion, M. A. (2002). Beyond employment interview validity: A comprehensive narrative review of recent research and trends over time. Personnel Psychology, 55, 1–81. Rousseau, D. M. (1977). Technological differences in job characteristics, employee satisfaction, and motivation: A synthesis of job design research and sociotechnical systems theory. Organizational Behavior and Human Performance, 19, 18–42. Rousseau, D. M., Ho, V. T., and Greenberg, J. (2006). I-deals: Idiosyncratic terms in employment relationships. Academy of Management Review, 31, 977–994. Salas, E., Dickinson, T. L., Converse, S. A., & Tannenbaum, S. I. (1992). Toward an understanding of team performance and training. In R. W. Swezey & E. Salas (Eds.), Teams: Their training and performance. Norwood, NJ: Ablex. Salvendy, G., & Smith, M. J. (Eds.) (1981). Machine pacing and occupational stress. London: Taylor & Francis. Sanders, M. S., & McCormick, E. J. (1987). Human factors in engineering and design (6th ed.), New York: McGraw-Hill. Shaw, M. E. (1983). Team composition. In H. H. Blumberg, A. P. Hare, V. Kent, & M. Davies (Eds.), Small teams and social interaction. New York: Wiley. Shea, G. P., & Guzzo, R. A. (1987). Teams as human resources. In K. M. Rowland & G. R. Ferris (Eds.), Research in personnel and human resources. Greenwich, CT: JAI Press. Sims, H. P., Szilagyi, A. D., & Keller, R. T. (1976). The measurement of job characteristics. Academy of Management Journal, 19, 195–212. Smith, A. (1776). An inquiry into the nature and causes of the wealth of nations, R. H. Campbell & A. S. Skinner (Eds.). Indianapolis: Liberty Classics, Staw, B. M. & Boettger, R. D. (1990). Task revision: A neglected form of work performance. Academy of Management Journal, 33, 534–559. Steers, R. M., & Mowday, R. T. (1977). The motivational properties of tasks. Academy of Management Review, 2, 645–658. Steiner, I. D. (1972). Group process and productivity. New York: Academic Press. Stevens, M. J., & Campion, M. A. (1994a). Staffing teams: Development and validation of the teamwork-KSA test. Paper presented at the annual meeting of the Society of Industrial and Organizational Psychology, Nashville, TN. Stevens, M. J., & Campion, M. A. (1994b). The knowledge, skill, and ability requirements for teamwork: Implications for human resource management. Journal of Management, 20, 503–530. Stevens, M. J., & Campion, M. A. (1999). Staffing work teams: Development and validation of a selection test for teamwork settings. Journal of Management, 25, 207–228. Sundstrom, E., DeMeuse, K. P., & Futrell, D. (1990). Work teams: Applications and effectiveness. American Psychologist, 45, 120–133. Swezey, R. W., & Salas, E. (1992). Teams: Their training and performance. Norwood, NJ: Ablex.
JOB AND TEAM DESIGN Tannenbaum, S. I., Beard, R. L., & Salas, E. (1992). Team building and its influence on team effectiveness: An examination of conceptual and empirical developments. In K. Kelley, (Ed.), Issues, theory, and research in industrial and organizational psychology. Amsterdam: Elsevier. Taylor, F. W. (1911). The principles of scientific management. New York: Norton. Taylor, J. C. (1979). Job design criteria twenty years later. In L. E. Davis & J. C. Taylor (Eds.), Design of jobs (2nd ed.). New York: Wiley. Thompson, J. D. (1967). Organizations in action. New York: McGraw-Hill. Tichauer, E. R. (1978). The biomechanical basis of ergonomics: Anatomy applied to the design of work situations. New York: Wiley. Turnage, J. J. (1990). The challenge of new workplace technology for psychology. American Psychologist, 45, 171–178. Turner, A. N., & Lawrence, P. R. (1965). Industrial jobs and the worker: An investigation of response to task attributes. Boston, MA: Harvard Graduate School of Business Administration. U.S. Department of Labor (1972). Handbook for analyzing jobs. Washington, DC: U.S. Government Printing Office. U.S. Department of Labor (1977). Dictionary of occupational titles (4rth Ed.). Washington, DC: U.S. Government Printing Office. U.S. Nuclear Regulatory Commission (1981). Guidelines for control room design reviews (NUREG 0700). Washington, DC: Nuclear Regulatory Commission. Vroom, V. H. (1964). Work and motivation. New York: Wiley. Walker, C. R., & Guest, R. H. (1952). The man on the assembly line. Cambridge, MA: Harvard University Press. Wall, T. B., Kemp, N. J., Jackson, P. R., & Clegg, C. W. (1986). Outcomes of autonomous workgroups: A long-term field experiment. Academy of Management Journal, 29, 280–304.
413 Warr, P., & Wall, T. (1975). Work and well-being. Harmondsworth: Penguin. Weber, A., Fussler, C., O’Hanlon, J. F., Gierer, R., & Grandjean, E. (1980). Psychophysiological effects of repetitive tasks. Ergonomics, 23, 1033–1046. Welford, A. T. (1976). Skilled performance: Perceptual and motor skills. Glenview, IL: Scott-Foresman. Wicker, A., Kirmeyer, S. L., Hanson, L., & Alexander, D. (1976). Effects of manning levels on subjective experiences, performance, and verbal interaction in groups. Organizational Behavior and Human Performance, 17, 251–274. Wilson, J. R., & Haines, H. M. (1997). Participatory ergonomics. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (2nd ed.). New York: Wiley. Wong, C. S. (1989). Task interdependence: The link between task design and job design. doctoral dissertation. West Lafayette, IN: Purdue University. Wong, C. S., & Campion, M. A. (1991). Development and test of a task level model of job design. Journal of Applied Psychology, 76, 825-837. Woodman, R. W., & Sherwood, J. J. (1980). The role of team development in organizational effectiveness. A critical review. Psychological Bulletin, 88, 166–186. Wrzesniewski, A., & Dutton, J. E. (2001). Crafting a job: Revisioning employees as active crafters of their work. Academy of Management Review, 26, 179–201. Zajonc, R. B. (1965). Social facilitation. Science, 149, 269–274. Zander, A. (1979). The study of group behavior over four decades. Journal of Applied Behavioral Science, 15, 272–282. Zhang, F., & Parker, S. K. (2016). Reorienting job crafting research: A hierarchical structure of job crafting concepts and integrative review. Journal of Organizational Behavior, 40, 126–146.
CHAPTER
16
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Tiffany M. Bisbey Rice University Houston, Texas
Rebecca Grossman and Kareem Panton Hofstra University Hempstead, New York
Chris W. Coultas Leadership Worth Following, LLC Irving, Texas
Eduardo Salas Rice University Houston, Texas
1 2
INTRODUCTION WHAT IS TRAINING?
415
2.1
415
2.2 2.3 3
4
414
The Science of Training Before Training Training Summary
419 420
DURING TRAINING
421
3.1
421
Training Delivery
3.2
Facilitator
421
3.3
Practice Scenarios
421
3.4
Feedback
423
3.5
Training Implementation
423
AFTER TRAINING
424
4.1
424
Evaluation
1 INTRODUCTION Training is a key component in the life of the modern organization and to sustain a competitive advantage. New and improved instructional strategies, learning options, and technologies have made the training industry larger and more diverse than ever. The Association for Talent Development, a group responsible for reporting industry trends in training and development, notes that organizations spent an average of $1299 and 34 hours per employee on formal learning initiatives in 2018. Considering organizations invest so much in training, it follows that training should be designed in such a way as to maximize its effectiveness. Not only can proper training be a boon to organizational health, but improper training can have severely negative consequences that effective training would otherwise mitigate. At the most dramatic, improper (or lack of) training can lead to injury or death in critical high-risk industries—as early as 2001, reports indicated the $131.7 billion was spent annually due to employee injuries and death (National Safety 414
4.2
Kirkpatrick’s Framework
424
4.3
Tools
424
4.4
Costs
426
4.5
Facilitating Transfer
426
4.6
Post-Training Climate
426
4.7
Job Aids
427
CONCLUSION
428
5.1
Future Directions
428
ACKNOWLEDGMENTS
428
REFERENCES
428
5
Council, 2010); these figures increased to $170.8 billion during 2018 (National Safety Council, 2020). Training can, and does, differ greatly across contexts in regard to specific emphases (e.g., preventing accidents, improving service, creating better products), and these generally vary depending on the organization and its mission. However, the universal goal is to improve the quality of the workforce in order to strengthen the bottom line of the organization. If this is to happen, those who design and deliver the training material must take advantage of the ever-growing field of research and evidence-based principles (Aguinis & Kraiger, 2009; Noe, Clarke, & Klein, 2014) and apply it to the analysis, design, delivery, evaluation, and transfer of training systems. This chapter will outline the relevant research on what constitutes an effective training system. To this end, we have updated our previous review (see Coultas, Grossman, & Salas, 2012) to incorporate recent work on workplace training systems to supplement the foundational research that continues
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS
to remain relevant in our science (e.g., Aguinis & Kraiger, 2009; Burke & Hutchins, 2007; Kozlowski & Salas, 2010; Sitzmann et al., 2008). 2
WHAT IS TRAINING?
At its most basic level, training can be defined as any systematic effort to impart knowledge, skills, and attitudes (KSAs) on employees with the end goal of improving work performance. To achieve this broad goal, training must change some (or all) of the following characteristics of the trainee: knowledge, patterns of cognition, attitudes, and motivation. However, this cannot occur if training is designed and delivered haphazardly and with disregard to the scientific principles developed over decades of research on workplace training. Training efforts must have a keen eye toward the science of training and learning—it must provide opportunities not only to learn the necessary KSAs, but also to practice and apply this learning and receive feedback regarding these attempts within the training (Aguinis & Kraiger, 2009). In today’s rapidly expanding technological landscape, there may be a tendency to immediately accept the newest technology in training as the best or most appropriate. However, technological advances do not necessarily equate to psychological advances, and many times, training strategies independent of advanced technology have proven to be just as effective (if not more so) than their more advanced counterparts. This is discussed in greater detail later in the chapter, but this serves to illustrate the point that the elements of training have a very real scientific basis. Technology should work with users to better serve their needs, rather than users needing to adapt to the needs of technology. Chen and Klimoski (2007) reviewed the literature on training and development, and found that the training literature is based on work that is scientifically rigorous and has “generated a large body of knowledge pertaining to learning antecedents, processes, and outcomes” (p. 188). This is a major improvement over earlier literature reviews that famously described then-current training research as “nonempirical, nontheoretical, poorly written, and dull” (Campbell, 1971, p. 565). While research avenues in the field of training are by no means exhausted, theories, models, and frameworks provide effective guidelines for developing and implementing training programs (Salas & Cannon-Bowers, 2001). These theories and frameworks are described and referenced throughout this chapter. 2.1 The Science of Training Lewin (1951) stated that there is “nothing more practical than a good theory.” This timeless quote illustrates the fact that, while some emphasize a divide between basic and applied research or between academics and practice, it is empirically grounded theory that enables practitioners to design, develop, and deliver training programs with confidence. Accordingly, we do not overlook the theoretical underpinnings that drive the science and practice of training. In the previous review by Coultas and colleagues (2012), we identified the important role of transfer in theoretical models and empirical studies. Updated models of training effectiveness have also been proposed. In the following paragraphs, we briefly review the training theories described in our previous edition and discuss theoretical developments that have since occurred. If the ultimate goal of training is to see positive organizational change, trainees must be able to transfer what they have learned in the training environment and apply it to work within the organizational setting. Accordingly, it is vital for researchers and practitioners alike to understand under what contexts this transfer of training is likely to occur. Research continues to
415
support Thayer and Teachout’s (1995) widely accepted model of training transfer outlined in previous reviews (Coultas et al., 2012; Salas, Wilson, et al., 2006). The factors they identified as maximizing transfer have been separately supported within the literature and include trainee: (1) reactions to previous training (Baldwin & Ford, 1988), (2) education (Mathieu, Tannenbaum, & Salas, 1992), (3) pretraining self-efficacy (Ford et al., 1992), (4) ability (Ghiselli, 1966), (5) locus of control (Williams et al., 1991), (6) job involvement (Noe & Schmitt, 1986), and (7) career/job attitudes (Williams, Thayer, & Pond, 1991). Additionally, trainee reactions to the training/task at hand regarding overall likability (Kirkpatrick, 1976), perceived instrumentality of training (Velada et al., 2007), and expectancy that transfer can/will occur (i.e., self-efficacy, Latham, 1989; Tannenbaum et al., 1991) have all been shown to lead to greater training transfer. Furthermore, organizational climate and transfer-enhancing activities can facilitate the transfer of training. Blume et al. (2010) conducted a meta-analysis and found that an organizational climate that supports training efforts is the most important aspect of the work environment in predicting training transfer. Organizational cues and consequences can encourage transfer. Transfer-enhancing activities such as goal setting, relapse prevention, self-management (Baldwin, Ford, & Blume, 2009), and job aids can assist the trainee in applying learning in the long term. The final factor in the Thayer and Teachout (1995) model is results; transfer of training should be evaluated in the context of whether learned knowledge translates into workplace behavior and whether behavioral change leads to organizationally desirable results. Burke and Hutchins (2008) proposed a model of transfer based on a major review of the training literature. The model, though simple, effectively summarizes much of the research on training, as it identifies personal (i.e., trainer/trainee), training (e.g., design, content), and environmental (i.e., organizational) characteristics as all having significant effects on training transfer (see Figure 1). Furthermore, various aspects are seen as having greater impact at various times in the life cycle of training (e.g., trainer characteristics are more salient pre-training, organizational characteristics are more salient post-training). Research has also emphasized the multifaceted nature of training, consisting of complex interactions between all of these characteristics. Bell and Kozlowski (2008) developed a model that reflects many of these interactions (see Figure 2). This model outlines the various interactions necessary in order for learning to occur and for transfer of new knowledge to on-the-job behavior. Although Bell and Kozlowski’s model admittedly excludes some important aspects of the training experience (most notably, organizational characteristics) it shows the impact that individual characteristics (e.g., cognitive ability, motivation, personality) can have on the effect of training design choices (e.g., error framing, exploratory learning) and how they interact to affect learning and training outcomes. A major advantage of this model is that it is backed by empirical validation conducted using structural equation modeling. Their theoretical model is replicated in Figure 2, but we refer the reader to their original publication for a more complex version, complete with correlation indices. An incredibly in-depth examination of the factors and processes involved in training provided by Tannenbaum and colleagues (1993) remains relevant in current discussions of the training process (Figure 3). The framework considers training from a longitudinal, process-oriented perspective and details how complex the training process and underlying phenomena are. Similarities between this model and the Burke and Hutchins (2008) model are apparent; however, this model includes additional details and identifies vital actions to ensure training success (e.g., training needs analysis). Although it is
416
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Training Content
Work Design & Job Content
Learner Characteristics
Trainer Characteristics
Before
Design/Delivery
Not time bound
Trainer Peers
Organization Size & Structure
Not time bound
During
Trainee
Work Environment
Evaluation
After
Supervisor
Learning
Organization
TRANSFER
Job Performance Figure 1
Burke and Hutchins’ (2008) training transfer model. (Source: Burke & Hutchins, 2008. © 2008 John Wiley & Sons.)
Cognitive Ability Metacognitive Activity
Exploratory Learning
Self-Evaluation Activity Strategic Knowledge
Trait Mastery
State Mastery Orientation
Intrinsic Motivation
Trait Prove
Error Framing
State Prove Orientation
Trait Avoid
Analogical Transfer
Adaptive Transfer
Basic Knowledge
Self-Efficacy
State Avoid Orientation
State Anxiety
Emotion-Control Strategy Trait Anxiety
Figure 2 Bell and Kozlowski’s (2008) model of core training design elements, self-regulatory processes, and learning outcomes. (Source: Bell & Kozlowski 2008. © 2008 American Psychological Association.)
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS
Training Needs Analysis
Individual Characteristics Non-Abilities: - Attitudes - Self-efficacy
Training
Individual Characteristics
- Method - Content - Principles - Instructors
Abilities Non-Abilities - Attitudes - Self-Efficacy - Demographics
Expectation Fulfillment
417
Abilities
Learning
Training Performance (Behavior I)
Job Performance (Behavior II)
Results/ O.E.
Expectations/ Desires Training Reactions
Training Motivation Organizational/ Situational Characteristics
Post-Training Motivation - To Transfer - To Maintain
- To Attend - To Learn Organizational/Situational Characteristics
Task/Job Characteristics - Organizational Climate - Policies & Practices - Trainee Selection/ Notification Process - Situational Constraints Figure 3
- Relevance - Happiness
- Supervisor/Peer Support - Resource Availability - Workload - Organizational Culture - Opportunity to Practice
Maintenance interventions
Tannenbaum et al.’s (1993) training effectiveness model. (Source: Tannenbaum et al. 1993.)
quite comprehensive, it is not very parsimonious. Burke and Hutchins’ (2008) model is an appropriate complement as it presents less factors and relationships in a manner that is easier to understand. Mathieu and Tesluk (2010) described a multilevel perspective of training effectiveness in which several levels of analysis are considered. In this view, training outcomes result from the combined influences of various organizational levels (e.g., individuals, teams, departments). Unlike other multilevel approaches, this framework incorporates both bottom-up (e.g., factors at the training level) and top-down (e.g., factors at the organizational level) influences on training effectiveness while also considering both microlevel outcomes (e.g., specific KSAOs) and macrolevel outcomes (e.g., broader organizational outcomes). Finally, the multilevel perspective put forth by Mathieu and Tesluk (2010) considers the combined effects of training and other human resource interventions (i.e., a compilation approach) as well as the impact of organizational factors on individual-level training outcomes (i.e., a cross-level approach). A visual representation is depicted in Figure 4. Specific issues relevant to training, such as motivation (Chiaburu & Lindsay, 2008; Colquitt, LePine, & Noe, 2000; Stanhope,Pond, & Surface, 2013), individual characteristics and work environment (Bell & Kozlowski, 2008; Tracey, Hinkin, Tannenbaum, & Mathieu, 2001), learner control (Sitzmann et al., 2008), training evaluation (Kraiger et al., 1993), and transfer of training (Burke & Hutchins, 2007; Quinones, 1997), have
been studied extensively. Recently, Blume et al. (2019) advanced a dynamic model of training transfer that details the complex interactions between the learner and the environment that impact transfer (see Figure 5). In a meta-analysis examining the role of the work environment on sustaining training transfer, Hughes et al. (2020) found evidence for the positive role of organizational support, supervisor support, and peer support on training transfer through effects on motivation to transfer. We anticipate a major focus of future research will follow this research in addressing the need to understand what makes the benefits of training last and how the organization can support this. The increasing reliance on teams in organizations has also led to a focus on team training. The science of team training, or training for teamwork competencies, has a long history of impact backed by a strong theoretical foundation and research “in the wild” making it practically relevant (Bisbey, Reyes, Traylor, & Salas, 2019). In fact, a meta-analysis found that team training shows the ability to enhance patient outcomes, such as mortality (Hughes et al., 2016). Moreover, teamwork is recognized as a key driver of safety in the workplace (Salas et al., 2020). Kozlowski and colleagues (2000) explored how training and individual processes lead to team and organizational outcomes. Other models throughout the past decade or more have examined the factors of organizations, individuals, and training that can impact team motivation and performance (Tannenbaum et al., 1993). Salas and colleagues (2007) reviewed and
418
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Focal Level of Influences HRM System
Specific Training Program
Cross-Level Approach
Micro Approach The influence of a Specific Training Program on Targeted Employee KSAOs
The Impact of Unit or Organizational Factors on the Effectiveness of Individual Level Training Outcomes “Regional” Illustration
Individual Senior executive support Organizational structure and systems Organizational culture Situational constraints Top-Down Influences Influences on participants’ pretraining conditions (e.g., motivation, self-efficacy), learning, and transfer
Focal Level of Outcome
Compilation Approach
Macro Approach The Relative Influence of the Training System on Organizational Outcomes
The Combined Influence of the Training and other HR Systems on Organizational Outcomes “Regional” Illustration Organizational
Influences on human capital, social capital, organizational capital outcomes Bottom-Up Influences Accounting for different forms of HR capital Vertical fit Horizontal fit Differentiation of HRM activities
Figure 4
Mathieu and Tesluk’s (2010) multilevel view of training effectiveness. (Source: Mathieu & Tesluk 2010. © 2010 Taylor & Francis.) Office buildings represent the influence of relevent contextual characteristics over time in the work environment
Time Passes Training Pre During K
Evaluate KSAs
S A
First Transfer Attempt
Intent to Transfer
Retain/ Modify
Post K S
Work Behavior
Time Passes_ Process Repeats
Time Passes
Outcomes
A
Discard, Do Not Maintain KSAs
Evaluations of Performance
Integrate Feedback, Intent to Transfer
Second Transfer Attempt
K S A
Work Behavior
Outcomes
Evaluations of Performance
Retain/ Modify Integrate Feedback, Intent to Transfer
Discard, Do Not Maintain KSAs
S A
Discard, Do Not Maintain KSAs
Person represents the influences of relevant individual differences and self-regulatory processes over time and across contexts Note: KSA refers to Knowledge, Skills, & Attitudes
Figure 5
K
Blume et al.’s (2019) dynamic model of training transfer. (Source: Blume et al. 2019. © 2019 Elsevier.)
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS
integrated over 50 models on team effectiveness; the result was a comprehensive yet parsimonious framework of team effectiveness that can be used to tailor training. Recent years have brought a slew of new technological capabilities for the world of training. Learning opportunities are being offered on virtual and online platforms now more than ever, and researchers are working towards understanding the conditions under which e-learning formats might be most effective. Brown and colleagues (2016) focus on the impact of learner attitudes about the transfer environment on training effectiveness, positing that user control of the learning environment and situational strength shape transfer attitudes. This model suggests that learners generally prefer having more discretion and open learning environments, though these formats may result in a lesser degree of appropriate transfer behaviors. There has also been a shift in studying formal training events to understanding work experiences as learning opportunities and other informal ways of developing new KSAs (Tannenbaum et al., 2010). Researchers are beginning to investigate informal field-based learning (IFBL) behaviors and their impact on job performance (e.g., Wolfson, Tannenbaum, Mathieu, & Maynard, 2018; Wolfson, Mathieu, Tannenbaum, & Maynard, 2019). Meta-analytic evidence supports that IFBLs do, in fact, impact performance (Cerasoli et al., 2018). Researchers estimate that informal learning makes up anywhere from 70–90% of all learning activities in organizations, yet it is only beginning to make its way in the organizational sciences (Cerasoli et al., 2018); thus, we expect to see this area to grow exponentially in the coming years. 2.1.1 Summary Researchers and organizations have continued to benefit from developments in our knowledge of the training process and the factors that influence it. However, our knowledge is far from comprehensive. The literature has a heavy theoretical emphasis on training transfer. This is not a misplaced emphasis, as the benefits of learning are little to none if transfer to on-the-job behavior does not occur. Despite the importance of transfer, it would be useful to see theoretical (and empirically grounded) universal models of the individual elements of training consumable to a practitioner audience. We discuss these elements in a format of guidelines for before, during, and after training. 2.2 Before Training 2.2.1 Needs Analysis Before designing or implementing a training program, it is important to understand the needs of the organization and how training might address those needs. This process is referred to as conducting a training needs analysis or needs assessment (Goldstein & Ford, 2002). Specifically, this process involves considering where an organization needs training (i.e., an organizational analysis), what KSAs need to be trained (i.e., a job or task analysis), and who needs to learn them (i.e., a person analysis; Goldstein, 1993). The needs analysis process will lead to: (1) specifying learning objectives; (2) guiding training design and delivery; and (3) developing criterion for training effectiveness. Needs assessment is vital in setting these training goals and determining the readiness of potential participants (Aguinis & Kraiger, 2009). A recent meta-analysis found that leadership training conducted with needs analyses are associated with greater learning outcomes and transfer to on-the-job behaviors (Lacerenza et al., 2017). The needs analysis is a key first step to ensuring any investment in training will ultimately pay off. In this section, we discuss the three types of training needs analyses: organizational, job/task, and person analyses.
419
2.2.2 Organizational Analysis An organizational analysis should be the first step in conducting a training needs analysis. In this process, various organizational aspects that have the potential to impact training delivery and/or transfer are considered, such as organizational climate, culture, norms, resources, constraints, and support (Festner & Gruber, 2008; Goldstein, 1993). Another key element of organizational analysis is determining what objectives are desired to be fulfilled or remediated with training. For example, if organizational leaders feel there is a need to encourage an appreciation for diversity in their workforce, they must reflect this goal in day-to-day practices in order to support a climate where new KSAs in training are supported. An organizational climate that supports transfer and application of trained KSAs is vital for effective training. Considering that organizational factors can have a great impact on the continuity and effectiveness of a training program, organizational analysis is a vital step in training needs analysis. It has been nearly three decades since researchers began recognizing the importance of organizational analysis; this has led to several well-cited studies of the impact of the organization on training. Essentially, what has been found is that organizational climate (e.g., situational cues, consequences, support) one of the strongest predictors of training transfer (Lim & Morris, 2006; Rouiller & Goldstein, 1993; Tracey et al., 1995). For example, an organization implementing a sexual harassment training program that does not foster a climate of safety and openness may see little changes stemming from their training (Knapp & Kustis, 1996). Meta-analyses show that a supportive training environment is key for training transfer (Blume, Ford, Baldwin, & Huang, 2010). Consider attending a training and learning a brand new set of KSAs only to return to the same workplace; it would be very easy to fall back into old behavioral patterns that worked so well. The environment must support transfer, and the organization must be ready before training is ever deployed. 2.2.3 Job/Task Analysis After identifying the organizational characteristics relevant to deploying a potential training program, a task (or job) analysis should be completed to identify characteristics of the actual task being trained for. The task analysis reveals the exact KSAs that should be targeted in the training program so that it has specific, focused learning objectives (Goldstein, 1993). The first step in this process is specifying the essential work functions of the job, as well as the resources required for success. After outlining these generalities, gaps in the specific task behaviors that employees should be engaging in for effective job performance versus how they are currently performing are identified. Furthermore, the contexts under which these tasks will be performed must be specified. The task analysis can be difficult because vague task requirements such as knowledge and attitudes are often difficult to observe in practice, and thus, are likely to be overlooked when designing training. More tacit or complex tasks may require the additional step of conducting an analysis of cognitive demands (e.g., decision making, problem solving). Some competencies are less observable and therefore require a specific cognitive task analysis to uncover them. A more targeted approach for identifying task cognitive processes may be necessary, as awareness of skills and steps in a process tend to fade as expertise advances and behaviors rely more on automaticity (Villachica & Stone, 2010). Cognitive task analysis (CTA) has emerged as the primary tool for understanding these processes (Salas & Klein, 2000). In CTA, various elicitation techniques can be used to specify cognitive processes involved in learning (Cooke, 1999; Klein & Militello, 2001). Three criteria are essential to successful CTA
420
(Klein & Militello, 2001). First, CTA must identify new information regarding trainees’ thought patterns and strategies for learning and task success, as well as other cognitive demands (e.g., cue recognition). This may be done through a variety of elicitation methods, such as structured and unstructured interviews, observing and coding of actual behaviors, probing users during task performance, or elicitation by critiquing. Elicitation by critiquing refers to having experts observe prerecorded scenarios of performance to offer feedback on how it should have been done or behind-the-scenes thought processes (Miller et al., 2006). Second, these findings must be conveyed to training designers, who can incorporate them into the training curriculum. The successful incorporation of CTA findings into training design is the final step of effective CTA and impactful training designs. 2.2.4 Person Analysis The final stage of training needs analysis is the person analysis. Conducting a person analysis involves identifying who needs the training, what they are deficient in, and how effective training will be for individuals (Goldstein, 1993; Tannenbaum & Yukl, 1992). The foundation of person analysis is that not everyone needs the same training, and not everyone responds similarly to all training. Different job domains within organizations require different training, as do individuals with differing levels of expertise (Feldman, 1988). Even employees within the same job domain operating at different levels (e.g., management) tend to need different training and view training differently due to varying job responsibilities (Ford & Noe, 1987; Lim & Morris, 2006). Person analysis identifies individuals’ possession of the requisite competencies outlined in the task analysis and considers the underlying knowledge and attitudes required to perform effectively. Individual characteristics such as personality, self-efficacy, goal orientation, and prior training experience have all been shown to impact training effectiveness and should therefore be considered in training needs analysis (Cheramie & Simmering, 2010; Chiaburu & Lindsay, 2008; Roberson et al., 2009; Rowold, 2007; Velada et al., 2007). Specifically, meta-analytic evidence points to positive relationships between training transfer and cognitive ability, conscientiousness, and motivation (Blume et al., 2010). One aspect of the person that often has bearing on what kind of training is used is individual learning styles. The concept of learning styles refers to the notion that individuals prefer specific instructional methods (e.g., visual, auditory, experiential) and that, when instructional strategies match learning styles, individuals learn significantly better. This has long been a conventional idea explaining differential learning outcomes across individuals, but research has not supported the learning styles hypothesis (Pashler, McDaniel, Rohrer, & Bjork, 2008). The overall suggestion stemming from Pashler and colleagues’ (2008) review is that training designers need not be overly concerned with tailoring instructional strategies to learning styles, but simply need to use the instructional strategy that best fits the KSAs being trained. Recent meta-analyses suggest that having a variety of instructional methods might actually lead to the most effective training (Lacerenza et al., 2017). 2.3 Training Summary In order for training to yield maximum effectiveness, a thorough training needs analysis is necessary. If trainees do not meet the requirements for training (i.e., if they already possess the required competencies and attitudes), if training is essentially irrelevant to the actual job, or if aspects of the organization (e.g., culture, support) are not conducive to learning and transfer, there
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
is a high likelihood that the training will fail. Our suggestion: thoroughly analyze training needs before designing any training system. 2.3.1 Evidence-Based Design Needs analysis-driven training design ensures that training is systematic and provides structure to the training program. Several things must be taken into account when designing training: development of training objectives, external factors relevant to the training program (i.e., individual/organizational characteristics and resources), selection of training strategies and methods, and developing specific program content. 2.3.2 Content/Training Objectives Training and learning objectives are developed in accordance with the training needs analysis. Objectives should be specific, measurable, and task relevant so that learning and performance can be evaluated post-training. There are three components to training objectives: performance, conditions, and standards (Buckley & Caple, 2008; Goldstein, 1993). Performance refers to both the end goal of training (i.e., the terminal objective) and the requisite process behaviors (i.e., enabling and session objectives) to reach the end goal. Performance objectives can be laid out in a hierarchy such that session objectives (mundane behaviors) aggregate to enabling objectives, which are the major components of the terminal objective, or the end goal of training. Conditions also must be specified as to when and where these behaviors must be exhibited. Standards are the final component of training objectives; this refers to specifying what will be considered acceptable performance levels of the various objectives. Clearly defined objectives are able to effectively guide the training designers’ choice of instructional strategy (or strategies). These strategies are selected according to their ability to promote objective-based behaviors and competencies. See Table 1 for guidance on the steps involved in developing training objectives. 2.3.3 Instructional Strategies A key element of training design is the instructional strategy selected. Characteristics of the trainee and the organization must be considered so as to select the most appropriate strategy. A myriad of instructional strategies exist that may be effective for training both individuals and teams. We summarize these in Table 2 and refer the reader to Coultas et al. (2012) for more detailed descriptions and findings behind different instructional strategies. Four basic guidelines should guide training designers seeking to develop effective training: (1) information or concepts to be learned must be presented; (2) necessary KSAs to be learned/acquired must be demonstrated; (3) opportunities for practice should be provided; and (4) feedback during and after training must be given to trainees. As is evident from Table 2, much research has been done toward investigating training strategies, but no single method is totally effective at meeting all training needs. Accordingly, research continues on how to present targeted information to trainees, based on factors such as organizational resources and training needs. There exists an abundance of research on the design of training systems. While it may be tempting to design training based on common sense or personal knowledge, it is vital for the training designer to leverage this extensive knowledge base. Training designers must consider not only the content and the instructional strategy but also external factors, such as organizational and individual characteristics, when designing a training program for maximal effectiveness.
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Table 1
Steps in Developing Training Objectives
Step
Guidelines
1. Review existing documents to determine competencies required
Examine your sources:
2. Translate identified competencies into training objectives
Include objectives that:
• • • •
• • •
• •
•
•
3. Organize training objectives
Specify targeted behaviors Use action verbs (e.g., “provide,” “prepare,” “locate,” “decide”) Outline specific behavior that demonstrates the appropriate skill or knowledge Can be easily understood Clearly outline the conditions under which skills and behaviors should be seen Specify standards to which they will be held when performed or demonstrated Make sure that standards are clear, realistic, accurate, complete, timely, and evaluated
Make sure to categorize: • •
4. Implement training objectives
Performance standards for the organization Essential task lists Past training objectives Probe experts on previous experiences
General objectives specifying desired end goal Specific objectives that identify tasks that will meet general objectives
Use training objectives to: • •
• •
Design training exercises Use events as opportunities to evaluate how well trainees exhibit training objectives Develop performance measurement tools Brief trainees on training event
3 DURING TRAINING 3.1 Training Delivery Once all pre-training components are aligned to support training, the implementation of the training design comes center-stage. There are several components of the training delivery with the potential to impact the effectiveness of a training system. Here, we discuss research on training facilitators/instructors, the importance of practice scenarios, and findings on the content and delivery of feedback. 3.2 Facilitator The training facilitator, or instructor, is the person who directs the event and delivers the training to learners. This person might be a member of the learners’ organization or could be external to the organization, such as a consultant or someone who works with a training company. Oftentimes, training may be deployed haphazardly by a member of HR, rather than a trained facilitator. Research suggests that this may not be the best choice. Not
421
only does having a training instructor internal to the organization influence those participating in the training, From the education literature, we know that instructors should be engaging, knowledgeable, and elicit positive reactions from students in order to be considered effective (Roorda et al., 2017; Seidel & Shavelson, 2007). The organizational science literature lends surprisingly little insight about the impact of the facilitator on training effectiveness, but there is currently a movement towards making the trainer more central to training theory (Glerum, Joseph, McKenny, & Fritzsche, 2020). In a meta-analysis, Glerum et al. (2020) found evidence that, in general, trainee reactions to the training event are more likely to be influenced by the facilitator than by the training content itself, except for in online courses. This brings up an interesting point. With the advances in technology and the increased use of e-learning methods, learners may not be afforded an instructor at all. Instead, training may be self-paced and up to the learning content and platform features to engage users. 3.3 Practice Scenarios A vital aspect of any training program is the practice opportunities offered to trainees during training. Research has shown that practice under conditions (either simulated or real) that differ from the end-goal task will improve on-the-job performance by developing meaningful contexts and knowledge structures (i.e., mental models; Satish & Streufert, 2002). Scripted practice scenarios ensure that the necessary competencies are being practiced and will allow for easier and better performance assessment. Beyond simply increasing performance, practice should also improve overall task-relevant learning. In a 2006 meta-analysis of training literature, Sitzmann and colleagues found that Web-based instruction led to more learning than classroom instruction, not because of the media itself but because it allowed learners more control and to practice the material at their own pace. That more practice opportunities lead to more learning is a finding replicated in other studies as well (Festner & Gruber, 2008; Goettl et al., 1996). Research has also been done on the scheduling of practice opportunities as well as introducing variations in practice difficulty (Ghodsian, Bjork, & Benjamin, 1997; Schmidt & Bjork, 1992). These studies have shown that varying the order of tasks during practice, providing less feedback (both in quality and quantity), and introducing alterations in the specifics of the tasks being practiced all led to enhanced retention and generalization, despite exhibiting initial decreases in task performance. This suggests that varying practice opportunities in such a fashion actually leads to deeper learning. Shute and Gawlick (1995) confirmed these findings for computer-based training. One thing to note: When the training medium is unfamiliar to the trainee (e.g., video games), introducing difficulty in practice will be less beneficial than for experienced users (Orvis et al., 2008), as they expend cognitive resources on learning the medium rather than the task. Practice enhances learning by refining knowledge structures, or mental models, within meaningful contexts (Murthy, Challagalla, Vincent, & Shervani, 2008). In addition, practice provides an opportunity to assess performance, enabling trainees to obtain feedback and make adjustments in behavior when weaknesses are identified. Furthermore, practice can promote the transfer of training by allowing trainees to gain experience applying learned competencies in various contexts. In order to reap these benefits, however, practice scenarios should be carefully designed prior to the training period. Doing so allows trainers and researchers to maintain control over the practice period by standardizing the selection, presentation, and timing of the competencies to be practiced. Practice scenarios can vary greatly in their degree of
422 Table 2
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Instructional Strategies.
Strategy
Definition
Technology-based strategies Simulation-based Provides practice opportunities through technology. Has the training potential to create realistic scenarios by simulating events, environments, and social interactions; has been applied to military, business, medical, cross-cultural, and research settings. Varies regarding fidelity, immersion, and cost. Behavior modeling Presents the trainee with examples (positive and negative) of training targeted behaviors. Can be done in a variety of settings but is often done electronically. A combination of simulation and role playing and varies regarding the degree of interactivity. Distance Allows instructors and students physically separated to interact. learning/E-learning Utilizes technologies such as the Internet, videoconferencing, virtual classrooms, online collaboration and instant messaging. Time-pressured employees are typically ideal candidates for this training. Learner control Allows trainees discretion over certain aspects of the training (e.g., content, delivery structure, pacing). A key component of many distance, e-learning, and other technology-based instructional strategies. Scenario-based Similar to simulation-based training, in that it provides complex and training realistic environments. It is more targeted, however, as it embeds task-relevant events with behavioral triggers into the program. The program then monitors performance and provides feedback on the task and processes within the scenarios. Collaborative learning Utilizes technology to enable groups to train together. This may happen with collocated or separated groups. Emphasizes group interaction and deemphasizes tasks. This is becoming more popular in online education and is different from team training. Error management Allows, encourages, induces, or guides trainees to make errors training within the program. Trainees are shown the consequences of failure and provided with feedback. Especially necessary in situations when post-training errors are likely or in highly dynamic and complex training areas. Stress exposure Informs trainees as to the potential stressors inherent in the training targeted task as well as the relationship between stressors, trainee affect, and performance. Trainees also receive information on coping strategies. Training may also incorporate graduated exposure to actual stressors to desensitize trainees to potential stressors. On-the-job training Training occurs in the same environment in which actual task behavior will eventually be performed. Experts instruct novices, who practice tasks under their supervision. Includes both apprenticeship and mentoring models. Team-based strategies Team coordination training
Cross-training
Team self-correction training
Provides opportunities for teams to practice workload distribution, implicit and explicit communication, back up behaviors, and interpersonal relations with the goal being maximal team coordination. This is becoming increasingly important as teams are on the rise even as face-to-face interactions are declining. Rotates team members through the tasking of other team members. The goal is to provide team members with a better understanding of role requirements and responsibilities to coordinate workload more effectively. Goal is to improve shared mental models and understanding of technology usage across team members. Evidence suggests this method may not be effective. Fosters awareness of team processes and effectiveness so that team members can evaluate and correct their behaviors. This is not only personal awareness but also awareness of the behaviors of teammates. Instructs teams to provide constructive feedback; may help mitigate errors due to mis-communications that naturally occur when adopting new technologies. Self-correction may be operationalized through debriefing
References
Bell et al. (2008), Landers & Armstrong (2017), Marks (2000), Tannenbaum & Yukl (1992), Bausch et al. (2014), Taylor et al. (2005)
Brown et al. (2016), Cheng et al (2012), Moe & Blodget (2000), Orvis et al. (2009)
Brown et al. (2016); Brown & Ford (2002), Sitzmann et al. (2006), Wydra (1980) Fowlkes et al. (1998), Oser et al. (1999), Salas et al. (2006)
Arthur et al. (1996), Capdeferro & Romero (2012), Lee et al. (2016), Marks et al. (2005) Dormann & Frese (1994), Dyre et al. (2017), Ivancic & Hesketh (1995), Keith & Frese (2008), Loh et al., 2013 Driskell et al. (2008, 2014), Finseth et al. (2018), McClernon et al. (2011)
Goldstein (1993), Ford et al. (1997), Munro (2009), Saks & Burke-Smalley (2014)
Bowers et al. (1998), Entin & Serfaty (1999), Gorman et al. (2010), Serfaty et al. (1998)
Volpe et al. (1996), Salas et al. (1997, 2007)
Blickensderfer et al. (1997), Levett-Jones & Lapkin (2014), Smith-Jentsch et al. (1998), Salas et al. (2007)
(continued overleaf)
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Table 2
423
(continued)
Strategy
Definition
References
Internationalization-based strategies Cross-cultural training Instructs trainees in various strategies that allow them to see the source of behavior (i.e., make attributions) similarly as individuals from a given culture. This also provides a better understanding and appreciation for diversity. Cultural awareness Emphasizes understanding personal culture, along with the various training biases, norms, and thought patterns involved. Assumes an awareness of one’s own culture leads to a greater appreciation for others. Diversity training Also increases awareness, but with an emphasis on reducing prejudice and developing positive attitudes and assumptions about diversity and others from different backgrounds. Didactic training Provides trainees with various culturally relevant facts such as living and working conditions, cultural dimensions, values, logistics of travel, shopping, and attire and even food. Geographic, political, economic, and social structure information is also conveyed. Experiential training Emphasizes experiencing the aspects and consequences of cultural differences in various scenarios. Often occurs in the form of simulations but can even include face-to-face exposure to the targeted culture or role-playing exercises. Not only improves trainees’ intercultural competence but also can serve a purpose similar to attribution training. Team leader training Trains leaders on two domains of multicultural team leadership: (1) leadership and (2) culture. Often these are taught asynchronously, but some training programs have attempted to train leaders on both of these simultaneously. Team building Incorporates team members into organizational- or team-level change development and implementation. Forces team members to interact strategically. Emphasizes one of four team-building goals: goal setting, interpersonal relationships, problem solving, and role clarification. Role playing Requires trainees to interact with other team members through the use of scripted scenarios. Progression through the roles can make team members aware of (1) their own culture or (2) other cultures (i.e., acculturation). Role playing can also support both of these goals when trainees experience multiple roles.
Bhawuk (2001), Littrell & Salas (2005), MacNab (2012), Mor et al. (2013) Bennett (1986), Collins & Pieterse (2007), Thomas & Cohn (2006)
Bezrukova et al. (2016), Dobbin & Kalev (2016), Roberson et al. (2012) Morris & Robie (2001), Sanchez-Burks et al. (2007)
Kealey and Protheroe (1996), Morris and Robie (2001), Fowler & Pusch (2010)
Kozlowski et al. (1996), Thomas (1999), Burke et al. (2005)
Dyer (1977), Beer (1980), Buller (1986), Salas et al. (1999), Shay & Tracey (2009)
Bennett (1986), Roosa et al. (2002)
Source: Adapted from Coultas et al., 2010.
realism, ranging from very high fidelity (e.g., full motion simulators) to very low fidelity (e.g., role-play activities). As both types have proven to be effective (e.g., Seddon, 2008; Vechhi, Van Hasselt, & Romano, 2005), the level of realism should be related to the content and goals of the training program. Practice scenarios should also incorporate a range of difficulty levels and should allow trainees to respond in different ways rather than requiring clear-cut answers. Additionally, learning and transfer can be facilitated by enabling trainees to practice their skills on multiple occasions and in various contexts (Prince, Oser, Salas, & Woodruff, 1993). 3.4 Feedback Constructive and timely feedback is important to the success of any training program. If not for feedback, trainees would have no metric by which to determine where improvements are needed or if performance is sufficient (Cannon-Bowers & Salas, 1997). Research has consistently shown that feedback leads to increases in learning and performance (Burke & Hutchins, 2007). However, effective feedback must meet three criteria. First, feedback should be task/training performance-based but not critical of the person. Second, feedback should provide information and strategies on how to improve learning strategies so that performance
expectations are met. Finally, feedback must be seen as meaningful at all applicable levels (i.e., individual and/or team). Feedback strategies may also vary depending on task complexity; “scaffolding” is a technique that involves gradually reducing the amount of feedback provided so as to encourage self-monitoring of errors (van Merriënboer et al., 2006). However, such techniques are still in need of further research as to their effectiveness and generalizability (Burke & Hutchins, 2007).
3.5 Training Implementation At this point, the training program should be fully developed and the organization should prepare to implement it. During this phase, a training location with the appropriate resources (e.g., computers for computer-based training) should be selected. Instructors should be trained, training should be pilot tested, and any final adjustments should be made. Once this has been completed, the training program will be fully prepared for delivery. During the implementation phase, steps should be taken to foster a learning climate and support transfer and maintenance (Salas & Stagl, 2009). For example, training objectives should be clearly communicated, and trainees should be prompted to set proximal and distal goals.
424
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
4 AFTER TRAINING 4.1 Evaluation Just as important as developing and implementing training is the process of evaluating the effectiveness of the training program. It is critical not only to determine whether training was effective, but also to evaluate how or why it was effective or ineffective. Without this phase, the training program cannot be improved or re-created for use in different situations. Training evaluation is thus essential to carrying out the overall goals of a training program. Evaluation concerns and methodologies will be discussed below. Training evaluation is essentially a system for measuring the intended outcomes or goals of the training program. The evaluation process includes the examination of factors such as measurement, design, learning objectives, and acquisition of the target knowledge, skills, and abilities (KSAs). Training evaluation plays a vital role in the overall training process because it allows organizations and researchers to determine whether or not training led to meaningful changes in performance, and how it impacted other organizational outcomes of value. Unfortunately, many organizations do not carry out the evaluation phase after implementing their training program, as this phase is often accompanied by high costs and requires intensive labor (Salas & Stagl, 2009). Evaluation can be a difficult process because it might require specialized expertise, a team of people to collect and analyze performance data, and it might need to be conducted in the field or on the job. Moreover, organizations sometimes avoid evaluating their training due to the politics involved or the possibility of uncovering bad news, which may necessitate further action. 4.2 Kirkpatrick’s Framework The most widely used evaluation methods are those based on Kirkpatrick’s (1976) four-level model (Aguinis & Kraiger, 2009). In this approach, training is evaluated based on four criteria: (1) trainees’ reactions (i.e., what trainees think of the Table 3
training); (2) learning (i.e., what trainees learned); (3) behavior (i.e., how trainees’ behavior changes); and (4) organizational results (i.e., impact on organization). In using the framework, reactions can be assessed by asking trainees if they liked the training and if they perceived it as useful, tests and exercises can be used to measure the degree to which trainees have acquired the trained competencies, behavior changes can be measured by observing trainees’ performance in the workplace, and organizational results can be evaluated by examining such things as turnover, costs, efficiency, and quality. Each of these levels is further described in Table 3. Although Kirkpatrick’s framework is practically useful and remains the most cited training evaluation framework in research, it has been criticized for a lack of theory (Kraiger, 2002, p. 333). This has prompted research in recent years to expand upon the way we think about and assess training effectiveness. Arguably the most comprehensive framework advanced in recent literature comes from Sitzmann and Weinhardt (2019), who propose a multilevel framework for training effectiveness that accounts for both within- and between-person effects as well as macro-level considerations and how all of these factors might be interrelated (see Figure 6). This model is an excellent advance in the literature, as it spotlights the variety of areas in which training might be considered effective (i.e., utilization, affect, performance, finances). Moreover, it works toward understanding how the bottom-up processes involved in training work to impact macro-level organizational phenomena. 4.3 Tools Performance measures also remain key components of successful training initiatives. Measuring performance provides an opportunity to assess or diagnose trainee competence and provide feedback, both of which are central to the learning process. Without incorporating measures of performance, it is arguable whether training can be said to have effectively produced changes in knowledge and behavior. Strong performance measures are essential, as they can ultimately feed back
Kirkpatrick’s Training Evaluation Framework and Example Criteria.
Level
What is being measured/evaluated
Measurement
1.
• Learner and/or instructor reactions after training • Satisfaction with training • Ratings of course materials • Effectiveness of content delivery
• •
• Attainment of trained competencies (i.e., knowledge, skills, and attitudes) • Mastery of learning objectives
• • •
Final examination Performance exercise Knowledge pre- and posttests
•
• Application of learned competencies on the job • Transfer of training • Improvement in individual and/or team performance
•
Observation of job performance
•
• Operational outcomes • Return on training investment • Benefits to organization
• • •
2.
3.
4.
Reactions
Learning
Behavior
Results
Self-report survey Evaluation or critique
Sample questions • • •
•
• • Longitudinal data Cost–benefits analysis Organizational outcomes
•
Did you like the training? How would you rate the trainer’s effectiveness? Was the training useful?
True or false: Large training departments are essential for effective training. Supervisors are closer to employees than is upper management. Do the trainees perform learned behaviors? Are the trainees paying attention and being observant? Have the trainees shown patience? Have there been observable changes in employee attitudes, turnover, and safety since the training?
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS
425
Training Effectiveness
Levels of Analysis
Macro
Affective Indicators
Training Utilization
Enrollment Rate
BetweenPerson
Which Employees Enroll
WithinPerson
Decision to Enroll
Attrition Rate
Complete vs. Drop Out of Training
Complete vs. Drop Out of Training Lessons
Human Capital
Training Reputation
Satisfaction
Episodic Satisfaction
Self-Efficacy
Episodic SelfEfficacy
Financial Impact of Training
Performance Indicators
Motivation
Episodic Motivation
Learning
Episodic Learning
Organizational and Team Performance
Training Transfer
Return on Investment
Personal Return on Investment
Episodic Training Transfer
Figure 6 Sitzmann and Weinhardt’s (2019) multilevel training evaluation framework. (Source: Sitzmann & Weinhardt 2019. © 2019 Elsevier.)
to the success or failure of training, and highlight any deficiencies to guide ongoing improvements (Tichon, 2007). In their implementation, performance measurement can be facilitated by following certain preparatory guidelines. First, steps can be taken to simplify the measurement process for those responsible for making assessments. To do this effectively, opportunities for measurement should be incorporated into carefully designed practice scenarios that can be built into the training program. This ensures that target competencies are practiced and measured appropriately. Instructors can thus easily identify trigger points at which performance should be observed and recorded. Second, an overall system for measuring performance and providing feedback should be established prior to training. This can be a complicated process, as any appropriate strategy would involve multiple factors. For example, objective measures of performance, such as timing or number of errors, can be obtained automatically through the use of high-technology simulations. While this is a convenient way to collect performance measures, it is a sub-optimal approach when assessing practice scenarios, which involve more complex processes (e.g., communication, decision making) as indicators of performance. Team performance in particular is difficult to capture due to the dynamic natures of teams and team processes. The limitations associated with automatic performance measures are especially apparent during periods of high workload and acute increases in stress (e.g., those circumstances experienced by trauma teams). Under such conditions, high performance requires that teams communicate and coordinate at the implicit level, and as a result, significant processes become impossible for simulation-based systems to automatically detect. In contrast, it is much easier for human observers to draw inferences by observing and assessing behaviors using pre-established criteria such as checklists or observation forms. However, utilizing human observers could introduce errors and bias into your performance measures. To reduce such issues, it is recommended that at least two observers be used, and steps should be taken to establish reliability (i.e., consistency between evaluators’ ratings) and validity (accuracy of evaluators’ ratings) (e.g., Brannick, Prince, & Salas, 2002; Holt, Hansberger, & Boehm-Davis, 2002). Choosing between objective and subjective performance measures essentially requires a trade-off between pros and cons that should be guided by the content and goals of the training program. Lastly, training and practice should be designed to incorporate multiple opportunities for performance measurement. Assessments should be taken on various occasions throughout
the simulation in order to gain an accurate representation of trainees’ performance, and these assessments can help indicate what segments of the training are most and least effective. Having assessed training effectiveness this way, the training procedures can be iterated on for subsequent improvements in later training deliveries. More broadly, the use of random confirmatory trials (RCTs) is popular among practitioners for its relative simplicity (Bass & Avolio, 1990; Burke & Day, 1986; Smith & Smith, 2007) and ability to yield useful information on trainings overall. However, RCTs are often incapable of capturing more than rudimentary levels of nuance given that they only compare complete trainings against other trainings in their entirety, or against conditions with no training programs at all. Training effectiveness is assessed using this “all or nothing” method (Collins et al., 2009, p. 21) through post-test analyses wherein the condition with the highest post-test score is deemed most effective. In addition to this, RCTs are often seen as suboptimal given that the training is largely finalized before any testing. In developing more effective training evaluation options, Collins et al. (2007) created the Multiphase Optimization Strategy (MOST). While this research design incorporates RCTs, it first seeks to identify the cheapest, most effective elements to optimize the training. The steps of the MOST can easily be broken down into screening, refining, and confirming, with the confirming phase being identical to an RCT (Howard & Jacobs, 2016). In another recent development, the Sitzmann and Weinhardt (2019) framework described above offers a more theoretically-driven approach than the Kirkpatrick framework that considers different types of outcomes, level of analysis concerns, as well as interrelations between evaluation criteria. Using practical training evaluation methods such as RCTs or MOST, in tandem with frameworks like Kirkpatrick’s or the multilevel approach proposed by Sitzmann and Weinhardt (2019) offers a stable base upon which training evaluation can lead to continued training improvements. These improvements, by extension, are also likely to yield more positive employee outcomes. However, while RCTs and MOST lean on fundamental research design practices, the incorporation of more modern tools and technologies in the training evaluation space cannot go underappreciated. With 67% of people learning on mobile devices (LinkedIn Learning Solutions, 2017), the role that technology plays in training (whether formal or informal) is imperative. Employees can now use their smartphones, laptops, or desktops, to receive
426
training on most subjects from any location, including their desks, and at any time. Maximizing the use of current technologies to evaluate training at multiple points in the delivery process, including after training, not only simplifies things for the trainee, but also the trainer. Deloitte et al. (2016) highlight that the employee is the most significant factor in the trend toward consumer-centric learning, and as such, in delivering training and evaluating the training process, leveraging an employee’s aptitude and frequent use of technology can lead to better organizational results. 4.4 Costs Various practical concerns are often major deterrents to the evaluation of training programs. Implementing the evaluation can put a strain on both temporal and monetary resources. In attempts to mitigate these issues, researchers have explored ways to reduce the costs of training evaluation. For example, trainers can assign different numbers of participants to training and control groups (Yang, Sackett, & Arvey, 1996). Building in unequal group sizes with a larger overall sample size may allow for the same level of statistical power at a lower cost. Training designers can also substitute the target criterion for a suitable, yet less expensive proxy criterion measure when evaluating training effectiveness. This technique expends fewer resources, but increases the sample size needed to achieve a given statistical power. Training designers may need to negotiate a trade-off between reducing costs through proxy criterion measures and potentially increasing costs by utilizing a larger sample size (Arvey, Salas, & Gialluca, 1992). It is critical for organizations to evaluate the effectiveness of their training during implementation and following the completion of a training program. Although practical concerns and high costs prevent many organizations from conducting evaluations, doing so can inform future training programs and contribute to organizations’ future success. 4.5 Facilitating Transfer The transfer of training refers to the extent to which trained competencies are applied, generalized, and maintained in the work environment (Baldwin & Ford, 1988). Successful transfer leads to meaningful changes in job performance and thus is essentially the primary goal of any training initiative. As such, the transfer of training remains a prominent area of interest in both research and applied work in organizations. Early research in this area indicated that transfer is affected by factors such as organizational context, the delay between training and use on the job, as well as situational cues and consequences. Since then, researchers have provided significant evidence that transfer is also influenced by the three main components of Baldwin and Ford’s (1988) model of transfer: trainee characteristics, training design, and the work environment. Starting with trainee characteristics, several have exhibited consistent, positive relationships with the transfer of training. For example, meta-analytic findings show a strong correlation between cognitive ability and positive transfer outcomes (Blume et al., 2010). Trainees higher in cognitive ability are more likely to successfully acquire, utilize, and maintain trained competencies in the appropriate contexts. Self-efficacy, or one’s belief in their ability to accomplish a task (Bandura, 1982), has also been linked to training transfer through meta-analysis (Blume et al., 2010). Research suggests that it is through its influence on motivation, another trainee characteristic that positively predicts the transfer of training (Baldwin et al., 2009), that self-efficacy partially contributes to transfer (e.g., Chiaburu & Lindsay, 2008). Specifically, pretraining motivation, motivation
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
to learn, and motivation to transfer have all demonstrated significant relationships with the transfer of trained competencies. Another prominent predictor of training outcomes is perceived utility, or the value associated with participating in training (Burke & Hutchins, 2007). This work was furthered by Burke and Saks (2009), who delineated the role accountability plays in training transfer. Both concepts emphasize that transfer is enhanced when trainees perceive a clear link between trained competencies and valued job outcomes (Chiaburu & Lindsay, 2008) and when training instructions match job requirements (Velada et al., 2007). Traits such as zest and perceived ability to learn and solve problems have been identified as particularly important for establishing a sense of accountability and in turn, for promoting transfer when the training content involves open skills and the transfer environment is characterized by low supervision (Grossman & Burke-Smalley, 2018). Recently, Ford, Baldwin, and Prasad (2018) conducted an updated review of the transfer literature—they concluded that conscientiousness and mastery orientation are also primary trainee characteristics that have been shown to impact the transfer of training. Transfer can also be facilitated more effectively through the use of certain training design and delivery methods. Consistent with the last version of this chapter (Coultas et al., 2012), behavioral modeling in particular remains a particularly robust technique. Behavioral modeling is a learning approach which incorporates clearly defined explanations of behaviors to be learned, models displaying the effective use of these behaviors, opportunities for trainees to practice learned skills, and the provision of feedback and social reinforcement (Taylor, Russ-Eft, & Chan, 2005). The literature has also converged around various other techniques for promoting transfer, such as using both positive and negative examples in demonstrations, building in error management strategies, implementing goal-setting activities, and incorporating multiple learning principles as well as multiple opportunities for practice rather than relying on only one mechanism (Ford et al., 2018). In an analysis of 100 years of training research, Bell, Tannenbaum, Ford, Noe, and Kraiger (2017) observed a shift in research and practice from an emphasis on passive to active learning. Active learning approaches place the learner at the center of the learning experience, giving them control over their own learning, as well as incorporating formal training design elements to support self-regulated learning by shaping the cognitive, motivational, and emotional learning processes that support it (Bell & Kozlowski, 2008). Through these approaches, the learner’s role is not simply to absorb and apply the knowledge provided to them, but to play an active role in their learning process by inferring the rules, principles, and strategies for effective performance (Frese et al., 1991; Smith et al., 1997). This inductive learning process differs from more common passive learning approaches as it involves the construction of knowledge as opposed to internalizing external knowledge, and presents a more effective approach for facilitating transfer. Active learning is particularly beneficial for promoting transfer because it stimulates transfer appropriate processing, meaning that the same metacognitive activities that are required to effectively apply new competencies to the transfer environment are first required and experienced within the learning context (Bell & Kozlowski, 2008). 4.6 Post-Training Climate In addition to the characteristics of trainees and training design, the characteristics of the post-training environment also play a significant role in the transfer of training. By facilitating or hindering the use of trained competencies, environmental factors largely determine whether or not learned behaviors are exhibited once trainees return to the work setting. Regardless of
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS
how well-designed or flawlessly a training was implemented, if the post-training environment does not encourage the transfer of target competencies, the training program will fail to yield long-term behavioral change. Several components of the post-training environment have been shown to contribute to transfer outcomes. Broadly, a positive transfer climate encourages trainees to apply target knowledge and skills to the job (e.g., Colquitt et al., 2000). Transfer climate has been described as observable or perceived situations that inhibit or facilitate the use of learned skills (Rouiller & Goldstein, 1993). Such situations might include cues that prompt trainees to use new skills, consequences for the correct, incorrect, or lack of use, and social support in the form of feedback and incentives. A wealth of research has demonstrated a significant relationship between transfer climate and training outcomes (e.g., Blume et al., 2010; Burke et al., 2008; Gilpin-Jackson & Busche, 2007). Supervisor and peer support have also exhibited strong relationships with the transfer of training (e.g., Blume et al., 2010). In particular, supervisor support can be manifested in various ways. Goal setting, for example, can be used to enhance transfer outcomes. As such, supervisors should facilitate training transfer by prompting employees to set goals for utilizing new competencies in the workplace (Burke & Hutchins, 2007). In addition, supervisors can provide support through the provision of recognition and rewards (Salas & Stagl, 2009), communicating expectations (Burke & Hutchins, 2007), and maintaining a high level of involvement (Gilpin-Jackson & Busche, 2007; Saks & Belcourt, 2006). Trainees can support each other by observing one another using trained skills, coaching each other, and sharing ideas about course content (Gilpin-Jackson & Busche, 2007; Hawley & Barnard, 2005). A recent review further reinforced supervisor and peer support as some of the most important components of the work environment for facilitating training transfer (Ford, et al., 2018). Not surprisingly, new competencies will not transfer to the workplace unless employees are given ample opportunities to apply them (Burke & Hutchins, 2007). Research shows that deficient time, resources, and opportunities to perform can seriously hinder the use of trained knowledge and skills on the job (e.g., Clarke, 2002; Gilpin-Jackson & Busche, 2007). To promote their use, training transfer is enhanced when trainees are provided with ample opportunities and resources for the application of their new skills. Additionally, significant delays between training and opportunity to perform can hinder transfer. As such, this time frame should be minimized. Transfer can be further facilitated through the use of follow-up activities (Salas & Stagl, 2009). For example, after-action reviews, which are a type of trainee debrief, provide further education and enable trainees to reflect on their experiences through practice and discussion. Post-training interventions such as relapse prevention, self-management, and goal setting can all serve to promote the transfer of training (Baldwin et al., 2009). Ford and colleagues’ (2018) review also provided updated support for the important role of opportunity to perform for the transfer of training. Grossman and Burke-Smalley (2018) recently delineated various strategies that can be used to foster trainees’ sense of accountability for transferring new knowledge and skills, which ultimately promotes transfer. Those most relevant to the post-training climate include communicating post-training transfer expectations, managing other job responsibilities, promoting favorable attitudes about the learned competencies and the transfer context, and providing post-training support, such as formalized goal-setting. Finally, trained competencies are transferred more readily when trainees perceive a continuous learning culture. A continuous learning culture encourages the acquisition of new knowledge, skills, and attitudes by
427
reinforcing achievement and encouraging innovation and competition (Tracey et al., 1995). Moreover, a positive perception of organizational learning culture is tied to willingness to transfer learning (Banerjee, Gupta, & Bates, 2017), so when this climate is ingrained in an organization, learning will be part of the daily work environment, and employees will be more likely to utilize new competencies on the job. We direct readers to a recently published checklist for a practically useful tool for facilitating training transfer (Hughes et al., 2018). 4.7 Job Aids The transfer of training can also be facilitated through the use of job aids. Job aids are tools that are used to assist in the performance of a job or task (Swezey, 1987). Job aids are valuable as they reduce the amount of time employees need to spend in training, in turn requiring them to spend less time away from their jobs. They can also improve performance by minimizing the cognitive load required to memorize job information. This frees up cognitive resources that can be allocated toward other aspects of performance. Furthermore, job aids can be particularly useful in stressful environments in which critical components of a task are more likely to be forgotten or unintentionally omitted. Several types of job aids exist, including informational, procedural, and decision making and coaching. Informational job aids contain material similar to that of on-the-job manuals and reference books. These materials are imperative when job information is impossible to memorize (e.g., an aircraft maintenance manual). They are also used to reduce the cognitive demands (e.g., recall of memorized information) associated with performing the job. Informational job aids typically include facts relating to names, locations, dates, and times, or other declarative knowledge that is relevant to the job (Rossett & Gautier-Downes, 1991). For their purposes, they tend to be available in paper or electronic formats, and can enhance performance by making pertinent job information easily accessible. Procedural job aids provide step-by-step instructions explaining how to complete a task (Swezey, 1987). Furthermore, they not only illustrate which actions to take in sequential order, but also tend to include feedback regarding what the results of each step should look like. Checklists, for example, are a type of procedural aid used to assist employees in remembering and completing each component of their task. Though they have traditionally been provided in paper formats, many companies have adopted online checklists, or administer them through other electronic mediums. Decision-making and coaching job aids provide heuristics, or references that guide employees to think in a prescribed way in order to optimize their decision-making or find a solution to a problem (Rossett & Gautier-Downes, 1991). Unlike procedural aids, they do not provide detailed, sequential information, but instead they provide cues, such as ideas or questions, which guide the user toward the path that will lead to the optimal solution. The specific steps used to reach the solution can vary widely. Job aids have traditionally been implemented when employees are unsure about job information or how to complete a task. Decision-making and coaching aids, however, maintain their utility at all stages, and can be used prior to and after the specific time they are needed. As such, these types of job aids can also be considered training aids because they provide learning opportunities that can benefit future task performance. To develop a job aid, a task analysis must first be conducted in order to identify the knowledge, skills, equipment, and technical data required to perform the task (Swezey, 1987). This allows for the identification of the specific steps used to perform the task as well as the appropriate sequence of those steps. Once this information has been gathered through task analysis, the type of
428
job aid can be determined and the tool can be fully developed. Upon completion, the job aid should be tested and modified to ensure its effectiveness. Further, it is imperative that job aids are updated as information, procedures, or decision-making processes change (Rossett & Gautier-Downes, 1991). Job aids can also be developed for use during training. Training aids differ from job aids in that they are not used to complete a specific task on the job, but rather, they aid in skill and knowledge acquisition during training. Specifically, training aids are documents, manuals, or devices that are designed to assist trainees in learning the appropriate competencies that are associated with a task or job (Swezey, 1987). Like other types of job aids, training aids are increasingly available in computer-based formats as well as more traditional paper formats. The completion of training evaluation does not mark the end of the training process. Rather, organizations should work to establish a positive transfer climate and provide critical resources such as support systems, opportunities to perform, and job aids. If efforts are not made to facilitate the transfer of training, learned competencies will not be applied on the job, and the goals of the training program will ultimately not be met.
5 CONCLUSION 5.1 Future Directions Researchers from across disciplines and decades contributed to the science of training as we know it today. In fact, the science is so developed that at times it can seem as if there is nothing left to discover; however, that could not be further from the truth. As we move forward in the twenty-first century, the modern workplace is experiencing a great deal of changes that have implications for the way training is designed, delivered, and implemented (Bisbey et al., 2020; Cascio, 2019). We have discussed several of these throughout this chapter (e.g., technological advancements), but they bear mentioning again alongside a few others that we expect training research to expound upon in the future. For instance, there will be changes to the traditional training paradigm stemming from increased diversity in the labor force. New training content and delivery methods may need to be capitalized on so that training systems can meet the needs of a wider audience, such as more culturally diverse work groups and older employees. We anticipate seeing new technological solutions allowing learners to control the features of the training design that suit them best. For example, older learners may benefit more from self-paced training with greater structure in order yield the same level of effectiveness as younger learners from the training event (Moseley & Dessigner, 2007). Moreover, the millennial generation is becoming a larger segment of the workforce, and researchers have found them to have different expectations than older workers regarding quicker career advancement and a desire to develop the necessary skills to do so (Ng et al., 2010). Researchers should continue investigating flexible instructional designs and learner control so that all types of workers can benefit from trainings. 5.1.1 Summary Not only are workers becoming more diverse, but there is also a host of new jobs in a new industry—the gig economy. The gig economy refers to industries and organizations in which workers are not technically employees, but rather contractors doing business in the company’s name (e.g., Uber, Lyft, InstaCart; McGovern, 2017). In the traditional workplace, the organization is responsible for providing opportunities to learn for employees, but this strategy may not be the norm in the gig economy. In these unique, but increasingly popular,
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
work situations, it is unclear who is responsible for employee development. It may be that workers must take the initiative to develop their own KSAs. Researchers should investigate the individual differences that may impact learning effectiveness and subsequent job performance in the gig economy. A well-trained workforce is critical to the success of most organizations. In order to develop their employees’ competencies and maintain a competitive advantage, organizations should place a heavy emphasis on training. The aim of this chapter was to provide an updated review on the foundational science of training and to offer further guidance related to the design, delivery, and evaluation of training programs. We maintain our position encouraging training designers to take a systematic approach to the training process by carefully considering each component involved. Organizations have much to gain from utilizing the science of training that has been developed and refined by scientists and professionals spanning multiple fields.
ACKNOWLEDGMENTS This material is based upon work supported in part by grants NNX16AP96G and NNX16AB08G from the National Aeronautics and Space Administration (NASA) to Rice University, as well as grant NNX17AB55G from NASA to Rice University via Johns Hopkins University (PI: Michael Rosen).
REFERENCES Aguinis, H., & Kraiger, K. (2009). Benefits of training and development for individuals and teams, organizations, and society. Annual Review of Psychology, 60, 451–474. Arthur, W., Bennet, W., Edens, P. S., & Bell, S. T. (2003). Effectiveness of training in organizations: A meta-analysis of design and evaluation features. Journal of Applied Psychology, 88, 234–245. Arthur, W., Young, B., Jordan, J. A., & Shebilske, W. L. (1996). Effectiveness of individual and dyadic training protocols: The influence of trainee interaction anxiety. Human Factors, 38, 79–86. Arvey, R. D., Salas, E., & Gialluca, K. A. (1992). Using task inventories to forecast skills and abilities. Human Performance, 5, 171–190. Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41, 63–105. Baldwin, T.T., Ford, J. K., & Blume, B. D. (2009). Transfer of training 1988–2008: An updated review and agenda for future research. International Review of Industrial and Organizational Psychology, 24, 41–70. Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37, 122–147. Banerjee, P., Gupta, R., & Bates, R. (2017). Influence of organizational learning culture on knowledge worker’s motivation to transfer training: Testing moderating effects of learning transfer climate. Current Psychology: A Journal for Diverse Perspectives on Diverse Psychological Issues, 36(3), 606–617. Bass, B. M., & Avolio, B. J. (1990). Developing transformational leadership: 1992 and beyond. Journal of European Industrial Training, 14(5). Bausch, S., Michel, A., & Sonntag, K. (2014). How gender influences the effect of age on self-efficacy and training success. International Journal of Training and Development, 18, 171–187. Beer, M. (1980). Organization change and development: A systems view. Glenview, IL: Foreman. Bell, B. S., & Kozlowski, S. W. (2008). Active learning: Effects of core training design elements on self-regulatory processes, learning, and adaptability. Journal of Applied Psychology, 93(2), 296.
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Bell, B. S., Tannenbaum, S. I., Ford, J. K., Noe, R. A., & Kraiger, K. (2017). 100 years of training and development research: What we know and where we should go. Journal of Applied Psychology, 102(3), 305–323. Bennett, J. M. (1986). Modes of cross-cultural training: Conceptualizing cross-cultural training as education. International Journal of Intercultural Relations, 10, 117–134. Bezrukova, K., Spell, C. S., Perry, J. L., & Jehn, K. A. (2016). A meta-analytical integration of over 40 years of research on diversity training evaluation. Psychological Bulletin, 142, 1227–1274. Bhawuk, D. P. S. (2001). Evolution of culture assimilators: Toward theory-based assimilators. International Journal of Intercultural Relations, 25, 141–163. Bisbey, T. M., Reyes, D. L., Traylor, A. M., & Salas, E. (2019). Teams of psychologists helping teams: The evolution of the science of team training. American Psychologist, 74, 278–289. Bisbey, T. M., Traylor, A., & Salas, E. (2020). Implications of the changing nature of work for training. In B. Hoffman, M. Shoss, & L. Wegman (Eds.), Cambridge handbook of the changing nature of work. Cambridge: Cambridge University Press. Blickensderfer, E. L., Cannon-Bowers, J. A., & Salas, E. (1997). Theoretical bases for team self-correction: Fostering shared mental models. In M. Beyerlein, D. Johnson, & S. Beyerlein (Eds.), Advances in interdisciplinary studies in work teams series (Vol. 4, pp. 249–279). Greenwich, CT: JAI Press. Blume, B. D., Ford, J. K., Baldwin, T. T., & Huang, J. L. (2010). Transfer of training: A meta-analytic review. Journal of Management, 36(4), 1065–1105. Blume, B. D., Ford, J. K., Surface, E. A., & Olenick, J. (2019). A dynamic model of training transfer. Human Resource Management Review, 29, 270–283. Bowers, C. A., Blickensderfer, E. L., & Morgan, B. B. (1998). Air traffic control specialist team coordination. In M. W. Smolensky & E. S. Stein (Eds.), Human factors in air traffic control (pp. 215–236). San Diego, CA: Academic Press. Brannick, M. T., Prince, C., & Salas, E. (2002). The reliability of instructor evaluations of crew performance: Good news and not so good news. International Journal of Aviation Psychology, 12, 241–261. Brown, K. G., & Ford, J. K. (2002). Using computer technology in training: Building an infrastructure for active learning. In K. Kraiger (Ed.), Creating, implementing, and managing effective training and development (pp. 192–233). San Francisco, CA: Jossey-Bass. Brown, K. G., Howardson, G., & Fisher, S. L. (2016). Learner control and e-learning: Taking stock and moving forward. Annual Review of Organizational Psychology and Organizational Behavior, 3, 267–291. Buckley, R., & Caple, J. (2008). Training objectives. In The theory & practice of training (5th ed., pp. 116–131). London: Kogan Page. Buller, P. F. (1986). The team building-task performance relation: Some conceptual and methodological refinements. Group and Organizational Studies, 11, 147–168. Burke, C. S., Hess, K. P., Priest, H. A., Rosen, M., Salas, E., Paley, M., & Riedel, S. (2005). Facilitating leadership in a global community: A training tool for multicultural team leaders. Paper No. 2294, presented at Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, FL (pp. 1–12). Burke, L. A., & Hutchins, H. M. (2007). Training transfer: An integrative literature review. Human Resource Development Review, 6, 263–296. Burke, L. A., & Hutchins, H. M. (2008). Study of best practices in training transfer and proposed model of transfer. Human Resource Development Quarterly, 19, 107–128. Burke, L. A., & Saks, A. M. (2009). Accountability in training transfer: Adapting Schlenker’s model of responsibility to a persistent but solvable problem. Human Resource Development Review, 8(3), 382–402. Burke, M. J., Chan-Serafin, S., Salvador R., Smith, A., & Sarpy, S. A. (2008). The role of national culture and organizational climate in
429
safety training effectiveness. European Journal of Work and Organizational Psychology, 17, 133–152. Burke, M. J., & Day, R. R. (1986). A cumulative study of the effectiveness of managerial training. Journal of Applied Psychology, 71, 232–245. Campbell, J. P. (1971). Personnel training and development. Annual Review of Psychology, 22, 565–602. Cannon-Bowers, J. A., & Salas, E. (1997). A framework for measuring team performance measures in training. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods, and applications (pp. 45–62). Mahwah, NJ: Lawrence Erlbaum Associates. Cannon-Bowers, J. A., Salas, E., Tannenbaum, S. I., & Mathieu, J. E. (1995). Toward theoretically-based principles of trainee effectiveness: A model and initial empirical investigation. Military Psychology, 7, 141–164. Capdeferro, N., & Romero, M. (2012). Are online learners frustrated with collaborative learning experiences? International Review of Research in Open and Distributed Learning, 13(2), 26–44. Cascio, W. F. (2019). Training trends: Macro, micro, and policy issues. Human Resource Management Review, 29, 284–297. Cerasoli, C. P., Alliger, G. M., Donsbach, J. S., Mathieu, J. E., Tannenbaum, S. I., & Orvis, K. A. (2018). Antecedents and outcomes of informal learning behaviors: A meta-analysis. Journal of Business and Psychology, 33, 203–230. Chen, G., & Klimoski, R. J. (2007). Training and development of human resources at work: Is the state of our science strong? Human Resource Management Review, 17, 180–190. Cheng, B., Wang, M., Moormann, J., Olaniran, B. A., & Chen, N. S. (2012). The effects of organizational learning environment factors on e-learning acceptance. Computers & Education, 58, 885–899. Cheramie, R. A., & Simmering, M. J. (2010). Improving individual learning for trainees with low conscientiousness. Journal of Managerial Psychology, 25, 44–57. Chiaburu, D. S., & Linsday, D. R. (2008). Can do or will do? The importance of self-efficacy and instrumentality for training transfer. Human Resource Development International, 11, 199–206. Clarke, N. (2002). Job/work environment factors influencing training transfer within a human service agency: Some indicative support for Baldwin and Ford’s transfer climate construct. International Journal of Training and Development, 6, 146–162. Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202–224. Collins, N. M, & Pieterse, A. L. (2007). Critical incident analysis based training: An approach for developing active racial/cultural awareness. Journal of Counseling & Development, 85, 14–23. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, 678–707. Cooke, N. J. (1999). Knowledge elicitation. In F. T. Durso, R. S. Nickerson, R. W. Schvaneveldt, S. T. Dumais, D. S. Lindsay, & M. T. H. Chi (Eds.), Handbook of applied cognition (pp. 479–509). New York: Wiley. Coultas, C. W., Grossman, R., & Salas, E. (2012). Design, delivery, evaluation, and transfer of training systems. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (4th ed., pp. 490–533). Hoboken, NJ: Wiley. Deloitte, Touche, Tohmatsu, Ltd. (2016). Global human capital trends 2016. Retrieved from https://www2.deloitte.com/content/ dam/Deloitte/global/Documents/HumanCapital/gx-dup-globalhuman-capital-trends-2016.pdf (on June 20, 2020). Dobbin, F., & Kalev, A. (2016). Why diversity programs fail. Harvard Business Review, 94(7), 14–23. Dormann, T., & Frese, M. (1994). Error training: Replication and the function of exploratory behavior. International Journal of Human-Computer Interaction, 6(4), 365–372.
430 Driskell, J. E., Salas, E., Johnston, J. H., & Wollert, T. N. (2008). Stress exposure training: An event-based approach. In P. A. Hancock & J. L. Szalma (Eds.), Performance under stress (pp. 271–286). Farnham: Ashgate. Driskell, T., Sclafani, S., & Driskell, J. E. (2014). Reducing the effects of game day pressures through stress exposure training. Journal of Sport Psychology in Action, 5, 28–43. Dyer, W. G. (1977). Team building: Issues and alternatives. Reading, MA: Addison-Wesley. Dyre, L., Tabor, A., Ringsted, C., & Tolsgaard, M. G. (2017). Imperfect practice makes perfect: Error management training improves transfer of learning. Medical Education, 51, 196–206. Entin, E. E., & Serfaty, D. (1999). Adaptive team coordination. Human Factors, 41, 312–325. Feldman, D. (1988). Managing careers in organizations. Glenview, IL: Foreman. Festner, D., & Gruber, H. (2008). Conditions of work environments in fostering transfer of training. In S. Billett, C. Harteis, & A. Eteläpelto (Eds.), Emerging perspectives of workplace learning (pp. 215–228). Rotterdam, The Netherlands: Sense Publishers. Finseth, T. T., Keren, N., Dorneich, M. C., Franke, W. D., Anderson, C. C., & Shelley, M. C. (2018). Evaluating the effectiveness of graduated stress exposure in virtual spaceflight hazard training. Journal of Cognitive Engineering and Decision Making, 12(4), 248–268. Ford, J. K., Baldwin, T. T., & Prasad, J. (2018). Transfer of training: The known and the unknown. Annual Review of Organizational Psychology and Organizational Behavior, 5, 201–225. Ford, J. K., Kozlowski, S., Kraiger, K., Salas, E., & Teachout, M. (Eds.). (1997). Improving training effectiveness in work organizations. Mahwah, NJ: Lawrence Erlbaum Associates. Ford, J. K., & Noe, R. A. (1987). Self-assessed training needs: The effects of attitudes toward training, managerial level, and function. Personnel Psychology, 40, 39–53. Ford, J. K., Quinones, M. A., Sego, D. J., & Sorra, J. S. (1992). Factors affecting the opportunity to perform trained tasks on the job. Personnel Psychology, 45, 511–527. Ford, J. K., Smith, E. M., Weissbein, D. A., Gully, S. M., & Salas, E. (1998). Relationships of goal orientation, metacognitive activity, and practice strategies with learning outcomes and transfer. Journal of Applied Psychology, 83, 218–233. Fowler, S. M., & Pusch, M. D. (2010). Intercultural simulation games: A review (of the United States and beyond. Simulation Gaming, 41(1), 94–115. Fowlkes, J., Dwyer, D. J., Oser, R. L., & Salas, E. (1998). Event-based approach to training (EBAT). International Journal of Aviation Psychology, 8(3), 209–221. Frese, M., Brodbeck, F., Heinbokel, T., Mooser, C., Schleiffenbaum, E., & Thiemann, P. (1991). Errors in training computer skills: On the positive function of errors. Human–Computer Interaction, 6, 77–93. Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York: Wiley. Ghodsian, D., Bjork, R., & Benjamin, A. (1997). Evaluating training during training: Obstacles and opportunities. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 63–88). Washington, DC: American Psychological Association. Gilpin-Jackson, Y., & Busche, G. R. (2007). Leadership development training transfer: A case study of post-training determinants. Journal of Management Development, 26, 980–1004. Glerum, D. R., Joseph, D. L., McKenny, A. F., & Fritzsche, B. A. (2020). The trainer matters: Cross-classified models of trainee reactions. Journal of Applied Psychology. Advance online publication. Goettl, B. P., Yadrick, R. M., Connolly-Gomez, C., Regian, W. J., & Shebilske, W. L. (1996). Alternating task modules in isochronal distributed training of complex tasks. Human Factors, 38, 330–346.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Goldstein, I. L. (1993), Training in organizations (3rd ed.). Pacific Grove, CA. Goldstein, I. L., & Ford, J. K. (2002). Training in organizations: Needs assessment, development, and evaluation (4th ed.). Belmont, CA: Wadsworth. Gorman, J. C., Cooke, N. J., & Amazeen, P. G. (2010). Training adaptive teams. Human Factors, 52, 295–307. Grossman, R., & Burke-Smalley, L. A. (2018). Context-dependent accountability strategies to improve the transfer of training: A proposed theoretical model and research propositions. Human Resource Management Review, 28(2), 234–247. Hawley, J. D., & Barnard, J. K. (2005). Work environment characteristics and implications for training transfer: A case study of the nuclear power industry. Human Resource Development International, 8, 65–80. Holt, R. W., Hansberger, J. T., & Boehm-Davis, D. A. (2002). Improving rater calibration in aviation: A case study. International Journal of Aviation Psychology, 12, 305–330. Howard, M. C., & Jacobs, R. R. (2016). The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): two novel evaluation methods for developing optimal training programs. Journal of Organizational Behavior, 37(8), 1246–1270. Hughes, A. M., Gregory, M. E., Joseph, D. L., Sonesh, S. C., Marlow, S. L., Lacerenza, C. N., Benishek, L. E., King, H. B., & Salas, E. (2016). Saving lives: A meta-analysis of team training in healthcare. Journal of Applied Psychology, 101, 1266–1304. Hughes, A. M., Zajac, S., Spencer, J. M., & Salas, E. (2018). A checklist for facilitating training transfer in organizations. International Journal of Training and Development, 22, 334–345. Hughes, A. M., Zajac, S., Woods, A. L., & Salas, E. (2020). The role of work environment in training sustainment: A meta-analysis. Human Factors, 62, 166–183. Ivancic, K., & Hesketh, B. (1995). Making the best of errors during training. Training Research Journal, 1, 103–125. Kealey, D. J., & Protheroe, D. R. (1996). The effectiveness of cross-cultural training for expatriates: An assessment of the literature on the issue. International Journal of Intercultural Relations, 20, 141–165. Keith, N., & Frese, M. (2008). Effectiveness of error management training: A meta-analysis. Journal of Applied Psychology, 93, 59–69. Kirkpatrick, D. L. (1976). Evaluation of training. In R. L. Craig (Ed.), Training and development handbook: A guide to human resource development (2nd ed., pp. 1–26). New York,: McGraw-Hill. Klein, G., & Militello, L. (2001). Some guidelines for conducting a cognitive task analysis. In E. Salas (Ed.), Advances in human performance and cognitive engineering research (pp. 161–199). Amsterdam: Elsevier Science. Knapp, D. E., & Kustis, G. A. (1996). The real ‘disclosure’: Sexual harassment and the bottom line. In M. S. Stockdale (Ed.), Sexual harassment in the workplace: Perspectives, frontiers, and response strategies (pp. 199–213). Thousand Oaks, CA: Sage Publications. Kozlowski, S. W. J., Gully, S. M., Salas, E., & Cannon-Bowers, J. A. (1996). Team leadership and development: Theory, principles, and guidelines for training leaders and teams. In M. Beyerlein, S. Beyerlein, & D. Johnson (Eds.), Advances in interdisciplinary studies of work teams: Team leadership (Vol.3, pp. 253–292). Greenwich, CT: JAI Press. Kozlowski, S. W. J., & Salas, E. (Eds.). (2010). Learning, training, and development in organizations. New York: Taylor & Francis Group. Kraiger, K. (2002). Decision-based evaluation. In K. Kraiger (Ed.), Creating, implementing, and maintaining effective training and development: State-of-the-art lessons for practice (pp. 331–375). Mahwah, NJ: Jossey-Bass. Kraiger, K., Ford, J. K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation Journal of Applied Psychology, 78, 311–328.
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Lacerenza, C. N., Reyes, D. L., Marlow, S. L., Joseph, D. L., & Salas, E. (2017). Leadership training design, delivery, and implementation: A meta-analysis. Journal of Applied Psychology, 102, 1686–1718. Landers, R. N., & Armstrong, M. B. (2017). Enhancing instructional outcomes with gamification: An empirical test of the Technology-Enhanced Training Effectiveness Model. Computers in Human Behavior, 71, 499–507. Latham, G. P. (1989). Behavioral approaches to the training and learning process. In L. L. Goldstein (Ed.), Training and development in organizations (pp. 256–295). San Francisco, CA: Jossey-Bass. Lee, H., Parsons, D., Kwon, G., Kim, J., Petrova, K., Jeong, E., & Ryu, H. (2016). Cooperation begins: Encouraging critical thinking skills through cooperative reciprocity using a mobile learning game. Computers & Education, 97, 97–115. Levett-Jones, T., & Lapkin, S. (2014). A systematic review of the effectiveness of simulation debriefing in health professional education. Nurse Education Today, 34(6), e58–e63. Lewin, K. (1951). Field theory in social science: Selected theoretical papers. New York: Harper & Row. Lim, D. H., & Morris, M. L. (2006). Influence of trainee characteristics, instructional satisfaction, and organizational climate on perceived learning and training transfer. Human Resource Development Quarterly, 17, 85–115. LinkedIn Learning Solutions. 2017 workplace learning report. (2017). Retrieved from https://learning.linkedin.com/content/dam/me/ learning/en-us/pdfs/lil-workplacelearning-report.pdf (on June 30, 2017). Littrell, L. N., & Salas, E. (2005). A review of cross-cultural training: Best practices, guidelines, and research needs. Human Resource Development Review, 4, 305–334. Loh, V., Andrews, S., Hesketh, B., & Griffin, B. (2013). The moderating effect of individual differences in error-management training: Who learns from mistakes? Human Factors, 55, 435–448. MacNab, B. R. (2012). An experiential approach to cultural intelligence education. Journal of Management Education, 36(1), 66–94. Marks, M. A. (2000). A critical analysis of computer simulations for conducting team research. Small Group Research, 31, 653–675. Marks, R. B., Sibley, S. D., & Arbaugh, J. B. (2005). A structural equation model of predictors for effective online learning. Journal of Management Education, 29, 531–563. Mathieu, J. E., Tannenbaum, S. I., & Salas, E. (1992). Influences of individual and situational characteristics on measures of training effectiveness. Academy of Management Review, 35, 828–847. Mathieu, J. E., & Tesluk, P. E. (2010). A multilevel perspective on training and development effectiveness. In S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 405–440). New York: Taylor and Francis Group. McClernon, C. K., McCauley, M. E., O’Connor, P. E., & Warm, J. S. (2011). Stress training improves performance during a stressful flight. Human Factors, 53, 207–218. McGovern, M. (2017). Thriving in the gig economy: How to capitalize and compete in the new world of work. Wayne, NJ: Career Press. Miller, J. E., Patterson, E. S., & Woods, D. D. (2006). Elicitation by critiquing a cognitive task analysis methodology. Cognition Technology and Work, 8, 90–102. Moe, M. T., & Blodget, H. (2000). The knowledge web, Part 3, Higher web universities online. New York: Merrill Lynch & Co. Mor, S., Morris, M. W., & Joh, J. (2013). Identifying and training adaptive cross-cultural management skills: The crucial role of cultural metacognition. Academy of Management Learning & Education, 12, 453–475. Morris, M. A., & Robie, C. (2001). A meta-analysis of the effects of cross-cultural training on expatriate performance and adjustment. International Journal of Training and Development, 5, 112–125. Moseley, J. L., & Dessigner, J. C. (2007). Training older workers and learners: Maximizing the performance of an aging workforce. San Francisco, CA: Wiley.
431
Munro, C. R. (2009). Mentoring needs and expectations of generation-Y human resources practitioners: Preparing the next wave of strategic business partners. Journal of Management Research, 1, 1–25. Murthy, N. N., Challagalla, G. N., Vincent, L. H., & Shervani, T. A. (2008). The impact of simulation training on call center agent performance: A field-based investigation. Management Science, 54, 384–399. National Safety Council. (2010). Summary from Injury Facts, 2010 Edition. http://www.nsc.org/news_resources/injury_and_death_ statistics/Documents/Summary_2010_Ed.pdf. National Safety Council. (2020). Work injury costs. https://injuryfacts .nsc.org/work/costs/work-injury-costs/ Ng, E. S. W., Schweitzer, L., & Lyons, S. T. (2010). New generation, great expectations: A field study of the millennial generation. Journal of Business & Psychology, 25, 281–292. Noe, R. A., Clarke, A. D. M., & Klein, H. J. (2014). Learning in the twenty-first-century workplace. Annual Review of Organizational Psychology and Organizational Behavior, 1(1), 245–275. Noe, R. A., & Schmitt, N. (1986). The influence of trainee attitudes on training effectiveness: Test of a model. Personnel Psychology, 39, 497–523. Orvis, K. A., Fisher, S. L., & Wasserman, M. E. (2009). Power to the people: Using learner control to improve trainee reactions and learning in web-based instructional environments. Journal of Applied Psychology, 94, 960–971. Orvis, K. A., Horn, D. B., & Belanich, J. (2008). The roles of task difficulty and prior videogame experience on performance and motivation in instructional videogames. Computers in Human Behavior, 24, 2415–2433. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (pp. 175–202). Greenwich, CT: JAI Press. Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. (2008). Learning styles: Concepts and evidence. Psychological Science in the Public Interest, 9, 104–119. Prince, C., Oser, R., Salas, E., & Woodruff, W. (1993). Increasing hits and reducing misses in CRM/LOS scenarios: Guidelines for simulator scenario development. International Journal of Aviation Psychology, 3, 69–82. Quinones, M. A. (1997). Contextual influencing on training effectivenesss. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 177–200). Washington, DC: American Psychological Association. Roberson, L., Kulik, C. T., & Pepper, M. B. (2009). Individual and environmental factors influencing the use of transfer strategies after diversity training. Group & Organization Management, 34, 67–89. Roberson, L., Kulik, C. T., & Tan, R. Y. (2012). Effective diversity training. In Q. M. Roberson (Ed.), The Oxford handbook of diversity and work (pp. 341–365). New York: Oxford University Press. Roorda, D. L., Jak, S., Zee, M., Oort, F. J., & Koomen, H. M. Y. (2017). Affective teacher–student relationships and students’ engagement and achievement: A meta-analytic update and test of the mediating role of engagement. School Psychology Review, 46, 239–261. Roosa, M. W., Dumka, L. E., Gonzales, N. A., & Knight, G. P. (2002). Cultural/ethnic issues and the prevention scientist in the 21st century. Prevention and Treatment, 5(1), 5a. Rossett, A., & Gautier-Downes, J. (1991). A handbook of job aids. San Francisco, CA: Jossey-Bass. Rouiller, J. Z., & Goldstein, I. L. (1993). The relationship between organizational transfer climate and positive transfer of training. Human Resource Development Quarterly, 4, 377–390. Rowold. J. (2007). The impact of personality on training-related aspects of motivation: Test of a longitudinal model. Human Resource Development Quarterly, 18, 9–31.
432 Saks, A. M., & Belcourt, M. (2006). An investigation of training activities and transfer of training in organizations. Human Resource Management, 45, 629–648. Saks, A. M., & Burke-Smalley, L. A. (2014). Is transfer of training related to firm performance? International Journal of Training and Development, 18, 104–115. Salas, E., Bisbey, T. M., Traylor, A. M., & Rosen, M. A. (2020). Can teamwork promote safety in organizations? Annual Review of Organizational Psychology & Organizational Behavior, 7, 283–313. Salas, E., & Cannon-Bowers, J. A. (2000). Designing training systematically. In E. A. Locke (Ed.), Handbook of principles of organizational behavior (pp. 43–59). Malden, MA: Blackwell. Salas, E., & Cannon-Bowers, J. A. (2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471–499. Salas, E., Cannon-Bowers, J. A., & Johnston, J. H. (1997). How can you turn a team of experts into an expert team? Emerging training strategies. In C. E. Zsambok, & G. Klein (Eds.), Naturalistic decision making (pp. 359–370). Mahwah, NJ: Lawrence Erlbaum Associates. Salas, E., & Klein, G. (Eds.). (2000) Linking expertise and naturalistic decision making. Mahwah, NJ: Lawrence Erlbaum Associates. Salas, E., Priest, H., Wilson, K., & Burke, C. S. (2006). Scenario-based training: Improving military mission performance and adaptability. In A. B. Adler, C. A. Castro, & T. W. Britt (Eds.), Military life: The psychology of serving in peace and combat (Vol. 2, pp. 32–53). Westport, CT: Praeger Security International. Salas, E., Rozell, D., Mullen, B., & Driskell, J. E. (1999). The effect of team building on performance: An integration. Small Group Research, 30, 309–329. Salas, E., & Stagl, K.C. (2009). Design training systematically and follow the science of training. In E. A. Locke (Ed.), Handbook of principles of organizational behavior: Indispensible knowledge for evidence-based management (2nd ed., pp. 59–84). Hoboken, NJ: Wiley. Salas, E., Stagl, K. C., Burke, C. S., & Goodwin, G. F. (2007). Fostering team effectiveness in organizations: Toward an integrative theoretical framework of team performance. In J. W. Shuart, W. Spaulding, & J. Poland (Eds.), Modeling complex systems: Motivation, cognition and social processes, Nebraska Symposium on Motivation, Vol. 52. Lincoln, NE: University of Nebraska. Salas, E., Wilson, K. A., Priest, H. A., & Guthrie, J. (2006). Training in organizations: The design, delivery, and evaluation of training systems. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 472–512). Hoboken, NJ: Wiley. Sanchez-Burks, J., Lee, F., Nisbett, R., & Ybarra, O. (2007). Cultural training based on a theory of relational ideology. Basic and Applied Social Psychology, 29, 257–268. Satish, U., & Streufert, S. (2002). Value of a cognitive simulation in medicine: Towards optimizing decision making performance of healthcare personnel. Quality and Safety in Health Care, 11, 163–167. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Seddon, J. (2008). Vets and videos: Student learning from context-based assessment in a pre-clinical science course. Assessment & Evaluation in Higher Education, 33(5), 559–566. Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77, 454–499. Serfaty, D., Entin, E. E., Johnston, J. H., & Cannon-Bowers, J. A. (1998). Team coordination training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 221–245). Washington, DC: American Psychological Association.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Shay, J. P., & Tracey, J. B. (2009). Expatriate adjustment and effectiveness: The mediating role of managerial practices. Journal of International Management, 15, 401–412. Shute, V. J., & Gawlick, L. A. (1995). Practice effects on skill acquisition, learning outcome, retention, and sensitivity to relearning. Human Factors, 37, 781–803. Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. D. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology, 93, 280–295. Sitzmann, T., Kraiger, K., Stewart, D., & Wisher, R. (2006). The comparative effectiveness of web-based and classroom instruction: A meta-analysis. Personnel Psychology, 59, 623–664. Sitzmann, T., & Weinhardt, J. M. (2019). Approaching evaluation from a multilevel perspective: A comprehensive analysis of the indicators of training effectiveness. Human Resource Management Review, 29(2), 253–269. Smith, A., & Smith, E. (2007). The role of training in the development of human resource management in Australian organizations. Human Resource Development International, 10, 263–279. Smith, E. M., Ford, J. K., & Kozlowski, S. W. J. (1997). Building adaptive expertise: Implications for training design strategies. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 89–118). Washington, DC: American Psychological Association. Smith-Jentsch, K. A., Zeisig, R. L., Acton, B., & McPherson, J. A. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–297). Washington, DC: American Psychological Association. Stanhope, D. S., Pond, S. B. III, & Surface, E. A. (2013). Core self-evaluations and training effectiveness: Prediction through motivational intervening mechanisms. Journal of Applied Psychology, 98(5), 820–831. Swezey, R. W. (1987). Design of job aids and procedure writing. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (pp. 1039–1057). New York: Wiley. Tannenbaum, S. I., Beard, R. L., McNall, L. A., & Salas, E. (2010). Informal learning and development in organizations. In S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 303–332). New York: Taylor and Francis Group. Tannenbaum, S. I., Cannon-Bowers, J. A., & Mathieu, J. E. (1993). Factors that influence training effectiveness: A conceptual model and longitudinal analysis. Report 93-011, Naval Training Systems Center, Orlando, FL. Tannenbaum, S. I., Mathieu, J. E., Salas, E., & Cannon-Bowers, J. A. (1991). Meeting trainees’ expectations: The influence of training fulfillment on the development of commitment. Journal of Applied Psychology, 76, 759–769. Tannenbaum, S. I., & Yukl, G. (1992). Training and development in work organizations. Annual Review of Psychology, 43, 399–441. Taylor, P. J., Russ-Eft, D. F., & Chan, D. W. L. (2005). A meta-analytic review of behavior modeling training. Journal of Applied Psychology, 90, 692–709. Thayer, P. W., & Teachout, M. S. (1995). A climate for transfer model. Report AL/HR-TP-1995-0035, Air Force Material Command, Brooks Air Force Base, TX. Thomas, D. C. (1999). Cultural diversity and work group effectiveness. Journal of Cross-Cultural Psychology, 30, 242–263. Thomas, V. J., & Cohn, T. (2006). Communication skills and cultural awareness courses for healthcare professionals who care for patients with sickle cell disease. Issues and Innovations in Nursing Education, 1, 480–488. Tichon, J. (2007). Training cognitive skills in virtual reality: Measuring performance. CyberPsychology & Behavior, 10, 286–289.
DESIGN, DELIVERY, EVALUATION, AND TRANSFER OF EFFECTIVE TRAINING SYSTEMS Tracey, B. J., Hinkin, T. R., Tannenbaum, S. I., & Mathieu, J. E. (2001). The influence of individual characteristics and the work environment on varying levels of training outcomes. Human Resource Development Quarterly, 12, 5–24. Tracey, B. J., Tannenbaum, S. I., & Kavanagh, M. J. (1995). Applying trained skills on the job: The importance of the work environment. Journal of Applied Psychology, 80, 239–252. van Merriënboer, J., Kester, L., & Paas, F. (2006). Teaching complex rather than simple tasks: Balancing intrinsic and germane load to enhance transfer of learning. Applied Cognitive Psychology, 20, 343–352. Vecchi, G. M., Van Hasselt, V. B., & Romano, S. J. (2005). Crisis (hostage) negotiation: Current strategies and issues in high-risk conflict resolution. Aggression and Violent Behavior, 10(5), 533–551. Velada, R., Caetano, A., Michel, J. W., Lyons, B. D., & Kavanagh, M. J. (2007). The effects of training design, individual characteristics and work environment on transfer of training. International Journal of Training and Development, 11, 282–294. Villachica, S. W., & Stone, D. L. (2010). Cognitive task analysis: Research and experience. In K. H. Silber & W. R. Foshay (Eds.), Handbook of improving performance in the workplace:
433
Instructional design and training delivery (pp. 227–258). San Francisco, CA: Pfeiffer. Volpe, C. E., Cannon-Bowers, J. A., Salas, E., & Spector, P. E. (1996). The impact of cross-training on team functioning: An empirical investigation. Human Factors, 38, 87-100. Williams, T. C., Thayer, P. W., & Pond, S. B. (1991). Test of a model of motivational influences on reactions to training and learning. Paper presented at the 6th Annual Conference of the Society for Industrial and Organizational Psychology, St. Louis, MO. Wolfson, M. A., Mathieu, J. E., Tannenbaum, S. I., & Maynard, M. T. (2019). Informal field-based learning and work design. Journal of Applied Psychology, 104, 1283–1295. Wolfson, M. A., Tannenbaum, S. I., Mathieu, J. E., & Maynard, M. T. (2018). A cross-level investigation of informal field-based learning and performance improvements. Journal of Applied Psychology, 103, 14–36. Wydra, F. T. (1980), Learner controlled instruction. Englewood Cliffs, NJ: Educational Technology Publications. Yang, H., Sackett, P. R., & Arvey, R. D. (1996). Statistical power and cost in training evaluation: Some new considerations. Personnel Psychology, 49, 651–668.
CHAPTER 17 SITUATION AWARENESS Mica R. Endsley SA Technologies, LLC Gold Canyon, Arizona
1
INTRODUCTION
434
2
SITUATION AWARENESS DEFINED
434
2.1
Level 1: Perception of the Elements in the Environment
435
4
441
4.6
Complexity Creep
441
4.7
Errant Mental Models
442
4.8
Out-of-the-Loop Syndrome
442
Level 2: Comprehension of the Current Situation
435
SITUATION AWARENESS IN TEAMS
442
2.3
Level 3: Projection of the Future
435
5.1
Team SA Requirements
442
2.4
SA Levels Are Not Linear
435
5.2
Team SA Devices
443
2.5
Elements of Situation Awareness
435
5.3
Team SA Mechanisms
443
2.6
Sources of SA
437
5.4
Team SA Processes
443
2.7
Role of SA in Decision Making
437
5
SITUATION AWARENESS MODEL
438
TRAINING TO SUPPORT SITUATION AWARENESS
444
3.1
438
6.1
Interactive Situation Awareness Trainer (ISAT)
444
6.2
Virtual Environment Situation Awareness Review System (VESARS)
444
6.3
Situation Awareness Virtual Instructor (SAVI)
445
Pre-Attentive Processes
3.2
Working Memory and Attention
438
3.3
Long-term Memory
438
3.4
Alternating Data-Driven and Goal-Driven Processing
440
3.5
Expertise in SA
440
3.6
Summary
440
SITUATION AWARENESS CHALLENGES
441
4.1
Attentional Tunneling
441
4.2
Requisite Memory Trap
441
4.3
Workload, Anxiety, Fatigue, and Other Stressors (WAFOS)
441
Data Overload
441
4.4
7
8
1 INTRODUCTION Situation awareness (SA), a person’s understanding of what is happening in the current situation, is critical for performance in a wide variety of domains, including aviation, air traffic control, driving, military operations, emergency management, health care, and power grid operations (Endsley, 2015b; Parasuraman, Sheridan, & Wickens, 2008; Wickens, 2008). In these complex settings, people must quickly understand the state of an often rapidly changing system and environment in order to make good decisions, formulate effective plans, and carry out appropriate actions. Obtaining accurate SA is often quite challenging, however. Problems with SA were found to underlie human error in 88% of accidents by commercial airlines (Endsley, 1995b). Reviews of errors and accidents in other domains, such as air traffic control (Rodgers, Mogford, & Strauch, 2000), nuclear power (Hogg, Torralba, & Volden, 1993; Mumaw, Roth, & Schoenfeld, 1993), health care (Schulz et al., 2016), and driving (Endsley, 2020b; Gugerty, 1997), show that problems in maintaining accurate SA are not limited to aviation, but also are responsible for the majority of human error in many complex systems. In these settings, people’s primary struggle is most often not in determining the correct thing to do, nor in physically 434
Misplaced Salience
2.2
6 3
4.5
SYSTEM DESIGN TO SUPPORT SITUATION AWARENESS
445
7.1
SA Requirements Analysis
445
7.2
SA-Oriented Design Principles
447
7.3
Measurement of SA
448
CONCLUSION
REFERENCES
450 451
performing their tasks, but in fundamentally understanding what is going on in the situation. Without accurate SA, even the most experienced professionals may fail to perform well. Maintaining a high level of SA has been reported to be the most difficult part of many jobs. In this chapter I will first present a detailed definition of SA, a term that has often been misunderstood. This is followed by a description of the cognitive processes involved in achieving high levels of SA, along with some of the common challenges for SA encountered in many domains where it is important. I then discuss SA in teams, and the factors that underlie the development of team SA which significantly affects performance in many settings. Based on this foundation, major approaches are then provided for improving SA through training and system design. I conclude with a discussion of validated techniques for measuring situation awareness in individuals and teams that can be used to evaluate new system designs and training programs, and to conduct further research on this important construct. 2 SITUATION AWARENESS DEFINED SA can be thought of as an internalized mental model of the current state of a person’s environment. All of the incoming data
SITUATION AWARENESS
435
from various sensors and systems, the outside environment, fellow team members, and others must be brought together into an integrated whole. This integrated picture forms the central organizing feature from which all decision making and action takes place. Although the term SA originated in the aviation domain, SA is widely studied and exists as a basis of performance across many different domains, including air traffic control, military operations, health care, driving, power plants and power grids, maritime operations, maintenance, and weather forecasting (Endsley, 2019). The earliest and most widely used formal definition of SA describes it as “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future” (Endsley, 1988). SA therefore involves perceiving critical factors in the environment (level 1 SA), understanding what those factors mean, particularly when integrated together in relation to the person’s goals (level 2), and at the highest level an understanding of what will happen with the system in the near future (level 3). These higher levels of SA allow people to function in a timely and effective manner, even with very complex and challenging tasks. Each of these levels will be discussed in more detail.
2.3 Level 3: Projection of the Future
2.1 Level 1: Perception of the Elements in the Environment
2.4 SA Levels Are Not Linear
The first step in achieving SA is to perceive the status, attributes, and dynamics of relevant elements in the environment. A pilot needs to perceive important elements such as other aircraft, terrain, system status, and warning lights along with their relevant characteristics. In the cockpit, just keeping up with all of the relevant system and flight data, other aircraft, and navigational information can be quite taxing. Similarly, an army officer needs to detect enemy, civilian, and friendly positions and actions, terrain features, obstacles, and weather. A physician needs to gather data on the patient’s symptoms, vital signs, medical history, current medications, and test results. The air traffic controller or automobile driver are each concerned with a different set of information that is needed for SA in their tasks. Information may be perceived visually, auditorily, tactilely or through smell. It may come from direct observation (e.g., looking at the patient or out the window of a vehicle), written, verbal, or non-verbal communications, or be assisted by sensors and displays of various types.
The ability to project the future status of the environment, at least in the very near term, forms the third and highest level of SA. Projection is generally achieved through knowledge of the status and dynamics of the elements and a comprehension of the situation (both levels 1 and 2 SA). A consideration of not only what is currently happening, but also what is likely or possible to happen, is critical for supporting proactive decision making and planning. The ability to project future events has been shown to be a hallmark of expert SA in both aviation (Endsley & Garland, 2000a; Prince & Salas, 1998) and driving (McKenna & Crick, 1991, 1994). Amalberti and Deblon (1992), for example, found that a significant portion of experienced pilots’ time was spent in anticipating possible future occurrences. This gives them the knowledge (and time) necessary to decide on the most favorable course of action to meet their objectives. A physician, similarly, needs to project the prognosis of a disease, and the likely effect of different courses of action in order to decide on an effective disease treatment. Drivers must project the actions of pedestrians and other motorists as well as impending collisions in order to proactively avoid accidents. This ability to project is similarly critical in many other domains for effective decision making.
The three levels of SA are not necessarily linear, and do not describe three distinct stages; rather they are ascending levels of SA (Endsley, 1995c, 2004, 2015a). Being able to project what is likely to happen (level 3 SA) is better than only comprehending the present situation (level 2 SA), and both are better than only perceiving information, but not being able to understand its significance (level 1 SA). It is also not the case that people necessarily gather all their level 1 data and then form understanding and projection in a linear order. In many cases, people may use their higher-level SA (comprehension and projections) to generate assumptions regarding Level 1 SA elements for which they have no direct knowledge, or to direct the further search for data. This process occurs due to the use of mental models that provide default values for missing information (which will be discussed in more detail later). “In this way people can have level 2 and 3 SA, even when they do not have complete or accurate level 1 SA, and can use the higher levels of SA to drive the search for and acquisition of level 1 SA” (Endsley, 2004, p. 318). This is illustrated in Figure 1.
2.2 Level 2: Comprehension of the Current Situation
2.5 Elements of Situation Awareness
Comprehension or understanding of the situation is based on a synthesis of disjointed level 1 elements. Level 2 SA goes beyond simply being aware of the data and cues that are present to include a holistic understanding of the significance of those elements in light of one’s goals. For example, upon seeing warning lights indicating a problem during take-off, the pilot must quickly determine the seriousness of the problem in terms of the immediate airworthiness of the aircraft, and combine this with knowledge on the amount of runway remaining, in order to know whether or not to abort a take-off. A soldier needs to comprehend that trampled grasses indicate that enemy soldiers have recently camped in an area. A physician needs to comprehend that a rash on a patient indicates the presence of shingles. While novices may be theoretically capable of achieving the same level 1 SA as more experienced professionals in these domains, they frequently will fall far short of being able to accurately comprehend the meaning or significance of information they perceive.
SA does not include everything a person needs to know, such as all the rules or background information that experts in a field must learn. Rather it is comprised of the dynamically changing situational information that dictates when and how this background knowledge is applied to inform real-time decisions. The “elements” of SA in the definition are very domainspecific. Examples for air traffic control are shown in Table 1. Information such as aircraft type, altitude, heading, and flight plan, restrictions in effect at an airport, and conformance to a clearance each comprise meaningful elements of the situation for an air traffic controller that are needed for complete and accurate SA. The elements that are relevant for SA in other domains can be delineated similarly. Cognitive task analyses have been conducted to determine SA requirements in commercial aviation (Farley, Hansman, Amonlirdviman, & Endsley, 2000), fighter aircraft (Endsley, 1993), infantry operations (Matthews, Strater, & Endsley, 2004), and driving (Endsley, 2020b), among others.
436
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Act: Directed Search Decide: Need more info
Perception
Comprehension
Projection
Levels 2 & 3 SA can be used to drive the search for Level 1 info
Default values from the mental model can provide reasonable Level 1 SA values, even when no info has been directly perceived on an element Figure 1 Higher-level SA can be used to drive the search for data and to provide default values when lower-level information is not available. (Source: Endsley, 2015a. © 2015 SAGE Publications.)
Table 1
Elements of SA for Air Traffic Control Level 1 SA
Aircraft Aircraft identification (ID), combat identification (CID), beacon code Current route (position, heading, aircraft turn rate, altitude, climb/descent rate, ground speed) Current flight plan (destination, filed plan) Aircraft capabilities (turn rate, climb/descent rate, cruising speed, max/min speed) Equipment on board Aircraft type Fuel/loading Aircraft status Activity (en route, arriving, departing, handed off, pointed out) Level of control, instrument flight rules (IFR), visual flight rules (VFR), flight following, VFR on top, uncontrolled object) Aircraft contact established Aircraft descent established Communications (present/frequency) Responsible controller Aircraft priority Special conditions Equipment malfunctions Emergencies Pilot capability/state/intentions Altimeter setting Emergencies Type of emergency Time on fuel remaining Souls on board Requests Pilot/controller requests Reason for request
Table 1
(continued)
Clearances Assignment given Received by correct aircraft Readback correct/complete Pilot acceptance of clearance Flight progress strip current Sector Special airspace status Equipment functioning Restrictions in effect Changes to standard procedures Special operations Type of special operation Time begin/terminate operations Projected duration Area and altitude affected ATC equipment malfunctions Equipment affected Alternate equipment available Equipment position/range Aircraft in outage area Airports Operational status Restrictions in effect Direction of departures Current aircraft arrival rate Arrival requirements Active runways/approach Sector saturation Aircraft in holding (time, number, direction, leg length) Weather Area affected Altitudes affected (continued overleaf)
SITUATION AWARENESS Table 1
(continued)
Conditions (snow, icing, fog, hail, rain, turbulence, overhangs) Temperatures Intensity Visibility Turbulence Winds IFR/VFR conditions Airport conditions Level 2 SA Conformance Amount of deviation (altitude, airspeed, route) Time until aircraft reaches assigned altitude, speed, route/heading Current separation Amount of separation between aircraft/objects/airspace/ground along route Deviation between actual separation and prescribed limits Number/timing aircraft on routes Altitudes available Timing Projected time in airspace Projected time until clear of airspace Time until aircraft landing expected Time/distance aircraft to airport Time/distance until visual contact Order/sequencing of aircraft Deviations Deviation aircraft/landing request Deviation aircraft/flight plan Deviation aircraft/pilot requests Other sector/airspace Radio frequency Aircraft duration/reason for use Significance Impact of requests/clearances on: Aircraft separation/safety Own/other sector workload Impact of weather on: Aircraft safety/flight comfort Own/other sector workload Aircraft flow/routing (airport arrival rates, flow rates, holding requirements aircraft routes, separation procedures) Altitudes available Traffic advisories Impact of special operations on sector Operations/procedures Location of nearest capable airport for aircraft type/emergency Impact of malfunction on: routing, communications, flow control, aircraft, coordination procedures, other sectors, own workload Impact on workload of number of aircraft sector demand vs. own capabilities
437 Table 1
(continued)
Confidence level/accuracy of information Aircraft ID, position, altitude, airspeed, heading Weather Altimeter setting Level 3 SA Projected aircraft route (current) Position, fight plan, destination, heading, route, altitude, climb/descent rate, airspeed, winds, ground speed, intentions, assignments Projected aircraft route (potential) Projected position x at time t Potential assignments Projected separation Amount of separation along route (aircraft/objects/airspace/ground) Deviation between separation and prescribed limits Relative projected aircraft routes Relative timing along route Predicted changes in weather Direction/speed of movement Increasing/decreasing in intensity Impact of potential route changes Type of change required Time and distance until turn aircraft amount of turn/new heading, altitude, route change required Aircraft ability to make change Projected number of changes necessary Increase/decrease length of route Cost/benefit of new clearance Impact of proposed change on aircraft separation Source: Endsley & Rogers (1994a).
2.6 Sources of SA SA is typically derived from a wide variety of sources in a given domain (Figure 2). The sources of SA can vary considerably between individuals, and even from time to time for the same individual. Information may come from a computerized display in one instance, or from a written report or verbal communication in another. The presence of these sources of SA should not be confused with actually having SA. While many technologies or displays purport to be “SA systems,” there is no awareness of situational information until the individual who needs the information to support their decisions has acquired it. Thus, the fact that certain information is on a report or display somewhere or possessed by a team mate is not sufficient; it must be acquired and understood by the decision maker before actual SA exists. 2.7 Role of SA in Decision Making SA is considered to be a separate stage from decision making and performance, but is a key input to these processes. It is one thing to know what is going on, but a separate matter to make the best decision, which may be based on different strategies, tactics, or processes. It is very difficult to make accurate decisions without good SA, however. Even in domains with well-established procedures, a key challenge is correctly understanding what is happening in the situation in order to carry out the right procedures.
438
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Direct Observation e4
Real World
e1
System Knowledge
e2
Interface Knowledge
e3
e5 SA
System
Team Members and Others Figure 2
Sources of SA. (Source: Endsley, 1995c, 1997.)
3 SITUATION AWARENESS MODEL Reviews of theoretical models of SA are provided in Pew (1995), Durso and Gronlund (1999), Wickens (2008), and Endsley (2000b, 2015a). The most comprehensive model of SA, Endsley (1988, 1990b, 1995c, 2015a) depicts a framework of relevant cognitive processes, based on information processing theory, that is summarized in Figure 3. In this model, SA is shown to be the product of a number of environmental and system factors, as well as individual cognitive and perceptual capabilities and mechanisms. In combination, pre-attentive processes, focalized attention, working memory, and long-term memory, coupled with alternating goal-driven and data-driven processing, form the basic cognitive structures on which SA is based. These mechanisms will be discussed in more detail. (See also Chapter 5 on Information Processing by Wickens and Carswell, in this volume.) 3.1 Pre-Attentive Processes According to this model of SA, elements in the environment may initially be processed in parallel through pre-attentive sensory stores, where certain properties are detected, such as spatial proximity, color, simple properties of shapes, and movement, providing cues for further focalized attention (Neisser, 1967; Treisman & Paterson, 1984). The information that is most salient is processed further using focalized attention to achieve perception. When operating in a simple data-driven fashion, information salience is most important for determining which portions of the environment pass into awareness for SA. 3.2 Working Memory and Attention Attention can be directed by the contents of both working and long-term memory (Braune & Trollip, 1982; Wickens, 1992). For instance, advance knowledge regarding the location of information, the form of the information, the spatial frequency, the color, or the overall familiarity and appropriateness of the information can all significantly facilitate perception (Barber & Folkard, 1972; Biederman, Mezzanotte, Rabinowitz, Francolin, & Plude, 1981; Davis, Kramer, & Graham, 1983; Posner, Nissen, & Ogden, 1978). Limited attention creates a major constraint on a person’s ability to perceive multiple items
accurately in parallel and, as such, is a major limitation on SA in many complex environments. While attention can be shared across information from different modalities to some degree (Wickens, 1992), it poses a significant limitation for SA in domains that require the gathering of large amounts of information that change frequently. The perceived importance of information for the task dictates how this limited attention is applied (Fracker, 1989). In addition, preconceptions and expectations significantly affect not only which information is attended to, but often the perception of that information itself (Jones, 1977). For example, people may often see what they expect to see or hear what they expected to hear. For individuals who have not developed other cognitive mechanisms in a domain (i.e., novices and those in novel situations), the perception of the elements in the environment (the first level of SA) is also significantly limited by working memory. In the absence of other mechanisms, most of a person’s active processing of information must occur in working memory. New information must be combined with existing knowledge and a composite picture of the situation developed. Projections of future status and subsequent decisions as to appropriate courses of action will also occur in working memory, which can be quite demanding. Working memory will be significantly taxed with the processes of achieving the higher levels of SA, formulating and selecting responses, and carrying out subsequent actions. Working memory has been shown to be significantly related to SA in novices in several studies, however, this relationship is not found in experts who are able to work around this limitation via long-term memory structures (Endsley & Bolstad, 1994; Gonzalez & Wimisberg, 2007; Gutzwiller & Clegg, 2012; Sohn & Doane, 2004; Sulistayawati, Wickens, & Chui, 2011). In that SA is accessible for a much longer period than typical working memory stores (Endsley, 1990a, 2000b), this model of SA supports the idea that working memory functions as an activated subset of long-term memory in accordance with Cowan (1988). 3.3 Long-term Memory In actual practice, goal-directed processing and long-term memory can be used to circumvent the limitations of working memory and direct attention more effectively. Long-term memory
SITUATION AWARENESS
Figure 3
439
Model of SA in dynamic decision making. (Source: Endsley, 1995c. © 1995 SAGE Publications.)
serves to shape the perception of objects in terms of known categories or mental representations (Ashby & Gott, 1988). Categorization tends to occur almost instantly (Hinsley, Hayes, & Simon, 1977). Long-term memory stores that support SA are believed to often exist in the form of mental models and schema. First, much relevant knowledge about a system is hypothesized to be stored in mental models. Rouse and Morris (1985, p. 7) define mental models as “mechanisms whereby humans are able to generate descriptions of system purpose and form, explanations of system functioning and observed system states, and predictions of future states.” Mental models are cognitive mechanisms that embody information about system form and function; often, they are relevant to physical systems (e.g., a car, computer, human bodies, or power plant) or organizational systems (e.g., how a university, company, or military unit works). They typically contain information about not only the components of a particular system but also how those components interact to produce various system states and events. Mental models can significantly aid SA as people recognize key features in the environment that map to key features in the model. The model provides a mechanism for determining associations between observed states of components (comprehension) and predictions of the behavior and status of these elements over time. Thus, mental models can provide much of the higher levels of SA without loading working memory. Mental models also significantly aid SA by providing knowledge of the relevant
information to help guide effective deployment of attention and the search for information. Also associated with mental models are schema: prototypical classes of states of the system (e.g., an engine failure, an enemy attack formation, or a dangerous weather formation) (Bartlett, 1932; Mayer, 1983). Schema are even more useful to the formation of SA since these recognized classes of situations provide an immediate one-step retrieval of the higher levels of SA, based on pattern matching between situation cues and known schema in memory (Dreyfus, 1981; Klein, 1989). Very often scripts, set sequences of actions, have also been developed for schema (Schank & Abelson, 1977), so that much of the load on working memory required for both generating alternative behaviors and selecting among them is also reduced. These mechanisms allow the individual to simply execute a predetermined action for a given recognized class of situations based on their SA. When schema and mental models are present, the current situation does not need to be exactly like the one encountered previously, due to the use of categorization mapping. As long as a close-enough mapping can be made into relevant categories, a situation can be recognized and comprehended in terms of the model, predictions made, and appropriate actions selected. Since people are generally good at pattern-matching, this process can be almost instantaneous and produces a much lower
440
load on working memory, which makes high levels of SA possible, even in very demanding situations. Mental models also provide default values via Q-morphisms that allow people to operate effectively, even with very limited information from the situation (Holland, Holyoak, Nisbett, & Thagard, 1986). Together mental models and schema form a hallmark of expert performance. (See also Endsley, 2018.) 3.4 Alternating Data-Driven and Goal-Driven Processing The individual’s goals also play an important part in the SA process. These goals can be thought of as ideal states of the system model that the person wishes to achieve. Goals determine which information is important. Not all data in the environment matters, just that needed for making decisions related to relevant active goals. Goals are also very important for interpreting information. The significance of information ultimately depends on the individual’s goals. In what Casson (1983) has termed a top-down decisionmaking process, a person’s goals and plans will direct which environmental aspects are attended to in the development of SA. Goal-driven or top-down processing is very important in the effective information process and development of SA. Conversely, in a bottom-up or data-driven process, patterns in the environment may be recognized which will indicate that different plans will be necessary to meet goals or that different goals should be activated. Alternating between “goal driven” and “data driven” is characteristic of human information processing that underpins much of SA development in complex worlds. People who are purely data driven are very inefficient at processing complex information sets; there is too much information, so they are simply reactive to the cues that are most salient. People who have clearly developed goals, however, will search for information that is relevant to those goals (on the basis of the associated mental model, which contains information on which aspects of the system are relevant to goal attainment), allowing the information search to be more efficient and providing a mechanism for determining the relevance of the information that is perceived. If people are only goal driven, however, they are likely to miss key information that would indicate that a change in goals is needed (e.g., from the goal “land the airplane” to the goal “execute a go-around”). Thus, effective information processing is characterized by alternating between these two modes: using goal-driven processing to efficiently find and process the information needed for achieving goals, and data-driven processing to regulate the selection of which goals should be most important at any given time. (See also Corbetta & Schulman, 2002.) 3.5 Expertise in SA Expertise has a significant effect on people’s ability to develop good SA (Endsley, 2018). For novices or those dealing with novel situations, performance in complex and dynamic systems can be very demanding or impossible to accomplish successfully in that it requires detailed mental calculations based on rules or heuristics, placing a heavy burden on working memory. As novices lack relevant and detailed mental models and schema in a domain, they tend to have more scattered information search patterns (Mann, Williams, Ward, & Janelle, 2007; Stein, 1992; Yu, Wang, Li, & Braithwaite, 2014). They lack knowledge of which information is most important and how to combine that information. Without understanding the relationships between cues and system components, they have trouble in prioritizing information and in understanding which information is most relevant (Endsley, 2018).
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
When experience has allowed the development of mental models and schema, pattern matching between the perceived elements in the environment and existing schema/mental models can occur on the basis of pertinent cues that have been learned. Thus, the comprehension and future projection required for the higher levels of SA can be developed with far less effort and within the constraints of working memory. When scripts have been developed, tied to the schema, the entire decision-making process will be greatly simplified. There is some evidence that some people are significantly better than others at developing SA. In one study of experienced military fighter pilots, Endsley and Bolstad (1994) found a 10-fold difference in SA between the pilot with the lowest SA and the one with the highest SA. They also found this ability to be highly stable, with test–retest reliability rates exceeding 0.94 for those evaluated. Others have similarly noted consistent individual differences, with some pilots routinely having better SA than their compatriots (Bell & Waag, 1995; Secrist & Hartman, 1993). These individual differences appear even when people operate with the same system capabilities and displays and in the same environment subject to the same demands. A number of studies have sought to find the locus of these individual differences in SA abilities. Are they due simply to the effects of expertise and experience or are they indicative of better cognitive mechanisms and capabilities that some people have? Endsley and Bolstad (1994) found that military pilots with better SA were better at attention sharing, pattern matching, spatial abilities, and perceptual speed. O’Hare (1997) also found evidence that elite pilots (defined as consistently superior in gliding competitions) performed better on a divided-attention task purported to measure SA. Gugerty and Tirre (1997) found evidence that people with better SA performed better on measures of working memory, visual processing, temporal processing, and time-sharing ability. Although studies have examined individual differences in only a few domains (e.g., piloting and driving), some of these attributes may also be relevant to SA differences in other arenas. If reliable markers can be found that differentiate those who will eventually be most successful at SA, more valid selection batteries can be developed for critical jobs such as air traffic controller, pilot, or military commander. There has also been research to examine what potentially trainable skills differentiate those with high SA from those with low SA. For instance, SA differences between those at different levels of expertise have been examined in groups of pilots (Endsley & Garland, 2000a; Prince & Salas, 1998), military officers (Strater, Jones, & Endsley, 2003), aircraft mechanics (Endsley & Robertson, 2000), power plant operators (Collier & Folleso, 1995), and drivers (Borowsky & Wall, 1983; Horswill & McKenna, 2004; Parmet, Borowsky, Yona, & Oron-Gilad, 2015; Underwood, Chapman, Bowden, & Crundall, 2002). These studies have found many systematic differences, some of which may relate to underlying abilities, but many of which also point to learned skills or knowledge that may be trainable. 3.6 Summary SA is an active process. People are not passive receivers of SA in most cases, but rather are active participants in the process. They control how they direct their attention, how they communicate with others to gather information, how they deploy tests or sensors to gather information, and how they manipulate their displays to show desired information, for example. The SA model also shows that this process is both cyclical and dynamic. It does not end with simply carrying out a single action in most cases. Rather, the current contents of a person’s SA (their integrated understanding of the situation) directs the ongoing search for further information in a cyclical manner.
SITUATION AWARENESS
The feedback loop in the model depicts this ongoing and dynamic process. SA processes affect SA as a state of knowledge (product) and that state of knowledge further drives SA processes in an ongoing cycle. Although developing SA can be very challenging in many environments, when key cognitive mechanisms are developed through experience (schema and mental models), people are able to circumvent certain cognitive limitations (working memory and attention) to develop sufficient levels of SA to function very effectively. Nevertheless, developing accurate SA remains very challenging in many complex settings and demands a significant portion of peoples’ time and resources. Thus, developing selection batteries, training programs, and system designs to enhance SA are major goals in many domains. The model of SA in Figure 3 also shows that the capabilities of the system for gathering and presenting needed information, as well as the user interface will have a significant effect on a person’s ability to develop SA. Environmental factors such as high or low workload, complexity, stress, and automation also can significantly affect SA. These factors will be discussed next in more detail. System designs that do a superior job of organizing and presenting needed information and overcoming these challenges can significantly improve SA, and are discussed in more detail at the end of this chapter.
4
SITUATION AWARENESS CHALLENGES
Building and maintaining SA can be a difficult process for people in many different jobs and environments. Pilots report that the majority of their time is spent trying to ensure that their mental picture of what is happening is current and correct. The same can be said for people in many other domains where systems are complex and there is a great deal of information to understand, where information changes rapidly, and where information is difficult to obtain. Common reasons for these difficulties have been captured in terms of eight SA demons—factors that work to undermine SA in many systems and environments (Endsley, Bolte, & Jones, 2003; Endsley & Jones, 2012). These eight factors are now discussed. 4.1 Attentional Tunneling Successful SA is highly dependent on constantly juggling attention between different aspects of the environment. Unfortunately, there are significant limits on people’s ability to divide their attention across multiple aspects of the environment, particularly within a single modality, such as vision or sound, and thus attention sharing can occur only to a limited extent (Wickens, 1992). They can often get trapped in a phenomenon called attentional narrowing or tunneling (Baddeley, 1972; Bartlett, 1943; Broadbent, 1954). When succumbing to attentional tunneling, people lock in on certain aspects or features of the environment that they are trying to process and will either intentionally or inadvertently drop their scanning behavior. In this case, SA may be very good on the part of the environment they are focused on, but will quickly become outdated on other aspects they are not attending to. Attentional narrowing has been found to undermine SA in tasks such as flying and driving and poses one of the most significant challenges to SA in many domains.
441
be remembered and combined with new information. Auditory information must also be remembered, as it cannot be revisited in the same way that visual displays can. Given the complexity and sheer volume of information required for SA in many systems, these memory limits create a significant problem for SA. System designs that require people to remember information (either auditory information or information from different screens or computers, for example), increase the likelihood of SA errors. 4.3 Workload, Anxiety, Fatigue, and Other Stressors (WAFOS) Stressors, such as anxiety, time pressure, mental workload, uncertainty, noise or vibration, excessive heat or cold, poor lighting, physical fatigue, and working against one’s circadian rhythms, are unfortunately an unavoidable part of many work environments. These stressors can act to significantly reduce SA by further reducing people’s already limited working memory capacity, and by reducing the efficiency of information gathering. It has been found that people may pay less attention to peripheral information, become more disorganized in scanning information, and are more likely to succumb to attentional tunneling when affected by these stressors. People are also more likely to arrive at a decision without taking into account all available information (premature closure). 4.4 Data Overload Data overload is a significant problem in many systems. The volume of data and the rapid rate of change of that data create a need for information intake that quickly outpaces people’s ability to gather and assimilate the data. As people can take in and process only a limited amount of information at a time, significant lapses in SA can occur. While it is easy to think of this problem as simply a human limitation, in reality it often occurs because data are processed, stored, and presented ineffectively in many systems. The data overload problem is not just due to large data volumes, but also due to ineffective use of the bandwidth provided by a person’s sensory and information-processing pipeline. The rate that data can flow through the input pipeline can be increased significantly based on the form of information presentation employed in the user interface. 4.5 Misplaced Salience The human perceptual system is more sensitive to certain features than others, including the color red, movement, and flashing lights. Similarly, loud noises, larger shapes, and things that are physically nearer have the advantage of catching a person’s attention. These natural salient properties can be used to promote SA or to hinder it. When used carefully, properties such as movement or color can be used to draw attention to critical and very important information and are thus important tools for designing to enhance SA. Unfortunately, these features are often overused or used inappropriately. In many systems there is a proliferation of lights, buzzers, alarms, and other signals that actively work to draw people’s attention, frequently either misleading them toward irrelevant or less important information, or overwhelming them completely. The unnecessary distraction of misplaced salience can act to degrade SA of more important information the person is attempting to assimilate.
4.2 Requisite Memory Trap The limitations of working memory also create a significant problem. Many features of the situation may need to be held in working memory. As a person scans different information from the environment, information accessed previously must
4.6 Complexity Creep Over time, systems have become more and more complex, often through a misguided attempt to add more features or capabilities. Unfortunately, this complexity makes it difficult for people
442
to form sufficient internal representations of how these systems work. The more features, and the more complicated and branching the rules that govern a system’s behavior, the greater the complexity. Although system complexity can slow down a person’s ability to take in information, it works primarily to undermine the person’s ability to correctly interpret the information presented and to project what is likely to happen (levels 2 and 3 SA). Information can be completely misinterpreted, as the internal mental model is likely inadequate to encompass the full characteristics of the system. 4.7 Errant Mental Models Mental models are important mechanisms for building and maintaining SA, providing key interpretation mechanisms for information collected. They tell a person how to combine disparate pieces of information, how to interpret the significance of that information, and how to develop reasonable projections of what will happen in the future. If an incomplete mental model is used, however, or if the wrong mental model is relied on for the situation, poor comprehension and projection (levels 2 and 3 SA) can result. Anchoring and confirmation bias can contribute to this problem. Also called a representational error, it can be very difficult for people to realize that they are working on the basis of an errant mental model and break out of it (Jones & Endsley, 2000). Mode errors, in which people misunderstand information because they believe that the system is in one mode when it is really in another, are a special case of this problem (Sarter & Woods, 1995). 4.8 Out-of-the-Loop Syndrome Automation is the final SA demon. While in some cases automation can help SA by eliminating excessive workload, it can also act to lower SA by putting people out of the loop (Endsley & Kiris, 1995). In this state, they develop poor SA as to both how the automation is performing and the state of the elements the automation is supposed to be controlling. When the automation is performing well, being out of the loop may not be a problem, but when the automation fails or, more frequently, reaches situational conditions that it is not equipped to handle, the person is out of the loop and often unable to detect the problem, properly interpret the information presented, and intervene in a timely manner (Endsley, 2017; Onnasch, Wickens, Li, & Manzey, 2014). 5 SITUATION AWARENESS IN TEAMS People work in teams to perform their tasks in many domains. A surgeon, anesthesiologist, and several nurses may work closely together to perform surgery, for example. Each person has a well-defined set of information they need to be aware of, based on their role on the team and the decisions they must make. Team SA is defined as “the degree to which every team member possesses the SA needed for his or her job” (Endsley, 1995c, p. 39). Good team SA means that every person needs to be aware of the information they need for their individual job. It is not sufficient if one person in the team has the needed information, but it is not successfully transmitted to another team member who needs it, as significant errors can still result, negatively affecting team performance. Team SA has been found to be predictive of overall team performance in a number of studies (Cooke, Kiekel, & Helm, 2001; Crozier et al., 2015; Gardner, Kosemund, & Martinez, 2017; Parush et al., 2017; Prince, Ellis, Brannick, & Salas, 2007). In addition, within teams there is a need for shared SA which is defined as “the degree to which team members have the same
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
SA on shared SA requirements” (Endsley & Jones, 2001, p. 48). While team members each have different roles and responsibilities, and thus different SA needs, it is also important that they are all “on the same page” (i.e., have shared SA) with respect to certain common situational information, allowing their actions to be coordinated and appropriate. Shared SA has also been shown to be predictive of team performance in a number of studies (Bonney, Davis-Sramek, & Cadotte, 2016; Cooke, et al., 2001; Coolen, Draaisma, & Loeffen, 2019; Rosenman et al., 2018). Team SA and shared SA are illustrated in Figure 4 (Endsley & Jones, 2001). By examining the shared SA needs on a team, it is possible to determine the degree to which system interfaces and training programs support the development of shared SA. Endsley and Jones (2001) provide a model of team SA that shows four factors that are important for supporting good shared SA and team SA: Team SA requirements, team SA devices, team SA mechanisms and team SA processes. 5.1 Team SA Requirements In addition to sharing relevant information about the system and environment across team members to support their individual SA needs, several other SA requirements exist in teams to support team coordination and functioning. These include: • • • •
the status of other team members’ tasks on oneself; the status of own tasks on others; the impact of one’s actions on others and vice versa; projections of the actions of other team members.
In addition to the need to share basic information in many cases, the sharing of SA regarding higher-level assessments of the situation across team members is extremely important. In that different team members have different sets of expertise, they may arrive at different understandings of information significance, even if they share the same level 1 SA. Shared comprehension needs are largely a function of the interdependencies between team members’ jobs. Team members also need to know the status of other team members’ activities that may impact on their own tasks and goals. For instance, a maintenance technician may need to alert other team members that he is opening a valve that can affect the operations or safety of other technicians in the area. Similarly, team members need to know how their own tasks and actions impact on other team members so that they can coordinate appropriately (Endsley & Robertson, 1996). In a highly functioning team, team members are able to project not only what will occur with their system and external events, but also what fellow team members will do. For example, Xiao, Mackenzie, and Patey (1998) found that operations in a medical trauma team broke down in cases where team members were unable to anticipate what help would be needed by other team members. In many teams, team members may not be aware of the information that others need, or may incorrectly assume that teammates already know what is going on. For example, a significant mishap occurred aboard the Mir Space Station when the wrong cable was accidentally disconnected by a cosmonaut during a routine maintenance task. The cable interrupted power to a central computer and set the station into a drift (having lost attitude control). The crew was faced with never-before-seen messages on the computer monitors and needed the help of the ground controllers. The ground controller on duty, however, did not really understand the Mir’s problem (as transmissions were garbled by static) and let it continue to drift, treating the problem as routine. Significant time was lost as they waited for the next communications pass. Energy-saving procedures were not put into place
SITUATION AWARENESS
443
Team SA
Shared SA A - Subgoal
B - Subgoal
C - Subgoal
Figure 4 Team SA involves all team-members knowing the relevant information for their individual role while shared SA focuses on the sub-set of information needs that are in common. (Source: Endsley & Jones, 2001. © 2001 Human Factors and Ergonomics Society.)
and the Mir’s batteries became drained, losing all power to the station (Burrough, 1998). The effective coordination of ground-based and space-based crew members was needed to solve this unique problem and was sorely lacking in this example because a shared understanding of the situation was never developed. This example also illustrates the significant challenges that teams who are distributed (either spatially or temporally) often have in effectively developing team SA.
5.2 Team SA Devices A number of different devices for sharing SA may be potentially used by teams including communications, shared displays and information from a shared environment. Communications, both verbal and nonverbal, form a common method for sharing information across team members. When teammates are co-located, non-verbal cues such as facial expressions, can be very helpful in communicating information such as task load and stress. Xiao, Mackenzie, and Patey (1998) found that non-verbal communication was very important in the medical trauma teams they studied, as did Segal (1994) in studying pilot teams. Shared displays, including visual, auditory, tactile, or other displays, such as reports, are a major contributor to team SA. Considerable effort is being directed toward the development of displays to support a common understanding of information across distributed team members, sometimes termed a common operating picture (COP). Bolstad and Endsley (1999) investigated the utility of shared displays in augmenting verbal communications channels between team members. They found that certain types of shared displays (that were tailored to meet the explicit SA requirements of each team member) could significantly enhance team performance, particularly under high workload conditions. Shared displays that were not tailored to individual SA needs, instead repeating all the information of the other team member, had no effect or depressed performance. The design of shared displays needs to be guided by an understanding both of each individual’s SA needs, as well as the select information needed to support shared SA and team coordination (Bolstad & Endsley, 2005). Finally, team SA can be supported by information perceived directly from within a shared environment. For example, a nurse, anesthesiologist and surgeon may all glean relevant information about a patient’s status due to all being collocated with the patient. A pilot and co-pilot both observe the aircraft lifting off from the runway and may not need to communicate this information explicitly.
In many situations, only some of these devices may be available, in which case all the information sharing must be funneled across a narrower bandwidth (e.g., verbal communications when there are no shared displays and people are physically distributed). Further, the relative contribution of these different sources of information may vary considerably in different environments and at different times (Bolstad & Endsley, 1999). Team members may freely trade-off between different SA devices and sources. 5.3 Team SA Mechanisms Teams may also be able to take advantage of shared mental models to significantly aid the development of shared SA. When teammates have the advantage of shared training and experiences, their mental models may be very similar, increasing the likelihood that they will form similar comprehension and projections from information they receive. Shared mental models can significantly facilitate the development of shared SA and reduce dependence on communications and other shared SA devices. They allow team members to arrive at the same comprehension and projections much more rapidly, as compared to teams who must verbally discuss their higher levels of SA. Mosier and Chidester (1991), for example, found that better performing aircrews actually communicated less than poorer performing ones, most likely through the use of these shared mental models. It has also been found that aircrew who are new in working together are much more likely to have an accident; 44% of aviation accidents occur on the first leg of a new crew pairing, and 73% occur on the first day (National Transportation Safety Board, 1994). Most shared SA is developed through common training programs, as well as experience in working together with other team mates. Jones (1997) also showed that shared mental models can be improved for aviation teams through a system that provides information on crew members’ aircraft flight experience and recency. Cross-training has also been found to improve shared mental models and team SA (Bolstad, Cuevas, Costello, & Rousey, 2005; Gorman, Cooke, & Amazeen, 2010; Stout, Cannon-Bowers, & Salas, 2017). 5.4 Team SA Processes Considerable research has focused on effective processes used by teams to maintain SA, even in challenging situations. The processes teams use to interact and share information are critical to the development of good individual SA and shared SA within teams. Good team processes have been shown to
444
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
be important for good team SA and shared SA (Berggren, Prytz, Johansson, & Nählinder, 2011; Gross & Kluge, 2014; Hallbert, 1997; Stout, Cannon-Bowers, Salas, & Milanovich, 1999; Sulistyawati, Chui, & Wickens, 2008). Orasanu and Salas (1993) summarize a number of studies involving aircrew and military teams to show that effective teams do the following: 1. Engage in contingency planning that helps set up shared mental models for emergencies. 2. Have leaders who establish a democratic environment that supports better sharing of information, and who explicitly state more plans, strategies, and intentions, consider more options, provide more explanations and give more warnings or predictions (Chidester, Kanki, Foushee, Dickinson, & Bowles, 1990; Orasanu, 1990). 3. Develop a shared understanding of the problem prior to looking for solutions, thus avoiding getting bogged down (Hirokawa, 1983), which is particularly important the more diverse the team members’ backgrounds are (Citera et al., 1995). Klein, Zsambok and Thordsen (1993) additionally point to the importance of (1) clear delineation and understanding of tasks, goals, and roles; (2) avoidance of micro-managing but willingness to help other teammates; (3) avoidance of fixation and willingness to examine various factors of the situation; (4) encouragement of different opinions and a process to come to convergence; and (5) the ability to manage time and make changes as needed. Taylor, Endsley and Henderson (1996) found teams that are effective at SA establish a group norm of information sharing and self-checking to make sure everyone is on the same page at each step. They coordinate as a group, delegate tasks, and gather information from each other. They imagine possible events in the future and come up with contingency plans for addressing them. They also actively prioritize goals, so that overall performance is not sacrificed due to distractions or unexpected problems. In contrast, teams with poor SA have a group norm in which pertinent information is not shared; team members go along with the group without contributing important, but conflicting information. They are easily distracted by unexpected problems and are unable to prioritize tasks effectively. They tend to rely more on expectations, which may be incorrect, and have no team processes in place for detecting this. In some cases, a strong personality acts to lead the others astray based on a strong, but erroneous, picture of the situation. Taken together, knowledge of team SA requirements, and the quality of team SA devices, team SA mechanisms and team SA processes all contribute to the ability of teams to form good shared SA and team SA to support team performance goals. 6 TRAINING TO SUPPORT SITUATION AWARENESS A number of programs have been developed that seek to train knowledge and skills related to SA, at the individual or team level, via classroom-based instruction, simulated scenarios or case studies, or through computer-based training. These include training programs for commercial aviation pilots (Hormann, Blokzijl, & Polo, 2004; Robinson, 2000), general aviation pilots (Bolstad, Endsley, Costello, & Howell, 2010; Bolstad, Endsley, Howell, & Costello, 2002; Bolstad, Endsley, Howell, & Costello, 2003; Endsley & Garland, 2000a; Prince, 1998), drivers (Kass, VanWormer, Mikulas, Legan, & Bumgarner, 2011; McKenna & Crick, 1994; Soliman & Mathna, 2009), health care workers (Bogossian et al., 2014; Chang et al., 2015; Hänsel et al., 2012), aircraft mechanics (Endsley & Robertson,
2000), and army officers (Bolstad & Endsley, 2005; Strater et al., 2004). Some of these approaches rely on classroom-based instruction that seeks to create more awareness about the concept of SA and the many challenges that effect it. Others involve more detailed programs that seek to build up the critical knowledge and skills that underlie SA. The preliminary findings reported by these efforts show initial successes in improving SA and performance in their respective settings. In general, more longitudinal studies are needed to ascertain the degree to which SA training programs can be successful in improving the SA of people in the wide variety of challenging situations that are common in these domains. Three unique approaches to SA training will be discussed: ISAT, VESARS, and SAVI. 6.1 Interactive Situation Awareness Trainer (ISAT) The Interactive SA Trainer (ISAT) employs rapid experiential learning to support mental model and schema development. In normal operations, over the course of many months and years, individuals will gradually build up the experience base to develop good mental models and schema for pattern matching upon which good SA most often relies. ISAT attempts to bootstrap this natural process by exposing the trainee to many, many situations in a very short period of time using computer-based training tools (Strater, et al., 2004). ISAT employs realistic scenarios with opportunities for complex operational decisions. It provides an increased opportunity for exposure to a variety of situations which (1) supports the development of situation-based knowledge stores; (2) trains the recognition of critical cues that signal prototypical situations; (3) supports needed information integration and decision making; and (4) promotes an understanding of the importance of consequences, timing, risk, and capabilities associated with different events, behaviors, and decision options. Trainees learn what it means to develop SA in the environment, learn to build higher-level SA from the data, and receive training on projecting future events in prototypical situations. In addition to realistic simulations of relevant events, ISAT incorporates an avatar virtual instructor that guides trainees through the process of searching for relevant information and the skills needed to form comprehension and projection. 6.2 Virtual Environment Situation Awareness Review System (VESARS) Feedback is critical to the learning process. In order to improve SA, individuals need to receive feedback on the quality of their SA; however, this often is lacking in the real world. For example, inexperienced pilots may fail to appreciate the severity of threatening conditions because they have come through similar conditions in the past just by luck. Unfortunately, this also reinforces poor assessments. It is difficult for individuals to develop a good gage of their own SA in normal operations. Training through SA feedback allows trainees to fine-tune critical behaviors and mental models based on knowledge of their own SA and relevant behaviors (Endsley, 1989). The Virtual Environment SA Review System (VESARS) involves the use of SA measures that assess trainee SA (Kaber et al., 2013; Kaber, Riley, Lampton, & Endsley, 2005). It includes three major components: 1. a behavioral rating tool that assesses individual and team actions; 2. a communications rating tool that evaluates team communications; 3. a SA query tool that allows direct and objective assessment of the quality of individual and team SA.
SITUATION AWARENESS
445
VESARS was specifically designed to work well within virtual and simulated training environments but it can also be employed in field exercises. SA training is provided after each simulated trial in which VESARS data are collected. Providing knowledge of results immediately following each simulation trial on the SA level achieved across the various SA requirements and relevant communications and behaviors, allows trainees to understand the degree to which they were able to acquire SA and ways in which they need to modify their processes to improve their SA. 6.3 Situation Awareness Virtual Instructor (SAVI) The Situation Awareness Virtual Instructor (SAVI) trains people on the behaviors that are consistent with and important to good SA (Endsley, Riley, & Strater, 2009). Trainees play the role of the trainer as they rate the actions of others in vignettes provided through a computer and provide a rationale for their rating. The SAVI approach leverages the exponential learning that occurs during peer instruction and in the transition to becoming a trainer. Trainees quickly learn which SA behaviors are appropriate for various operational situations, because they observe these aspects of performance and provide their assessments on the quality of the performance observed. Trainees are able to refine their mental models of good SA behaviors and communications by comparing their assessments to those provided by domain experts. This allows trainees to fine-tune their understanding of critical cues and behaviors associated with good SA. 7 SYSTEM DESIGN TO SUPPORT SITUATION AWARENESS Successful system designs must deal with the challenge of combining and presenting the vast amounts of data now available from many technological systems in order to provide good SA (whether it is to a pilot, a physician, a business manager, or an automobile driver). An important key to the development of complex technologies is understanding that SA exists only in the mind of the human decision maker. Therefore, presenting a ton of data will do no good unless the data are transmitted, absorbed, and assimilated successfully and in a timely manner by the individual in order to form SA. Unfortunately, many systems fail in this regard, leaving significant SA problems in their wake (Figure 5). Attempts to improve SA through better sensors, information processing, and display approaches have received significant attention over much of the past 30 years. Unfortunately, a significant portion of these efforts have stopped short of really addressing SA but rather have added additional sensors or InformationNeeded
Data Produced Sort Find
Integrate Process
More Data ≠ More Information Figure 5 Information gap. (Source: Endsley, 2000b. © 2000 Taylor & Francis.)
displays that purport to improve SA without addressing the cognitive and perceptual needs of the decision maker. While ensuring that people have the data needed to meet their level 1 SA requirements is undoubtedly important, a rampant increase in data may inadvertently hurt SA as much as it helps. Simply increasing the amount of data available to decision makers adds to the information gap, overloading people without necessarily improving the level of SA the person can develop and maintain. People are still required to search through all the data, sort it, process and integrate it to form the needed SA, resulting in much of the overload observed in many domains. High workload associated with electronic health record systems has been associated with increases in physician burnout, for example (Downing, Bates, & Longhurst, 2018; Melnick et al., 2019), and act as a deterrent to their use. SA provides a key mechanism for overcoming this data overload. SA specifies how all the data in an environment need to be combined and understood. Therefore, instead of loading down a decision maker with hundreds of pieces of miscellaneous data provided in a haphazard fashion, SA requirements provide guidance as to what the real comprehension and projection needs of the individual are. Therefore, it provides the system designer with key guidance on how to bring the various pieces of data together to form meaningful integrations and groupings of data that can be absorbed and assimilated easily in time-critical situations. An integrated SA-oriented design approach provides very unique combinations of information and portrayals of information that go far beyond the technology-oriented design approaches of the past. In the past it was up to individuals to figure out what information is important and find it in a sea of data. This task left them overloaded and susceptible to missing critical factors. If system designers work to develop systems that support the SA process, however, they can alleviate this bottleneck significantly. So how should systems be designed to meet the challenge of providing high levels of SA? Based on the SA model presented in Figure 3, an SA-oriented design process has been created (Figure 6) to guide the development of systems that support SA (Endsley, et al., 2003; Endsley & Jones, 2012). This structured approach incorporates SA considerations into the design process, including a determination of SA requirements, design principles for SA enhancement, and the measurement of SA in design evaluation. 7.1 SA Requirements Analysis The problem of determining which aspects of the situation are important for a particular individual’s SA can be addressed using a form of cognitive task analysis called goal-directed task analysis (GDTA) (Endsley, 1993), illustrated in Figure 7. In a GDTA, the major goals of a particular job class are identified, along with the major subgoals necessary for meeting each goal. Associated with each subgoal, the major decisions that need to be made are then identified. The SA needed for making these decisions and carrying out each subgoal are then delineated. These SA requirements focus not only on what data individuals need in their roles, but also on how that information is integrated or combined to address each decision. In this analysis process, SA requirements are defined as those dynamic information needs associated with the major goals or subgoals of each role (as opposed to more static knowledge, such as rules, procedures, and general system knowledge). The GDTA is based on goals or objectives, not physical tasks (as a traditional task analysis might). This is because goals form the basis for decision making in many complex environments. GDTAs are typically conducted for each relevant role in the system. For example, a different GDTA would be appropriate for a physician and a nurse, or for different roles in a military
446
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
SA-Oriented Design SA Requirements Analysis
Figure 6
SA-Oriented Design Principles
SA Measurement
SA-oriented design process. (Source: Endsley et al., 2003; Endsley & Jones, 2012. © 2012 Taylor & Francis.)
1.0 Major Goal
Figure 7
1.1 Subgoal
1.2 Subgoal
1.3 Subgoal
Decisions
Decisions
Decisions
SA Requirements:
SA Requirements:
SA Requirements:
Level 3 -Projection Level 2 -Comprehension Level 1 -Perception
Level 3 -Projection Level 2 -Comprehension Level 1 -Perception
Level 3 -Projection Level 2 -Comprehension Level 1 -Perception
Goal-directed task analysis for determining SA requirements. (Source: Based on Endsley & Rodgers, 1994b.)
command and control battalion. A GDTA is usually conducted using a combination of cognitive engineering procedures. Expert elicitation, observation of people performing their tasks, verbal protocols, analysis of written materials and documentation, and formal questionnaires have formed the basis for the GDTA. In general, a GDTA is conducted with a number of different experts for each role, who are interviewed, observed, and recorded individually, with the resulting analyses pooled and then validated overall by a larger number of experienced individuals. An example of the output of a GDTA is shown in Figure 8. This example shows the SA requirements analysis for a driver for the subgoal “avoid objects in the roadway” for the major goal “avoid collisions.” In this example, there are several major decisions that are made including, “will the vehicle collide with the object?,” “does the object need to be avoided?,” and “do you have the ability to avoid the object?” Any or all of these decisions may need to be made with respect to objects in or near the roadway. The relevant SA is listed below each decision, including comprehension and projection needs, as well as low-level data inputs. The GDTA systematically defines the SA requirements (at all three levels of SA) that are needed to effectively make the decisions required by for each goal. In this manner, the way in which pieces of data are used together and combined to form what decision makers really want to know is determined. Thus, it includes a broad assessment of SA needs for both normal and infrequent or emergency conditions. Because the GDTA is used to determine information needs for system design, this systematic approach is useful to make sure that all SA needs are considered. It should be noted that at any given time, only certain subgoals may be “active” for a decision maker. For example, avoiding a collision with a fixed object will only become of primary concern on the rare occasion that an object appears, and similarly, the goal of responding to emergencies will only become
active infrequently. Other goals, such as navigate to destination, and avoid collision with other vehicles and pedestrians may be active most of the time. At any given time more than one goal or subgoal may be operational, although they will not always have the same prioritization. The GDTA does not indicate any prioritization among the goals (which can vary over time) or that each subgoal within a goal will always be active. The GDTA does not follow a fixed sequence of actions. Decision making and tasks rarely fall into a fixed sequence, in the complex domains where SA is often studied, but are dynamically dictated by the flow of events. When particular events are triggered (e.g., the subgoal of avoiding an object collision in this example), the relevant goals or subgoals become active. The analysis strives to be as technology-free as possible. How the information is acquired is not addressed, as this can vary considerably from person to person, from system to system, and from time to time. In some cases, it may be through system displays, verbal communications, other operators, or internally generated from within the individual. Many of the higher-level SA requirements fall into this category. The way in which information is acquired can vary widely between persons, over time, and between system designs. The analysis seeks to determine what people would ideally like to know to meet each goal. It is recognized that they often must operate on the basis of incomplete information and that some desired information may not be available at all with today’s systems. However, for the purposes of design and evaluation of systems, it is important to consider what people ideally need to know, so that artificial ceiling effects, based on today’s technology, are not induced in the process. Finally, it should be noted that static knowledge, such as procedures or rules for performing tasks, is outside the bounds of a SA requirements analysis. The analysis focuses on the dynamic situational information that affects what people do.
SITUATION AWARENESS
Subgoal Decision
447
3.3.1 Avoid objects in the roadway Will vehicle collide with the object? Projected point of collision/miss distance Predicted trajectory of the object Position of object Speed of object Direction of movement of object Projected changes in speed/direction of the object
SA Requirements
Predicted trajectory of the vehicle Position of vehicle Speed of vehicle Direction of vehicle Projected changes in speed/direction of the vehicle Does the object need to be avoided? Predicted damage to vehicle during collision Type of object Mass of object Speed of vehicle Ability to avoid the object? Predicted collision with other vehicles/objects/pedestrians Distance to other vehicles/objects/pedestrlans Vehicle/object/pedestrian locations Vehicle/object/pedestrian trajectories Braking time availble Distance to object Maximum breaking rate Roadside conditions/clearance Ability to execute avoidance maneuver Projected point of collision/miss distance Figure 8
Goal-directed task analysis example for driving. (Source: Endsley 2020. © 2020 Taylor and Francis.)
To date, GDTAs have been completed for many domains of common concern, including air traffic control (Endsley & Jones, 1995; Endsley & Rodgers, 1994b), fighter pilots (Endsley, 1993), commercial pilots (Endsley, Farley, Jones, Midkiff, & Hansman, 1998), aircraft mechanics (Endsley & Robertson, 1996), military officers (Bolstad, Riley, Jones, & Endsley, 2002; Strater, Endsley, Pleban, & Matthews, 2001), emergency response (Humphrey & Adams, 2011), primary health care (Farrell et al., 2017), paramedics (Hamid & Waterson, 2010), and nuclear power plant operators (Hogg, et al., 1993).
Some of the general principles include the following: 1.
2.
3. 7.2 SA-Oriented Design Principles Designing systems to successfully provide the multitude of SA requirements that exist in complex systems is a significant challenge. To address this need, design principles have been developed to better support the cognitive processes involved in acquiring and maintaining SA in dynamic complex systems based on the model of SA in Figure 3 (Endsley, et al., 2003; Endsley & Jones, 2012). The 50 SA-Oriented design principles, summarized in Table 2, include (1) general guidelines for supporting SA; (2) guidelines for coping with automation and complexity; (3) guidelines for the design of alarm systems; (4) guidelines for the presentation of information uncertainty; and (5) guidelines for supporting SA in team operations.
4.
5.
6.
Direct presentation of higher-level SA needs (comprehension and projection) is recommended, rather than supplying only low-level data that people must integrate and interpret manually. Goal-oriented information displays should be provided, organized so that the information needed for a particular goal is co-located and answers directly the major decisions associated with the goal. Support for global SA is critical, providing an overview of the situation across the person’s goals at all times (with detailed information for goals of current interest) and enabling efficient and timely goal switching and projection. Critical cues related to key features of schema need to be determined and made salient in the interface design (in particular, those cues that will indicate the presence of prototypical situations will be of prime importance and will facilitate goal switching in critical conditions). Extraneous information not related to SA needs should be removed (while carefully ensuring that such information is not needed for broader SA needs). Support for parallel processing, such as multimodal displays, should be provided in data-rich environments.
448 Table 2 No.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS SA-Oriented Design Principles. SA design principles
General principles 1 Organize information around goals 2 Present Level 2 information directly—support comprehension 3 Provide assistance for Level 3 SA projections 4 Support global SA 5 Support trade-offs between goal-driven and data-driven processing 6 Make critical cues for schema activation salient 7 Take advantage of parallel processing capabilities 8 Use information filtering carefully Principles for supporting information uncertainty and confidence 9 Explicitly identify missing information 10 Support sensor reliability assessment 11 Use data salience in support of certainty 12 Represent information timeliness 13 Support assessment of confidence in composite data 14 Support uncertainty management activities Principles for handling complexity 15 Just say no to feature creep—buck the trend 16 Manage rampant featurism through prioritization and flexibility 17 Insure logical consistency across modes and features 18 Minimize logic branches 19 Map system functions to the goals and mental models of users 20 Provide system transparency and observability 21 Group information based on Level 2/3 SA requirements and goals 22 Reduce display density, but don’t sacrifice coherence 23 Provide consistency and standardization on controls across different displays and systems 24 Minimize task complexity Principles for supporting alarm management 25 Don’t make people rely on alarms—provide projection support 26 Support alarm confirmation activities 27 Make alarms unambiguous 28 Reduce false alarms, reduce false alarms, reduce false alarms 29 Set missed alarm and false alarm trade-offs appropriately 30 Use multiple modalities to alarm, but insure they are consistent 31 Minimize alarm disruptions to ongoing activities 32 Support the assessment and diagnosis of multiple alarms 33 Support the rapid development of global SA of systems in an alarm state Principles for supporting automation 34 Automate only if necessary 35 Use automation for assistance in carrying out routine actions rather than higher level cognitive tasks
Table 2 No. 36 37 38 39 40 41 42 43 44
(continued) SA design principles Provide SA support rather than decisions Keep the operator in control and in the loop Avoid the proliferation of automation modes Make modes and system states salient Enforce automation consistency Avoid advanced queuing of tasks Avoid the use of information cueing Use methods of decision support that create human/system symbiosis Provide automation transparency
Principles for supporting team operations 45 Build a common picture to support team operations 46 Avoid display overload in shared displays 47 Provide flexibility to support shared SA across functions 48 Support transmission of different comprehensions and projections across teams 49 Limit non-standardization of display coding techniques 50 Support transmission of SA within positions by making status of elements and states overt Source: Endsley, et al., 2003; Endsley & Jones, 2012.
These guidelines provide principles and approaches for supporting key cognitive processes, and for avoiding the SA demons that create a significant challenge for SA in many domains. In addition to supporting SA in individuals, principles are provided on supporting shared SA across collocated and distributed teams. Principles are provided for supporting calibration of confidence in information, an aspect of SA that has been found to be important in every domain where it has been studied. Approaches to address the challenge of system complexity, over-use of alarms, and automation that negatively impact SA are also provided. SA-oriented design is applicable to a wide variety of system designs. It has been used successfully as a design philosophy for systems involving unmanned vehicles control, remote maintenance operations, health care systems, power grid operations, and command and control for distributed military teams, among many others. 7.3 Measurement of SA Many concepts and technologies are currently being developed and touted as enhancing SA. Prototyping and simulation of new technologies, new displays, and new automation concepts are extremely important in evaluating the actual effects of proposed concepts within the context of the task domain, when operated by domain-knowledgeable subjects. If SA is to be a design objective, it is critical that it be evaluated specifically during the design process. Without this, it will be impossible to tell if a proposed concept actually helps SA, does not affect it, or inadvertently compromises it in some way. A primary benefit of examining system design from the perspective of SA is that the impact of design decisions on SA can be assessed objectively as a measure of the quality of the integrated system design when used within the actual challenges of the operational environment. Thus, SA can be separated from other factors such as decision strategies or experience that can indicate different types of remedial actions.
SITUATION AWARENESS
449
In general, direct measurement of SA can be very advantageous in providing more sensitivity and diagnosticity in the test and evaluation process. SA measurement augments the use of performance and workload measures in determining the utility of new design concepts. While workload measures provide insight into how hard an operator must work to perform Table 3
tasks with a new design, measuring SA provides insight into the level of understanding gained from that work. SA measurement has been approached in a number of ways. See Endsley and Garland (2000b) for details on these methods. A summary of the advantages and disadvantages of these approaches is provided in Table 3 (Endsley, 2019).
Comparison of SA Measurement Approaches Direct SA measures
Process measures
Performance measures
Metrics Eye tracking, communications, verbal protocols, physiological
Response time, Errors
Advantages • Objective and continuous
•
• Information on order and duration of attention to visual information • Communications and verbal protocols can provide information on processes, strategies, types of assessments made
Subjective
Objective
Likert scales, SART
SAGAT
SPAM
Objective or subjective
• Easy to collect
• Queries people on relevant SA knowledge on perceptions, comprehension & projection
•
Queries people on relevant SA knowledge of past, present & future
•
Can be gathered without operator input
• Can be used across many domains
• Objectively scored based on simulation data
•
Objectively scored and timed based on simulation data
•
Often already collected
• Provides indication of confidence in SA
• Unbiased sampling across scenario amd avoids end-of-trial memory dependence
•
Simulation freeze not required
•
Assumes what behavior will occur given a particular state of SA. System or training changes may affect performance in unexpected ways
• People may not be aware of what they do not know; meta-awareness is poor
• Requires freezing of simulation scenario
•
Requires dual-tasking to answer queries while performing task, potentially interfering with performance and creating a secondary task workload measure.
• Communications provides only partial information on what is attended to and how processed. Some people verbalize more than others.
•
SA for normal and emergency events may be different, so inferences constrained by scenarios tested and performance measures collected
• May be overly influenced by self-assessments of performance
• Requires people to answer queries based on memory for 2–3 minutes during freeze
•
Allows people to look up answers to queries which may not assess SA
• Little research to date to support validity of physiological measures for SA
•
Confuses SA with performance which can be affected by other factors. Often insufficient sensitivity and diagnosticity.
• Some scales (e.g., SART) include measures of workload
• Requires development of domain-specific queries
•
Requires development of domain-specific queries
Disadvantages • Eye tracking does not capture attention to auditory cues or if information is correctly understood or integrated for higher levels of SA
Source: Endsley, 2019. © 2019 SAGE Publications.
450
Direct measurement of SA has generally been approached either through subjective ratings or by objective techniques. Although subjective ratings are simple and easy to administer, research has shown that they correlate poorly with objective SA measures, indicating they more closely capture a person’s confidence in his or her SA rather than the actual level or accuracy of that SA (Endsley, Selcon, Hardiman, & Croft, 1998). Subjective SA ratings have been found not to correlate well with objective measures of SA over a number of studies, indicating that people have poor awareness of the quality or completeness of their own SA (Endsley, 2020a). One of the most widely used objective measures of SA is the SA Global Assessment Technique (SAGAT) (Endsley, 1988, 1995a, 2000a). SAGAT has been used successfully to measure SA when evaluating system technologies, display designs, automation concepts and training programs (Endsley, 2019). Using SAGAT, a simulated test scenario employing the design of interest is frozen at randomly selected times, the system displays are blanked, and the simulation is suspended while people quickly answer questions about their current perceptions of the situation. The questions correspond to their SA requirements as determined from a GDTA for that domain. People’s perceptions are then compared to the real situation based on simulation computer databases to provide an objective measure of SA. Multiple “snapshots” of people’s SA can be acquired in this way, providing an index of the quality of SA provided by a particular design. The collection of SAGAT data provides an objective, unbiased assessment of SA that overcomes the problems incurred when collecting such data after the fact, which is reliant on memory (Nisbett & Wilson, 1977). It also avoids the problems of biasing of SA due to secondary task loading or artificially cuing attention that have been found to be problematic when SA probes are provided in real time while a person is performing the task (Endsley, 2019). SAGAT is a global tool developed to assess SA across all of its elements based on a comprehensive assessment of operator SA. As a global measure, SAGAT includes queries across the range of a person’s SA requirements, including level 1, level 2, and level 3 SA components. It includes a consideration of system functioning and status as well as relevant features of the external environment. By including queries across the full spectrum of an individual’s SA requirements, this approach minimizes possible biasing of attention, as people cannot prepare for the queries in advance since they could be queried on almost every aspect of the situation to which they would normally attend. Details on administering SAGAT can be found in Endsley (2020c). A recent meta-analysis of studies that have employed SAGAT found that it has a high level of sensitivity (94% across 68 studies) and is highly predictive of performance (r = .46) across a wide range of domains and study manipulations in 35 studies (Endsley, 2019). It is also sensitive to changes in task load and to factors that affect operator attention demonstrating construct validity (Endsley & Rodgers, 1998; Endsley & Smith, 1996; Fracker, 1990; Gronlund, Ohrt, Dougherty, Perry, & Manning, 1998; Gugerty, 1997). SAGAT has been found to have a high level of reliability in a number of studies (Endsley & Bolstad, 1994; Gugerty & Tirre, 1997). It was found not to be overly reliant on working memory, and a review of 11 studies that examined the potential intrusiveness of the freezes to collect SAGAT data found there to be no effect on performance, negating this concern (Endsley, 2019). An example of the use of SAGAT for evaluating the impact of new system concepts is provided in Endsley, Mogford, and Stein (1997). A new form of distributing roles and responsibilities between pilots and air traffic controllers was examined. Results showed a trend toward poorer controller performance
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS
Figure 9 SAGAT results. (Source: Endsley et al., 1997. © 1997 SAGE Publications.)
in detecting and intervening in aircraft separation errors under a “free flight” operational concept and poorer subjective ratings of performance. Finding statistically significant changes in separation errors during ATC simulation testing is quite rare, however. More detailed analysis of the SAGAT results provided more diagnostic detail as well as backing up this finding. As shown in Figure 9, controllers were aware of significantly fewer aircraft in the simulation under free-flight conditions. Attending to fewer aircraft under a higher workload has also been found in other studies (Endsley & Rodgers, 1998). In addition to reduced level 1 SA, however, controllers had a significantly reduced understanding (level 2 SA) of what was happening in the traffic situation, as evidenced by lower SA regarding which aircraft weather would affect, and a reduced awareness of those aircraft that were in a transitionary state. They were less aware of which aircraft had not yet completed a clearance, whether the instruction was received correctly, and whether they were conforming to the clearance. Controllers also demonstrated lower level 3 SA with free flight. Their knowledge of where the aircraft was going (i.e., the next sector) was significantly lower under free-flight conditions. SAGAT results showed not only that the new concept did induce problems for controller SA that would prevent them from performing effectively as monitors to back up pilots with separation assistance, but also in what ways these problems were manifested. This information was very useful diagnostically in that it provided a cue to the type of aid that was needed for controllers to assist them in overcoming these deficiencies. A new display that provides enhanced information on flight paths for aircraft in transitionary states was designed and evaluated in a follow-up study as a way of compensating for the lower SA observed which showed a three-fold improvement of SA on changes in aircraft state (Endsley, Sollenberger, Nakata, Hough, & Stein, 1999). This rich source of data is very useful in developing iterative design modifications to improve SA, and in making design trade-off decisions. SAGAT has similarly been used to evaluate new training programs. 8 CONCLUSION A firm theoretical foundation has been laid for understanding the factors that affect SA in complex environments. This foundation can be used to guide the development of training programs and
SITUATION AWARENESS
the development of system designs that go beyond data presentation to provide higher levels of SA. In either case, validation of the effectiveness of the proposed solutions through detailed, objective testing is paramount to ensure that the approach is actually successful in improving SA. The need to process and understand large volumes of data is critical for many endeavors, from the cockpit to military missions, from power plants to automobiles, and from space stations to health care operations. It is likely that the potential benefits of the information age will not be realized until system designs address the significant challenges of managing this dynamic information base to provide people with the SA they need on a real-time basis. Doing so is the primary challenge of technology development for the foreseeable future.
REFERENCES Amalberti, R., & Deblon, F. (1992). Cognitive modeling of fighter aircraft process control: A step towards an intelligent on-board assistance system. International Journal of Man-machine Systems, 36, 639–671. Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory and Cognition, 14(1), 33–53. Baddeley, A. D. (1972). Selective attention and performance in dangerous environments. British Journal of Psychology, 63, 537–546. Barber, P. J., & Folkard, S. (1972). Reaction time under stimulus uncertainty with response certainty. Journal of Experimental Psychology, 93, 138–142. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. London: Cambridge University Press. Bartlett, F. C. (1943). Fatigue following highly skilled work. Proceedings of the Royal Society (B) , 131, 147–257. Bell, H. H., & Waag, W. L. (1995). Using observer ratings to assess situational awareness in tactical air environments. In D. J. Garland & M. R. Endsley (Eds.), Experimental analysis and measurement of situation awareness (pp. 93–99). Daytona Beach, FL: Embry-Riddle Aeronautical University Press. Berggren, P., Prytz, E., Johansson, B., & Nählinder, S. (2011). The relationship between workload, teamwork, situation awareness, and performance in teams: a microworld study. In Proceedings of the Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 851–855). Los Angeles, CA: Sage. Biederman, I., Mezzanotte, R. J., Rabinowitz, J. C., Francolin, C. M., & Plude, D. (1981). Detecting the unexpected in photo interpretation. Human Factors, 23, 153–163. Bogossian, F., Cooper, S., Cant, R., Beauchamp, A., Porter, J., Kain, V., … Team, F. A. R. (2014). Undergraduate nursing students’ performance in recognising and responding to sudden patient deterioration in high psychological fidelity simulated environments: an Australian multi-centre study. Nurse Education Today, 34(5), 691–696. Bolstad, C. A., Cuevas, H. M., Costello, A. M., & Rousey, J. (2005). Improving situation awareness through cross-training. In Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 2159–2163). Santa Monica, CA: Human Factors and Ergonomics Society. Bolstad, C. A., & Endsley, M. R. (1999). Shared mental models and shared displays: An empirical evaluation of team performance. In Proceedings of the 43rd Annual Meeting of the Human Factors and Ergonomics Society (pp. 213–217). Santa Monica, CA: Human Factors and Ergonomics Society. Bolstad, C. A., & Endsley, M. R. (2005). Choosing team collaboration tools: Lessons learned from disaster recovery efforts. Ergonomics in Design, Fall, 7–13.
451 Bolstad, C. A., Endsley, M. R., Costello, A. M., & Howell, C. D. (2010). Evaluation of computer based situation awareness training for general aviation pilots. International Journal of Aviation Psychology, 20(3), 269–294. Bolstad, C. A., Endsley, M. R., Howell, C., & Costello, A. (2002). General aviation pilot training for situation awareness: An evaluation. In Proceedings of the 46th Annual Meeting of the Human Factors and Ergonomics Society (pp. 21–25). Santa Monica, CA: Human Factors and Ergonomics Society. Bolstad, C. A., Endsley, M. R., Howell, C. D., & Costello, A. M. (2003). The effect of time-sharing training on pilot situation awareness. In Proceedings of the 12th International Symposium on Aviation Psychology, Dayton, OH. Bolstad, C. A., Riley, J. M., Jones, D. G., & Endsley, M. R. (2002). Using goal directed task analysis with Army brigade officer teams. In Proceedings of the 46th Annual Meeting of the Human Factors and Ergonomics Society (pp. 472–476). Santa Monica, CA: Human Factors and Ergonomics Society. Bonney, L., Davis-Sramek, B., & Cadotte, E. R. (2016). Thinking” about business markets: A cognitive assessment of market awareness. Journal of Business Research, 69(8), 2641–2648. Borowsky, M. S., & Wall, R. (1983). Flight experience and naval aviation mishaps. Aviation, Space and Environmental Medicine, 54, 440–446. Braune, R. J., & Trollip, S. R. (1982). Towards an internal model in pilot training. Aviation, Space and Environmental Medicine, 53(October), 996–999. Broadbent, D. E. (1954). Some effects of noise on visual performance. Quarterly Journal of Experimental Psychology, 6, 1–5. Burrough, B. (1998). Dragonfly: NASA and the crisis aboard Mir. New York: HarperCollins. Casson, R. W. (1983). Schema in cognitive anthropology. Annual Review of Anthropology, 12, 429–462. Chang, A. L., Dym, A., Venegas-Borsellino, C., Bangar, M., Kazzi, M., Lisenenkov, D., … Keene, A. (2015). A comparison of simulation training versus classroom-based education in teaching situation awareness: randomized control study. Chest, 148(4), 461A. Chidester, T. R., Kanki, B. G., Foushee, H. C., Dickinson, C. L., & Bowles, S. V. (1990). Personality factors in flight operations: Vol. I. Leadership characteristics and crew performance in a full-mission air transport simulation (NASA Tech Memorandum No. 102259). Moffett Field, CA: NASA Ames Research Center. Citera, M., McNeese, M. D., Brown, C. E., Selvaraj, J. A., Zaff, B. S., & Whitaker, R. D. (1995). Fitting information systems to collaborating design teams. Journal of the American Society for Information Science, 46(7), 551–559. Collier, S. G., & Folleso, K. (1995). SACRI: A measure of situation awareness for nuclear power plant control rooms. In D. J. Garland & M. R. Endsley (Eds.), Experimental analysis and measurement of situation awareness (pp. 115–122). Daytona Beach, FL: Embry-Riddle University Press. Cooke, N. J., Kiekel, P. A., & Helm, E. E. (2001). Measuring team knowledge during skill acquisition of a complex task. International Journal of Cognitive Ergonomics, 5(3), 297–315. Coolen, E., Draaisma, J., & Loeffen, J. (2019). Measuring situation awareness and team effectiveness in pediatric acute care by using the situation global assessment technique. European Journal of Pediatrics, 1–14. Corbetta, M., & Schulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215. Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychological Bulletin, 104(2), 163–191. Crozier, M. S., Ting, H. Y., Boone, D. C., O’Regan, N. B., Bandrauk, N., Furey, A., … Hogan, M. P. (2015). Use of human patient simulation and validation of the Team Situation Awareness Global Assessment Technique (TSAGAT): A multidisciplinary
452 team assessment tool in trauma education. Journal of Surgical Education, 72(1), 156–163. Davis, E. T., Kramer, P., & Graham, N. (1983). Uncertainty about spatial frequency, spatial position, or contrast of visual patterns. Perception and Psychophysics, 5, 341–346. Downing, N., Bates, D. W., & Longhurst, C. A. (2018). Physician burnout in the electronic health record era: Are we ignoring the real cause? Annals of Internal Medicine, 169, 50–51. Dreyfus, S. E. (1981). Formal models vs. human situational understanding: Inherent limitations on the modeling of business expertise (ORC 81-3). Berkeley: Operations Research Center, University of California. Durso, F. T., & Gronlund, S. D. (1999). Situation awareness. In F. T. Durso, R. Nickerson, R. Schvaneveldt, S. Dumais, S. Lindsay, & M. Chi (Eds.), Handbook of applied cognition (pp. 284–314). New York: Wiley. Endsley, M. R. (1988). Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society 32nd Annual Meeting (pp. 97–101). Santa Monica, CA: Human Factors Society. Endsley, M. R. (1989). Pilot situation awareness: The challenge for the training community. In Proceedings of the Interservice/Industry Training Systems Conference (I/ITSC) (pp. 111–117). Ft Worth, TX: American Defense Preparedness Association. Endsley, M. R. (1990a). A methodology for the objective measurement of situation awareness. In Situational Awareness in Aerospace Operations (AGARD-CP-478) (pp. 1/1–1/9). Neuilly Sur Seine, France: NATO - AGARD. Endsley, M. R. (1990b). Situation awareness in dynamic human decision making: Theory and measurement. Los Angeles, CA: University of Southern California. Endsley, M. R. (1993). A survey of situation awareness requirements in air-to-air combat fighters. International Journal of Aviation Psychology, 3(2), 157–168. Endsley, M. R. (1995a). Measurement of situation awareness in dynamic systems. Human Factors, 37(1), 65–84. Endsley, M. R. (1995b). A taxonomy of situation awareness errors. In R. Fuller, N. Johnston, & N. McDonald (Eds.), Human factors in aviation operations (pp. 287–292). Aldershot: Avebury Aviation, Ashgate. Endsley, M. R. (1995c). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. Endsley, M. R. (1997). Communication and situation awareness in the aviation system. Paper presented at the Conference on Aviation Communication: A Multi-Cultural Forum, Prescott, AZ. Endsley, M. R. (2000a). Direct measurement of situation awareness: Validity and use of SAGAT. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement (pp. 147–174). Mahwah, NJ: LEA. Endsley, M. R. (2000b). Theoretical underpinnings of situation awareness: A critical review. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement (pp. 3–32). Mahwah, NJ: LEA. Endsley, M. R. (2004). Situation awareness: Progress and directions. In S. Banbury & S. Tremblay (Eds.), A cognitive approach to situation awareness: Theory, measurement and application (pp. 317–341). Aldershot: Ashgate Publishing. Endsley, M. R. (2015a). Situation awareness misconceptions and misunderstandings. Journal of Cognitive Engineering and Decision Making, 9(1), 4–32. Endsley, M. R. (2015b). Situation awareness: Operationally necessary and scientifically grounded. Cognition, Technology and Work, 17(2), 163–167. Endsley, M. R. (2017). From here to autonomy: Lessons learned from human-automation research. Human Factors, 59(1), 5–27. Endsley, M. R. (2018). Expertise and situation awareness. In K. A. Ericsson, R. R. Hoffman, A. Kozbelt, & A. M. Williams (Eds.),
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Cambridge handbook of expertise and expert performance (2nd ed., pp. 714–744). Cambridge: Cambridge University Press. Endsley, M. R. (2019). A systematic review and meta-analysis of direct, objective measures of SA: A comparison of SAGAT and SPAM. Human Factors. Endsley, M. R. (2020a). The divergence of objective and subjective situation awareness: A meta-analysis. Journal of Cognitive Engineering and Decision Making, 14(1), 34–53. Endsley, M. R. (2020b). Situation awareness in driving. In D. Fisher, W. J. Horrey, J. D. Lee & M. Regan (Eds.), Handbook of human factors for automated, connected and intelligent vehicles. London: Taylor and Francis. Endsley, M. R. (2020c). Situation awareness measurement: A guide for assessing situation awareness in the evaluation of systems designs, training programs and construct research. Washington, DC: Human Factors and Ergonomics Society. Endsley, M. R., & Bolstad, C. A. (1994). Individual differences in pilot situation awareness. International Journal of Aviation Psychology, 4(3), 241–264. Endsley, M. R., Bolte, B., & Jones, D. G. (2003). Designing for situation awareness: An approach to human-centered design. London: Taylor & Francis. Endsley, M. R., Farley, T. C., Jones, W. M., Midkiff, A. H., & Hansman, R. J. (1998). Situation awareness information requirements for commercial airline pilots (ICAT-98-1). Cambridge, MA: Massachusetts Institute of Technology International Center for Air Transportation. Endsley, M. R., & Garland, D. J. (2000a). Pilot situation awareness training in general aviation. In Proceedings of the 14th Triennial Congress of the International Ergonomics Association and the 44th Annual Meeting of the Human Factors and Ergonomics Society (pp. 357–360). Santa Monica, CA: Endsley, M. R., & Garland, D. J. (Eds.). (2000b). Situation awareness analysis and measurement. Mahwah, NJ: Lawrence Erlbaum. Endsley, M. R., & Jones, D. G. (1995). Situation awareness requirements analysis for TRACON air traffic control (TTU-IE-95-01). Lubbock, TX: Texas Tech University. Endsley, M. R., & Jones, D. G. (2012). Designing for situation awareness: An approach to human-centered design (2nd ed.). London: Taylor & Francis. Endsley, M. R., & Jones, W. M. (2001). A model of inter- and intrateam situation awareness: Implications for design, training and measurement. In M. McNeese, E. Salas & M. Endsley (Eds.), New trends in cooperative activities: Understanding system dynamics in complex environments (pp. 46–67). Santa Monica, CA: Human Factors and Ergonomics Society. Endsley, M. R., & Kiris, E. O. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), 381–394. Endsley, M. R., Mogford, R. H., & Stein, E. S. (1997). Controller situation awareness in free flight. In Proceedings of the Human Factors and Ergonomics Society 41st Annual Meeting (pp. 4–8). Santa Monica, CA: Human Factors and Ergonomics Society. Endsley, M. R., Riley, J. M., & Strater, L. D. (2009). Leveraging embedded training systems to build higher level cognitive skills in warfighters. In Proceedings of the NATO Conference on Human Dimensions in Embedded Virtual Simulation Orlando, FL: NATO. Endsley, M. R., & Robertson, M. M. (1996). Team situation awareness in aviation maintenance. In Proceedings of the 40th Annual Meeting of the Human Factors and Ergonomics Society (pp. 1077–1081). Santa Monica, CA: Human Factors and Ergonomics Society. Endsley, M. R., & Robertson, M. M. (2000). Situation awareness in aircraft maintenance teams. International Journal of Industrial Ergonomics, 26, 301–325. Endsley, M. R., & Rodgers, M. D. (1994a). Situation awareness information requirements for en route air traffic control. In Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting
SITUATION AWARENESS (pp. 71–75). Santa Monica, CA: Human Factors and Ergonomics Society. Endsley, M. R., & Rodgers, M. D. (1994b). Situation awareness information requirements for en route air traffic control (DOT/FAA/AM-94/27). Washington, DC: Federal Aviation Administration Office of Aviation Medicine. Endsley, M. R., & Rodgers, M. D. (1998). Distribution of attention, situation awareness, and workload in a passive air traffic control task: Implications for operational errors and automation. Air Traffic Control Quarterly, 6(1), 21–44. Endsley, M. R., Selcon, S. J., Hardiman, T. D., & Croft, D. G. (1998). A comparative evaluation of SAGAT and SART for evaluations of situation awareness. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 82–86). Santa Monica, CA: Human Factors and Ergonomics Society. Endsley, M. R., & Smith, R. P. (1996). Attention distribution and decision making in tactical air combat. Human Factors, 38(2), 232–249. Endsley, M. R., Sollenberger, R., Nakata, A., Hough, D., & Stein, E. (1999). Situation awareness in air traffic control: Enhanced displays for advanced operations Atlantic City, NJ: Federal Aviation Administration William J. Hughes Technical Center. Farley, T. C., Hansman, R. J., Amonlirdviman, K., & Endsley, M. R. (2000). Shared information between pilots and controllers in tactical air traffic control. Journal of Guidance, Control and Dynamics, 23(5), 826–836. Farrell, L. J., Du, S., Steege, L. M., Cartmill, R. S., Wiegmann, D. A., Wetterneck, T. B., … Endsley, M. R. (2017). Understanding cognitive requirements for EHR design for primary care teams. In Proceedings of the Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care (pp. 15–16). Los Angeles: Sage. Fracker, M. L. (1989). Attention allocation in situation awareness. In Proceedings of the Human Factors Society 33rd Annual Meeting (pp. 1396–1400). Santa Monica, CA: Human Factors Society. Fracker, M. L. (1990). Attention gradients in situation awareness. Situational Awareness in Aerospace Operations (AGARD-CP-478) (Conference Proceedings #478, pp. 6/1-6/10). Neuilly Sur Seine, France: NATO - AGARD. Gardner, A. K., Kosemund, M., & Martinez, J. (2017). Examining the feasibility and predictive validity of the SAGAT tool to assess situation awareness among medical trainees. Simulation in Healthcare, 12(1), 17–21. Gonzalez, C., & Wimisberg, J. (2007). Situation awareness in dynamic decision making: Effects of practice and working memory. Journal of Cognitive Engineering and Decision Making, 1(1), 56–74. Gorman, J. C., Cooke, N. J., & Amazeen, P. G. (2010). Training adaptive teams. Human Factors, 52(2), 295–307. Gronlund, S. D., Ohrt, D. D., Dougherty, M. R. P., Perry, J. L., & Manning, C. A. (1998). Role of memory in air traffic control. Journal of Experimental Psychology: Applied, 4, 263–280. Gross, N., & Kluge, A. (2014). Predictors of knowledge-sharing behavior for teams in extreme environments: An example from the steel industry. Journal of Cognitive Engineering and Decision Making, 8(4), 352–373. Gugerty, L. J. (1997). Situation awareness during driving: Explicit and implicit knowledge in dynamic spatial memory. Journal of Experimental Psychology: Applied, 3, 42–66. Gugerty, L. J., & Tirre, W. (1997). Situation awareness: Aa validation study and investigation of individual differences. In Proceedings of the Human Factors and Ergonomics Society 40th Annual Meeting (pp. 564–568). Santa Monica, CA: Gutzwiller, R. S., & Clegg, B. A. (2012). The role of working memory in levels of situation awareness. Journal of Cognitive Engineering and Decision Making, 46. Hallbert, B. P. (1997). Situation awareness and operator performance: results from simulator-based studies. In Proceedings of the IEEE
453 Sixth Conference on Human Factors and Power Plants (pp. 18/11–18/16). New York: IEEE. Hamid, H., & Waterson, P. (2010). Using goal-directed task analysis to identify situation awareness requirements of advanced paramedics. In Proceedings of Advances in Human Factors and Healthcare, 1–4. Hänsel, M., Winkelmann, A., Hardt, F., Gijselaers, W., Hacker, W., Stiehl, M., … Müller, M. (2012). Impact of simulator training and crew resource management training on final-year medical students’ performance in sepsis resuscitation: a randomized trial. Minerva Anestesiologica, 78(8), 901. Hinsley, D., Hayes, J. R., & Simon, H. A. (1977). From words to equations, meaning and representation in algebra word problems. In P. Carpenter & M. Just (Eds.), Cognitive processes in comprehension (pp. 89–106). Hillsdale, NJ: Erlbaum. Hirokawa, R. Y. (1983). Group communication and problem solving effectiveness: An investigation of group phases. Human Communication Research, 9, 291–305. Hogg, D. N., Torralba, B., & Volden, F. S. (1993). A situation awareness methodology for the evaluation of process control systems: Studies of feasibility and the implication of use (1993-03-05). Storefjell, Norway: OECD Halden Reactor Project. Holland, J. H., Holyoak, K. F., Nisbett, R. E., & Thagard, P. R. (1986). Induction: processes of inference, learning and discovery. Cambridge, MA: MIT Press. Hormann, H. J., Blokzijl, C., & Polo, L. (2004). ESSAI - A European training solution for enhancing situation awareness and threat management on modern aircraft flight decks. Paper presented at the 16th Annual European Aviation Safety Seminar of the Flight Safety Foundation and European Regions Airline Association, Barcelona, Spain. Horswill, M. S., & McKenna, F. P. (2004). Drivers hazard perception ability: Situation awareness on the road. In S. Banbury & S. Tremblay (Eds.), A cognitive approach to situation awareness: Theory, measurement and application (pp. 155–174). Aldershot: Ashgate Publishing. Humphrey, C. M., & Adams, J. A. (2011). Analysis of complex team-based systems: augmentations to goal-directed task analysis and cognitive work analysis. Theoretical Issues in Ergonomics Science, 12(2), 149–175. Jones, D. G., & Endsley, M. R. (2000). Overcoming representational errors in complex environments. Human Factors, 42(3), 367–378. Jones, R. A. (1977). Self-fulfilling prophecies: Social, psychological and physiological effects of expectancies. Hillsdale, NJ: Lawrence Erlbaum. Jones, W. M. (1997). Enhancing team situation awareness: Aiding pilots in forming initial mental models of team members. In Proceedings of the Ninth International Symposium on Aviation Psychology (pp. 1436–1441). Columbus, OH: The Ohio State University. Kaber, D., Riley, J. M., Endsley, M. R., Sheik-Nainar, M. A., Zhang, T., & Lampton, D. R. (2013). Measuring situation awareness in virtual environment based training. Military Psychology, 25(4), 330–344. Kaber, D. B., Riley, J. M., Lampton, D., & Endsley, M. R. (2005). Measuring situation awareness in a virtual urban environment for dismounted infantry training. In Proceedings of the 11th International Conference on HCI. Mahwah, NJ: Lawrence Erlbaum. Kass, S. J., VanWormer, L. A., Mikulas, W. L., Legan, S., & Bumgarner, D. (2011). Effects of mindfulness training on simulated driving: Preliminary results. Mindfulness, 2(4), 236–241. Klein, G. A. (1989). Recognition-primed decisions. In W. B. Rouse (Ed.), Advances in man-machine systems research (Vol. 5, pp. 47–92). Greenwich, CT: JAI Press, Inc. Klein, G. A., Zsambok, C. E., & Thordsen, M. L. (1993). Team decision training: Five myths and a model. Military Review, April, 36–42. Mann, D. T., Williams, A. M., Ward, P., & Janelle, C. M. (2007). Perceptual-cognitive expertise in sport: A meta-analysis. Journal of Sport and Expertise Psychology, 29(4), 457.
454 Matthews, M. D., Strater, L. D., & Endsley, M. R. (2004). Situation awareness requirements for infantry platoon leaders. Military Psychology, 16(3), 149–161. Mayer, R. E. (1983). Thinking, problem solving, cognition. New York: W. H. Freeman and Co. McKenna, F., & Crick, J. L. (1991). Hazard perception in drivers: A methodology for testing and training. Reading: University of Reading, Transport and Road Research Laboratory. McKenna, F., & Crick, J. L. (1994). Developments in hazard perception. London: Department of Transport. Melnick, E. R., Dyrbye, L. N., Sinsky, C. A., Trocke, M., West, C. P., Nedelec, L., … Shanafelt, T. (2019). The association between perceived electronic health record usability and professional burnout among US physicians. Mayo Clinic Proceedings. https://doi.org/ 10.1016/j.mayocp.2019.09.024, 1-12. Mosier, K. L., & Chidester, T. R. (1991). Situation assessment and situation awareness in a team setting. In Y. Queinnec & F. Daniellou (Eds.), Designing for everyone (pp. 798–800). London: Taylor & Francis. Mumaw, R. J., Roth, E. M., & Schoenfeld, I. (1993). Analysis of complexity in nuclear power severe accidents management. In Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting (pp. 377–381). Santa Monica, CA: Human Factors and Ergonomics Society. National Transportation Safety Board. (1994). A review of flight crews involved in major accidents of U.S. air carriers 1978–1990. Washington, DC: NTSB. Neisser, U. (1967). Cognitive psychology. New York: Appleton-Century, Crofts. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259. O’Hare, D. (1997). Cognitive ability determinants of elite pilot performance. Human Factors, 39(4), 540–552. Onnasch, L., Wickens, C. D., Li, H., & Manzey, D. (2014). Human performance consequences of stages and levels of automation: An integrated meta-analysis. Human Factors, 56(3), 476–488. Orasanu, J. (1990, July). Shared mental models and crew decision making. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Cambridge, MA. Orasanu, J., & Salas, E. (1993). Team decision making in complex environments. In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. Zsambok (Eds.), Decision making in action: Models and methods (pp. 327–345). Norwood, NJ: Ablex. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2008). Situation awareness, mental workload and trust in automation: viable empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making, 2(2), 140–160. Parmet, Y., Borowsky, A., Yona, O., & Oron-Gilad, T. (2015). Driving speed of young novice and experienced drivers in simulated hazard anticipation scenes. Human Factors, 57(2), 311–328. Parush, A., G. Mastoras, A. Bhandari, Kathryn Momtahan, Kathy Day, B. Weitzman, Benjamin Sohmer, A. Cwinn, S. J. Hamstra, and L. Calder. (2017). Can teamwork and situational awareness (SA) in ED resuscitations be improved with a technological cognitive aid? Design and a pilot study of a team situation display. Journal of Biomedical Informatics, 76, 154–161. Pew, R. W. (1995). The state of situation awareness measurement: Circa 1995. In M. R. Endsley & D. J. Garland (Eds.), Experimental analysis and measurement of situation awareness (pp. 7–16). Daytona Beach, FL: Embry-Riddle Aeronautical University. Posner, M. I., Nissen, J. M., & Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick & E. J. Saltzman (Eds.), Modes of perceiving and processing (pp. 137–157). Hillsdale, NJ: Erlbaum Associates. Prince, C. (1998). Guidelines for situation awareness training. Orlando, FL: Naval Air Warfare Center Training Systems Division.
DESIGN OF EQUIPMENT, TASKS, JOBS, AND ENVIRONMENTS Prince, C., Ellis, E., Brannick, M. T., & Salas, E. (2007). Measurement of team situation awareness in low experience level aviators. The International Journal of Aviation Psychology, 17(1), 41–57. Prince, C., & Salas, E. (1998). Situation assessment for routine flight and decision making. International Journal of Cognitive Ergonomics, 1(4), 315–324. Robinson, D. (2000). The development of flight crew situation awareness in commercial transport aircraft. In Proceedings of the Human Performance, Situation Awareness and Automation: User-Centered Design for a New Millennium Conference (pp. 88–93). Marietta, GA: SA Technologies, Inc. Rodgers, M. D., Mogford, R. H., & Strauch, B. (2000). Post-hoc assessment of situation awareness in air traffic control incidents and major aircraft accidents. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement. Mahwah, NJ: Lawrence Erlbaum. Rosenman, E. D., Dixon, A. J., Webb, J. M., Brolliar, S., Golden, S. J., Jones, K. A., … Chao, G. T. (2018). A simulation-based approach to measuring team situational awareness in emergency medicine: a multicenter, observational study. Academic Emergency Medicine, 25(2), 196–204. Rouse, W. B., & Morris, N. M. (1985). On looking into the black box: Prospects and limits in the search for mental models (DTIC #AD-A159080). Atlanta, GA: Center for Man-Machine Systems Research, Georgia Institute of Technology. Sarter, N. B., & Woods, D. D. (1995). “How in the world did I ever get into that mode”: Mode error and awareness in supervisory control. Human Factors, 37(1), 5–19. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding. Hillsdale, NJ: Erlbaum. Schulz, C. M., Krautheim, V., Hackemann, A., Kreuzer, M., Kochs, E. F., & Wagner, K. J. (2016). Situation awareness errors in anesthesia and critical care in 200 cases of a critical incident reporting system. BMC Anesthesiology, 16(4), 1–10. Secrist, G. E., & Hartman, B. O. (1993). Situational awareness: The trainability of near-threshold information acquisition dimension. Aviation, Space and Environmental Medicine, 64, 885–892. Segal, L. D. (1994). Actions speak louder than words. In Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting (pp. 21–25). Santa Monica, CA: Human Factors and Ergonomics Society. Sohn, Y. W., & Doane, S. M. (2004). Memory processes of flight situation awareness: Interactive roles of working memory capacity, long-term working memory and expertise. Human Factors, 46(3), 461–475. Soliman, A. M., & Mathna, E. K. (2009). Metacognitive strategy training improves driving situation awareness. Social Behavior and Personality, 37(9), 1161–1170. Stein, E. S. (1992). Air traffic control visual scanning (DOT/FAA/CT-TN92/16). Atlantic City International Airport, NJ: Federal Aviation Administration William J. Hughes Technical Center. Stout, R. J., Cannon-Bowers, J. A., & Salas, E. (2017). The role of shared mental models in developing team situational awareness: Implications for training. In Situational awareness (pp. 287–318). New York: Routledge. Stout, R. J., Cannon-Bowers, J. A., Salas, E., & Milanovich, D. M. (1999). Planning, shared mental models, and coordinated performance: An empirical link is established. Human Factors, 41(1), 61–71. Strater, L. D., Endsley, M. R., Pleban, R. J., & Matthews, M. D. (2001). Measures of platoon leader situation awareness in virtual decision making exercises (Research Report 1770). Alexandria, VA: Army Research Institute. Strater, L. D., Jones, D., & Endsley, M. R. (2003). Improving SA: Training challenges for infantry platoon leaders. In Proceedings of the 47th Annual Meeting of the Human Factors and Ergonomics
SITUATION AWARENESS Society (pp. 2045–2049). Santa Monica, CA: Human Factors and Ergonomics Society. Strater, L. D., Reynolds, J. P., Faulkner, L. A., Birch, K., Hyatt, J., Swetnam, S., & Endsley, M. R. (2004). PC-based tools to improve infantry situation awareness. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 668–672). Santa Monica, CA: Human Factors and Ergonomics Society. Sulistyawati, K., Chui, Y. P., & Wickens, C. D. (2008). Multi-method approach to team situation awareness. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 463–467). Los Angeles, CA: Sage. Sulistayawati, K., Wickens, C. D., & Chui, Y. P. (2011). Prediction in situation awareness: Confidence bias and underlying cognitive abilities. International Journal of Aviation Psychology, 21(2), 153–174. Taylor, R. M., Endsley, M. R., & Henderson, S. (1996). Situational awareness workshop report. In B. J. Hayward & A. R. Lowe (Eds.), Applied aviation psychology: Achievement, change and challenge (pp. 447–454). Aldershot: Ashgate Publishing Ltd.
455 Treisman, A., & Paterson, R. (1984). Emergent features, attention and object perception. Journal of Experimental Psychology: Human Perception and Performance, 10(1), 12–31. Underwood, G., Chapman, P., Bowden, K., & Crundall, D. (2002). Visual search while driving: Skill and awareness during inspection of the scene. Transportation Research, Part F, (5), 87–97. Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). New York: HarperCollins. Wickens, C. D. (2008). Situation awareness: Review of Mica Endsley’s 1995 articles on situation awareness theory and measurement. Human Factors, 50(3), 397–403. Xiao, Y., Mackenzie, C. F., & Patey, R. (1998). Team coordination and breakdowns in a real-life stressful environment. In Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting (pp. 186–190). Santa Monica, CA: Human Factors and Ergonomics Society. Yu, C. S., Wang, E. M., Li, W. C., & Braithwaite, G. (2014). Pilots’ visual scan patterns and situation awareness in flight operations. Aviation, Space and Environmental Medicine, 85(7), 798–714.
PART 4
DESIGN FOR HEALTH, SAFETY, AND COMFORT
CHAPTER
18
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE John G. Casali Virginia Polytechnic Institute and State University & Hearing, Ergonomics & Acoustics, (H.E.A.R.) LLC Blacksburg, Virginia
1
INTRODUCTION
459
2
SOUND AND NOISE
459
2.1
Fundamental Parameters
459
2.2
Physical Quantification: Sound Levels and the Decibel Scale
460
Basic Computations with Decibels
461
2.3 3
4
5
1
MEASUREMENT AND QUANTIFICATION OF SOUND AND NOISE EXPOSURES
6
462
3.1
Basic Instrumentation
462
3.2
Sound and Noise Metrics
467
7
5.2
Types and Etiologies of Noise-Induced Hearing Loss
476
5.3
Concomitant Auditory Injuries
477
PERFORMANCE, NONAUDITORY, AND PERCEPTUAL EFFECTS OF NOISE
478
6.1
Performance and Nonauditory Health Effects of Noise
478
6.2
Annoyance Effects of Noise
478
6.3
Loudness and Related Scales of Measurement
478
SIGNAL AUDIBILITY AND SPEECH COMMUNICATIONS IN NOISE
480
7.1
General Concepts in Signal and Speech Audibility
480
7.2
Analysis of and Design for Signal Audibility in Noise
480
INDUSTRIAL NOISE REGULATION AND ABATEMENT
469
4.1
Need for Attention to Noise
469
7.3
Analysis of Speech Intelligibility in Noise
484
4.2
OSHA (and Other) Noise Exposure Regulated Limits 469
7.4
4.3
Hearing Conservation Programs
469
Other Considerations for Signal Audibility and Speech Intelligibility
487
4.4
Engineering Noise Control
472
Summary for Improving Audibility and Reducing Effects of Noise on Signals and Speech
489
AUDITORY EFFECTS OF NOISE
475
5.1
475
Hearing Loss in the United States
INTRODUCTION
Sound along with its subset, noise, which is often defined as unwanted sound, is a phenomenon that confronts human factors professionals in many applications, including product design, human interfaces of many types, and environmental situations. A few examples are: (1) an auditory warning signal, for which the proper sound parameters must be selected for maximizing detection, identification, and localization; (2) a situation wherein the speech communication that is critical between operators is compromised in its intelligibility by environmental noise, and therefore redesign of the communications system and/or acoustic environment is needed; (3) a residential community is intruded upon by the noise from vehicular traffic or a nearby industrial plant, causing annoyance and sleep arousal and necessitating abatement; (4) an in-vehicle auditory display that warns of impending collision must convey urgency and positional cues; (5) a worker is exposed to hazardous noise on the job, and to prevent hearing loss, an appropriate hearing protection device (HPD) must be selected; and (6) a soldier’s ears must be protected from exposure to gunfire with an HPD, but at the same time, he or she must be able to detect, identify, and localize enemy threat-related sounds. To deal effectively with examples of these types, the human factors engineer must understand the basics of sound, instrumentation, and techniques for its measurement and quantification, analyses of acoustic measurements for ascertaining the audibility of signals and speech as well as the risks to hearing, and countermeasures to
7.5
REFERENCES
491
combat the deleterious effects of noise. In this chapter these and related matters are addressed from a human factors engineering perspective while several important noise-related standards and regulations are also covered. At the outset it should be noted that the science of acoustics and the study of sound and noise within it are very broad and comprise a vast body of research and standards literature. Thus, as the subject of a single chapter, this topic cannot be covered in great depth herein. It is therefore an intent of this chapter to introduce several major topics concerning sound/noise, particularly as it impacts humans, and to point the reader to other publications for detail on specific topics. In an overall sense, spanning the science surrounding sound/noise as a whole, the following texts are recommended: Meinke, Berger, Driscoll, Neitzel & Bright (2020), Kryter (1994), Crocker (1998), and Harris (1991).
2 SOUND AND NOISE Most aspects of acoustics rely on accurate quantification and evaluation of the sound itself; therefore, a basic understanding of sound parameters and sound measurement is needed before delving into application-oriented issues. 2.1 Fundamental Parameters Sound is a disturbance in a medium (most commonly air or a conductive structure such as a floor or wall in industry, home, or 459
460
DESIGN FOR HEALTH, SAFETY, AND COMFORT
recreational settings) that has mass and elasticity. For example, an exhaust fan on the roof of an industrial plant has blades that rotate in the air, creating noise which may propagate into the surrounding community. Because the blades are coupled to the air medium, they produce pressure waves that consist of alternating compressions (above ambient air pressure) and rarefactions (below ambient pressure) of air molecules, the frequency (f) of which is the number of above/below ambient pressure cycles per second, or hertz (Hz). The reciprocal of frequency, 1/f, is the period of the waveform. The waveform propagates outward from the fan as long as it continues to rotate, and the disturbance in air pressure that occurs in relation to ambient air pressure produces sound, heard in this case “fan roar.” The linear distance traversed by the sound wave in one complete cycle of vibration is the wavelength: λ = c∕f (1) Thus, in Eq (1), wavelength (𝜆 in meters [m] or feet [ft]) depends on the sound frequency (f in hertz [Hz]) and velocity (c in meters per second [m/s] or feet per second [ft/s]). Velocity depends principally upon the medium itself, being much faster in water than in air, for instance. This chapter primarily focuses on sound phenomena in air, and the temperature of the air, as well as the barometric pressure and other parameters, affect the velocity. Standard values of velocity in air at 68∘ F and pressure of 1 atmosphere (atm) are about 343 m/s or 1126 ft/s, corresponding to 767 mph or a Mach value of 1.0. The speed of sound as influenced by the temperature of air increases about 1.1 ft/s for each increase of 1∘ F (Ostergaard, 2003). Obviously, phenomena that release both light and sound energy, such as fireworks displays, are seen visually well in advance of being heard, given that the speed of light is 186,282 miles per second (mps) versus the speed of sound at 0.213 mps. Vibrations are oscillations in solid media and are often associated with the production of sound waves that also can couple to and propagate via air. Noise can be loosely defined as a subset of sound; that is, noise is sound that is undesirable or offensive in some aspect. However, the distinction is largely situation- and listener-specific, as perhaps best stated in the old adage “one person’s music is another’s noise.” Unlike some common ergonomics-related stressors such as repetitive motions or awkward lifting maneuvers, noise is a physical stimulus that impinges upon the body and it is more readily measurable and quantifiable using transducers (microphones) and instrumentation (sound level meters [SLMs] and their variants) that are commercially available. Aural exposure to noise and the damage potential therefrom are functions of the total energy transmitted to the ear. In other words, the energy is equivalent to the product of the noise intensity and duration of the exposure. Several metrics that relate to the energy of the noise exposure have been developed, most with an eye toward accurately reflecting the exposures that occur in industrial or community settings. These metrics are covered in Section 3.2, but, first, the most basic unit of measurement must be understood, and that is the decibel. 2.2 Physical Quantification: Sound Levels and the Decibel Scale The unit of decibel, one-tenth of a bel, is the most common metric applied to the quantification of noise amplitude. The decibel (dB) is a measure of level, defined as the logarithm of the ratio of a quantity to a reference quantity of the same type. In acoustics, it is applied to sound level, of which there are three types. Sound power level, the most fundamental quantity, is typically expressed in decibels and is defined as: Sound power level (dB) = 10 log10 Pw1 ∕Pwr
(2)
where Pw1 is the acoustic power of the sound in Watts (W) and Pwr is the acoustic power of a reference sound in W, usually taken to be the acoustic power at hearing threshold for a young, healthy ear at the frequency of maximum sensitivity, the quantity 10−12 W. Sound intensity level, following from power level, is typically expressed in decibels and is defined as: Sound intensity level (dB) = 10 log10 I1 ∕Ir
(3)
where I1 is the acoustic intensity of the sound in Watts per square meter (W/m2 ) and Ir is the acoustic intensity of a reference sound in W/m2 , usually taken to be the acoustic intensity at hearing threshold, or the quantity 10−12 W/m2 . Within the last decade, sound measurement instruments to measure sound intensity level have become commonplace. Sound power level, by contrast and comprising the total acoustic power radiated by a source, is not directly measured but can be computed from empirical measures of sound intensity level or sound pressure level (Ostergaard, 2003). The calculation of sound power level from sound pressure level is covered in various consensus standards that depend upon the noise source type and sound field characteristics, e.g., ISO 3747-2010 for reverberant field (International Standards Organization [ISO], 2010). Sound power level is comprised of the acoustic power emitted by a source in all directions, and is dependent only on the source, independent of the environment. On the other hand, sound pressure level reflects the sound level at a given measurement position, and is dependent upon all factors impinging at that position, such as distance from the source and reflection or absorption in the environment. Sound pressure level is thus directly measurable by using relatively straightforward instruments and is by far the most common metric used in practice. Sound pressure level (SPL), abbreviated in formulas as LP , is also typically expressed in decibels. Since power is directly proportional to the square of the pressure, SPL is defined as: Sound pressure level (SPL or LP ; dB) = 10 log10 P1 2 ∕Pr 2 = 20 log10 P1 ∕Pr (4) where P1 is the pressure level of the sound in micropascals (μPa) (or pressure units can be specified) and Pr is the pressure level of a reference sound in μPa, usually taken to be the pressure at hearing threshold, or the quantity 20 μPa, or 0.00002 Pa. Other equivalent reference quantities are 0.0002 dyne/cm2 and 20 μbars. The application of the decibel scale to acoustic measurements yields a convenient means of collapsing the vast range of sound pressures which would be required to accommodate sounds that can be encountered into a more manageable, compact range. As shown in Figure 1, using the logarithmic compression produced by the decibel scale, the range of typical sounds from human hearing threshold to the threshold of tactile “feeling” is about 120 dB, while the linear pressure scale applied to the same range of sounds encompasses a vast range of 1,000,000 Pa. Of course, sounds do occur that are higher than 120 dB, such as most gunfire impulses, as per Flamme and Murphy (2020), or lower than 0 dB, for example, below normal threshold on an audiometer. A direct comparison of decibel values of example sounds to their pressure values (in Pa) is also depicted in Figure 1. In considering changes in sound level measured in decibels, a few numerical relationships emanating from the decibel formulas above are often helpful in practice. An increase (decrease) in sound pressure level (SPL) by 6 dB is equivalent to a doubling (halving) of the sound pressure. Similarly, on the power or intensity scales, an increase (decrease) of 3 dB is equivalent to a doubling (halving) of the sound power or intensity. This latter relationship gives rise to what is known as the equal-energy
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
Sound Pressure Level (dB) re 20 μPa (logarithmic scale) dB Range: Factor of 120 dB Jackhammer, operator’s position 120 Chainsaw, cutting, operator’s position 110 Drag race car, unmuffled, 100 ft 100 Power mower, muffied, operator position 90 Electric razor, at ear 80
461
Sound Pressure (Pa) (linear scale) Range: Factor of 1,000,000 Pa
Pa 20 10 5
DANGER
2 1 0.5 RISK
0.2 0.1
Vacuum cleaner, operator’s position 70 Conversation, 3 ft apart 60 Computer fan, operator’s position 50
ANNOYANCE
0.05 0.02 0.01 0.005
Quiet bedroom 40 Recording studio 30
0.002 0.001 0.0005
Very soft whisper, at ear 20 Wristwatch, ticking, at arm’s length 10 Threshold of hearing for healthy ear 0 Figure 1
0.0001 0.00005 0.00002
Sound pressure level in decibels and sound pressure in Pascals for typical sounds.
rule or trading relationship. Because sound represents energy which is itself a product of intensity and duration, a sound that increases (decreases) by 3 dB is equivalent in total energy to the same sound that does not change in decibel value but increases (decreases) in its duration by twice (half). 2.3 Basic Computations with Decibels There are many practical instances in which it is helpful to predict the combined result of several individual sound sources that have been measured separately in decibels. This can be performed for random, uncorrelated sound sources using Eq (5): LP1 ∕10
LPcombo (dB) = 10 log10 (10
0.0002
LP2 ∕10
+ 10
LPn ∕10
+ … + 10
) (5) and it applies for any decibel weighting (dBA, dBC, etc., as explained later) or for any bandwidth (such as one-third octave, full octave, etc.). For example, suppose that an industrial plant currently exposes workers in a work area to a time-weighted average (TWA) of 83.0 dBA, which is below the Occupational Safety and Health Administration (OSHA, 1983) “action level” (85.0 dBA) at which a hearing conservation program would be required by law (as discussed later). Two new pieces of equipment are proposed for purchase and installation in this area: a new single-speed conveyor that has a constant noise output of 78.0 dBA and a new compressor that has a constant output of 82.5 dBA. The combined sound level will be approximately: LPcombo (dB) = 10 log10 (1083.0∕10 + 1078.0∕10 + 1082.5∕10 ) = 86.4 dBA
Thus, by purchasing this conveyor, the plant would move from a noise exposure level (83.0 dBA) that is in compliance with (i.e., below) OSHA (1983) action level limits to one that is not (86.4 dBA). This is one illustration why industries should adhere to a “buy quiet” policy, so that noise exposure problems are not created unknowingly when making equipment purchases. Subtraction of decibels works in the same manner as addition: LPdifference (dB) = 10 log10 (10LP1 ∕10 − 10LP2 ∕10 )
(6)
Using the example above, if the compressor were eliminated from the situation, the overall combined noise level would be the combination of the three sources as computed to be 86.4 dBA, reduced by the absence of the compressor at 82.5 dBA: LPdifference (dB) = 10 log10 (1086.4∕10 − 1082.5∕10 ) = 84.1 dBA With this result, the plant area noise level moves back into OSHA compliance under the action level of 85.0 dBA, but just by 0.9 dBA. To err on the safe side, especially to accommodate the potential of any upward fluctuations in noise level, this plant’s management should still look to reduce the noise further or consider installing a hearing conservation program that would legally be required at the 85 dBA TWA action level. There are a few rules of thumb that arise from the computations shown above. One is that when two sound sources are approximately equivalent in SPL, the combination of the two will be about 3 dB larger than the decibel level of the higher source. Another is that as the difference between two sounds
462
exceeds about 13 dB, the contribution of the lower level sound to the combined sound level is negligible (i.e., about 0.2 dB). In relation to this, when it is desirable to measure a sound of interest in isolation but it cannot be physically separated from a background noise, the question becomes: To what extent is the background noise influencing the accuracy of the measurement? In many cases, such as in some manufacturing plants, the background noise cannot be turned off but the sound of interest can. If this is the case, then the sound of interest is measured in the background noise, and then the background noise is measured alone. If the background noise measurement differs from the combined measurement by more than 13 dB, then it has not influenced the measurement of the sound of interest in a significant manner. If the difference is smaller, then Eq (6) can be applied to correct the measurement, effectively by removing the background noise’s contribution. Some standards simply use a difference of 10 dB as a guideline for when to apply the background noise correction, while others require more precision. For instance, MIL-STD 1474E, Section 4.11.1 (U.S. Department of Defense, 2015), specifies dB correction values that must be applied when the differences between a measured level and a background level range from as small as 5 dB to as large as 15 dB. Finally, it is important to recognize that due to the limits in precision and reliability of decibel measurements, for the applications discussed in this chapter (and most others in acoustics as well), it is unnecessary to record decibel calculations that result from the formulas herein to greater than one decimal point, and it is usually sufficient to round final results to the nearest 0.5 dB or even to integer values. However, to avoid interim rounding error, it is important to carry the significant figures through each step of the formulas until the end result is obtained (Ostergaard, 2003). 3 MEASUREMENT AND QUANTIFICATION OF SOUND AND NOISE EXPOSURES 3.1 Basic Instrumentation Measurement and quantification of sound levels and noise exposure levels provide the fundamental data for assessing hearing exposure risk, speech and signal-masking effects, hearing conservation program needs, and engineering noise control strategies. A vast array of instrumentation is available; however, for most of the aforementioned applications, a basic understanding of three primary instruments (SLMs, dosimeters, and real-time spectrum analyzers) and their data output will suffice. In instances where the noise is highly impulsive in nature, such as gunfire, more specialized instruments are necessary, including the use of blast microphones and acoustic “manikins” or other test fixtures (Flamme & Murphy, 2020). Because sound is propagated as pressure waves that vary over space and in time, a complete acoustic record of a noise exposure or a sound event that has a prolonged duration requires simultaneous measurements at all points of interest in the sound field. This measurement should occur over a representative, continuous time period to document the noise level exhaustively in the space. Obviously, this is typically costand time-prohibitive, so one must resort to sampling strategies to establish the observation points and intervals. The analyst must also decide whether detailed, discrete-time histories with averaging over time and/or space are needed (such as with a noise-logging dosimeter), if discrete samples taken with a short-duration moving time average (with a basic sound level meter) will suffice, or if frequency-band-specific SPLs are needed for selecting noise abatement materials (with a spectrum analyzer). A discussion of these three primary types of sound measurement instruments and the noise descriptors that can be obtained therefrom follows.
DESIGN FOR HEALTH, SAFETY, AND COMFORT
3.1.1 Sound Level Meter Sound Level Meter Types and Standardization Most sound measurement instruments derive from the basic SLM, a device for which there are four grades and associated performance tolerances that become more stringent as the grade number decreases, described by American National Standards Institute (ANSI) S1.4-2014 (ANSI, 2014). Type 0 instruments have the most stringent tolerances and are for laboratory use only. Other grades include type 1, intended for precision measurement in the field or laboratory; type 2, intended for general field use, especially where frequencies above 10,000 Hz are not prevalent; and type S, a special-purpose meter that may perform at grade 1, 2, or 3 but may not include all of the operational functions of the particular grade. A grade of type 2 or better is needed for measuring occupational exposures under OSHA regulations, and for community noise measurements for ordinance enforcement as well as to obtain data for most court proceedings. For specialized SLMs which comprise dosimeters for obtaining various metrics of sound exposure levels for OSHA or other noise-monitoring requirements, ANSI S1.25-1991(R2020) should be consulted (ANSI, 2020a). This standard specifies SLM characteristics that are essential to the accurate measurement of steady, intermittent, fluctuating, and impulsive sounds, particularly when the measurement obtained is over a time interval as opposed to instantaneously. A block diagram of the functional components of a generic SLM appears in Figure 2. At the top, a microphone/preamplifier transduces the pressure changes caused by an airborne sound wave and converts the pressure signal into a voltage signal. Because the pressure fluctuations of a sound wave are small in magnitude, the corresponding voltage signal must be preamplified and then input to an amplifier, which boosts the signal before it is processed further. The passband, the range of frequencies that are passed through and processed, of a high-quality SLM
Microphone and preamplifier
Amplifier
Frequencyweighting networks
dB Hz
Exponential averager: fast, slow, impulse
dB Seconds 90 dB 70 range 30 10 68 dBA Figure 2
Attenuator
Indicator readout
Functional components of a sound level meter.
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
463
an exponential averaging network, which defines the meter’s dynamic response characteristics. In effect, this response creates a moving-window, short-time average display of the sound waveform. The two most common settings are FAST, which has a time constant of 0.125 s, and SLOW, which has a time constant of 1.0 s. These time constants were established decades ago to give analog needle indicators a rather sluggish response (particularly on the SLOW setting) so that they could be read by the human eye even when highly fluctuating sound pressures were measured. Under the FAST or SLOW dynamics, the meter indicator rises exponentially toward the decibel value of an applied constant SPL. For OSHA measurements, the SLOW setting is used, and this setting is also best when the average value (as it is changing over time) is desired. The FAST setting is more appropriate when the variability or range of fluctuations of a time-varying sound is desired. On certain SLMs, a third time constant, IMPULSE, may also be included for measurement of sounds that have sharp transient (rise and decay) characteristics over time, exemplified by gunshots or impact machinery such as drop forges. The IMPULSE setting has an exponential rise time constant of 35 ms and a decay time
contains frequencies from about 10–20,000 Hz, but depending on the frequency weighting used, not all frequencies are always treated in the same way, unless the SLM is set to a linear (unweighted) mode. A selectable frequency-weighting network, or filter, is then applied to the signal. These networks most commonly include the A-, B-, and C-weighting functions shown in Figure 3(b). For OSHA noise-monitoring measurements and for many community noise applications, the A scale, which deemphasizes the low frequencies and to a smaller extent the high frequencies, is used. In addition to the common A scale (which approximates the 40-phon level of hearing) and C scale (100-phon level), other selections may be available. If no weighting function is selected on the meter or if it is set to a linear mode, the notation dBZ or dB(linear) is used, and all frequencies are processed without weighting factors. The actual weighting functions for the three suffix notations A, B, and C are superimposed on the phon contours of Figure 3(a) and are also depicted in Figure 3(b) as actual frequency-weighting functions. Next (not shown), the signal is effectively squared to reflect the fact that SPL in decibels is a function of the square of the sound pressure. The signal is then applied to
Threshold of feeling
120 Phones
120
110 100 100 SPL (dB) re 20 μPa
90 80
80
70 60
60
50
Th re sh old
40
40 30
of
20
au
dib
10
ilit
y
0
0 20
100
500
1000
5000 10,000
Relative response (dB)
Frequency (Hz) (a) +10 0 –10 –20
C
A
B
B+C
–30 –40
A
–50 20
50
100
200
500
1000
2000
5000 10,000 20,000
Frequency (Hz) (b) Figure 3 (a) Equal-loudness contours based on the psychophysical phon scale, with sound level meter frequency-weighting curves superimposed; (b) decibels vs. frequency values of A, B, and C sound level meter weighting curves. (Source: Earshen, 1986. © 1986 AIHA.)
464
DESIGN FOR HEALTH, SAFETY, AND COMFORT
of 1.5 s. It is useful to afford the observer the time to view the maximum value of a burst of sound before it decays and is more commonly applied in community and business machine noise measurements than in industrial settings.
that correspond to various time segments within that period. Thus, at a later time, the recording can be played back to aurally demonstrate the decibel level of certain events within the total period.
Continuous vs. Impulse Noise Measurement It is important at this point to distinguish between two major categories of sound that are defined by its behavior as a function of time. These categories are “continuous” and “impulse” noise, and across different references there is, unfortunately, some variability in their definitions. However, in U.S. Federal law, OSHA (1971a, 1971b) states that a noise is to be considered as continuous if the “variations in noise level involve maxima at intervals of 1 second or less.” This would indicate that a noise which has a very fast staccato characteristic wherein its peaks in level occur more often than once per second are continuous. Thus, implicit by complement is that if the peaks in level are separated by one second or greater, the noise is considered to be impulsive. NIOSH (1988) states that “impulsive noise is characterized by a sharp rise and rapid decay in sound levels and is less than 1 second in duration” (p. xiii), while continuous noise is “noise with negligibly small fluctuations of level within the period of observation” (p. xii). Furthermore, while impulse noise is often taken to encompass both impulse and impact sources, impulse is a more generic term referring to the origin of a sound due to the product of a force applied and the time over which it is applied, while impact refers to a sound that results from a collision of a mass that is in motion with another mass, with the second mass either stationary or in motion. Because sound often consists of symmetrical pressure fluctuations above and below ambient air pressure for which the arithmetic average is zero, a root mean square (rms) averaging procedure is applied when FAST, SLOW, or IMPULSE measurements are taken, and the result is displayed in decibels. In effect, each pressure (or converted voltage) value is squared, the arithmetic mean of all squared values is then obtained, and finally the square root of the mean is computed to provide the rms value. Some SLMs include an unweighted PEAK setting that does not utilize the rms computation but instead provides an indication of the actual peak SPL reached during a pressure impulse, typically with a unit of dBP. This measurement mode is necessary for certain applications: for instance, to determine if the OSHA limit of 140 dB for impulsive exposure is exceeded. It is important to note that the aforementioned rms-based IMPULSE dynamics setting is unsuitable for measurement of PEAK SPLs. With regard to the final component of a SLM shown in Figure 2, the indicator display or readout, much debate has existed over whether an analog (needle pointer or bar “thermometer-type” linear display) or digital (numeric) display is best. Ergonomics design guidance would suggest that while the digital readout affords higher precision of information to be presented in a smaller space, its disadvantage is that the least significant digit becomes impossible to read when the sound level is fluctuating rapidly. Also, it is more difficult with a digital readout for the observer to capture the maximum and minimum values of a sound, as is often desirable using the FAST or IMPULSE response. On the other hand, if very precise measurements down to a fraction of a decibel are needed, the digital indicator is preferable as long as the meter incorporates an appropriate time integrating/averaging feature or “hold” setting so that the data values can be captured. Because of the advantages and disadvantages of each type of display, some contemporary SLMs include both analog and digital readout displays. An additional feature is that some SLMs enable the audio recording of a complete noise emission time period, and to do so simultaneously in time with the decibel measurements
Microphone Considerations Most SLMs have interchangeable microphones that offer varying frequency response, sensitivity, and directivity characteristics (Peterson, 1979). The response of the microphone is the ratio of electrical output (in volts) to the sound pressure at the diaphragm of the microphone. Sound pressure is commonly expressed in Pascals for free-field conditions and the free-field voltage response of the microphone is given as millivolts per Pascal. (Sound fields are discussed in detail in Section 4.4.2, but briefly the free field is a region where there are no barriers to sound propagation nor surfaces causing sound reflections that result in reverberation, and the sound level decreases by 6 dB for each doubling of distance from its original source (Driscoll, 2020). This is contrasted with a reverberant field wherein reflective surfaces strongly affect the sound level as it moves outward from its source, resulting in less of a decrease in level.) When microphone specifications for sensitivity or output level are given, the response is usually based on a pure-tone sound wave input. Typically, the output level is provided in decibels re 1 V at the microphone electrical terminals, and the reference sensitivity is 1 V/Pa. Most microphones that are intended for general sound measurements are essentially omnidirectional (i.e., nondirectional) in their response for frequencies below about 1000 Hz. The 360∘ response pattern of a microphone is called its polar response, and this pattern is generally measured in the plane perpendicular to the diaphragm and aligned with its center. Some microphones are designed to be highly directional, of which one example is the cardiode design, which has a heart-shaped polar response wherein the maximum sensitivity is for sounds whose direction of travel causes them to enter the microphone at 0∘ (or the perpendicular incidence response), and minimum sensitivity is for sounds entering at 180∘ behind the microphone. Microphone response characteristics can also be defined as a function of the direction of the incoming sound waves with respect to the plane of the microphone diaphragm, as shown in Figure 4. The response at 90∘ , where sound waves travel and enter parallel to the diaphragm, is known as the grazing incidence response. Another response pattern, the random-incidence response, represents the mean response of the microphone for sound waves that strike the diaphragm from all angles with equal probability. This response characteristic is the most versatile, and thus it is the response pattern used most often in the United States. Frequency responses for various microphone incidence patterns are depicted in Figure 4. Because most U.S. SLM microphones are omnidirectional and utilize the random-incidence response, it is best for an observer to point the microphone at the primary noise source and hold it at an angle of incidence from the source at approximately 70∘ . This will produce a measurement most closely corresponding to the random-incidence response. On the other hand, free-field microphones have their flattest (i.e., most accurate) response at normal incidence (0∘ ), while pressure microphones have their flattest response at grazing incidence (90∘ ), and both should be pointed accordingly with respect to the noise source. In any case, care must be taken to avoid shielding the microphone with the body or objects. The response of microphones can also vary with temperature, atmospheric pressure, and humidity, with temperature usually being the most critical factor. Most microphone manufacturers supply correction factors for variations in decibel readout due to temperature effects. Atmospheric effects are generally significant only when measurements are made in aircraft or at very
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
465
10 Perpendicular 5
Relative response (dB)
Random 0 Grazing Perpendicular –5 Grazing –10
–15
–20 20
50
100
200
500
1k
2k
5k
10k
20k
Frequency (Hz) Figure 4 Frequency response of a hypothetical microphone for three angles of incidence. (Source: Peterson, 1979. © 1979 McGraw-Hill.)
high altitudes, and humidity has a negligible effect except at very high levels. In any case, microphones must not be exposed to moisture or large magnetic fields, such as those produced by transformers. When used in windy conditions, a foam windscreen should be placed over the microphone. This will reduce the contaminating effects of wind noise while influencing the frequency response of the microphone only slightly at high frequencies. The windscreen offers the additional benefit of protection of the microphone from damage due to being struck and/or from airborne foreign matter. Sound Level Meter Applications It is important to note that the basic SLM is intended to measure sound levels at a given moment in time, although certain specialized devices can perform integration or averaging of sound levels over an extended period of time. When the nonintegrating/nonaveraging SLM is used for long-term noise measurements, such as over a workday, it is necessary to sample and make multiple manual data entries on a record to characterize the exposure. Being difficult, both in terms of reading the meter and recording sound level data, this technique is usually best limited to area measurements and is not applied to an individual’s exposure sampling. Furthermore, the sampling process becomes more difficult as the fluctuations in a noise become more rapid and/or random in nature. SLMs are useful for determining the levels of human speech in both rms or peak values, calibration of laboratory experiments, calibration of audiometers (with special attachments), and community noise event-related measurements. 3.1.2 Dosimeter The audio-dosimeter, or simply “dosimeter,” is a portable battery-powered device that is derived directly from a SLM but also features the ability to obtain special measures of noise exposure (discussed later) that relate to regulatory compliance and hearing hazard risk. Some versions are weather-resistant
and can be used outdoors to log a record of noise in a community setting, including both event-related, short-term measures and long-term averages and other statistical data. Dosimeters for industrial use are very compact and are generally worn on the belt or in the pocket of an employee, with the microphone generally clipped to the lapel or shoulder of a shirt or blouse. The intent is to obtain a noise exposure history over the course of a full or partial work shift and to obtain, at a minimum, a readout of the TWA exposure and noise dose for the period measured. Depending on the features, the dosimeter may provide a running histogram of noise levels on a short-time-interval (such as 1-min) basis, compute statistical distributions of the noise exposures for the period, flag and record exposures that exceed OSHA maxima of 115 dBA (for continuous noise) or 140 dBP (for impulse or impact noise), and compute average metrics using 3 dB, 5 dB, or even other time-versus-level exchange rates. The dosimeter eliminates the need for the observer to set up a discrete sampling scheme, follow a noise-exposed worker, or monitor continuously an instrument that is staged outdoors, all of which are necessary with a conventional SLM. Dosimeters are covered in standards, most notably ANSI S1.25-1991(R2020) (ANSI, 2020a). 3.1.3 Spectrum Analyzer A spectrum analyzer is an advanced SLM which incorporates selective frequency-filtering capabilities to provide an analysis of the noise level as a function of frequency. In other words, the noise is broken down into its frequency components and a distribution of the noise energy in all measured frequency bands is available. Bands are delineated by upper and lower edge or cutoff frequencies and a center frequency. Different widths and types of filters are available, with the most common width being the octave filter, wherein the center frequencies of the filters are related by multiples of 2 (i.e., 31.5, 63, 125, 250, 500, 1000, 2000, 4000, 8000, and 16,000 Hz), with the most common type being the center-frequency proportional, wherein
466
DESIGN FOR HEALTH, SAFETY, AND COMFORT
the width of the filter depends on the center frequency (as in an octave filter set, in which the passband width equals the center frequency divided by 21/2 ). The octave band, commonly called the 1/1-octave filter, has a center frequency (CF) that is equal to the geometric mean of the upper ( fu ) and lower ( fl ) cutoff frequencies. To compute the center frequency for the octave filter, as well as the band-edge frequencies, the calculations of Eq (7) are used: CF = (fu fl )1∕2 Upper cutoff, fu = CF ⋅ 21∕2 Lower cutoff, fl = CF∕21∕2
(7)
More precise spectral resolution can be obtained with other center-frequency proportional filter sets with narrower bandwidths, the most common being the 1/3 octave, and with constant-percentage bandwidth filter sets, such as 1% or 2% filters. Note that in both types, the filter bandwidth increases as the center frequency increases. Still other analyzers have constant-bandwidth filters, such as 20-Hz-wide bandwidths which are of constant width regardless of center frequency. Whereas in the past most spectrum analyzer filters have been analog devices with “skirts” or overshoots extending slightly beyond the cutoff frequencies, digital computer-based analyzers are now very common. These “computational” filters use fast Fourier transform (FFT) or other algorithms to compute sound level in a band of fixed resolution that is selectable by the user. FFT devices can be used to obtain very high resolutions of noise spectral characteristics using bandwidths as low as 1 Hz. However, in most measurement applications, a 1/1- or 1/3-octave analyzer will suffice unless the noise has considerable power in near-tonal components that must be isolated. One caution is in order: If a noise fluctuates in time and/or frequency, an integrating/averaging analyzer should be used to achieve good accuracy of measurements. It is important that the averaging period be long in comparison to the variability of the noise being sampled. Real-time analyzers incorporate parallel banks of filters (not FFT-derived) that can process all frequency bands simultaneously, and the signal output may be controlled by a SLOW, FAST, or other time constant setting, or it may be integrated
or averaged over a fixed time period to provide LOSHA , Leq , or other average-type measurements discussed in Section 3.2. Spectrum Analyzer Applications While occupational noise is monitored with a dosimeter or SLM for the purpose of OSHA noise exposure compliance (using A-weighted broadband measurement) or the assessment of hearing protection adequacy (using C-weighted broadband measurement), both of these applications can also be addressed (in some cases more accurately) with the use of spectral measurements of the noise level. For instance, the OSHA Occupational Noise Exposure Standard (OSHA, 1983) allows the use of octave-band measurements reduced to broadband dBA values to determine if noise exposures exceed dBA limits defined in Table G-9 of the standard. Furthermore, Appendix B of the standard concerns hearing protector adequacy and allows the use of an octave-band method for determining, on a spectral rather than a broadband basis, whether a hearing protector is adequate for a particular noise spectrum. Using this octave-band spectral method, a more precise matching of the noise exposure’s spectrum to the hearing protector’s spectral attenuation can be attained. It is also noteworthy that spectral analysis can help the hearing conservationist discriminate noises as to their hazard potential even though they may have similar A-weighted SPLs. This is illustrated in Figure 5, where both noises would be considered to be of equal hazard by the OSHA-required dBA measurements (since they both are 90 dBA), but the 1/3-octave analysis demonstrates that the lowermost noise is more hazardous, as evidenced by the heavy concentration of energy in the midrange and high frequencies. One of the most important applications of the spectrum analyzer is to obtain data that will provide the basis for engineering noise control solutions. For instance, to select an absorption material for lining interior surfaces of a workplace (discussed later herein), the spectral content of the noise must be known so that the appropriate density, porosity, thickness and even shape of the material may be selected. Spectrum analyzers are also necessary for performing the frequency-specific measurements needed to predict either signal audibility or speech intelligibility in noisy situations, according to the techniques discussed in Section 7. Furthermore, they can be applied for calibration of
120 90 dBA
100 90 60 40 120 100
90 dBA
90 60 40 125 Figure 5
1000 8000 1/3-Octave-band center (Hz)
Spectral differences for two different noises that have the same dBA value.
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
signals for laboratory experiments and audiometers, for determining the frequency response and other quality-related metrics for systems designed for music and speech rendition, and for determining certain acoustic parameters of indoor spaces, such as reverberation time. dBC–dBA Lacking a spectrum analyzer, one can obtain a rough indication of the dominant spectral content of a noise by using a SLM and taking measurements in both dBA and dBC for the same noise. If the dBC–dBA value is large, that is, about 5 dB or more, then it can be concluded that the noise has considerable low-frequency content. If, on the other hand the dBC–dBA value is negative, the noise clearly has strong midrange components, since the A-weighting curve exhibits slight amplification in the range 2000–4000 Hz. Such rules of thumb rely on the differences in the C- and A-weighting curves shown in Figure 3(b). However, they should not be relied upon in lieu of a spectrum analysis if the noise is believed to have high-frequency or narrowband components that need engineering noise control attention. 3.1.4 Acoustical Calibrators Each of the SLM and dosimeter instruments described above contains a microphone that transduces the changes in pressure and inputs this signal into the electronics. Although modern sound measurement equipment is generally stable and reliable, calibration is necessary to match the microphone to the instrument so that the accuracy of the measurement is assured. Because of its susceptibility to varying environmental conditions and damage due to rough handling, moisture, and magnetic fields, the microphone is generally the weakest link in the measurement equipment chain. Therefore, an acoustic calibrator should be applied before and after each measurement. The pretest calibration ensures that the instrument is indicating the correct SPL for a standard reference calibrator output at a specified SPL and frequency (e.g., 94 dB at 1000 Hz). The posttest calibration is done to determine if the instrumentation, including the microphone, has drifted during the measurement and, if so, if the drift is large enough to invalidate the data obtained. Calibrators may be electronic transducer-type devices with loudspeaker outputs from an internal oscillator, or “pistonphones,” which use a reciprocating piston in a closed cavity to produce sinusoidal pressure variations as the cylinder volume changes. Both types include adapters that allow the device to be mated to microphones of different diameters. Calibrators should be sent to the manufacturer on a periodic basis, at intervals recommended by the manufacturer, for bench calibration and certification. There are many other issues that bear on the proper application of sound level measurement equipment, such as microphone selection and placement, averaging time and sampling schemes, and statistical data reduction techniques, all of which are beyond the scope of this chapter. For further coverage of these topics, the reader is referred to Harris (1991) and Meinke et al. (2020). 3.2 Sound and Noise Metrics 3.2.1 Exchange or Trading Rates Because both sound amplitude and sound duration determine the energy of an exposure, average-type measures are based on simple algorithms or exchange rates, which trade amplitude for time and vice versa. For example, most noise regulations, OSHA (1983) or otherwise, stipulate that a worker’s exposure may not exceed a maximum daily accumulation of noise energy. In other words, in OSHA terms the product of duration and intensity must remain at or under the regulatory cap or criterion level of 90 dBA TWA for an 8-h work period, which is equivalent to
467
a 100% noise dose, and a hearing conservation program must be instituted for any workers exposed to an action level of 85 dBA TWA or above, which is equivalent to a 50% noise dose. Much debate has occurred over the past several decades about which exchange rate is most appropriate for prediction of hearing damage risk, and most countries currently use either a 3- or 5-dB relationship, with the 3-dB exchange rate representing the majority. However, for the United States, the OSHA exchange rate is 5 dB, which means that an increase (decrease) in decibel level by 5 dB is equivalent (in exposure) to a doubling (halving) of time. For instance, using the OSHA PEL of 90 dBA for 8 h, if a noise is at 95 dBA, the allowable exposure per workday is half of 8 h, or 4 h. If a noise is at 85 dBA, the allowable exposure time is twice 8 h, or 16 h. These allowable reference exposure durations (T values) are provided in Table A-1 of the OSHA (1983) regulation or they may be computed using the formula for T, which appears below as equation (14) . The 5-dB exchange rate is predicated in part on the theory that intermittent noise is less damaging than continuous noise because some recovery from temporary hearing loss occurs during quiet periods. Arguments against it include the fact that an exchange of 5 dB for a factor of 2 in time duration has no real physical basis in terms of energy equivalence. Furthermore, there is some evidence that the quiet periods of intermittent noise exposures are insufficient in length to allow for recovery to occur. The 5-dB exchange rate is used for all measures associated with OSHA regulations, including the most general average measure of LOSHA , the TWA referenced to an 8-h duration, and noise dose in percent. Most European countries use a 3-dB exchange rate, also known as the aforementioned equal-energy rule. In this instance, a doubling (halving) of sound intensity, which corresponds to a 3-dB increase (decrease), equates (in energy) to a doubling (halving) of exposure duration. The equal energy concept stems from the fact that if sound intensity is doubled or halved, the equivalent sound intensity level change is 3 dB. An exposure to 90 dBA for 8 h using a 3-dB exchange rate is equivalent to a 120-dBA exposure of only 0.48 min. Because each increase in decibels by 10 corresponds to a 10-fold increase in intensity, the 30-dB increase from 90 to 120 dBA represents a 1000-fold (103 ) increase in sound intensity, from 0.001 to 1 W/m2 . The 90-dBA exposure period is 8 h, or 480 min, and this must be reduced by the same factor as the SPL increase, so 480/1000 = 0.48 min, or 29 s. The 3-dB exchange rate is used for all measures associated with the equivalent continuous sound level, Leq . 3.2.2 Average and Integrated SPLs As discussed earlier, conventional SLMs provide “momentary” decibel measurements that are based on very short moving-window exponential averages using FAST, SLOW, or IMPULSE time constants. However, since the majority of noises fluctuate over time, one of several types of average measurements, discussed below, is usually most appropriate as a descriptor of the central tendency of the noise. Averages may be obtained in one of two ways: (1) by observing and recording conventional SLM readouts using a short-time-interval sampling scheme and then manually computing the average value from the discrete values, or the much-preferred method; or (2) by using a SLM or dosimeter which automatically calculates a running-average value using microprocessor circuitry which provides either a true continuous integration of the area under the sound pressure curve or which obtains discrete samples of the sound at a very fast rate and computes the average (ANSI, 2014, 2020a). Generally, average measures obtained by method (2) yield more representative values because they are based on continuous or near-continuous sampling of the waveform, which the human observer cannot perform well even with continuous vigilance.
468
DESIGN FOR HEALTH, SAFETY, AND COMFORT
The average metrics discussed below are generally considered as the most useful for evaluating noise hazards in industry, annoyance potential in the community, and other sounds in the laboratory or in the field which fluctuate over time. In most cases for industrial hearing conservation as well as community noise annoyance purposes, the metrics utilize the A-weighting scale. For precise spectral measurements with no frequency weighting, the decibel unweighted (linear) scale may be applied in the measurements. The equations are all in a form where the data values are considered to be discrete sound levels. Thus, they can be applied to data from conventional SLMs or dosimeters. For continuous sound levels (or when the equations ∑ are used to describe true integrating meter functioning), the sign in the equations T would be replaced by the integral sign ∫0 and the ti replaced by dt. Variables used in the equations are as follows: Li = decibel level in measurement interval i N = number of intervals T = total measurement time period ti = length of measurement interval i Q = exchange rate (dB)
N 1 ∑ Li∕q (10 ti ) T i=1
] (8)
The equivalent continuous sound level, Leq , equals the continuous sound level which when integrated or averaged over a specific time would result in the same energy as a variable sound level over the same time period. The equation for Leq , which uses a 3-dB exchange rate and thus q = 10, is: [ N ] 1 ∑ Li∕10 (10 ti ) (9) Leq = Lav (3) = 10 log10 T i=1 In applying the Leq , the individual Li values are usually in dBA but not always so, as any weighting curve can be applied to the measurements. Equation (9) may also be used to compute the overall equivalent continuous sound level (for a single site or worker) from individual Leq values that are obtained over contiguous time intervals by substituting the Leq values in the Li variable. The Leq values are often expressed with the time period over which the average is obtained; for instance, Leq (24) is an equivalent continuous level measured over a 24-h period. Another average measure that is derived from Leq and often used for community noise quantification is Ldn , which is simply a 24-h Leq measurement with a 10-dB penalty added to all nighttime noise levels from 10 P.M. to 7 A.M. The rationale for the penalty is that humans are more disturbed by noise, especially due to sleep arousal, during nighttime periods. The equation for the OSHA average noise level, LOSHA , which uses a 5-dB exchange rate and thus q = 16.61, is: [ N ] 1 ∑ LiA ∕16.61 LOSHA (5) = 16.61 log10 (10 ti ) (10) T i=1 where LiA is in dBA, slow response.
100 ∑ (LiA −Lc )∕q (10 ti ) Tc i=1 N
The general form equation for average SPL, or Laverage , Lav , is: [
where LiA is in dBA, slow response, and T is always 8 h. Only LiA ≥ 80 dBA is included. OSHA’s noise dose is a dimensionless representation in percentage of the noise exposure, where 100% is the maximum allowable dose, corresponding to a 90-dBA TWA referenced to 8 h. Dose utilizes a criterion sound level, which is presently 90 dBA, and a criterion exposure period, which is presently 8 h. A noise dose of 50% corresponds to a TWA of 85 dBA, and this is known as the OSHA action level at which a hearing conservation program is required. Calculation of dose, D, is as follows: D=
⎧for 3-dB exchange, q = 10.0 ⎪ q = Q∕ log10 (2) ⎨for 4-dB exchange, q = 13.3 ⎪for 5-dB exchange, q = 16.6 ⎩
Lav (Q) = q log10
OSHA’s TWA is a special case of LOSHA which requires that the total time period T always be 8 h, that time is expressed in hours, and that sound levels below 80 dBA, termed the threshold level, are not included in the measurement: [ N ] 1 ∑ LiA ∕16.61 TWA = 16.61 log10 (10 ti ) (11) 8 i=1
(12)
where LiA is in dBA, slow response, Lc is the criterion sound level, and Tc is the criterion exposure duration. Only LiA ≥ 80 dBA is included. Noise dose D can also be expressed as follows for different sound levels Li over the workday: ( ) Cn C1 C2 D = 100 + +···+ (13) T1 T2 Tn where Ci is the total time (h) of actual exposure at Li , Ti is total time (h) of reference-allowable exposure at Li , from Table G-16a of OSHA, (1983), and Ci /Ti represents a partial dose at sound level i (Li ). The reference allowable exposure T for a given sound level can also, in lieu of consulting Table G-16a in OSHA (1983), be computed as: T = 8∕2(L−90)∕5 (14) where L is the measured dBA level. Two other useful equations to compute dose D from TWA and vice versa are: D = 100 × 10(TWA−90)∕16.61 TWA = [16.61 log10 (D∕100)] + 90
(15) (16)
where D is the dose in percent. TWA and dose D conversions can also be found in Table A-1 of OSHA (1983). A final measure that is particularly useful for quantifying the exposure due to single or multiple occurrences of an acoustic event (such as one complete operating cycle of a machine, a vehicle drive-by, or an aircraft flyover) is the sound exposure level (SEL). The SEL represents a sound 1 s in length that imparts the same acoustic energy as a varying or constant sound that is integrated over a specified time interval ti in seconds. Over ti , an Leq is obtained which indicates that SEL is used only with a 3-dB exchange rate. A reference duration of 1 s is applied for t0 in the following equation for SEL: SEL = Leq + 10 log10 (ti ∕t0 )
(17)
where Leq is the equivalent SPL measured over time period ti . Detailed example problems and solutions using many of the above formulas above may be found in Casali and Robinson (1999) and Meinke et al. (2020).
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
4 INDUSTRIAL NOISE REGULATION AND ABATEMENT 4.1 Need for Attention to Noise This section concentrates on the management of noise in industry, because that is the major source of noise exposure for most people, and as such, it constitutes a very common threat as noise-induced hearing loss (NIHL). Many of the techniques for measurement, engineering control, and hearing protection also apply to other exposures, such as those encountered in recreational or military settings. The need for attention to noise that pervades any human-occupied environment is indicated when (1) noise creates sufficient intrusion and operator distraction such that job performance and even job satisfaction are compromised; (2) noise creates interference with important communications and signals, such as inter-operator communications, machine- or process-related aural cues, alerting/emergency signals, or military tactics and missions; and/or (3) noise exposures constitute a hazard potential for NIHL in workers or military service members. 4.2 OSHA (and Other) Noise Exposure Regulated Limits In U.S. workplaces, while a few industrial hearing conservation programs were voluntarily implemented by a few industries in the 1940s and 1950s, legal limits on general industrial noise exposure and application of hearing protection were not promulgated into law until 1971, and this occurred with the Occupational Noise Exposure Standard of the Occupational Safety and Health Act. The OSHA noise regulation was the first requirement, based on exposure levels, for noise abatement and hearing protection devices (HPDs) in general industry (OSHA, 1971a), and a similar law was promulgated for construction (OSHA, 1971b), these settings being where the great majority of U.S. citizens were and continue to be at risk for hearing loss due to noise exposure. In 1983, some 12 years after the original OSHA legislation on noise in industry, the legal advent of the OSHA Hearing Conservation Amendment for General Industry (OSHA, 1983) immediately resulted in the proliferation of HPDs and other programmatic conservation measures in U.S. industrial workplaces. This was because the amendment required an effective hearing conversation program (HCP) be administered when any noise exposure equaled or exceeded an 85 dBA TWA, or 50% noise dose, for an 8-h workday, with the measurement taken on the “slow” scale and using a 5-dB exchange rate between exposure dBA level and time of exposure. Other industries, including airline, truck and bus carriers, railroads, and oil and gas well drilling, developed separate, and generally less comprehensive, noise and hearing conservation regulations than those of OSHA (1971a, 1983) for general industry, and unfortunately, to date, there has never been an analogue to the OSHA Hearing Conservation Amendment of 1983 adopted into law for the construction industry, where noise levels are often high and auditory warning signals (e.g., vehicle backup alarms) are prevalent (Casali & Alali, 2009). Finally, in the mining industry, noise exposure limits and hearing protection were addressed in that industry’s regulation, first under the Federal Coal Mine Health and Safety Act of 1969 and later under the Federal Mine Safety and Health Amendments Act of 1977. In 1999, the Mine Safety and Health Administration (MSHA) issued a more comprehensive noise regulation that governed all forms of mining (MSHA, 1999). In regard to combating the hearing loss problem in OSHA terms, if the noise dose (in general industry) equals or exceeds the OSHA action level of 50%, which corresponds to an 85-dBA TWA, the employer must institute a multi-faceted hearing conservation program (OSHA, 1983). If the criterion level of 100%
469
dose is exceeded (which corresponds to the permissible exposure level (PEL) of 90-dBA TWA for an 8-h day), the original OSHA (1971a) regulations specifically state that steps must be taken to reduce the employee’s exposure to the PEL or below via administrative work scheduling and/or the use of engineering controls. It is stated specifically that HPDs must be provided if administrative and/or engineering controls fail to reduce the noise to the PEL. Therefore, in applying the letter of the law, HPDs are only intended to be relied on when administrative or engineering controls are infeasible or ineffective. The final OSHA noise-level requirement pertains to impulsive or impact noise, which is not to exceed a PEAK SPL limit of 140 dB, often expressed as dBP. In addition to federal OSHA and MSHA HCP regulations for U.S. industry and mining, respectively, the U.S. military has been active in hearing conservation due to its high level exposures from ordnance, military vehicles, aircraft and other sources, with early regulations dating back to 1948 (U.S. Air Force, 1948). Comprehensive HCP program requirements that apply to all military branches are detailed in U.S. Department of Defense (2010). In addition, guidance for individual branches is published in various reports, such as the Army Hearing Program 40-501 pamphlet (U.S. Army, 2015). 4.3 Hearing Conservation Programs 4.3.1 Shared Responsibility: Management, Workers, and Government A successful HCP depends on the shared commitment of management and labor as well as the quality of services and products provided by external noise control consultants, audiology or medical personnel who conduct the hearing measurement program, and vendors (e.g., hearing protection suppliers). Involvement and interaction of corporate positions such as the plant safety engineer, ergonomist, occupational nurse, noise control engineer, purchasing director, and manufacturing supervisor are important. Furthermore, government agencies, such as OSHA and MSHA, have a responsibility to maintain and disseminate up-to-date noise exposure regulations and HCP guidance, to conduct regular in-plant compliance checks of noise exposure and quality of HCPs, and to provide enforcement where noise control and/or hearing protection is inadequate. Finally, the “end user” of the HCP, that is, the worker, must be an informed and motivated participant. For instance, if a fundamental component of an industrial HCP is the personal use of HPDs, the effectiveness of the program in preventing NIHL will depend most heavily on the worker’s commitment to wear the HPD properly and consistently. Failure by any of these groups to carry out their responsibilities can result in HCP failure and worker hearing loss. Side benefits of a successful HCP may include a marked reduction in noise-induced distractions and interference on the job, improved auditory situation awareness of hazards, and an improvement in worker comfort and morale. 4.3.2 Hearing Conservation Program Components Hearing conservation in industry should be thought of as a strategic, programmatic effort that is initiated, organized, implemented, and maintained by the employer, with cooperation from other parties as indicated above. A well-accepted approach is to address the noise exposure problem from a systems perspective, wherein empirical noise measurements provide data input which drives the implementation of countermeasures against the noise (including engineering controls, administrative strategies, and personal hearing protection). Subsequently, noise and audiometric data, which reflect the effectiveness of those countermeasures, serve as feedback for program adjustments and improvements. A brief discussion of
470
the major elements of a HCP, as dictated by OSHA (1983), follows, while additional detail can be found in various chapters of Meinke et al. (2020). Monitoring Noise exposure monitoring is intended to identify employees for inclusion in the HCP and to provide data for the selection of HPDs. The data are also useful for identifying areas where engineering noise control solutions and/or administrative work scheduling may be necessary. All OSHA-related measurements, with the exception of the PEAK SPL limit, are to be made using a SLM or dosimeter (of at least ANSI type 2, as noted above) set on the dBA scale, SLOW response, using a 5-dB exchange rate, and incorporating all sounds of levels from 80 to 130 dBA. It is unspecified, but must be assumed that continuous noises above 130 dBA should also be monitored. (Of course, such noise levels represent OSHA noncompliance since the maximum allowable continuous noise level is 115 dBA.) Appendix G of the OSHA regulation suggests that monitoring be conducted at least once every one or two years. Related to the noise-monitoring requirement is that of notification. Employees must be given the opportunity to observe the noise-monitoring process, and they must be notified when their exposures exceed the 50% dose (85 dBA TWA) level. Audiometric Testing Program All employees whose noise exposures are at the 50% dose level (85 dBA TWA) or above must be included in a pure-tone audiometric testing program wherein a baseline audiogram is completed within six months of the first exposure, and subsequent tests are done on an annual basis. Annual audiograms are compared against the baseline to determine if the worker has experienced a standard threshold shift (STS), which is defined by OSHA (1983) as a change in hearing threshold, relative to the baseline audiogram’s threshold, of an average of 10 dB or more at 2000, 3000, and 4000 Hz in one or both ears. The annual audiogram may be adjusted for age-induced hearing loss (presbycusis) using the gender-specific correction data in Appendix F of the regulation (OSHA, 1983). All OSHA-related audiograms must include 500, 1000, 2000, 3000, 4000, and 6000 Hz, in comparison to most clinical audiograms, which typically extend in range from at least 125 to 8000 Hz, and sometimes higher for specialized types of diagnoses. If an STS is revealed, a licensed physician or audiologist must review the audiogram and determine the need for further audiological or otological evaluation, the employee must be notified of the STS, and the selection and proper use of HPDs must be revisited. Training Program and Record Keeping An essential component of any HCP is a training program for all noise-exposed workers. Training elements to be covered include: effects of noise on hearing; purpose, selection, and use of HPDs; and purpose and procedures of audiometric testing. Also, accurate records must be kept of all noise exposure measurements, at least from the last two years, as well as audiometric test results for the duration of the worker’s employment. It is important, but not specifically required by OSHA, that noise and audiometric data be used as feedback for improving the program. For example, noise exposure records may be used to identify machines that need maintenance attention, to assist in the relocation of noisy equipment during plant layout efforts, to provide information for future equipment procurement decisions, and to target plant areas that are in need of noise control intervention. Some employers plot noise levels on a “contour boundary map,” delineating floor areas by their dB levels. When monitoring indicates that the noise level within a particular boundary has changed, it is taken as a sign that the machinery and/or work process has changed in the area and that further evaluation may be needed.
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Hearing Protection Devices The OSHA Hearing Conservation Amendment (OSHA, 1983) requires that a selection of HPDs that are suitable for the noise and work situation must be made available to all employees whose TWA exposures meet or exceed 85 dBA. Such HPDs are also useful outside the workplace, for the protection of hearing against noises produced by power tools, lawn care equipment, recreational vehicles, target shooting and hunting, and spectator events. In addition, HPDs of various types, some very specialized in their design, are used for military noise exposures from sources such as tracked vehicles, fixed-wing and rotor aircraft, ordnance and various weapons. Comprehensive overviews of conventional HPDs which provide noise attenuation via passive (nonelectronic) means may be found in Gerges and Casali (2007) and Berger and Voix (2020). Following is a brief overview of the basic types of devices, in part adopted from Gerges and Casali (2007). Earplugs consist of vinyl, silicone, spun fiberglass, cotton/ wax combinations, and open-cell or closed-cell foam products that are inserted into the ear canal to form a noise-blocking seal. Proper fit to the user’s ears and training in insertion procedures are critical to the success of earplugs. One style of earplug is the premolded device of many forms, often incorporating flanges for sealing against the ear canal walls; these are simply pushed into the ear canal for insertion. Another style comprises user-molded earplugs, typically made of slow-recovery foam, which must be rolled-down by the user and inserted into the ear canal before they re-expand. Still other earplugs are custom-molded to the user’s ear canal, via injection of a viscous resin into the canal, filling it up to the limiting point of a soft “ear dam” which has prior been inserted to protect the eardrum. The resultant positive casting of the ear canal is then used to make a negative mold from which the custom earplug is manufactured. Proper insertion of all three of these styles of earplugs is facilitated when a user uses the hand that is opposite the insertion ear to pull the outer ear (pinna) upward and outward to straighten the ear canal, and then inserting the earplug. Earplug pairs are sometimes connected with a flexible lanyard, which is useful for preventing loss as well as storage of the earplugs around a user’s neck. A related device to earplugs is the semi-insert or ear canal cap, which consists of earplug-like pods that are positioned at the rim of the ear canal as a shallow insertion device, and held in place by a lightweight headband. The headband is useful for storing the device around the neck when the user moves out of the noise. Compared to earplugs, canal caps are generally a more shallow insertion product, typically intended to provide a seal for the ear canal near its rim and not by deep insertion into the canal (Gerges & Casali, 2007). This fact has some attendant disadvantages, including lower attenuation of noise and a higher occlusion effect, wherein the user will experience his or her own voice as more bassy, resonant, and unnatural (Lee & Casali, 2011). Earmuffs consist of earcups, usually of a rigid plastic material with an absorptive liner, that enclose the outer ear completely and seal around it with foam- or fluid-filled cushions. A headband connects the earcups, and on some models this band is adjustable so that it can be worn over the head, behind the neck, or under the chin, depending on the presence of other headgear, such as a welder’s mask. In general terms, as a group, earplugs provide better attenuation than earmuffs below about 500 Hz and equivalent or greater protection above 2000 Hz (Gerges & Casali, 2007). Earmuffs are generally easier to fit by the user than either earplugs or canal caps, and depending on the temperature and humidity of the environment, the earmuff can be uncomfortable (in hot or high-humidity environments) or a welcome ear insulator (in a cold environment). Conventional styles of passive earplugs and earmuffs generally exhibit a spectral profile of attenuation that is nonlinear with respect to sound frequency; that is, the attenuation generally
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
increases with increased frequency. At any given frequency, the most attenuation that any HPD can provide, also termed its “theoretical limit” is the bone conduction threshold of the wearer. This is because at sound levels above the bone conduction threshold the HPD is “flanked” by the bone conduction pathway to the sensory portion of the ear and sound enters the ear through structural-borne vibrations of the skull (Berger & Voix, 2020). Beginning in the mid-1980s, conventional passive HPDs were augmented with new features, such as attenuation spectra which are nearly uniform or “flat” as a function of frequency, passive attenuation that increases as a function of increase in incident sound level above a certain transition level due to orifice-based filters in a thin occluder within a sound channel through the product (termed “level-dependent,” “nonlinear,” or “amplitude-sensitive” devices), adjustable attenuation as achieved with adjustable continuous valves or discrete filter-dampers inside a vent running through the HPD, and passive noise attenuators achieved using one-end-closed tube structures that intend to provide quarter-wave resonance to cancel offending noise in a narrow frequency band. All of these passive HPD augmentations, along with research results on their performance, are covered in detail in Casali (2010a). Furthermore, battery-powered electronic, also termed “active” HPD features. began to appear in the 1980s, including active noise reduction (ANR) or active noise cancellation (ANC) products that incorporate “anti-noise” of inverted phase relationship with the noise to be cancelled (primarily effective below about 1000 Hz), electronically-modulated sound transmission devices which provide microphone transduction of sounds external to the HPD and output-limited, amplified pass-through of signals and speech within a certain passband through the HPD (hear-through or sound transmission), and military/law enforcement-oriented Tactical Communications and Protective Systems (TCAPS) which provide covert two-way communications, signal pass-through capabilities, and gunfire-responsive protection. All versions of these active electronic HPDs are reviewed in Casali (2010b), along with research data on their performance in various applications. Regardless of their type, HPD effectiveness depends heavily on the proper fitting and use of the devices, as demonstrated in research that has compared laboratory testing results to actual in-field data (e.g., Park & Casali, 1991). Therefore, the employer is required to provide training in the fitting, care, and use of HPDs to all employees affected (OSHA, 1983). Hearing protector use becomes mandatory when the worker has not undergone the baseline audiogram, has experienced an STS, or has a TWA exposure that meets or exceeds 90 dBA. In the case of the worker with an STS, the HPD must attenuate the noise to 85 dBA TWA or below. Otherwise, the HPD must reduce the noise to at least 90 dBA TWA. The protective effectiveness or adequacy of an HPD for a given noise exposure must be determined by applying the attenuation data as currently required by the U.S. Environmental Protection Agency (U.S. EPA, 1979) per 40 CFR §211, with reference to ANSI S3.19-1974 (ANSI, 1974) protocol, to be included on protector packaging. These data are obtained from psychophysical threshold tests at nine 1/3-octave bands with centers from 125 to 8000 Hz that are performed on human subjects and specifically with the HPDs fit per the “experimenter-fit” technique for optimal attenuation (per ANSI, 1974). The difference between the thresholds with the HPD on and without it constitutes the attenuation at a given frequency. Spectral attenuation statistics (means and standard deviations) and the single-number noise reduction rating (NRR) which is computed therefrom are provided. The ratings are the primary means by which end users compare different HPDs on a common basis and make determinations of whether adequate
471
protection and OSHA compliance will be attained for a given noise environment. The most accurate method of determining HPD adequacy is to use octave-band measurements of the noise and the spectral mean and standard deviation attenuation data to determine the protected exposure level under the HPD. This is called the National Institute for Occupational Safety and Health (NIOSH) long method or NIOSH method #1, and it is cited in Appendix B of OSHA (1983). Computational procedures for this method appear in NIOSH (1994). Because this method requires octave-band measurements of the noise, preferably with each noise band’s data in TWA form, the data requirements are somewhat extensive and the method is not widely applied in industry. However, because the noise spectrum is compared against the attenuation spectrum of the HPD, a “matching” of exposure to protector can be obtained; therefore, the method is considered to be the most accurate available. The NRR represents a means of collapsing the spectral attenuation data into one broadband attenuation estimate that can easily be applied against broadband dBC or dBA TWA noise exposure measurements. In calculation of the NRR, the mean attenuation is reduced by two standard deviations; this translates into an estimate of protection theoretically achievable by 98% of the population (EPA, 1979). The NRR is intended primarily to be subtracted from the dBC exposure TWA to estimate the protected exposure level in dBA: Workplace TWA (dBC) − NRR = protected TWA (dBA) (18) Unfortunately, because OSHA regulations require that noise exposure monitoring be performed in dBA, the dBC values may not be readily available to the hearing conservationist. In the case where the TWA values are in dBA, the NRR can still be applied, albeit with some loss of accuracy. With dBA data, a 7-dB “safety” correction is applied to the NRR to account for the largest typical differences between C- and A-weighted measurements of industrial noise, and the equation is: Workplace TWA (dBA) − (NRR − 7) = protected TWA (dBA) (19) Although the methods above are promulgated by OSHA (1983) for determining HPD adequacy for a given noise situation, a word of caution is needed. The data appearing on HPD packaging are obtained under optimal laboratory conditions with properly fitted protectors and trained human subjects. In no way does the “experimenter-fit” protocol and other aspects of the currently required (by the EPA) test procedure, ANSI S3.19-1974 (ANSI, 1974), represent the conditions under which HPDs are selected, fit, and used in the workplace (Park & Casali, 1991). Therefore, the attenuation data to be used in the NIOSH long method or NRR formulae shown above are, in general, inflated and cannot be assumed as representative of the protection that will be achieved in the field. The results of a review of research studies in which manufacturers’ on-package NRRs were compared against NRRs computed from actual subjects taken with their HPDs from field settings are shown in Figure 6 (Berger, 2003). Clearly, the differences between laboratory and field estimates of HPD attenuation are typically large and the hearing conservationist must take this into account when selecting protectors. Efforts by ANSI Working Group S12/WG11 since the early 1980s have focused on the development of an improved testing standard, ANSI S12.6, the first version of which was approved as a standard in 1984, and the current version (as of this chapter’s publication) of which is ANSI S12.6-2016 (ANSI, 2016). This standard has an important human factors provision in its “Method B” for subject (not experimenter) fitting of the HPD and relatively naïve subjects fitting the product after only reading the manufacturer’s instructions with
472
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Noise reduction rating (dB)
30
Laboratory
25 20 15 10 Field 5
tro M H is 9A M c.M H SA uff el lb Mk s er g IV Bi Nls N Pe om se ltr 23 o H 1 Bi 7P 3 ls om 3E U F1
Pe l
D o C wn u So st un om dB U an ltr a V- Fit 5 PO 1R P/ So EP ft 1 3 00 E- -fla A- ng R e fo am
0
Figure 6 Comparison of hearing protection device NRRs by device type manufacturers’ laboratory data vs. real-world “field” data. (Source: Berger, 2003. © 2003 AIHA.)
no experimenter assistance, in contrast to the EPA-required experimenter-fit protocol alternative of ANSI S3.19-1974 (ANSI, 1974). Method B subject-fit testing protocol has been demonstrated to yield attenuation data that are more representative of those achievable under realistic workplace conditions wherein a high-quality HCP is operated (Casali, 2006). However, ANSI S12.6 has never been promulgated into federal law with its Method B naïve subject fitting of the HPD, even though it was considered by the EPA in the 2009 timeframe. Therefore, HPD attenuation, and product labeling therefrom, is still required by the U.S. EPA (per 40 CFR §211) to be tested as per the experimenter-fit protocol of ANSI S3.19-1974, with the attendant disadvantages as to non-validity of its optimal results for predicting protection performance that is actually realized in the workplace. Therefore, HPD labeled attenuation, as required by the U.S. EPA, is typically higher than that which is actually achieved in practice (EPA, 2009). A different issue associated with the U.S. EPA HPD product labeling rule is that as of this chapter’s publication, there is no labeling requirement in federal law to include augmented HPD capabilities provided by active noise cancellation, active sound transmission, impulsive (e.g., gunfire) attenuation, incident sound attenuation nonlinearity, and other special features. This warrants a change, especially since consensus standards (e.g., ANSI S12.42-2010 [ANSI, 2010]) do exist for testing the performance afforded by certain of these features (Casali, 2010a, 2010b). 4.4 Engineering Noise Control 4.4.1 Brief Overview of General Strategies As discussed above, hearing protection and/or administrative controls are not a panacea for combating the risks posed by noise. They should not supplant noise control engineering; in fact, the best solution, in part because it does not rely on employee behavior, is to reduce the noise itself, preferably at the emission source. The physical reduction of the noise energy, either at its source, in its path, or at the worker, should be a major focus of noise management programs. However, in cases where noise control is ineffective, infeasible (as on an
airport taxi area), or prohibitively expensive, HPDs become the primary countermeasure. Source-Path-Receiver Controls There are many techniques used in noise control, and the specific approach must be tailored to the noise problem at hand. Spectrum analyzer (frequency-specific) measurements are typically used by noise control engineers to identify the source or root-cause of noise and to assist in the selection of control strategies. Example noise control strategies are (1) reduction at the source, which includes items such as vibration isolation with spring or elastomer machinery mounts to inhibit energy transfer, vibration damping with elastic material to absorb and dissipate energy, mufflers or silencers on exhausts, reducing cutting, fan, or impact speeds, dynamically balancing rotating components, reducing fluid-flow speeds and turbulence, and lining or wrapping of pipes and ducts; (2) isolation of the source or in the path treatments via relocation, acoustical enclosures, or shields (barriers) to reflect and redirect noise (especially high frequencies); (3) replacement or alteration of machinery, including belt drives as opposed to noisier gears, electrical rather than pneumatic tools, and shifting frequency outputs such as by using centrifugal fans (low frequencies) rather than propeller or axial fans (high frequencies), which often results in a lower A-weighted broadband level; (4) application of quieter materials, such as rubber liners in parts bins, conveyors, and vibrators, resilient hammer faces, bumpers on material handling equipment, nylon slides or rubber tires rather than metal rollers, and fiber rather than metal gears; and (5) treatment at the receiver via absorptive foam or fiberglass on reflective surfaces to reduce reverberation, enclosing the receiver (e.g., control room), or simply relocating the worker further away from the primary source(s). Further discussion of these and other techniques may be found in Driscoll (2020) and in Harris (1991), and an illustration of implementation possibilities in an industrial plant appears in Figure 7. Active Noise Reduction Another approach that has become available to industry in the last three decades is active noise
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
Sound-absorbing material beneath ceiling
473
Air intake muffler
Sound shield, absorbing Flexible pipe
Control room Door with sealing strips Vibration isolation Double glass with large interval between, with stripping
Noisy equipment in basement Figure 7
Sound insulating joints
Placement of heavy, vibrating equipment on separate plates with pillars
Noise control implementation in an industrial plant. (Source: OSHA, 1980.)
reduction (ANR), in which an electronic system is used to transduce an offensive noise in a sound field and then process and feed back the noise into the same sound field such that it is exactly 180∘ out of phase with, but of equal amplitude to, the original noise. The superposition of the out-of-phase anti-noise with the original noise causes physical cancellation of the noise in a target zone of the workplace. This requires two-dimensional plane wave sound propagation, which is characteristic of a limited number of noise situations, such as in air ducts, pipe stacks, and relatively small, enclosed spaces. Furthermore, ANR can sometimes be effective outside an industrial plant, for example, to partially cancel the noise from large centrifugal fans mounted on the plant’s roof or at ground level, reducing noise propagation into the community. In any case, at this stage of technology, ANR does have industrial noise control applications, but they are limited to specific situations. As to the ANR system design, there are several strategies that are employed. For highly repetitive, predictable noise sources, such as fans and blowers, synthesis of the anti-noise, as opposed to transducing it and reintroducing it in the sound field in a
feedback sense, may also be used in a feedforward fashion. At frequencies below about 1000 Hz, the ANR technique is most effective, which is fortuitous since the passive noise control systems to combat low-frequency noise, such as custom-built reactive silencers, often pose physical constraints such as size, shape, and weight, and can be prohibitively expensive. At higher frequencies and their corresponding shorter wavelengths, the processing and phase relationships become more complicated and cancellation is less successful, although the technology is improving and the effective bandwidth of ANR systems is increasing, both in noise control applications as well as in ANR-based HPDs and earphones (Casali, 2010b; Casali, Robinson, Dabney, & Gauger, 2004). Ergonomics Considerations In designing and implementing noise control hardware, it is important that ergonomics be taken into account. For instance, in a sound-treated control booth to house an operator, the ventilation system, lighting, visibility outward to the surrounding work area, and other
474
DESIGN FOR HEALTH, SAFETY, AND COMFORT
considerations relating to operator comfort and performance must be considered. With regard to noise-isolating machine enclosures, access provisions should be designed so as not to compromise the operator–machine interface. In this regard, it is important that both operational functionality and maintenance access needs be met. If noise control hardware creates difficulties for the operators in carrying out their jobs, they may tend to modify or remove it, rendering it ineffective. When noise control systems are retrofitted to existing equipment, it is important to explain their purpose and functionality to operators, so that they can appreciate the necessity of the additions made. 4.4.2 Sound Propagation and Metrics Various Sound Fields Human factors professionals need to have a basic understanding of the design and implementation of strategies for “in the path” controls; this depends heavily on the concepts of sound fields, and metrics thereof. The sound field, or region of airborne propagation around a sound source, can be divided into the free-field region and the far-field region, as illustrated in Figure 8. Free field is the region where there are no barriers to sound propagation nor surfaces causing sound reflections that result in reverberation, and the sound level decreases by 6 dB for each doubling of distance from its original source (i.e., the inverse distance law, discussed in Section 7.4.1). This is contrasted with the region of the reverberant field, wherein reflective surfaces reinforce the sound level as it moves outward from its source, resulting in less of a drop-off in sound level. Free fields and reverberant fields exist both indoors and outdoors, but hypothetically “pure” free fields only exist outdoors in regions where there are no obstructions or reflective surfaces whatsoever, and no ground effects from the Earth’s surface, the latter occurring only at altitude or perhaps with a highly absorptive surface covering, such as deep, soft snow. An “anechoic” chamber, meaning “without echo,” is an indoor room with highly absorptive materials (i.e., nonreflective foam or fiberglass) covering all room surfaces, which approximates a free field. In contrast, a reverberant room has highly reflective boundary surfaces. A “hemi-anechoic” region is a free field over a reflecting plane, such as a paved parking lot.
In taking sound measurements for noise control applications, it is important to recognize the effects of the particular sound field in which propagation occurs. If measurements are taken with the microphone very close to a sound source, they may be erratic due to the fact that the nonuniform shapes of the sound waves from the source and nearby surfaces produce variations in sound level. This region is known as the near field, and typically exists within a distance from the sound source that is less than one to two times the longest dimension of the source (Driscoll, 2020). At greater distances than this from the source, known as the far field, the sound waves tend to become more uniform and stable in their propagation. Indeed, in the far field region, if the environment is truly free field as well, the measurements will yield a 6 dB drop for each doubling of distance from the sound source, i.e., adhering to the inverse distance law as discussed in Section 7.4.1. For more detail on the implications of taking measurements in the near field versus far field, or in the transition zone between them, see Driscoll (2020). Absorption and its Metrics Various metrics are used to describe the properties of materials used for noise control, and a brief introduction follows. One of the most important properties is how well a surface absorbs sound wave energy (as opposed to reflecting it), which is defined by the absorption coefficient, 𝛼. This is a dimensionless quantity ranging from 0 (reflection of all incident sound energy) to 1 (absorption of all incident sound energy), and it is frequency-specific, i.e., typically provided in octave bands with center frequencies from 125–4000 Hz. A derivative metric, the Noise Reduction Coefficient, NRC, is computed as the arithmetic mean value of the absorption coefficients in octave bands centered at 250, 500, 1000, and 2000 Hz. In general terms, materials which are characterized by high porosity, softness and relatively light weight per unit area (e.g., open cell foams, fiberglass, thick fabrics, cellulose) provide absorption and have higher NRC values, while hard, nonporous materials (e.g., glass, painted drywall panels, glazed brick, ceramic tile) are reflective and have lower NRC values. It is important to note that the ability of a material to absorb sound waves is very frequency-specific, so a thorough spectrum analysis is necessary when determining how a noise’s
Sound Fields Lp
Near field
Far field Free field
Reverberant field
6 dB
A1
BA 7666-11,22 860512/2
2 × A1
Distance, r
Brüel & kjær
Figure 8 Illustration of sound fields and variation in sound pressure level with propagation of sound away from a source. (Source: Courtesy of Bruel & Kjaer.)
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
spectral emission can be matched with a material’s absorptive capabilities. In this regard, typical octave band-specific sound absorption coefficients for a variety of common materials are published, as in NIOSH (1980). In many indoor rooms, the acoustic environment will depend upon multiple surfaces (walls, floors, ceilings, etc.) that typically have different absorption coefficients. In Eq (20), by multiplying the absorption coefficient of each surface, 𝛼 1 … 𝛼 n , by its total surface area in square feet, S1 … Sn , respectively, then summing the individual absorption coefficients together, and finally dividing this sum by the total area of the surfaces, the average absorption coefficient, α, of the room can be obtained. Obviously, the higher the value of the average absorption coefficient, the more absorptive the room. Computational examples for α may be found in Driscoll (2020). Finally, multiplying α by the total surface area (S) of a room produces the total absorption for the space, A, in units of Sabins, as per Eq (21). α=
n n ∑ ∑ (S1 𝛼1 + S2 𝛼2 + · · · + Sn 𝛼n )∕ (S1 + S2 + · · · + Sn ) i=1
i=1
A (in Sabins) = Sα
(20) (21)
An entity which builds upon the total absorption, known as the Room Constant (R), is a measure of the effective absorption of a physical space in terms of the spatial area of absorptive surfaces. The Room Constant, in units of ft2 or m2 as calculated using Eq (22), uses A in the numerator, thus accounting for the absorption of each sound frequency band achieved by all of the room surfaces, and the complement of the average absorption coefficient, α, in the denominator. As indicated by Eq (22), as the average absorption coefficient increases, R increases and the reverberance of the room is lowered in the process. Essentially, R can be thought of as the number of square feet or square meters of absorption. However, caution is advised in that in actual application to real indoor spaces, calculation of R using only the wall, floor and ceiling surfaces absorption coefficients will likely underestimate the true Room Constant. This is generally due to the fact that other surfaces in the room, such as tables, chairs and window shades, provide additional absorption. For this reason, Driscoll (2020) recommends increasing the calculated R for typical industrial spaces by at least 25%. R (in ft2 or m2 ) = Sα∕(1 − α)
(22)
Moving to human factors implications, as R is increased, the reverberation time of the room decreases, and speech intelligibility can be expected to improve as a result. Furthermore, noise emissions will diminish more rapidly over distance as R increases, thus reducing noise exposures the further operators are located from the source. Rooms with high R values are often referred to as “acoustically dead” or “soft” spaces, while rooms with low R values are termed “acoustically live” or “hard” spaces. Sound Transmission and the Mass Law Absorption is but one measurable property associated with the control of noise in spaces occupied by humans. Another property is the ability of a partition or other barrier to reduce noise transmission from the space in which it originates to adjoining spaces. As opposed to the characteristics of materials which are effective at absorption, i.e., porous, soft, and lightweight, materials which are effective barriers for attenuating the passage of sound waves are typically high density and heavy. The primary measure of a material’s ability to transmit sound is the transmission coefficient, τ, which is the ratio of sound energy transmitted through a partition to the sound energy incident upon the partition. τ values are
475
frequency-specific, and provided in 1/3-octave bands with center frequencies from 125–8000 Hz. In general, excepting certain composite and reactive-dynamic materials, most materials transmit low frequency sound more readily than high frequency; thus, τ values tend to be lower as frequency increases. For example, this is why when driving a car with windows up, occupants can better hear sound with significant low frequency content such as train horns and the bass music of other’s car stereos, and not as readily hear higher frequency backup alarms or the treble from those stereos. A more useful metric which derives from the transmission coefficient τ is the sound transmission loss, TL, imposed by the material making up a partition. TL has the dB as its unit, and generally ranges from 0 dB (no attenuation of sound) to more than 60 dB for effectively-attenuating partitions; TLs for specific materials used in partitions appear in Driscoll (2020) and NIOSH (1980). The equation for TL appears in Eq (23). To simplify matters and enable easy comparisons of different partitions, a single-number rating system has been developed that provides a sound transmission class (STC) value, in dB, across all TL values as a function of frequency. STC values for typical building materials can be found in HUD (2009). For example, a typical 5/8 in-thick drywall installation between rooms, with standard upright support (stud) construction, where both drywall sides of the wall are physically connected to each other via each stud, has an STC of about 38 dB. By comparison, if the adjacent internal studs are alternatively staggered such that only one side of the wall is touching each stud, the STC improves significantly to about 47 dB. This illustrates an important point: in many instances, noise transmission can be significantly reduced with foresight and planning in construction, with minimal added expense. TL (in dB) = 10log10 (1∕τ)
(23)
Sound transmission loss of a barrier can be predicted with a formula, known as the field-incidence mass law, which is given in Eq (24). Equation (24) is valid only for predicting TLs for relatively high surface density materials and/or at high frequencies (Elbit & Hansen, 2007). Various references provide slight adjustments to the constants of this equation, to make the prediction of TL more conservative for dealing with the irregularities and heterogeneity of real materials. The commonly-cited mass law formula holds that the acoustical insulation performance of a simple homogeneous barrier is controlled by its surface mass, m, in kg/m2 , where the center frequency, f, in Hz of the octave band is specified. The practical result of the mass law Eq (24) is that for all other variables held constant, for a doubling (halving) of the mass of the barrier, or a doubling (halving) of the sound frequency, a 6 dB increase (decrease) in TL will occur. TL (in dB) = [20log10 (m) + 20log10 (f )] − 42
(24)
5 AUDITORY EFFECTS OF NOISE 5.1 Hearing Loss in the United States 5.1.1 Noise-Induced Hearing Loss Prevalence in Industry and the Military Noise-induced hearing loss (NIHL) is one of the most widespread occupational maladies in the United States, if not the world. In the early 1980s, it was estimated that over 9 million workers were exposed to noise levels averaging over 85 dBA for an 8-h workday (EPA, 1981). Today, this number is likely to be higher because the control of noise sources, in both type and number, has not kept pace with the proliferation of industrial and service sector development. Due in part to the
476
fact that before the first OSHA noise exposure regulation of 1971 there were no U.S. federal regulations governing noise exposure in general industry, many workers over 60 years of age now exhibit hearing loss that results from the effects of occupational noise. Of course, the total noise exposure from both occupational and nonoccupational sources influences the NIHL that a victim experiences. Of Americans who exhibit significant hearing loss due to a variety of etiologies, such as pathology of the ear, head trauma and hereditary causes, it has been estimated that over 10 million (at least one-third) have losses that are directly attributable to noise exposure (National Institutes of Health [NIH] Consensus Development Panel, 1990). Therefore, the noise-related losses are preventable in nearly all cases. Noise-induced hearing loss in the civilian sector from 2003 to 2012 had a prevalence rate of 12.9% (Matterson, Bushnell, Themann & Morata, 2016). The majority of these losses are due to on-the-job exposures, but leisure noise sources do contribute a significant amount of energy to the total noise exposure for some people. Although the effects of noise exposure are serious and must be reckoned with by the safety professional, one fact is encouraging: Process/machine-produced noise, as well as most sources of leisure noise, are physical stimuli that can usually be avoided, reduced, or eliminated; therefore, NIHL is preventable with effective abatement and protection strategies. Although perhaps optimistic, it is a fact that total elimination of NIHL should thus be the only acceptable goal. In the military, NIHL is also a staggering problem, especially during periods of war. Since the Afghanistan War began in 2001 and the Iraq War in 2003, approximately 52% of combat soldiers have experienced moderately severe hearing loss or worse, and these were primarily attributable to combat-related exposures (Defense Occupational and Environmental Health Readiness Data Repository [DOEHRS-DR], 2007). Furthermore, the problem of noise-induced hearing loss is the most common military disability. For instance, this is evidenced by the U.S. Veterans Administration (VA) having spent over $1.2 billion on personnel hearing-related injuries in fiscal year 2006 alone, and in fiscal year 2007, the VA dispensed 348,920 hearing aids to veterans at a cost of approximately $141 million (Saunders & Griest, 2009). Historically since about 2000, combat operations and heightened training requirements have resulted in increases in military service members’ exposure to hazardous noise. This has resulted in concomitant increases in noise-induced hearing loss and tinnitus, which again have accounted for the two most prevalent military service-connected disabilities from 2005 to 2016 (McIlwain, Gates, & Ciliax, 2008; U.S. Department of Veterans Affairs, 2016). Furthermore, 2.6 million military veterans received some form of compensation for hearing loss or related symptoms in fiscal year 2016 (U.S. Department of Veterans Affairs, 2016). In the largest branch of the U.S. military, the Army, significant hearing loss recently was at a 24% prevalence rate among soldiers (DOEHRS-DR, 2016). In contrast, the prevalence of noise-induced hearing loss in the civilian sector was 12.9% from 2003 to 2012, nearly half the rate reported in soldiers (Matterson et al., 2016). Military personnel have demonstrated 30% higher propensity of having severe hearing impairment compared to nonveteran counterparts, over similar timeframes (Groenwold, Tak, & Matterson, 2011). For instance, in U.S. industry, the annual cost of disability payments for hearing-related injuries in about 30 million workers was about $242.4 million in 2001 (NIOSH, 2001). 5.1.2 Auditory Situation Awareness and Its Interaction with Hearing Loss and Hearing Protection Part of the reason for military-related hearing loss is that combat soldiers may be inhibited from using hearing protection devices
DESIGN FOR HEALTH, SAFETY, AND COMFORT
(HPDs) for fear that the devices will reduce their ability to maintain stealth, operate tactically, hear threats, and communicate with fellow soldiers—in other words, compromising their state of auditory situation awareness. This behavior may also manifest in civilian workers who are noise-exposed and need hearing protection, but must also be aurally attentive to forklifts or other dynamic equipment that emit alarms when in motion. In individuals fail to use their HPDs, then noise-induced hearing loss will likely occur as a result, especially due to military gunfire or high-level civilian exposures. Their auditory situation awareness capabilities will then be diminished as a result of the hearing loss itself, and the effect of not using HPDs will have come full circle. As such, HPD technology must be incremented such that it enables natural or even slightly enhanced hearing when worn, while simultaneously affording adequate attenuation against hazardous noises, including those that occur unexpectedly, such as from military ordnance. Once outfitted with these products, confidence must be instilled in the soldier or worker that the assigned device indeed provides true auditory situation awareness simultaneously with the protection turn, if these afforded, in an effort to motivate consistent use. The complex issue of HPD effects on auditory situation awareness is further covered in Section 7.4.3 herein, and also in Casali and Lee (2018). 5.2 Types and Etiologies of Noise-Induced Hearing Loss Although the major concern of the industrial hearing conservationist is to prevent employee hearing loss that stems from occupational noise exposure, it is important to recognize that hearing loss may also emanate from a number of sources other than noise, including infections and diseases specific to the ear, most frequently originating in the middle or conductive portion; other bodily diseases, such as multiple sclerosis, which injures the neural part of the ear; ototoxic drugs, of which the mycin family (in sufficient dosages) is a prominent member; exposure to certain chemicals and industrial solvents; hereditary factors; head trauma; sudden hyperbaric- or altitude-induced pressure changes; and aging of the ear (presbycusis). Furthermore, not all noise exposure occurs on the job. Many workers are exposed to hazardous levels during leisure activities, from such sources as automobile/motorcycle racing, personal stereo headsets and car stereos, firearms, and power tools. The effects of noise on hearing are generally subdivided into acoustic trauma and temporary or permanent threshold shifts (Melnick, 1991). 5.2.1 Acoustic Trauma Immediate organic damage to the ear from an extremely intense acoustic event such as an explosion is known as acoustic trauma. The victim will notice the loss immediately, and it often constitutes a permanent injury. The damage may be to the conductive chain of the ear, including rupture of the tympanum (eardrum) or dislodging of the ossicular chain (small bones and muscles) of the middle ear. Conductive losses can, in many cases, be compensated for with a hearing aid and/or surgically corrected. Neural damage may also occur, involving a dislodging of the hair cells and/or breakdown of the neural organ (Organ of Corti) itself. Unfortunately, neural loss is irrecoverable and not typically compensable with a hearing aid. Acoustic trauma represents a severe injury, but fortunately, its occurrence is relatively uncommon, even in industrial settings. However, it can and does occur in settings where extremely powerful impulsive exposures occur, such as due to sudden explosions with a resultant blast wave in the military setting. More information on acoustic trauma may be found in Rabinowitz, Davies, and Meinke (2020).
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
5.2.2 Noise-Induced Threshold Shift A threshold shift is defined as an elevation of hearing level from a person’s baseline hearing level and it constitutes a loss of hearing sensitivity. Noise-induced temporary threshold shift (NITTS), sometimes referred to as “auditory fatigue,” is by definition recoverable with time away from the noise. Thus, elevation of threshold is temporary and usually can be traced to an overstimulation of the neural hair cells (actually, the stereocilia) in the Organ of Corti. Although the person may not notice the temporary loss of sensitivity, NITTS is a cardinal sign of overexposure to noise. It may occur over the course of a full workday in noise or even after a few minutes of exposure to very intense noise. Although the relationships are somewhat complex and individual differences are rather large, NITTS does depend on the level, duration, and spectrum of the noise as well as on the audiometric test frequency in question; further information can be found in Melnick (1991) and Royster, Royster, and Dobie (2020). With noise-induced permanent threshold shift (NIPTS), there is no possibility of recovery. NIPTS can manifest suddenly as a result of acoustic trauma; however, noises that cause NIPTS most typically constitute exposures that are repeated over a period of time and have a cumulative effect on hearing sensitivity. In fact, the losses are often quite insidious in that they occur in small steps over a number of years of overexposure and the person may not be aware until it is too late. This type of exposure produces permanent neural damage, and although there are some individual differences as to magnitude of loss and audiometric frequencies affected, the typical pattern for NIPTS is a prominent elevation of threshold at the 4000-Hz audiometric frequency (sometimes called the “4 kHz notch”), followed by a spreading of loss to adjacent frequencies of 3000 and 6000 Hz. From a classic study on workers in the jute weaver industry, Figure 9 depicts the temporal profile of NIPTS as the family of audiometric threshold shift curves, with each curve representing a different number of years of exposure. As noise exposure continues over time, the hearing loss spreads over a wider frequency bandwidth inclusive of midrange and high frequencies, which happens to encompass the range of most auditory warning signals. In some cases, the hearing loss renders it unsafe or unproductive for the victim to work in certain occupational settings where the hearing of certain
477
signals are requisite to the job. Unfortunately, the power of the consonants of speech sounds, which heavily influence the intelligibility of human speech, also lie in the frequency range that is typically affected by NIPTS, compromising the victim’s ability to understand speech. This is the tragedy of NIPTS in that the worker’s ability to communicate is hampered, often severely and always irrecoverably. Hearing loss is a particularly troubling disability because its presence is not particularly overt; therefore, the victim is often unintentionally excluded from conversations and may miss important auditory signals because others either are unaware of the loss or simply forget about the need to compensate for it. 5.3 Concomitant Auditory Injuries Following exposure to high-intensity noise, some people will notice that ordinary sounds are perceived as “muffled,” and in some cases, they may experience a ringing or whistling sound in the ears, known as tinnitus. These manifestations should be taken as serious indications that overexposure has occurred and that protective action should be taken if similar exposures are encountered in the future. Tinnitus may also occur by itself or in conjunction with NIPTS. Some people report that tinnitus is always present, pervading their lives. It thus has the potential to be quite disruptive and in severe cases debilitating. Various clinical procedures exist for the treatment of tinnitus, some of which involve habituation to the heard tinnitus signal (e.g., Gold, 2003). More rare than tinnitus, but typically quite debilitating, is the malady known as hyperacusis, which refers to hearing that is extremely sensitive to sound, usually to the extent that even moderate noise cannot be tolerated (Gold, 2003). Hyperacusis can manifest in many ways, but a number of victims report that their hearing became painfully sensitive to sounds of even normal levels after exposure to a particular, singular high-level noise event. Therefore, at least for some, hyperacusis can be traced directly to noise exposure. Sufferers often must use HPDs when performing normal activities, such as walking on city streets, visiting movie theaters, or washing dishes in a sink, because such activities produce sounds that are painfully loud to them. It should be noted that hyperacusis sufferers often exhibit normal audiograms, even though their reaction to sound is one of hypersensitivity.
0
Median presumed noiseinduced threshold shift (dB)
10 20 30 40 Exposure 50
5–9 years 15–19 years 25–29 years 35–39 years 40–52 years
60 70 125
250
500
1000
2000 3000 4000 6000
Frequency (Hz) Figure 9 Cumulative auditory effects of years of noise exposure in a jute-weaving industry. (Source: Taylor et al., 1964. © 1964 Acoustical Society of America.)
478
6 PERFORMANCE, NONAUDITORY, AND PERCEPTUAL EFFECTS OF NOISE 6.1 Performance and Nonauditory Health Effects of Noise 6.1.1 Task Performance Effects It is important to recognize that, among other deleterious effects, noise can degrade operator task performance. Research studies concerning the effects of noise on performance are primarily laboratory-based and task-/noise-specific; therefore, extrapolation of the results to actual industrial settings is somewhat risky (Sanders & McCormick, 1993). Nonetheless, on the negative side, noise is known to mask task-related acoustic cues as well as to cause distraction and disruption of “inner speech;” on the positive side, noise may at least initially heighten operator arousal and thereby improve performance on tasks that do not require substantial cognitive processing (Poulton, 1978). To obtain reliable effects of noise on performance, except on tasks that rely heavily on short-term memory, the level of noise must be fairly high, usually 95 dBA or greater. Tasks that are simple and repetitive often show no deleterious performance effects (and sometimes improvements) in the presence of noise, whereas difficult tasks that rely on perception and information processing on the part of the operator will often exhibit performance degradation (Sanders & McCormick, 1993). It is generally accepted that unexpected or aperiodic noise causes greater degradation than predictable, periodic, or continuous noise, and the startle response created by sudden noise can be disruptive. 6.1.2 Nonauditory Health Effects Noise has been linked to physiological problems other than those of the hearing sense, including hypertension, heart irregularities, extreme fatigue, and digestive disorders. Most physiological responses of this nature are symptomatic of stress-related disorders. Because the presence of high noise levels often induces other stressful feelings (such as sleep disturbance and interference with conversing in the home and fear of missing oncoming vehicles or warning signals on the job), there are second-order effects of noise on physiological functioning that are difficult to predict. The reader is referred to Kryter (1994) for a detailed discussion of nonauditory health effects of noise. 6.2 Annoyance Effects of Noise Noise has frequently given rise to vigorous complaints in many settings, ranging from office environments to aircraft cabins to homes. Such complaints are manifestations of what is known as noise-induced annoyance, which has given rise to a host of products, such as white/pink noise generators for masking undesirable noise sources, noise-canceling headsets, and noise barriers for reducing sound propagation over distances and through walls. In the populated community, noise is a common source of disturbance, and for this reason many communities, both urban and rural, have noise ordinances and/or zoning restrictions which regulate the maximum noise levels that can result from certain sources and/or in certain land areas. In communities that have no such regulations, residents who are disturbed by noise sources such as industrial plants or spectator events, often have no other recourse than to bring civil lawsuits for remedy (Casali, 1999). The principal rationale for limiting noise in communities is to reduce sleep and speech interference and to avoid annoyance. Some of the measurement units and instrumentation discussed in this chapter are useful for community and other noise annoyance applications, while more detailed information on the subject may be found in Driscoll, Stewart, Anderson, and Leasure (2020), Fidell and Pearsons (1997), and Casali (1999).
DESIGN FOR HEALTH, SAFETY, AND COMFORT
6.3 Loudness and Related Scales of Measurement One of the most readily identified aspects of a sound or noise and one that relates to a majority of complaints, be it a theater actor’s voice which is too quiet to be heard or a background noise which is too intense and thus annoying, is that of loudness. As discussed above, the decibel is useful for quantifying the amplitude of a sound on a physical scale; however, it does not yield an absolute or relative basis for quantifying the human perception of sound amplitude, commonly called loudness. However, there are several psychophysical scales that are useful for measuring loudness, the two most prominent being phons and sones. 6.3.1 Phons The decibel level of a 1000-Hz tone, which is judged by human listeners to be equally loud to a sound in question, is the phon level of the sound. The phon levels of sounds of different intensities are shown in Figure 3(a); this family of curves is referred to as the equal-loudness contours. On any given curve, the combinations of sound level and frequency along the curve produce sound experiences of equal loudness to the normal-hearing listener. Note that at 1000 Hz on each curve the phon level is equal to the decibel level. The threshold of hearing for a young, healthy ear is represented by the 0-phon-level curve. The young, healthy ear is sensitive to sounds between about 20 and 20,000 Hz, although, as shown by the curve, it is not equally sensitive to all frequencies. At low- and midlevel sound intensities, low-frequency and to a lesser extent high-frequency sounds are perceived as less intense than sounds in the range 1000–4000 Hz, where the undamaged ear is most sensitive. But as phon levels move to higher values, the ear becomes more linear in its loudness perception for sounds of different frequencies. It is because the ear exhibits this nonlinear behavior that the frequency-weighting responses for dBA, dBC, and so on, were developed, as discussed in Section 3.1.1. 6.3.2 Sones Although the phon scale provides the ability to equate the loudness of sounds of various frequencies, it does not afford an ability to describe how much louder one sound is than another. For this, the sone scale is needed (Stevens, 1936). One sone is defined as the loudness of a 1000-Hz tone of 40-dB SPL. In relation to 1 sone, 2 sones are twice as loud, 3 sones are three times as loud, one-half sone is half as loud, and so on. Phon level (LP ) and sones are related by Eq (25) for sounds at or above a 40-phon level: Loudness (sones) = 2(LP −40)∕10
(25)
According to Eq (25), 1 sone equals 40 phons and the number of sones doubles with each 10-phon increase above 40; therefore, it is straightforward to conduct a comparative estimate of loudness levels of sounds with different decibel levels. The rule of thumb is that each 10-dB increase in a sound (i.e., one that is above 40 dB to begin with) will result in a doubling of its loudness. For instance, a home theater room that is currently at 50 dBA may be comfortable for listening to movies and classical music. However, if a new air-conditioning system increases the noise level in the room by 10 dBA, the occupants will experience a perceptual doubling of background loudness and will probably complain about the interference with speech and music in the room. Once again, the compression effect of the decibel scale yields a measure that does not reflect the much larger influence that an increase in sound level will have on the human perception of loudness.
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
Loudness Level (sones) = Smax + 0.3
n ∑
Si
(26)
i=1
where Si is the loudness index for each band ∑n from Figure 10, Smax is the largest of the loudness indices, and i=1 Si is the sum of the loudness indices for all bands except Smax . Using this “precise” method, the effect is to include the loudest band of noise at 100% loudness value, while the totality of the other bands is included at 30%. Obviously, because the noise must be measured in octave or 1/3-octave bands, the method is measurement-intensive and requires special instrumentation (i.e., a real-time spectrum analyzer).
130 120
Loudness Index 150
110 Band Sound pressure level (dB re 2 × 10–5 N/m2)
Precise Calculation of Sone Levels by the Stevens Method It should be evident from Eq (25) that sone levels can be calculated directly from psychological measurements in phons but not from physical measurements of SPL in decibels without special conversions. This is because the phon-based loudness and SPL relationship changes as a function of the sound frequency, and the magnitude of this change depends on the intensity of the sound. The Stevens method (Stevens, 1936), now also known as the ISO spectral method, is fully described in Rossing (1990). Briefly, this method requires measurement of the dB (linear) level in 10 standard octave or 1/3-octave bands, with centers at 31, 63, 125, 250, 500, 1000, 2000, 4000, 8000, and 16000 Hz. Then, for each band measurement, the loudness index, Si , is found in Figure 10 and then applied in Eq (26):
479
100 80 60 50 40 30 25 20
100 90 80
15 12 10 8
70
6 5 4
60
3 2.5 2
50
1.5
40
1 0.7
30
Approximation of Sone Levels from dBA In contrast to the Stevens method, the loudness of a sound in sones can be computed from dBA values, albeit with substantially less spectral precision. In this method, only a SLM (as compared to a spectrum analyzer) is needed, and measurements are captured in dBA. Then 1.5 sones is equated to 30 dBA, and the number of sones is doubled for each 10-dBA increase over 30 dBA. For example, 40 dBA = 3 sones, 50 dBA = 6 sones, 55 dBA = 8 sones, 60 dBA = 12 sones, 65 dBA = 16 sones, 70 dBA = 24 sones, 75 dBA = 32 sones, 80 dBA = 48 sones, 85 dBA = 64 sones, and 90 dBA = 96 sones (Rossing, 1990). This method is primarily accurate at low to moderate sound levels since the ear responds in similar sensitivity to the A-weighting curve at these levels. Practical Applications of the Sone Despite its practicality, the sone scale is not widely used (an exception is that household ventilation fans typically have voluntary sone ratings). However, it is the most useful scale for comparing different sounds as to their loudness as perceived by humans. Given its interval qualities, the sone is more useful than decibel measurements when attempting to compare the loudness of different products’ emissions; for example, a vacuum cleaner that emits 60 sones is twice as loud as one of 30 sones. The sone also is useful in conveying sound loudness experiences to lay groups. An example of such use for illustrating the perceptual impacts of a community noise disturbance (automobile racetrack) to a civil court jury may be found in Casali (1999). 6.3.3 Modifications of the Sone A modification of the sone scale (Mark VI and subsequently Mark VII sones) was proposed by Stevens (1972) to account for the fact that most real sounds are more complex than pure tones. Utilizing the general form Eq (27), this method incorporates octave-band, 1/2-octave band, or 1/3-octave band noise measurements and adds to the sone value of the most intense
0.5 0.3 0.2 0.1
20 10
50
100 200
500
1k
2k
5k
10k
Frequency (Hz) Figure 10 Chart for calculating loudness indices from decibel levels in various frequency bands for use in computing sone levels. (Source: Rossing, 1990. © 1990 Pearson.)
frequency band a fractional portion of the sum of the sone values ∑n of the other bands ( i=1 Si ): ) ( n ∑ Si − Sm Loudness Level (Mark VII sones) = Sm + k i=1
(27) where Sm is the maximum sone value in any band, k is a fractional multiplier that varies with bandwidth ∑ (octave, k = 0.3; n 1/2-octave, k = 0.2; 1/3-octave, k = 0.15), and i=1 Si is the sum of the individual sone values of the other bands. 6.3.4 Zwicker’s Method of Loudness The concept of the critical band for loudness formed the basis for Zwicker’s (1960) method of loudness quantification. The critical band is the frequency band within which the loudness of a band of continuously distributed sound of equal SPL is independent of the width of the band. The critical bands widen as frequency increases. A graphical method is used for computing the loudness of a complex sound based on critical band results obtained and graphed by Zwicker. The noise spectrum is plotted and lines are drawn to depict the spread of a masking effect. The result is a bounded area on the graph which is proportional to total loudness. The method is relatively complex, and Zwicker (1960) should be consulted for computational detail.
480
6.3.5 Noisiness Units As descriptive terms, noisiness and loudness are related but not synonymous. Noisiness can be defined as the “subjective unwantedness” of a sound. Perceived noisiness may be influenced by a sound’s loudness, tonality, duration, impulsiveness, and variability (Kryter, 1994). Whereas a low level of loudness might be perceived as enjoyable or pleasing, a low level of unwantedness (i.e., noisiness) is by definition undesirable. Equal-noisiness contours, analogous to equal-loudness contours, have been developed based on a unit (analogous to the phon) called the perceived noise level (PNdB ), which is the SPL in decibels of a 1/3-octave band of random noise centered at 1000 Hz, which sounds equally noisy to the sound in question. Also, an N (later D) SLM weighting curve was developed for measuring the perceived noise level of a sound. A subjective noisiness unit analogous to the sone, the noy, is used for comparing sounds as to their relative noisiness. One noy is equal to 40 PNdB , and 2 noys are twice as noisy as 1 noy, 5 noys are five times as noisy, and so on. Similar to the behavior of sones as discussed above for loudness, an increase of about 10 PNdB is equivalent to a doubling of the perceived noisiness of a sound.
7 SIGNAL AUDIBILITY AND SPEECH COMMUNICATIONS IN NOISE Portions of this section pertaining to noise masking and its computation are based in part on Robinson and Casali (2003). 7.1 General Concepts in Signal and Speech Audibility 7.1.1 Signal-to-Noise Ratio Influence One of the most noticeable effects of noise is its interference with speech communications and the hearing of nonverbal signals. Operators often complain that they must shout to be heard and that they cannot hear others trying to communicate with them. Similarly, noise interferes with the detection of signals such as alarms for general area evacuation and warnings in buildings, annunciators, on-equipment alarms, and machine-related sounds which are relied upon for feedback to industrial workers. In a car or truck, the hearing of external signals, such as emergency vehicle sirens or train horns or in-vehicle warning alarms or messages, may be compromised by the ambient noise levels. The ratio (equivalent to the signed algebraic difference) of the speech or signal level to the noise level, termed the signal (or speech)-to-noise ratio (hereinafter, S/N for signal-to-noise and SNR for speech-to-noise), is a critical parameter in determining whether speech or signals will be heard in noise. A S/N value of +5 dB means that the signal is 5 dB greater than the noise; a S/N value of −5 dB means that the signal is 5 dB lower than the noise. 7.1.2 Masking and Masked Threshold Technically, masking is defined as the increase (in decibels) of the threshold of a desired signal or speech (the masked sound) to be raised in the presence of an interfering sound (the masking sound or masker). For example, in the presence of noisy traffic alongside a busy street, an auditory pedestrian crossing signal’s volume must be sufficiently higher than the traffic noise level to enable a pedestrian to hear it, whereas a lower volume may be audible (and possibly less annoying) when no traffic is present. It is also possible for one signal to mask another signal if both are active at the same time. The masked threshold is often defined in psychophysical terms as the SPL required for 75% correct detection of a signal when that signal is presented in a two-interval
DESIGN FOR HEALTH, SAFETY, AND COMFORT
task wherein, on a random basis, one of the two intervals of each task trial contains the signal and the noise and the other contains only noise. In a controlled laboratory test scenario, a signal that is about 6 dB above the masked threshold will result in nearly perfect detection performance for normal hearing individuals (Sorkin, 1987). In the remainder of this chapter, various aspects of the masking phenomenon are discussed and methods for calculating a masked signal threshold for nonverbal signals or, in the case of speech, estimates of intelligibility are presented. Throughout, it is important to remember that the masked threshold is, in fact, a threshold; it is not the level at which the signal is clearly audible. For the ensuing discussion, a functional definition of an auditory threshold is the SPL at which the stimulus is just audible to a person listening intently for it in the specified conditions. If the threshold is determined in “silence,” as is the case during an audiometric examination, it is referred to as an absolute threshold. If, on the other hand, the threshold is determined in the presence of noise, it is referred to as a masked threshold. 7.2 Analysis of and Design for Signal Audibility in Noise Fundamentally, detection of an auditory signal is prerequisite to any other function performed on or about that signal, such as discrimination of it from other signals, identification of its source, recognition of its intended meaning or urgency, localization of its placement in azimuth, elevation, and distance, and/or judgment of its speed on approach or retreat. “Audibility” of a signal is sometimes used as a general term that can broadly apply to any of the aforementioned hearing tasks. Although the S/N ratio is one of the most critical parameters that determine a signal’s detectability in a noise, there are many other factors as well. These include the spectral content of the signal and noise (especially in relation to the critical bandwidth), temporal characteristics of signal and noise (especially in relation to the contrast between them), duration of the signal’s presentation, listener’s hearing ability, demands on the listener’s attention, criticality of the situation at hand, and the attenuation of hearing protectors, if used. These factors are discussed in detail in Casali and Tufts (2021). In fact, depending upon the application for a particular auditory signal, be it warning, advisory, communications, or otherwise, consensus standards may be applied for specifying the acoustical parameters and their values in the signal, as is discussed later. The ensuing discussion concentrates on the most important issue of spectral content of the signal and noise and how that content impacts masking effects on audibility. 7.2.1 Spectral Considerations and Masking Generally speaking, the greater the decibel level of the background noise relative to the signal (inclusive of speech), the more difficult it will be to hear the signal. Conversely, if the level of the background noise is reduced and/or the level of the signal is increased, the masked signal will be more readily audible. In some cases, ambient noise can be reduced through engineering controls, and in the same or other cases it may be possible to increase the intensity of the signals. Although most off-the-shelf auditory warning devices have a preset output level, it is possible to increase the effective level of the devices by distributing multiple alarms or warning devices throughout a coverage area instead of relying on one centrally located device. This approach can also be used for variable-output systems such as public address loudspeakers since simply increasing the output of such systems often results in distortion of the amplified speech signal, thereby reducing intelligibility. Simply increasing the signal level without adding more sound sources can have the undesirable side effect of increasing the noise
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
exposures of people in the area of the signal if the signal is sounded too often. If the signal levels are extremely high (e.g., over 105 dBA), exposed persons could experience temporary threshold shifts or tinnitus if they are in the vicinity of the device when it is sounding, even if just for brief periods. As to a working decibel range for auditory warning signals, a recommendation from the International Organization for Standardization (ISO, 2003) standard “Danger Signals for Public and Work Areas—Auditory Danger Signals” is that the signal shall not be less than 65 dBA nor more than 118 dBA in the signal reception area. One problem directly related to the level of the background noise is distortion within the inner ear. At very high noise levels, the cochlea becomes overloaded and cannot accurately filter and discriminate different forms of acoustic energy (e.g., signal and noise) reaching it, resulting in the phenomenon known as cochlear distortion. In order for a signal, including speech, to be audible at very high noise levels, it must be presented at a higher level, relative to the background noise, than would be necessary at lower noise levels. This is one reason why it is best to make reduction of the background noise a high priority in occupational or other environments, so that signals or communications will not have to be presented at objectionably high SPLs. In addition to manipulating the levels of the auditory displays, alarms, warnings, and background noise, it is also possible to increase the likelihood of detection of an auditory display or alarm by manipulating its spectrum so that it contrasts with the background noise and other common workplace sounds. In a series of experiments, Wilkins and Martin (1982, 1985) found that the contrast of a signal with both the background noise and irrelevant signals was an important parameter in determining the detectability of a signal. For example, in an environment characterized by high-frequency noise such as sawing and/or planing operations in a wood mill, it might be best to select a warning device with strong low-frequency components, perhaps in the range of 500–800 Hz. On the other
481
hand, for low-frequency noise such as might be encountered in the vicinity of large-capacity, slow-rotation ventilation fans, an alarm with strong mid-frequency components in the range of 1000–1500 Hz might be a better choice. Upward Spread of Masking When considering masking of a tonal signal by a tonal noise or a narrow band of noise, masking is greatest in the immediate vicinity of the masking tone or, in the case of a band-limited noise, the center frequency of the band. (This is one reason why increasing the contrast in frequency between the signal and noise can increase the audibility of a signal.) However, the masking effect does spread out above and below this frequency, being greater at the frequencies above the frequency of the masking noise than at frequencies below the frequency of the masking noise (Egan & Hake, 1950; Wegel & Lane, 1924). This phenomenon, referred to as the upward spread of masking, becomes more pronounced as the level of the masking noise increases, probably due to cochlear distortion. In practical situations, masking by pure tones would seldom be a problem, except in instances where the noise contains strong tonal components or if two warning signals with similar frequencies were activated simultaneously. Although less pronounced, upward spread of masking does occur when band-limited noises are used as maskers. This phenomenon is illustrated in Figure 11. Masking with Broadband Noise A very common form of masking characteristic of typical industrial workplaces or building spaces such as conference rooms or auditoria occurs when a signal or speech is masked by a broadband noise. Also, passenger vehicle interior noises as well as many aircraft cockpit noises have relatively broadband spectra that are effective maskers of speech communications. Laboratory experiments on broadband noise masking commonly employ white or pink
70 80 dB
60
Masking (dB)
50
40
60 dB
30
20 40 dB
10
0 100
200
300 400 500
1000
2000
4000
8000
Frequency (Hz) (of masked tone) Figure 11 Upward spread of masking of a pure tone by three levels (40, 60, and 80 dB) of a 90-Hz-wide band of noise centered at 410 Hz. The ordinate (y axis) is the amount (in decibels) by which the absolute threshold of the masked tone is raised by the masking noise, and the abscissa is the frequency of the masked tone. (Source: Egan and Hake, 1950. © 1950 Acoustical Society of America.)
482
DESIGN FOR HEALTH, SAFETY, AND COMFORT
noise, while specific application experiments employ renditions of the actual vehicular, cockpit noise or workstation noise. White noise sounds very much like static on a radio tuned to a frequency that is between broadcast channels, and it consists of equal energy by hertz, while pink noise sounds like the roar of a waterfall, consisting of a 3-dB-per-octave decrease in energy as frequency increases in hertz. In examining the masking of pure-tone stimuli by white noise, Hawkins and Stevens (1950) found that masking was directly proportional to the level of the masking noise, irrespective of the frequency of the masked tone. In other words, if a given background white noise level increased the threshold of a 2500-Hz tone by about 35 dB, the threshold of a 1000-Hz tone would also be increased by about 35 dB. Furthermore, they found that for the noise levels under investigation, masking increased linearly with the level of the white noise, meaning that if the level of the masking noise was increased 10 dB, the masked thresholds of the tones also increased by about 10 dB. The bottom line is that broadband noise such as white or pink noise, due to its inclusion of all frequencies, serves as a very effective masker of tonal signals and speech. Thus, its abatement often needs to be of high priority. On the other hand, white or pink noises may be useful as in intentional maskers to mask the distractions created by conversations and phone ringers among open-plan offices, although it is certainly debatable whether one noise should be added to combat another noise in this sense. Furthermore, the relatively broadband spectral characteristics of certain aircraft cockpit noises, such as those in small, turboprop airplanes and helicopters, can have deleterious effects on the intelligibility of communications for the cockpit crew members. This calls for careful selection of headsets that provide sufficient passive and/or active attenuation of the noise, thus improving the speech-to-noise (SNR) ratios for the user, and enhancing communications and reducing workload in the process (Casto & Casali, 2013; Valimont, Casali & Lancaster, 2006). Especially in aircraft cockpits characterized by broadband noise spectra with low frequency bias, ANR headsets can offer distinct advantages for both hearing protection as well as speech intelligibility (Casali, 2010b). 7.2.2 Signal Audibility Analysis Based on Critical Band Masking Fletcher (1940) developed what would become critical band theory, which has formed the fundamental basis for explaining how signals are masked by narrowband noise. According to this theory, the ear behaves as if it contains a series of overlapping auditory filters, with the bandwidth of each filter being proportional to its center frequency. When masking of pure tones by broadband noise is considered, only a narrow “critical band” of the noise centered at the frequency of the tone is effective as a masker and the width of the band is dependent only on the frequency of the tone being masked. In other words, the masked threshold of a pure tone could be predicted simply by knowing the frequency of the tone and the spectrum level (decibels per hertz) of the masking noise, assuming that the noise spectrum is reasonably flat in the region around the tone. Thus, the masked threshold of a tone in white noise is per Eq (28): Lmt = Lps + 10 log10 (BW)
(28)
where Lmt is the masked threshold, Lps is the spectrum level of the masking noise, and BW is the width of the auditory filter centered around the tone. Strictly speaking, this relationship applies only when the masking noise is flat (equal energy by hertz) and when the masked signal has a duration greater than 0.1 s. However, an acceptable approximation may be obtained for other
noise conditions as long as the spectrum level in the critical band does not vary by more than 6 dB (Sorkin, 1987). In many environments, the background noise is likely to be sufficiently constant and can often be presumed to be flat in the critical band for a given signal. The exception to this assumption is a situation where the noise has prominent tonal components and/or fluctuates a great deal. The spectrum level of the noise in each of the 1/3-octave bands containing the signal components is not the same as the band level measured using an octave-band or 1/3-octave-band analyzer. Spectrum level refers to the level in decibels per hertz, or the level that would be measured if the noise were measured using a filter that is 1 Hz wide. If it is assumed that the noise is flat within the bandwidth of the 1/3-octave-band filter, the spectrum level, Lps , can be estimated using Eq (29): Lps = 10 log10 (10Lpb∕10 ∕BW1∕3 )
(29)
where Lps is the spectrum level in dB of the noise within the 1/3-octave band, Lpb is the SPL in dB measured in the 1/3-octave band in question, and BW1/3 is the bandwidth of the 1/3-octave band, calculated by multiplying the center frequency (fc ) of the band by 0.232. Finally, the bandwidth of the auditory filter can be approximated by multiplying the frequency of the masked signal/tone by 0.15 (Patterson, 1982; Sorkin, 1987). If the signal levels measured in one or more of the 1/3-octave bands considered exceed these masked threshold levels, the signal should be audible. A computational example using the critical band method appears in Robinson and Casali (2003). 7.2.3 Signal Audibility Analysis and Display Design Based on Consensus Standards and Regulations Before embarking on the design of any auditory signal that is associated with a safety or product-related issue, the designer should first determine if there are any consensus standards, legal regulations or even best practices guideline publications that have bearing and thus provide scientific guidance. In this area of acoustics, the coverage of consensus standards and laws is relatively broad and in depth. Various standards organizations and agencies, both in the United States and internationally, cover auditory display design of many types, including nonverbal designs (i.e., tonal or other displays not including speech) and verbal designs (i.e., inclusive of speech output). Due to the large number of standards and regulations, a listing of all references and discussion thereof is beyond the scope of this chapter. However, detailed coverage is provided by Casali and Tufts (2021). Briefly, examples of major agencies that promulgate standards and/or regulations for auditory displays and alarms, and the applications that are typically covered in their publications include: U.S. Department of Defense (2012) MIL-STD-1472G (warning signals and speech displays); U.S. Department of Defense (2015) MIL-STD-1474E (speech communications evaluation and noise evaluation); National Fire Protection Association (fire and smoke alarms, firefighter-worn alarms, public notifications); Society of Automotive Engineers (emergency vehicle sirens, backup alarms and horns); Federal Aviation Administration (aircraft cockpit signals); U.S. Navy (onboard ship auditory signals); International Electrotechnical Commission (medical device alarms); ANSI (public outdoor notifications, emergency evacuation signals and many others); and ISO (verbal displays, annunciators, evacuation signals, in-vehicle displays, backup alarms and horns, consumer product displays, handicapped-accessibility displays). U.S. Federal law also covers specific signals as locomotive horns (Code of Federal Regulations, 2006).
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
7.2.4 Danger Signal Audibility Analysis Based on ISO Standard 7731:2003 Fortunately, there exists a comprehensive, international, and generally well-accepted standard that applies to evaluation and design of most any acoustic alarm as to its predicted audibility in a specific noise. This is ISO 7731:2003, “Ergonomics—Danger Signals for Public and Work Areas—Auditory Danger Signals” (ISO, 2003). This standard provides guidelines for calculation of the masked threshold of audibility but also specifies the spectral content and minimum signal-to-noise ratios (S/N) of the signals and requires special considerations for people suffering from hearing loss or those wearing HPDs. Application of ISO 7731 (2003) is best illustrated by an example. A warning signal that is quite common is a standard backup alarm typically found on commercial trucks and construction/industrial equipment. It has strong tonal components in the range 1000–2000 Hz and significant harmonic components at higher frequencies (Casali & Alali, 2009). The alarm has a 1-s period and a 50% duty cycle (i.e., it is “on” for 50% of its period). The levels in all other 1/3-octave bands are sufficiently below those in the bands mentioned as to be Table 1
inconsequential. The levels needed for audibility of this signal will be determined for application in a hypothetical masking noise spectrum represented by its 1/3-octave and octave band levels, shown in columns 2 and 4, respectively, in Table 1. 1.
Starting at the lowest octave-band or 1/3-octave-band level available, set the masked threshold (Lmt1 ) for a signal in that band to be: Lmt1 = Lpb1
2.
(30)
where Lpb1 is the SPL measured in the octave band or 1/3-octave band in question. For each successive octave-band or 1/3-octave-band filter n, the masked threshold (Lmtn ) is set to be the noise level in that band or the masked threshold in the preceding band less a constant, whichever is greater: Lmtn = max(Lpbn ; Lmtn−1 − C)
(31)
C equals 7.5 dB for octave-band data or 2.5 dB for 1/3-octave-band data.
Masked Threshold Calculations According to ISO 7731:2003 for 1/3-Octave-Band and Octave-Band Methods
(1) Center frequency (Hz)a 25 31.5 40 50 63 80 100 125 160 200 250 315 400 500 630 800 1,000 1,250 1,600 2,000 2,500 3,150 4,000 5,000 6,300 8,000 10,000 12,500 16,000 20,000
483
(2) 1/3-Octave-band level (dB)
(3) Masked threshold in 1/3-octave band (dB)b
52.0 50.7 42.9 56.4 86.8 83.7 79.7 83.7 82.8 76.5 81.4 81.6 76.3 77.3 73.1 74.4 79.6 73.4 82.6 80.1 85.3 83.7 85.7 88.0 74.2 77.3 58.7 67.4 48.7 53.3
52.0 50.7 48.2 56.4 86.8 84.3 81.8 83.7 82.8 80.3 81.4 81.6 79.1 77.3 74.8 74.4 79.6 77.1 82.6 80.1 85.3 83.7 85.7 88.0 85.5 83.0 80.5 78.0 75.5 73.0
(4) Octave- band level (dB)
(5) Masked threshold in octave band (dB)b
54.7
54.7
88.5
88.5
87.1
87.1
85.1
85.1
80.7
80.7
81.5
81.5
87.9
87.9
90.9
90.9
79.1
83.4
67.6
75.9
Source: Data from ISO, 2003. © 2003 ISO. a Frequencies in boldface type are octave-band center frequencies. b Thresholds in boldface type are the masked thresholds for the signal components of the backup alarm described in the text.
484
For an auditory signal to be “clearly audible,” ISO 7731 requires that at least one of the following be met: (1) the dBA level of the signal must exceed the dBA level of the ambient noise by more than 15 dB; (2) the signal level must exceed the masked threshold by at least 10 dB in at least one octave band; or (3) the signal level must exceed the masked threshold by at least 13 dB in at least one 1/3-octave band. Furthermore, the spectral content of the signal must include frequency components in the range of 500–2500 Hz, and it is recommended that there be two dominant components in the subset range of 500–1500 Hz. Furthermore, to accommodate persons with hearing loss or using hearing protection, “sufficient” signal energy below 1500 Hz is recommended. While the aforementioned broadband dBA measurement is sufficient per ISO 7731, the 1/3-octave band or full-octave band procedures which are computed by Eqs (30) and (31) and exemplified by the data in Table 1 are preferred, due to their higher spectral precision. These procedures (unlike the aforementioned critical band procedure) presume that the auditory filter width is equal to the 1/3-octave band or to the octave band and also takes upward spread of masking into account by comparing the level in the band in question to the level in the preceding band. For example, for the 1250-Hz row of column 3, it can be seen that the masked threshold of the previous 1/3-octave band (1000 Hz) determines, via equation (31), the masked threshold (77.1 dB) of the 1250-Hz band due to upward masking effects. The masked thresholds for each 1/3-octave band and octave band of noise for the example are shown in columns 3 and 5, respectively, in Table 1. For the purposes of the example signal (a backup alarm), only the thresholds for the 1/3-octave bands centered at 1000, 1250, 2000, and 2500 Hz and the threshold for the octave bands centered at 1000 and 2000 Hz are relevant, because these are the signal’s dominant component bands, and they overlap the standard’s spectral requirements noted above. (But if the signal had possessed significant energy below 1000 Hz, then the 1/3-octave bands centered at 500, 630, and 800 Hz would require attention, as would the octave band centered at 500 Hz.) The conclusion is that if the signal levels measured in one or more of these bands slightly exceed the calculated masked threshold levels (as indicated by boldface type), then the backup alarm is predicted to be barely audible. More importantly, to next determine the necessary sound level output of the alarm to render it “clearly audible” per ISO:, to simplify we will assume that the backup alarm’s dominant frequency bands (1000, 1250, 2000, and 2500 Hz) themselves cannot change but their decibel output can be raised. Thus, based on the 1/3-octave analysis, in order for the alarm to be reliably audible, the signal level would have to be at least the following dB values in at least one of these four 1/3-octave bands: at 1000 Hz: 79.6 + 13 = 92.6 dB; at 1250 Hz: 77.1 + 13 = 90.1 dB; at 2000 Hz: 80.1 + 13 = 93.1 dB; at 2500 Hz: 85.3 + 13 = 98.3 dB. Or, based on the octave analysis, in order for the alarm to be reliably audible, the signal level would have to be at least the following dB values in at least one of these two octave bands: at 1000 Hz: 81.5 + 10 = 91.5 dB; at 2000 Hz: 87.9 + 10 = 97.9 dB. Of course, these results are based on the criteria from ISO 7731 for clear audibility and are well above the levels required for threshold audibility with normal hearing. The ISO 7731:2003 standard provides a procedure which may be used to calculate masked thresholds with and without
DESIGN FOR HEALTH, SAFETY, AND COMFORT
HPDs. Calculating a protected masked threshold for a particular signal requires (1) subtracting the attenuation of the HPD from the noise spectrum to obtain the noise spectrum effective when the HPD is worn; (2) calculation of a masked threshold for each signal component using the procedures outlined in the preceding discussion, which results in the signal component levels that would be just audible to the listener when the HPD is worn; and (3) adding the attenuation of the HPD to the signal component thresholds to provide an estimate of the environmental (exterior to the HPD) signal component levels that would be required to produce the under-HPD threshold levels calculated in step 2. Although not difficult, this procedure does require a reasonably reliable estimate of the actual attenuation provided by the HPD. The manufacturer’s data supplied with the HPD are unsuitable for this purpose because they overestimate the real-world performance of the HPD, as explained in Section 4.3.2 herein. Furthermore, if a 1/3-octave band masking computation is desired, the manufacturer’s attenuation data, which are available for only nine selected 1/3-octave bands, are insufficient for the computation. Finally, the standard does not take the listener’s hearing level into account. It is simply assumed that if the calculated masked thresholds are above the listeners’ absolute thresholds, the signals should be audible, and if hearing impairment is at issue, signals should include sufficient energy below 1500 Hz. As alluded to previously, use of the ISO 7731 standard for prediction of masked threshold for auditory signals is not limited to the octave and 1/3-octave calculations discussed herein, although the latter is the most precise method. As a less precise method (which is advocated by this author only as a last resort), ISO 7731 also offers a broadband analysis that can be performed by obtaining the dBA level of the ambient noise, and if the signal exceeds this level by 15 dB, it is said to be audible in most circumstances. However, this does not take into account upward masking or other spectrally-specific effects, and it may result in (unnecessarily) higher signal levels than computed by either spectral technique. ISO 7731 also includes recommendations for signal temporal characteristics with repetition rates from 0.5 to 4 Hz, as follows: pulsating signals are preferred to constant ones, with maximum pulse rates (in Hz) keyed to reception area reverberation times (in seconds [s]) as follows: 0.5 Hz at 8 s, 1 Hz at 4 s, 2 Hz at 2 s, and 4 Hz at 1 s. In addition, the standard calls for a signal to have unambiguous meaning, discriminability, and the addition of redundant visual signals if the ambient noise exceeds 100 dBA. The broadband S/N recommendation of 15 dB of ISO 7731 is generally in keeping with those of auditory researchers. For example, Sorkin (1987) suggested that signal levels 6–10 dB above masked threshold are adequate to ensure 100% detectability, whereas signals which are approximately 15 dB above their masked threshold will elicit rapid operator response. He also suggested that signals more than 30 dB above the masked threshold could result in an unwanted startle response and that no signal should exceed 115 dB. This recommended upper limit on signal level is consistent with OSHA hearing conservation requirements (OSHA, 1983), which prohibits exposure to continuous noise levels greater than 115 dBA. These recommendations are also in line with those of other authors (Deatherage, 1972; Wilkins & Martin, 1982). Masked thresholds estimated via ISO 7731 are not necessarily exact, nor are they intended to be. The “clearly audible” design estimates represent conservative estimates for a large segment of the population representing a wide range of hearing levels for nonspecific noise environments and signals. 7.3 Analysis of Speech Intelligibility in Noise Many of the concepts presented above that relate to the masking of nonspeech signals by noise apply equally well to the masking
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
of speech, so they will not be repeated in this section. However, for the spoken message, the concern is not simply detection but, rather, intelligibility. The listener must understand what was said, not simply know that something was said. Furthermore, speech is a very complex broadband signal whose components are not only differentially susceptible to noise but also highly dependent on vocal effort, the gender of the speaker, and the content and context of the message. In addition, other factors must be considered, such as the effects of HPD use by the speaker and/or listener, hearing loss of the listener, or speech signal degradation occurring in a communications system. 7.3.1 Speech-to-Noise Ratio Influence Similar to the case with nonverbal signals, the signed difference between the speech level and the background noise level is referred to as the speech-to-noise ratio, in this case abbreviated as SNR to distinguish it from signal-to-noise ratio or S/N. The speech level referred to is usually the long-term rms level measured in decibels. When background noise levels are between 35 and 110 dB, an SNR of 12 dB is usually adequate to reach a normal-hearing person’s threshold of intelligibility (Sanders and McCormick, 1993); however, it is quite impossible for anyone to sustain the vocal efforts required in the higher noise levels without electronic amplification (i.e., a public address system). The threshold of intelligibility is defined as “the level at which the listener is just able to obtain without perceptible effort the meaning of almost every sentence and phrase of continuous speech” (Hawkins & Stevens, 1950, p. 11); essentially, this is 100% intelligibility. Intelligibility decreases as SNR decreases, reaching 70–75% (as measured using phonetically balanced words) at an SNR of 5 dB, 45–50% at an SNR of 0 dB, and 25–30% at an ANR of -5 dB (Acton, 1970). At least in low to moderate noise levels, people tend to modulate their vocal effort automatically, using the Lombard reflex, to maintain SNR ratios in increasing background noise so that they can be understood by other people. However, there are two limiting issues. First is that the natural increase in voice output is only about 0.5 dB for each 1 dB increase in incident noise level; thus the Lombard reflex alone does not truly result in maintenance of SNR ratios as noise levels increase. Second is an upper limit to one’s vocal output capability, since uttered speech levels cannot be maintained at more than 90 dB for long periods without amplification (Kryter, 1994). Since a relatively high SNR (12 dB or so) is necessary for reliable speech communications in noise, it should be obvious that in high noise levels (i.e., greater than about 75–80 dB), unamplified speech cannot be relied upon except for short durations over short distances. Furthermore, since speech levels for females tend to be about 2–7 dB less than for males, depending on vocal effort, the female voice is at a disadvantage in high levels of background noise (Kryter, 1994). An additional factor which impinges on the amplitude modulation of one’s own voice is that of the occlusion effect (Lee & Casali, 2011; Stenfelt & Reinfeldt, 2005), which results when the ear canal is occluded, as with an earplug for hearing protection or with a custom-molded, in-the-ear hearing aid. The occlusion effect results from an enhancement of internal bodily conduction of sound that is caused by occlusion of the ear canal and its resultant attenuation of the air conduction pathway, as compared to that which occurs with the open ear where both air conduction and bone conduction feedback are present. Compared to one’s own voice levels in the canal of an open ear, measurably higher SPLs result within an occluded canal, typically with maximal effects measured at about 500 Hz; therefore, the auditory feedback to the occluded person is that his or her own voice sounds louder than with a normally open canal, and this affects amplitude modulation during vocal utterances
485
(Lee & Casali, 2011). Therefore, the speech output is generally lower in level when the ear canal is occluded, the opposite of the desired effect when the speaker is in noise, as is usually the case when wearing hearing protection. The effect is maximized as the entrapped volume of the ear canal is at its largest, such as with a shallowly-inserted earplug, and in addition to making one’s own voice sound louder, it often renders the voice as sounding like it has more bass and resonance. Also, sounds of bodily origin, such as breathing and footfalls, are heard as unnaturally loud (Casali, 2010a). This can cause particular problems for soldiers who may be wearing shallow-insertion, passive earplugs and who are trying to whisper, walk quietly, and otherwise maintain covert operations. 7.3.2 Speech Bandwidth Influence The speech bandwidth extends approximately from 200 to 8000 Hz, with male voices generally having more energy than female voices at the low frequencies (Kryter, 1974); however, the region between 600 and 4000 Hz is most critical to intelligibility (Sanders & McCormick, 1993). This also happens to be the frequency range at which most auditory alarms exist, providing an opportunity for the direct masking of speech by an alarm or warning. Therefore, speech communications in the vicinity of an activated alarm can be difficult. Consonant sounds, which are generally higher in frequency than vowel sounds in the 600–4000 Hz critical speech bandwidth, are also more important than vowels to intelligibility. This fact renders speech differentially susceptible to masking by band-limited noise, depending on the level of the noise. At low levels, bands of noise in the mid- to high-frequency ranges mask consonant sounds directly, thus impairing speech intelligibility more than would low-frequency sounds presented at similar levels. However, at high levels, low-frequency bands of noise can also adversely affect intelligibility due to upward spread of masking into the critical speech bandwidth. When electronic transmission/amplification systems are used to overcome problems associated with speech intelligibility, it is important to understand that the systems themselves may exacerbate the problem if they are not designed properly. Most civilian telecommunications systems (i.e., intercoms, telephones, cell phone systems) do not transmit the full speech bandwidth, nor do they reproduce the entire dynamic range of the human voice. To reduce costs and simplify the electronics, such systems often filter the signal and pass (transmit) only a portion of the speech bandwidth (e.g., the landline telephone passband is generally about 300–3600 Hz). If the frequencies above 4000 Hz or below 600 Hz are filtered out of the voice spectral content (i.e., not transmitted), there is little negative impact on speech intelligibility, even though the voice may not appear as natural or as pleasing as when its full bandwidth is available. However, if the frequencies between 1000 and 3000 Hz are filtered out of the signal, intelligibility is severely impaired (French & Steinberg, 1947). In addition to filtering the speech signal, it is possible to clip the speech peaks so that the full dynamic range of a speaker’s voice is not transmitted to a listener. This clipping may be intentional on the part of the designer to reduce the cost of the system or it may be an artifact of the amplitude distortion caused by an overloaded amplifier. Either way, the effects on intelligibility are the same. Since the speech peaks contain primarily vowel sounds and intelligibility relies predominantly on the recognition of consonants, there is little loss in intelligibility due strictly to peak clipping. However, if the clipping is caused by distortion within the amplifier, there may be ancillary distortion of the speech signal in other ways that could affect intelligibility adversely. One technique for improving intelligibility while not requiring additional amplification is to electronically clip the
486
DESIGN FOR HEALTH, SAFETY, AND COMFORT
2000 Hz (Peterson & Gross, 1978). It is most useful when the spectrum of the background noise is relatively flat and intended only as an indication of whether or not there is likely to be a communications problem, not as a predictor of intelligibility. If the background noise is not flat, is predominated by or contains strong tonal components, or fluctuates a great deal, the utility of the PSIL is lessened. As an example of PSIL application, the hypothetical octave-band noise spectrum presented earlier in column 4 of Table 1 can be used. The PSIL for this spectrum, using the SPL values of octave bands centered at 500, 1000, and 2000 Hz, is (80.7 + 81.5 + 87.9)/3 = 83. With this information, Figure 12 can be consulted to determine how difficult verbal communication is likely to be in this noise. At a PSIL of 83, verbal communications will be “difficult” at any speaker–listener distance greater than about 1.5 ft. Even at closer distances, a “raised” or “very loud” voice must be used. If octave-band levels are not available, the A-weighted sound level may also provide rough guidance concerning the speech-interfering effects of background noise, also shown in Figure 12. In summary, the PSIL is a useful, simple tool for estimating the degree of difficulty that can be expected when verbal communications are attempted in a relatively steady, spectrally-uniform background noise.
peaks and then re-amplify the resultant signal; this will result in the lower power consonants being increased in sound level relative to the higher power vowels in the peaks (Sanders & McCormick, 1993). 7.3.3 Acoustic Environment Influence The acoustic environment (i.e., room volume, distances, barriers, absorption, reverberation time, etc.) can also have a dramatic effect on speech intelligibility. This is a complex subject involving sound propagation and architectural acoustics, and a detailed treatment is beyond the scope of this chapter, but more information may be found in Kryter (1974, 1994) and Harris (1991). One fairly obvious point is that as the distance between the listener and the speech source (person or loudspeaker) increases, the ability to understand the speech can be affected adversely if the SNR ratio decreases sufficiently. In the same vein, barriers in the source–receiver path can create shadow zones behind the barrier, within which the SNR ratio is insufficient for reliable intelligibility. Finally, speech intelligibility decreases linearly as reverberation time increases. Reverberation time (RT60 ) is defined for a given space as the time (in seconds) required for a steady sound to decay by 60 dB from its original value after being shut off. Each 1-s increase in reverberation time will result in a loss of approximately 5% in intelligibility, assuming other factors are held constant (Fletcher, 1953). Thus, rooms with long reverberation times, producing an echo effect, will not provide good conditions for speech understanding.
7.3.5 Speech Intelligibility Analysis Based on the Speech Intelligibility Index (SII) and Extended SII In contrast to the PSIL, a more precise analytical prediction of the interfering effects of noise on speech communications may be conducted using the speech intelligibility index (SII) technique defined in ANSI S3.5-1997(R2017) (ANSI, 2017). Essentially, this well-known standardized technique utilizes a weighted sum of the SNR ratios in specified frequency bands to compute an SII score ranging between 0.0 and 1.0, with higher scores indicative of greater predicted speech intelligibility. While the end result is an SII score on a simple scale of 0.0–1.0, the process of measurement and calculation is complex by comparison to the aforementioned PSIL. However, the SII is more accurate than the PSIL, broader in its coverage, and can account for many additional factors, such as speaker vocal effort, room reverberation, monaural versus binaural listening, hearing loss, varying message content, hearing protector effects, communications system gain, and the existence of external masking noise. Four calculation methods are available with the SII: the critical band method (most accurate), the 1/3-octave band
7.3.4 Speech Intelligibility Analysis Based on the Preferred Speech Interference Level There are a number of techniques to analyze or in some cases to predict accurately the intelligibility of a speech communications system based on empirical measurements of an incident noise and, in some cases, additional measurements of the system’s speech output, be it amplified or live unamplified voice. A variety of techniques are covered in Sanders and McCormick (1993) and Kryter (1994). However, one of the better known techniques, the preferred speech interference level (PSIL), which involves only measurements of the noise and is straightforward to administer (although limited in its predictive ability), warrants discussion here. The PSIL is the arithmetic average of the noise levels measured in three octave bands centered at 500, 1000, and
Distance from speaker to listener (ft)
t e ou ic vo Sh ud lo e ic vo ry
Ex
POSSIBLE With “normal voice”
pe
cte
Co
dv
.v
oc
al
it
fo
ra
m
ef
pl
ifi
fo
rt
ed
sp
ee
mm
oic
DIFFICULT
un
ch
IMPOSSIBLE
ica
el ev
2
m
ax
d
8 4
Ve
al
m
se
ai
R
or
N
16
Li
M
el
ting
voi
ce
1 0.5 PSIL 40 SIL 37 LA 47 Figure 12 1993.)
50 47 57
60 57 67
70 67 77
80 77 87
90 87 97
100 97 107
110 107 117
120 117 127
Relationship among PSIL, speech difficulty, vocal effort, and speaker–listener separation. (Source: Sanders and McCormick,
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
method, the equally contributing critical band method, and the octave-band method (least accurate) (ANSI, 2017). At a minimum, the calculations require knowledge of the spectrum level of the speech and noise as well as the listeners’ hearing thresholds. Where speech spectrum level(s) are unavailable or unknown, the standard offers guidance in their estimation with exemplary values provided. Although quite flexible in the number and types of conditions to which it can be applied, application of the standard is limited to natural speech, otologically-normal listeners with no linguistic or cognitive deficiencies, and situations that do not include sharply filtered bands of speech or noise. Software programs for calculation of the SII may be obtained at the following website address: http:// www.sii.to/html/programs.html. The SII “score” actually represents the proportion of the speech cues that would be available to the listener for “average speech” under the noise/speech conditions for which the calculations were performed. Hence, intelligibility is predicted to be greatest when the SII = 1.0, indicating that all of the speech cues are reaching the listener, and poorest when the SII = 0.0, indicating that none of the speech cues are reaching the listener. The general steps used in calculating the SII and estimating intelligibility are beyond the scope of this chapter, but they may be found in the standard itself, ANSI S3.5-1997(R2017) (ANSI, 2017), or in paraphrased terms with examples in Robinson and Casali (2003). A limitation of the SII model is that it employs a long-term speech and noise spectrum and thus was designed and validated for masking noises that are stationary, that is, invariant over time. However, in fluctuating noises, speech intelligibility for the normal hearer may be different, and often better, than that in stationary noises because the listener can benefit from the relatively quiet periods in the noise. In an effort to extend the SII model to accommodate nonstationary noises, Rhebergen and others (e.g., Rhebergen, Versfeld & Dreschler, 2006) developed and validated the Extended SII. The basic principle is that the system’s speech output and its noise environment are partitioned into segments or time frames over the time course of the speech presentation. Within each frame, the SII is calculated to provide an instantaneous value; next, the instantaneous SII values are averaged to produce an SII for that particular speech-in-noise condition (Rhebergen et al., 2006). Extant data for these metrics indicate that the extended SII provides accurate predictions for a variety of time-variant noise conditions. This method improves prediction of speech reception thresholds in fluctuating noise (which is a common masking situation), thus adding practical value to the utility of the standardized SII (ANSI, 2017). 7.3.6 Speech Intelligibility Experimental Test Methods In lieu of analytical techniques such as the PSIL and the SII, both of which require spectral measurements, an alternative (or complementary) approach is to conduct an experiment to measure intelligibility for a given set of conditions with a group of human listeners. For this purpose, there exists a standard, ANSI S3.2-2009(R2020), which provides not only guidance for conducting such tests but also three alternative sets of standard speech stimuli (ANSI, 2020b). The standard is intended for designers and manufacturers of communications systems and provides valuable insight into the subject of speech intelligibility and how various factors associated with the speaker, transmission path/environment, and listener can affect it. The standard accommodates empirical testing of intelligibility in the following situations: indoors or outdoors, speaking face-to-face or in the vicinity, telephonic systems, public address systems, radio systems, and complex systems that include air, wire, wireless, fiber optics, and water transmission paths that are
487
applied in certain military, remote, or emergency systems. Thus, ANSI S3.2-2009(R2020) can be of benefit to the human factors designers/evaluators of many types of communications systems when empirical measurement of speech intelligibility performance is necessary for evaluation, acceptance, or proof-of-performance efforts (ANSI, 2020b). Although space does not permit a detailed description of the procedures, the ANSI S3.2 standard’s strategy involves presenting speech stimuli to a listener in an environment that replicates the conditions of concern and measuring how much of the speech message is understood. The speech stimuli may be produced by a trained talker speaking directly to the listener while in the same environment or via an intercom system. Alternatively, the materials may be recorded and presented electronically through an audio system. Use of recorded stimuli and/or electronic presentation of the stimuli offers the greatest control over the speech levels presented to the listener. Further guidance on empirical speech system testing with human subjects can be found in MIL-STD 1474E (U.S. Department of Defense, 2015). 7.4 Other Considerations for Signal Audibility and Speech Intelligibility 7.4.1 Distance Effects It cannot be overemphasized that the noise and signal levels referred to in the analysis techniques above refer to the levels measured at the listener’s location. Measurement made at some central location or the specified output levels of the alarm or warning devices are not representative of the levels present at a given workstation and cannot be used for masked threshold calculations. In a free-field, isotropic environment, the sound level of an alarm or warning will decrease in inverse relationship to the distance from the source, in accordance with Eq (32): p1 ∕p2 = d2 ∕d1
(32)
where p1 and p2 are the sound pressures of the signal at distances d1 and d2 , respectively, in micropascals or dynes per square centimeters, and d1 and d2 are, respectively, distance 1 (near point) and distance 2 (far point) at which the signal is measured, in linear distance units. Alternatively, where the drop between distance 1 and 2 in the SPL of the signal is desirable in decibels, the result is yielded by Eq (33): SPLdrop = 20 log10 (d2 ∕d1 )
(33)
As noted before, the free field is a region where there are no barriers to sound propagation nor surfaces causing sound reflections that result in reverberation, and the sound level decreases by 6 dB for each doubling of distance from its original source (Driscoll, 2020). The above formulas provide accurate results in outdoor environments where there are no barriers, such as trees, or reflecting planes, such as paved parking lots. Indoors, the formulas will typically overestimate the drop in signal level, where reflective surfaces reinforce the signal as it propagates, as characteristic of a reverberant field. 7.4.2 Barrier Effects Furthermore, buildings or other large structures in the source–receiver path can create “shadow zones” in which little or no sound is audible. It is for these reasons that the U.S. Department of Defense (2015) recommends that frequencies below 1000 Hz be used for outdoor alarms since low frequencies are less susceptible to atmospheric absorption and diffract more readily around barriers. Similar problems can be encountered indoors as well. Problems associated with the general decrease
488
in SPL with increasing distance as well as shadow zones created by walls, partitions, screens, and machinery/vehicles must be considered. Since different materials reflect and absorb sound depending on its frequency, not only do the sound levels change from position to position, but the spectra of both the noise and signals/speech can change as well. Finally, since most interior spaces reverberate to some degree, the designer should also be concerned with phase differences between reflected sounds, which can result in superposition effects of enhancement or cancellation of the signals and speech from location to location. It is for all these reasons that it is necessary to know the SPL at the listener’s location when considering masked thresholds. 7.4.3 Hearing Protection Effects and Auditory Situation Awareness HPDs are often blamed for exacerbating the effects of noise on the audibility of speech and signals, although, at least for people with normal hearing, protectors may actually facilitate hearing in some noisy situations. Overall, the research evidence on normal hearers generally suggests that conventional passive HPDs have little or no degrading effect on the wearer’s understanding of external speech and detection of signals in ambient noise levels above about 80 dBA and may even yield some improvements, with a crossover between disadvantage and advantage between 80 and 90 dBA. However, HPDs do often cause increased misunderstanding and poorer detection (compared to unprotected conditions) in lower sound levels, where HPDs are not typically needed for hearing defense anyway but may be applied for reduction of annoyance (Casali, 2006). In intermittent noise, HPDs may be worn during quiet periods so that when a loud noise occurs, the wearer will be protected. However, during those quiet periods, the conventional passive HPDs typically reduce hearing acuity. In certain of these cases, the family of amplitude-sensitive augmented HPDs may be beneficial, including those that provide, during quiet periods, minimal or moderate passive attenuation via acoustic orifice-based filters or valving systems (or, alternatively, more amplification of external sounds via electronically-modulated sound transmission through the HPD) but then also provide increased passive attenuation (or less amplification) as the incident noise increases. However, the real performance effects of these and other augmented HPDs are very situation-specific, and the interested reader is pointed to the reviews of passive devices in Casali (2010a) and active (battery electronic) devices in Casali (2010b). Noise- and age-induced hearing losses generally occur in the high-frequency regions first, and for those so impaired, the effects of conventional passive HPDs on speech perception and signal detection are not clear-cut. Due to their already elevated thresholds for mid- to high-frequency speech sounds being raised further by the protector, hearing-impaired persons are usually disadvantaged in their hearing by conventional HPDs. Although there is no consensus across studies, certain reviews have concluded that sufficiently hearing-impaired persons will usually experience additional reductions in communications abilities with conventional HPDs worn in noise. In some instances, HPDs with electronic hearing-assistive circuits, sometimes called sound-transmission or sound restoration HPDs, can be offered to hearing-impaired persons to determine if their hearing, especially in quiet to moderate noise levels below about 85 dBA, may be improved with such devices while still receiving a measure of protection. However, as noted above, the realized benefits of such devices are very dependent upon the particular signal-in-noise situation as well as the individual’s particular hearing loss (Casali, 2010b). Conventional passive HPDs cannot differentiate or selectively pass speech or nonverbal signal (or speech) energy versus noise energy at a given frequency. Therefore, conventional
DESIGN FOR HEALTH, SAFETY, AND COMFORT
HPDs do not improve the S/N ratio in a given frequency band, which is the most important factor for achieving reliable signal detection or intelligibility. Conventional HPDs also typically attenuate high-frequency sound more than low-frequency sound, thereby attenuating the power of consonant sounds that are important for word discrimination as well as most warning signals, both of which lie in the higher frequency range, while also allowing low-frequency noise through. Thus, the HPD may enable an associated upward spread of masking to occur if the penetrating noise levels are high enough. Certain augmented HPD technologies help to overcome the weaknesses of conventional HPDs as to low-frequency attenuation in particular; these include the aforementioned active noise reduction (ANR) devices, which via electronic phase-derived cancellation of noises below about 1000 Hz improve the low-frequency attenuation of passive HPDs. Concomitant benefits of ANR-based HPDs can include the reduction of upward spread of masking of low-frequency noise into the speech and warning signal bandwidths, as well as reduction of noise annoyance in environments that are dominated by low frequencies, such as certain aircraft cockpits, especially propeller-driven fixed-wing planes and helicopters, and passenger cabins of commercial aircraft (Casali et al., 2004; Casali, 2010b). In any case, by far the most commonly used HPDs in most settings are relatively simple, conventional products for which the paramount objective is the passive attenuation of noise and thus the prevention of noise-induced hearing loss. However, as noted earlier in this chapter, there are many workplaces, the military, and leisure situations wherein noise may be a hazard to the ears, but there also exists a critical need to hear external signals and/or speech and, in general, to maintain one’s auditory situation awareness. This requires HPD technology that provides adequate protection while simultaneously being essentially “transparent” when passing signals and speech through to the ear, and affording near-natural hearing. Therefore, both passive and active (battery electronic) augmentations in hearing protection designs, particularly those that attempt to maintain situation awareness while simultaneously providing protection against gunfire and other noises, have been made and are the subject of reviews by Casali (2010a—passive, 2010b—active). Nonetheless, human factors research since about 2000, as reviewed in Casali (2018), has demonstrated that even the most advanced Tactical Communications and Protective Systems (TCAPS), which provide covert two-way communications, signal/speech pass-through capabilities, and sudden gunfire-responsive protection, can significantly compromise auditory situation awareness for the user. There are large differences among devices, and also among certain types of signals (e.g., broadband-frequency weapons fire vs. high-frequency weapons “cocking”) which results in differences in detection, recognition and localization of the signals through an HPD or TCAPS. The reasons for deleterious effects of certain HPDs and TCAPS on auditory situation awareness for the user encompass a myriad of design issues, in view that any device that covers or seals the ears, and passes external acoustic signals through to the ears (either passively or with amplification), leaves an artificial “imprint” on that signal. This may result from something as simple as covering the pinnae of the ears with earmuff cups, which obviate the pinnae’s important role in sound localization, to more complex gain (amplification circuit) and compression (electronic sudden shutoff) influences on a signal’s spectral and temporal signature when it is electronically passed through the protector (Casali, 2018). Because the effects of HPDs and TCAPS on auditory situation awareness have such profound implications for military mission operational effectiveness as well as consistent use of the products for protection against high-level noise, objective measurement of the effects is important. As such, measurement systems and protocols have been
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
developed and validated for this purpose, one of which is the “DRILCOM” system, which measures both open ear (without protector) and occluded (with protector) performance on four distinct task elements that comprise auditory situation awareness: Detection, Recognition-Identification, Localization, and Communications (e.g., Lee & Casali, 2017). It is evident that at least at the stage of technology as of this chapter’s publication, augmented HPDs and TCAPS do not truly provide “transparency” for external, incoming signals and speech that is equivalent to the natural hearing provided by the open ear (Casali, 2018). Thus, it has been advocated that users of these augmented HPD and TCAPS systems be trained prior to deployment with them, in hopes of bolstering their auditory situation awareness while wearing the devices. As such, it has been demonstrated that at least with the open ear, as well as with certain (but not all) devices, improvements in such auditory abilities as sound localization can be made with structured training regimens using auditory simulators that are designed for this purpose. More information on this subject may be found in Cave, Thompson, Lee, and Casali (2019) and Lee and Casali (2019). 7.4.4 Hearing-Aided Users People with a hearing loss sufficient to require the use of hearing aids are already at a disadvantage when attempting to hear auditory alarms, warnings, or speech, and this disadvantage is exacerbated when noise levels are high. Activation of hearing aids in high levels of noise so as to improve hearing of speech or signals can increase the risk of additional damage to hearing due to amplification of the ambient noise (Humes & Bess, 1981). But shutting off the hearing aids increases the chance that the signals will be missed, and since it has been shown that vented and open-fit hearing aid inserts do not function well as hearing protectors, there is still a risk of further hearing damage by doing so. It has been suggested that some benefit may be obtained if a programmable in-the-ear (ITE) hearing aid is worn under an earmuff (Berger, 1987); however, doing so offers no guarantee that the individual will indeed be able to hear the necessary alarms. Furthermore, the hearing aid’s output may add to the total noise exposure, especially if its volume is turned up under the muff, and its contribution to the overall exposure is not measurable without specialized instrumentation. Other options for hearing-aided personnel include the location of a redundant visual warning near the worker to augment the auditory warnings, or provision of personally-worn vibratory pagers or other messaging devices. These devices, however, require integration into the facility’s warning systems. With the lack of quantitative research in this area, no blanket recommendation can be made to accommodate all individuals with hearing impairment. Each case must be evaluated individually, different candidate solutions contemplated and implemented on a trial basis, and a workable solution amenable to both the employer and employee ascertained. In some cases, it may be necessary to reassign the employee from a hearing-critical job (Casali & Tufts, 2021). 7.5 Summary for Improving Audibility and Reducing Effects of Noise on Signals and Speech 7.5.1 Recommendations for Improving Signal Audibility The following principles regarding masking effects on nonverbal signals are offered as a summary for general guidance:
489
2. 3.
4.
5.
6.
7.
8.
9.
10.
11. 1. Determine and apply the most appropriate consensus standard or regulation for guidance in selecting or designing an alarm or warning device. Even though standards or regulations may not exist that cover the
specific application, those that do exist can still be used to guide the selection process. Alarm and warning signals should have fundamental frequencies in the range between about 800 and 2000 Hz. Where hearing loss is an issue for signal recipients, alarm signals should include sufficient energy below 1500 Hz, for example, as per the ISO 7731:2003 standard. If the signal and masker are tonal in nature, the primary masking effect is at the fundamental frequency of the masker and at its harmonics, so avoid these frequencies in the signal itself. For instance, if a masking noise has primary frequency content at 1000 Hz, this frequency and its harmonics (2000, 3000, 4000, etc.) should be avoided. As the decibel level of the masker increases, the masking effect will spread upward in frequency, often causing signal frequencies which are higher than the masker to be missed (i.e., upward masking). Since some warning signal guidelines recommend that midrange and high-frequency signals (about 1000– 4000 Hz) be used for detectability, it is important to consider that the masking effects of noise dominated by lower frequencies can spread upward and cause interference in this range. Therefore, if the noise has its most significant energy in this range, a low-frequency signal, say, 500–750 Hz, may be necessary. Signals intended for use outdoors or that need to be heard over great distances should have fundamental frequencies below 1000 Hz so they will be less susceptible to atmospheric and ground absorption, and at 500 Hz or below if significant structural barriers are present to enable diffraction around and/or transmission through the barriers. Signals should be at least 6–10 dB, and preferably 15 dB, above their masked thresholds. At least 10 dB of S/N is desirable in at least one octave band from 200– 8000 Hz, and 20 dB of S/N is desirable if the signal falls within the critical band of the masking noise. Signals that are more than 20–25 dB above their masked thresholds may elicit an undesirable startle response from individuals in the area. Furthermore, the contribution of particularly intense signals, especially those at or above 115 dBA, to hearing hazard risk must be considered; however, in certain situations, they may be necessary to avoid catastrophe and/or serious injury or death (per ISO, 2003). In extremely loud environments of about 110 dB and above, nonauditory signal channels such as visual and vibrotactile should be considered as alternatives to auditory displays. They should also be used for redundancy in some lower-level noises where the auditory signal may be overlooked or it blends in as the background noise varies, and also where people who have hearing loss must attend to the signal. Limit the number of discrete signals to seven or eight, and preferably not more than four for absolute discrimination (U.S. Department of Defense, 2012). More than this could cause confusion as to the meanings of the signals and overtax auditory memory. When localization of signals in azimuth is important, include frequency content below 1000 Hz to capitalize on interaural time differences and above 3500 Hz to capitalize on interaural level differences. If localization in elevation is important, ensure that the listener
490
DESIGN FOR HEALTH, SAFETY, AND COMFORT
12.
13.
14. 15.
16.
17.
18.
19.
20.
can face the sound source to maximize the use of pinna cues in the frontal plane. Ensure that the signals contrast well against the background noise and that the various signals sound sufficiently different to avoid confusion. Spectral and temporal contrast are useful in this regard. Whenever possible, calculate the audibility of signals in the noise in which they are to be heard using a standardized analytical method, and thereafter verify the results using a listening test. Perhaps the most comprehensive and defensible calculation method is outlined in ISO 7731:2003 (ISO, 2003). Signal and noise levels at the listener’s location should be used for all such calculations, and not simply the levels at a general or average location. Signals should be selected so that their perceived urgency matches the condition to which they call attention. This applies both to iconic signals that convey meaning, as well as to “earcon” or tonal signals, which have no inherent meaning. Signal levels and frequency of occurrence should be closely monitored to ensure that they do not add to the noise exposure of the employees in the area. If the noise level is such that excessively high signal levels are required, then measures should be taken to reduce the background noise. Include material on the audibility of auditory signals in the training provided to the workforce and stress the potentially positive as well as negative effects that HPDs can have on the audibility of such signals, including implications for situation awareness. If HPDs or TCAPS are to be used by operators in work or military situations, the operators should receive prior training and experience with them on representative auditory tasks that involve situation awareness elements. When HPDs or TCAPS are to be used in dynamic workplace or military situations where maintenance of auditory situation awareness is important, then those devices must be evaluated as to their effects on the component tasks involved in situation awareness (detection, recognition/identification, localization, communication) prior to selection and deployment. Encourage input and feedback from employees regarding the audibility of signals in their workplaces. Such information can identify problem areas before an accident occurs.
3.
4.
5.
6.
7.
8.
7.5.2 Recommendations for Improving Speech Communications Intelligibility Just as for the audibility of nonverbal signals noted above, speech communications will be negatively affected as noise levels increase, with degradation of the intelligibility of the heard message. Thus, the following principles regarding masking effects on speech signals are offered as a summary for general guidance: 1. Whenever possible, quantify the speech reception and intelligibility problem using the most appropriate computational, standardized method available (such as PSIL or SII). 2. Noise levels and, if possible, speech levels at the listener’s location should be used for all speech intelligibility computational predictions.
9.
10.
Speech intelligibility predictions based on simple SNR ratios alone should normally not be relied on in isolation. However, in general terms, SNRs of 15 dB or higher should result in 80% intelligibility performance for normal-hearing persons in broadband noise (Acton, 1970). Above speech output levels of about 85 dBA, the voice begins to distort and there is some decline in intelligibility even if the SNR is held constant (Pollack, 1958). Electronic speech communications systems should reproduce speech frequencies accurately in the range of 500–6000 Hz, which encompasses the most sensitive range of hearing and includes the speech sounds important for message understandability. More specifically, because much of the information required for word discrimination lies in the consonants which are toward the higher end of this frequency range and of low power (while the power of the vowels is in the peaks of the speech waveform), the use of electronic peak clipping and re-amplification of the waveform may improve intelligibility because the power of the consonants is thereby boosted relative to the vowels. Furthermore, to maintain intelligibility it is critical that frequencies in the bandwidth between at least 1000-3000 Hz be faithfully reproduced in electronic communication systems. Filtering out of frequencies outside this range will not appreciably affect word intelligibility but will influence the quality of the speech (Sanders & McCormick, 1993). Whenever possible, decrease the distance between the speaker and the listener and encourage the use of hand and facial cues, although these are recommended only as ancillary aids that should not take the place of adequate verbal stimuli. Encourage employees to speak more forcefully to overcome the tendency to lower the voice while wearing HPDs, keeping in perspective that high vocal efforts should not be continued for long periods because of the potential of irritation of the speaker’s vocal tract. Include material on speech intelligibility in noise in employee training and incorporate the potentially positive as well as negative effects that HPDs and headsets can have on communications. Select HPDs that are appropriate for the noise environment and the task, including special augmented HPDs that have passive or active systems for communications pass-through and are adequate for the exposure but do not overprotect in terms of spectral attenuation. In particular, excessive HPD attenuation in the frequencies above 1000 Hz has the potential to reduce the level of the consonants important to intelligibility to the point that these critical speech sounds become inaudible; such attenuation may also affect detectability of warning signals. Provide high-attenuation communications headsets in noisy situations where voice understanding is critical. In intense noise that is dominated by low frequencies below about 1000 Hz, active noise reduction devices may prove helpful in that their low-frequency attenuation bias can help reduce upward spread of masking into the speech bandwidth. With headsets, provide noise-rejecting or noisecancelling microphones for transduction of speech, or use alternative technologies such as in-ear canal microphones if circumstances warrant.
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE
11. Improve message content by encouraging and implementing consistent sentence construction for standard messages. 12. Avoid the use of single letters, and use whole words (e.g., a phonetic alphabet such as alpha, bravo, charlie, etc.) or complete sentences whenever possible. 13. Recognize that if given a choice, actual human speech typically results in higher intelligibility in noise than that of computer-generated speech, and there are also differences among synthesizers as to their intelligibility (Morrison & Casali, 1994). However, speech synthesizers have improved markedly in the past decade. 14. Encourage input and feedback from employees regarding speech intelligibility in their workplaces. Such information can identify problem areas before an accident occurs.
REFERENCES Acton, W. I. (1970). Speech intelligibility in a background noise and noise-induced hearing loss. Ergonomics, 13(5), 546–554. American National Standards Institute (ANSI) (1974). Method for the measurement of real-ear protection of hearing protectors and physical attenuation of earmuffs. ANSI S3.19-1974. New York: ANSI. American National Standards Institute (ANSI) (2010). Methods for the measurement of insertion loss of hearing protection devices in continuous or impulsive noise using microphone-in-real-ear or acoustic test fixture procedures. ANSI S12.42-2010. New York: ANSI. American National Standards Institute (ANSI) (2014). American national standard electroacoustics – sound level meters Part 1: specifications. ANSI S1.4-2014. New York: ANSI. American National Standards Institute (ANSI) (2016). Methods for measuring the real-ear attenuation of hearing protectors. ANSI S12.6-2016. New York: ANSI. American National Standards Institute (ANSI) (2017). Methods for the calculation of the speech intelligibility index. ANSI S3.5-1997(R2017). New York: ANSI, American National Standards Institute (ANSI) (2020a). Specification for personal noise dosimeters. ANSI S1.25-1991(R2020). New York: ANSI. American National Standards Institute (ANSI) (2020b). Methods for measuring the intelligibility of speech over communications Systems. ANSI S3.2-2009(R2020). New York: ANSI. Berger, E.H. (1987). EARLog #18 — Can hearing aids provide hearing protection? American Industrial Hygiene Association Journal, 48(1), A20–A21. Berger, E. H. (2003). Hearing protection devices. In E. H. Berger, L. H. Royster, J. D. Royster, D. P. Driscoll, & M. Layne (Eds.), The noise manual (revised 5th ed., pp. 379–454). Fairfax, VA: American Industrial Hygiene Association. Berger, E. H., & Voix J. (2020). Hearing protection devices. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Casali, J. G. (1999). Litigating community noise annoyance: a human factors perspective. In Proceedings of the 1999 Human Factors and Ergonomics Society 42nd Annual Conference, Houston, TX, September 27–October 1 (pp. 612–616). Casali, J. G. (2006). Hearing protection devices: Regulation, current trends, and emerging technologies. In C. G. LaPrell, D. Henderson, R. R. Fay, & A. N. Popper (Eds.), Noise-induced hearing loss: Scientific advances (pp. 257–284). New York: Springer. Casali, J. G. (2010a). Passive augmentations in hearing protection technology circa 2010 including flat-attenuation, passive level-dependent, passive wave resonance, passive adjustable
491 attenuation, and adjustable-fit devices: Review of design, testing, and research. International Journal of Acoustics and Vibration, 15(4), 187–195. Casali, J. G. (2010b). Powered electronic augmentations in hearing protection technology circa 2010 including active noise reduction, electronically-modulated sound transmission, and tactical communications devices: Review of design, testing, and research. International Journal of Acoustics and Vibration, 15(4), 168–186. Casali, J. G., & Alali, K. (2009). Vehicle backup alarm localization (or not): Effects of passive and electronic hearing protectors, ambient noise level, and backup alarm spectral content. In Proceedings of the Human Factors and Ergonomics Society 53rd Annual Meeting, San Antonio, TX, October 19–23 (pp. 1617–1621). Casali, J. G., & Lee, K. (2018). Auditory situation awareness: The conundrum of providing critical aural cues while simultaneously protecting hearing, with implications for training. Spectrum, 35(3), 12–28. Casali, J. G., & Robinson, G. S. (1999). Noise in industry: Auditory effects, measurement, regulations, and management. In W. Karwowski & W. Marras (Eds.), Handbook of occupational ergonomics (pp. 1661–1692). Boca Raton, FL: CRC Press. Casali, J. G., Robinson, G. S., Dabney, E. C., & Gauger, D. (2004). Effect of electronic ANR and conventional hearing protectors on vehicle backup alarm detection in noise. Human Factors, 46(1), 1–10. Casali, J. G., & Tufts, J. B. (2021). Auditory situation awareness and speech communication in noise. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Casto, K. L., & Casali, J. G. (2013). Effects of headset, flight workload, hearing ability, and communications message quality on pilot performance. Human Factors, 55(3), 486–498. Cave, K. M., Thompson, B., Lee, K. & Casali, J. G. (2019). Optimization of an auditory azimuth localization training protocol for military service members. International Journal of Audiology, 59(Suppl. 1), 1708–1886. Code of Federal Regulations (2006). Locomotive horn. Code of Federal Regulations, § 229.129, 71 FR 47666, August 17. Crocker, M. (Ed.) (1998). Handbook of acoustics. New York: Wiley. Deatherage, B. H. (1972). Auditory and other sensory forms of information presentation. In, H. P. Van Cott & R. G. Kincade (Eds.), Human engineering guide to equipment design (pp. 123–160). New York: Wiley. Defense Occupational and Environmental Health Readiness System Data Repository (DOEHRS-DR) (2007). U.S. Army Center for Health Promotion and Preventative Medicine, Aberdeen Proving Ground, MD. Defense Occupational Health Readiness System Data Repository (DOEHRS-DR) (2016), U.S. Army Public Health Command, Aberdeen Proving Ground, MD. https://doehrswww.apgea.army .mil/doehrsdr Driscoll, D. P. (2020). Noise control engineering. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Driscoll, D. P., Stewart, N. D., Anderson, R. A., & Leasure, J. (2020). Community noise. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Earshen, J. J. (1986). Sound measurement: instrumentation and noise descriptors. In E. H. Berger, W. D. Ward, J. C. Morrill, & L. H. Royster (Eds.), Noise and hearing conservation manual (pp. 38–95). Akron, OH: American Industrial Hygiene Association. Egan, J. P., & Hake, H. W. (1950). On the masking pattern of a simple auditory stimulus. Journal of the Acoustical Society of America, 22(5), 622–630. Elbit, G., & Hansen, M. (2007). Mass law – calculations and measurements. SAE Technical Paper Series, 2007-01-2201, Warrendale, PA.
492 Fidell, S. M., & Pearsons, K. S. (1997). Community response to environmental noise. In M. L. Crocker (Ed.), Encyclopedia of acoustics (pp. 1083–1091). New York: Wiley. Flamme, G. A., & Murphy, J. (2020). Brief high-level sounds. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12, 47–65. Fletcher, H. (1953). The masking of pure tones and of speech by white noise. In Speech and hearing in communication. Princeton, NJ: Van Nostrand Reinhold. French, N., & Steinberg, J. (1947). Factors governing the intelligibility of speech sounds. Journal of the Acoustical Society of America, 19, 90–119. Gerges, S., & Casali, J. G. (2007). Ear protectors. In M. L. Crocker (Ed.), Handbook of noise and vibration control (pp. 364–376). New York: Wiley. Gold, S. (2003). Clinical management of tinnitus and hyperacusis. American Speech-Language-Hearing Association Leader, 8(20), 4–25. Groenwold, M. R., Tak, S., & Matterson, E. (2011). Severe hearing impairment among military veterans-United States, 2010. Journal of the American Medical Association, 306(11), 1192–1194. Harris, C. M. (Ed.). (1991), Handbook of acoustical measurements and noise control. New York: McGraw-Hill. Hawkins, J. E., & Stevens, S. S. (1950). The masking of pure tones and of speech by white noise. Journal of the Acoustical Society of America, 22(1), 6–13. Humes, L. E., & Bess, F. H. (1981). Tutorial on the potential deterioration in hearing due to hearing aid usage. Journal of Speech and Hearing Research, 24(1), 3–15. International Organization for Standardization (ISO) (2003). Ergonomics--danger signals for public and work areas--auditory danger signals, ISO 7731:2003, Geneva: ISO. Switzerland. International Organization for Standardization (ISO) (2010). Acoustics – determination of sound power levels and sound energy levels of noise sources using sound. Pressure - Engineering/survey methods for use in-situ in a reverberant environment, ISO 3747:2010, Geneva: ISO. Kryter, K. D. (1974). Speech communication. In H. P. Van Cott & R. G. Kincade (Eds.), Human engineering guide to equipment design (pp. 161–226). New York: Wiley. Kryter, K. D. (1994). The handbook of hearing and the effects of noise. New York: Academic Press. Lee, K., & Casali, J. G. (2011). Investigation of the auditory occlusion effect with implications for hearing protection and hearing aid design. In Proceedings of the Human Factors and Ergonomics Society 55th Annual Meeting. Las Vegas, NV, September 19–23 (pp. 178–187). Lee, K., & Casali, J. G. (2017), Development of an auditory situation awareness test battery for advanced hearing protectors and TCAPS: Detection subtest of DRILCOM (DetectionRecognition/Identification-Localization-Communication). International Journal of Audiology, 56(Suppl. 1), 22–3. Lee, K., & Casali, J. G. (2019). Learning to localize a broadband tonal complex signal with advanced hearing protectors and TCAPS: The effectiveness of training on open-ear vs. device-occluded performance. International Journal of Audiology, 58(Suppl. 1), 3–11. Matterson, E. A., Bushnell, P. T., Themann, C. L., & Morata, T. C. (2016). Hearing impairment among noise-exposed workers - United States, 2003–2012. MMWR. Morbidity and Mortality Weekly Report 65 (15), April 22 (pp. 389–394). doi:10.15585/mmwr.mm6515a2. McIlwain, S. D., Gates, K., & Ciliax, D. (2008), Heritage of Army audiology and the road ahead: The Army Hearing Program. American Journal of Public Health, 98(12), 2167–2172.
DESIGN FOR HEALTH, SAFETY, AND COMFORT Meinke, D. K., Berger, E. H., Driscoll, D. P., Neitzel, R. L. & Bright, K. (Eds.) (2020). The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Melnick, W. (1991). Hearing loss from noise exposure. In C. M. Harris (Ed.), Handbook of acoustical measurements and noise control (pp. 18.1–18.19). New York: McGraw-Hill. Mine Safety and Health Administration (MSHA) (1999). Health standards for occupational noise exposure; final rule. 30 CFR Part 62, 64, MSHA, Code of Federal Regulations, Federal Register, Washington, DC. Morrison, H. B., & Casali, J. G. (1994). Intelligibility of synthesized voice messages in commercial truck cab noise for normal-hearing and hearing-impaired listeners. In Proceedings of the 1994 Human Factors and Ergonomics Society 38th Annual Conference, Nashville, TN, October 24–28 (pp. 801–805). National Institute for Occupational Safety and Health (NIOSH) (1980), Compendium of materials for noise control, DHHS-NIOSH No. 80-116. Cincinnati, OH. Department of Health and Human Services – NIOSH. National Institute for Occupational Safety and Health (NIOSH) (1988), Occupational noise exposure - revised criteria 1998, DHHS-NIOSH No. 98-126. Cincinnati, OH. Department of Health and Human Services – NIOSH. National Institute for Occupational Safety and Health (NIOSH) (1994), The NIOSH compendium of hearing protection devices, DHHS-NIOSH No. 94-130. Cincinnati, OH. Dept. of Health and Human Services – NIOSH. National Institute for Occupational Safety and Health (NIOSH) (2001), Work-Related hearing loss. Cincinnati, OH: Department of Health and Human Services – NIOSH. National Institutes of Health (NIH) Consensus Development Panel (1990). Noise and hearing loss. Journal of the American Medical Association, 263(23), 3185–3190. Occupational Safety and Health Administration (OSHA) (1971a). Occupational Noise Exposure (general industry); final rule. 29CFR1910.95, OSHA, Code of Federal Regulations, Federal Register, Washington, DC. Occupational Safety and Health Administration (OSHA) (1971b). Occupational Noise exposure (construction industry); final rule. 29CFR1926.101, OSHA, Code of Federal Regulations, Federal Register, Washington, DC. Occupational Safety and Health Administration (OSHA) (1980). Noise control: A guide for workers and employers, OSHA 3048, OSHA, U.S. Washington, DC: Department of Labor. Occupational Safety and Health Administration (OSHA) (1983). Occupational noise exposure; hearing conservation amendment; final rule. 29CFR1910.95, OSHA, Code of Federal Regulations, Federal Register, Washington, DC. Ostergaard, P. (2003). Physics of sound and vibration. In E. H. Berger, L. H. Royster, J. D. Royster, D. P. Driscoll, & M. Layne (Eds.), The noise manual (revised 5th ed., pp. 19–39.). Fairfax, VA: American Industrial Hygiene Association. Park, M. Y., & Casali, J. G. (1991). A controlled investigation of in-field attenuation performance of selected insert, earmuff, and canal cap hearing protectors. Human Factors, 33(6), 693–714. Patterson, R. D. (1982). Guidelines for auditory warning systems on civil aircraft. Paper 82017, Cheltenham: Civil Aviation Authority, Airworthiness Division. Peterson, A. (1979). Noise measurements: instruments. In, C. M. Harris (Ed.), Handbook of noise control (pp. 515–519). New York: McGraw-Hill. Peterson, A., & Gross, E., Jr. (1978). Handbook of noise measurement (8th ed.). Concord, MA: General Radio Co. Pollack, I. (1958). Speech intelligibility at high noise levels: Effects of short-term exposure. Journal of the Acoustical Society of America, 30, 282–285. Poulton, E. (1978). A new look at the effects of noise: a rejoinder. Psychological Bulletin, 85, 1068–1079.
SOUND AND NOISE: MEASUREMENT AND DESIGN GUIDANCE Rabinowitz, P. M., Davies, H.W. & Meinke. D. K. (2020). Auditory and non-auditory health effects of noise. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. Journal of the Acoustical Society of America, 120(6), 3988–3997. Robinson, G. S., & Casali, J. G. (2003). Speech communications and signal detection in noise. In E. H. Berger, L. H. Royster, J. D. Royster, D. P. Driscoll, & M. Layne (Eds.), The noise manual (revised 5th ed., pp. 567–600). Fairfax, VA: American Industrial Hygiene Association. Rossing, T. (1990). The science of sound. Reading, MA: AddisonWesley. Royster, L. H., Royster, J. D., & Dobie R. A. (2020). Prediction and analysis of the hearing characteristics of noise-exposed populations or individuals. In D. K. Meinke, E. H. Berger, D. P. Driscoll, R. L. Neitzel & K. Bright (Eds.), The noise manual (6th ed.). Falls Church, VA: American Industrial Hygiene Association. Sanders, M. S., & McCormick, E. J. (1993). Human factors in engineering and design, (7th ed.). New York: McGraw-Hill. Saunders, G. H., & Griest, S. (2009). Hearing loss in veterans and the need for hearing loss prevention programs. Noise and Health, 11(42), 14–21. Sorkin, R. D. (1987). Design of auditory and tactile displays. In G. Salvendy (Ed.), Handbook of human factors (pp. 549–576). New York: McGraw-Hill. Stenfelt, S., & Reinfeldt, S. (2005). Human body conduction sensitivity in a sound field. Paper No. 20,022. In Proceedings of the 2005 International Congress and Exhibition of Noise Control Engineering, Rio de Janeiro, Brazil, August 7–10. Stevens, S. S. (1936). A scale for the measurement of a psychological magnitude: Loudness. Psychological Review, 43, 405–416. Stevens, S. S. (1972). Perceived level of noise by mark VII and decibels. Journal of the Acoustical Society of America, 51(2), 575–601. Taylor, W., Pearson, J., Mair, A., & Burns, W. (1964). Study of noise and hearing in jute weavers. Journal of the Acoustical Society of America, 38, 113–120. U.S. Department of Defense (DoD) (2010). Department of Defense Instruction - Hearing Conservation Program DoDI 6055.12, 12/3/10 and incorporating change 2, 8/31/18, Under Secretary of Defense for Acquisition, Technology and Logistics, Washington, DC. U.S. Department of Defense (DoD) (2012). Department of Defense design criteria standard: Human engineering, MIL-STD-1472G, Washington, DC.
493 U.S. Department of Defense (DoD) (2015). Department of Defense design criteria standard: Noise limits, MIL-STD-1474E, Washington, DC. U.S. Department of Housing and Urban Development (HUD) (2009). The noise guidebook, March, 2009, Washington, DC. U.S. Department of the Air Force (U.S. Air Force) (1948). Precautionary measures against noise hazards, Air Force Regulation 160-3, Washington, DC. U.S. Department of the Army (U.S. Army) (2015). Army hearing program, Pamphlet 40-501, January 8, Washington, DC, U.S. Department of Veterans Affairs. (2010). Annual benefits report fiscal year 2010. Retrieved January 29, 2018 from https://www .benefits.va.gov/REPORTS/abr/2010_abr.pdf U.S. Department of Veterans Affairs. (2016). Annual benefits report fiscal year 2016. Retrieved July 22, 2017, from http://www.benefits .va.gov/REPORTS/abr/ABRCompensation-FY16-0613017.pdf U.S. Environmental Protection Agency (EPA) (1979). Noise labeling requirements for hearing protectors. 40CFR211, Federal Register, Vol. 44, No. 190, 56130–56147. U.S. Environmental Protection Agency (EPA) (1981). Noise in America: The extent of the noise problem. Report 550/9-81-101, Washington, DC: EPA. U.S. Environmental Protection Agency (EPA) (2009). Product noise labeling hearing protection devices; proposed rule. 40 CFR Part 211, Federal Register, EPA Docket No. EPA-HQ-OAR-2003-0024; FRL-8934-9, EPA, August 5, Washington, DC: EPA. Valimont, R. B., Casali, J. G., & Lancaster, J. A. (2006). ANR vs. passive communications headsets: investigation of speech intelligibility, pilot workload, and flight performance in an aircraft simulator. In Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting, San Francisco, CA, October 16–20 (pp. 2143–2147). Wegel, R. L., & Lane, C. E. (1924). The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear. Physiological Review, 213, 266. Wilkins, P. A., & Martin, A. M. (1982). The effects of hearing protection on the perception of warning sounds. In P. W. Alberti (Ed.), Personal hearing protection in industry (pp. 339–369). New York: Raven. Wilkins, P. A., & Martin, A. M. (1985). The role of acoustical characteristics in the perception of warning sounds and the effects of wearing hearing protection. Journal of Sound and Vibration, 100(2), 181–190. Zwicker, E. (1960). Ein Verfahren zur Berechnung der Lautstärke. Acustica, 10, 304–308.
CHAPTER
19
VIBRATION AND MOTION Neil J. Mansfield Nottingham Trent University Nottingham, United Kingdom
Michael J. Griffin Institute of Sound and Vibration Research University of Southampton Southampton, United Kingdom
1 2
3
INTRODUCTION MEASUREMENT OF VIBRATION AND MOTION
494 495
Biomechanics
504
3.6
Protection from Whole-Body Vibration
505
2.1
Vibration Magnitude
495
MOTION SICKNESS
505
2.2
Vibration Frequency
495
4.1
Causes of Motion Sickness
505
2.3
Vibration Direction
495
4.2
Sickness Caused by Oscillatory Motion
506
4.3
Habituation to Motion Sickness
507
2.4
Vibration Duration
496
2.5
Instrumentation for Measuring Human Vibration Exposures
496
4
5
HAND-ARM VIBRATION
507
5.1
Sources of Hand-Arm Vibration
507
WHOLE-BODY VIBRATION
496
5.2
Effects of Hand-Arm Vibration
507
3.1
Vibration Discomfort
496
5.3
Preventative Measures
509
3.2
Interference with Activities
498
5.4
Standards for Evaluation of Hand-Arm Vibration
509
3.3
Health Effects
502
3.4
Disturbance in Buildings
504
1 INTRODUCTION In work and leisure activities the human body experiences movement. The motion may be voluntary (as in some sports) or involuntary (as for passengers in vehicles). Movements may occur simultaneously in six different directions: three translational directions (fore-and-aft, lateral, and vertical) and three rotational directions (roll, pitch, and yaw). Translational movements at constant velocity (i.e., with no change of speed or direction such as when cruising in an aircraft) are mostly imperceptible, except where a change of position relative to other objects is detected through visual (relative motion) or auditory (e.g., Doppler) signals. Translational motion can also be detected when the velocity changes, causing acceleration or deceleration of the body that can be perceived via tactile and balance sensations (e.g., vestibular, cutaneous, kinesthetic, or visceral sensory systems). Rotation of the body at constant velocity may be detected because it gives rise to translational acceleration in the body, because it re-orientates the body relative to the gravitational force of Earth, or because the changing orientation relative to other objects is perceptible. Even outside of the Earth’s gravitational field the vestibular system can signal motion due to inertia of fluid in the semi-circular canals. Vibration is oscillatory motion: the velocity is changing and so the movement is detectable by the human sensory system. Vibration of the body may be desirable or undesirable. It can be described as pleasant or unpleasant; it can interfere with the performance of various tasks and cause injury and disease. Low-frequency oscillations of the body and movements of visual
494
3.5
REFERENCES
511
displays can cause motion sickness. It is convenient to consider human exposure to oscillatory motion in three categories: 1. Whole-body vibration occurs when the body is supported on a surface that is vibrating (e.g., sitting on a seat that is in motion, standing on a moving floor or lying on a moving surface). Whole-body vibration occurs in transport (e.g., road, off-road, rail, air, space and marine transport) and when near some machinery. Its effects are usually most important between 1 and 20 Hz, although some applications require considerations of signals outside of this range (Mansfield, 2004). 2. Motion sickness can occur when real or illusory movements of the body or the environment lead to ambiguous inferences as to the movement or orientation of the human body. The movements associated with motion sickness are always of very low frequency, usually below 1 Hz. 3. Hand-transmitted vibration is caused by various processes in industry, agriculture, mining, construction, transport, and health care where vibrating tools or workpieces are grasped or pushed by the hands or fingers. Its effects are usually most important between 8 and 1000 Hz, although some applications require considerations of signals outside of this range. There are many different effects of oscillatory motion on the body and many variables influencing each effect. The variables may be categorized as extrinsic variables (those occurring
VIBRATION AND MOTION
495
Table 1 Variables Influencing Human Responses To Oscillatory Motion Extrinsic variables
Intrinsic variables
Vibration variables Vibration magnitude Vibration frequency Vibration direction Vibration input positions
Intrasubject variability Body posture Body position Body orientation (sitting, standing, recumbent)
Vibration duration Vibration waveform Other variables Other stressors (noise, temperature, etc.) Seat dynamics Personal protective equipment
Physiological state (muscle tonus) Intersubject variability Body size, weight and anthropometry Body dynamic response Age Biological sex Experience, expectation, attitude, and personality Fitness Training
outside the human body) and intrinsic variables (the variability that occurs between and within people), as in Table 1. Some variables, especially intersubject variability, have large effects but are not easily measured. Consequently, it is often not practicable to make highly accurate predictions of the discomfort, interference with activities, or health effects for an individual. However, the average effect, or the probability of an effect, can be predicted for groups of people. This chapter introduces human responses to oscillatory motion, summarizes current methods of evaluating exposures to oscillatory motion, and identifies some methods of minimizing unwanted effects of vibration.
2 MEASUREMENT OF VIBRATION AND MOTION 2.1 Vibration Magnitude When vibrating, an object has alternately a velocity in one direction and then a velocity in the opposite direction. This change in velocity means that the object is constantly accelerating, first in one direction and then in the opposite direction. Figure 1 shows the displacement waveform, the velocity waveform, and acceleration waveform for a movement occurring at a single frequency (i.e., a sinusoidal oscillation). The magnitude of a vibration can be quantified by its displacement, its velocity, or its acceleration. For practical convenience, the magnitude of vibration is now usually expressed in terms of the acceleration and measured using accelerometers. The units of acceleration are meters per second per second (i.e., m s−2 , or m/s2 ). The acceleration due to gravity on Earth is approximately 9.81 m s−2 . The magnitude of an oscillation can be expressed as the difference between the maximum and minimum values of the motion (e.g., the peak-to-peak acceleration) or the maximum deviation from some central point (e.g., the peak acceleration). Most often, magnitudes of vibration are expressed in terms of an average measure of the oscillatory motion, usually the root-mean-square (r.m.s.) value of the acceleration (i.e., m s−2 r.m.s. for translational acceleration, rad s−2 r.m.s. for rotational
Acceleration
Velocity
Displacement
A,V,D
0
-A,-V,-D
Time Figure 1 Displacement, velocity, and acceleration waveforms for a sinusoidal vibration. If the vibration has frequency f (in hertz) and peak displacement D (in meters), the peak velocity is V = 2πfD (in meters per second) and the peak acceleration is A = (2πf)2 D (in meters per second per second).
acceleration). For a sinusoidal motion, the r.m.s. value is the √ peak value divided by 2 (i.e., the peak value divided by approximately 1.4). When observing vibration, it is sometimes possible to estimate the displacement caused by the motion. For a sinusoidal motion, the acceleration a can be calculated from the frequency f in hertz and the displacement d: a = (2π f )2 d For example, a sinusoidal motion with a frequency of 1 Hz and a peak-to-peak displacement of 0.1 m will have an acceleration of 3.95 m s−2 peak to peak, 1.97 m s−2 peak, and 1.40 m s−2 r.m.s. As the acceleration is proportional to the square of the frequency for a given displacement, this means that the visual range of motion is a poor indicator of severity; for example, the same peak-to-peak displacement of 0.1 m as used in the previous example would correspond to an acceleration of 98.7 m s−2 peak-to-peak at 5 Hz (25 times higher and an exceptionally severe magnitude). Although this expression can be used to convert acceleration measurements to corresponding displacements, it is only accurate when the motion occurs at a single frequency (i.e., it has a pure sinusoidal waveform as shown in Figure 1). 2.2 Vibration Frequency The frequency of vibration is expressed in cycles per second using the SI unit hertz (Hz). The frequency of vibration influences the extent to which vibration is transmitted to the surface of the body (e.g., through seating), the extent to which it is transmitted through the body (e.g., from seat to head), and the responses to vibration within the body. From Section 2.1 it will be seen that the relation between the displacement and the acceleration of a motion depends on the frequency of oscillation. 2.3 Vibration Direction The responses of the body to motion differ according to the direction of the motion. Vibration is often measured at the interfaces between the body and the vibrating surfaces in three orthogonal directions. Figure 2 shows a coordinate system used when measuring the vibration of a hand holding a tool. The x, y, and z axes correspond to vibration from palm to back of hand,
496
DESIGN FOR HEALTH, SAFETY, AND COMFORT
2.5 Instrumentation for Measuring Human Vibration Exposures Systems for measuring human exposure to vibration usually comprise a precision accelerometer, signal conditioning, and a data processing system. Each of these need to be specified for the application and a system suitable for whole-body vibration might not be suitable for hand-arm vibration. While apps can be installed on mobile devices to utilize inbuilt accelerometers that can give indications of vibration magnitudes, these systems are not suitable for precision assessments. Accelerometers must be correctly attached to the surface of the object being measured at the site of input to the body. A poorly attached accelerometer will result in unreliable measurements. Instrumentation should comply with ISO 8041 (2017), indicating that it has been tested to comply with quality standards.
Zh Yh
Xh
Figure 2 Axes of vibration used to measure exposures to handarm vibration.
parallel to a handle, and in the direction away from the wrist in a neutral posture respectively. The three principal directions of whole-body vibration for seated and standing persons are x axis (fore-and-aft), y axis (lateral), and z axis (vertical). The vibration is measured at the interface between the body and the surface supporting the body (e.g., on the seat beneath the ischial tuberosities for a seated person, beneath the feet for a standing person). Figure 3 illustrates the translational and rotational axes for an origin at the ischial tuberosities on a seat and the translational axes at a backrest and the feet of a seated person. 2.4 Vibration Duration Some human responses to vibration depend on the duration of exposure and exposure to vibration can accelerate fatigue effects experienced by seat occupants (Mansfield, Mackrill, Rimell, & MacMull, 2014). Some assessment methods are affected by measurement duration and need to be corrected for the exposure duration. Alternative assessment methods can be used depending on the duration of the vibration events of interest; the root-mean-square (i.e., r.m.s.) acceleration may not provide a good indication of vibration severity if the vibration is intermittent, contains shocks, or otherwise varies in magnitude from time to time (see, e.g., Section 3.3).
Back
Ischial tuberosities
Feet
Figure 3 Axes of vibration used to measure exposures to whole-body vibration.
3 WHOLE-BODY VIBRATION Whole-body vibration may affect health, comfort, and the performance of activities. The comments of persons exposed to vibration mostly derive from the sensations produced by vibration rather than certain knowledge that the vibration is causing harm or reducing their performance. Vibration of the whole body is produced by various types of industrial machinery and by all forms of transport (including road, off-road, rail, sea, air, and space transport). 3.1 Vibration Discomfort The relative discomfort caused by different oscillatory motions can be predicted from measurements of the vibration. For very low magnitude motions it is possible to estimate the percentage of persons who will be able to feel the vibration and the percentage who will not be able to feel the vibration. For higher vibration magnitudes, an approximate indication of the extent of subjective reactions is available in a semantic scale of discomfort. Limits appropriate to the prevention of vibration discomfort vary between different environments (e.g., between buildings and transport) and between different types of transport (e.g., between cars and trucks) and within types of vehicle (e.g., between sports cars and limousines). The design limit depends on external factors (e.g., cost and speed) and the comfort in alternative environments (e.g., competitive vehicles). Understanding of discomfort requires consideration of human behavior and psychology in addition to product design and the physical factors such as vibration (Mansfield, Naddeo, Frohriep, & Vink, 2020). 3.1.1 Effects of Vibration Magnitude The absolute threshold for the perception of vertical whole-body vibration in the frequency range 1 to 100 Hz is, very approximately, 0.01 m s−2 r.m.s.; a magnitude of 0.1 m s−2 will be easily noticeable; magnitudes around 1 m s−2 r.m.s. are usually considered uncomfortable; magnitudes of 10 m s−2 r.m.s. are usually dangerous. The precise values depend on vibration frequency and the exposure duration and they are different for other axes of vibration (Morioka & Griffin, 2006a, 2006b). A doubling of vibration magnitude (expressed in m s−2 ) produces, very approximately, a doubling of the sensation of discomfort; the precise increase depends on the frequency and direction of vibration. For many motions, a halving of the vibration magnitude therefore greatly reduces discomfort.
VIBRATION AND MOTION
497
3.1.2 Effects of Vibration Frequency and Direction The dynamic responses of the body and the relevant physiological and psychological processes dictate that subjective reactions to vibration depend on the frequency and the direction of vibration. The extent to which a given acceleration will cause a greater or lesser effect on the body at different frequencies is reflected in frequency weightings: frequencies capable of causing the greatest effect are given the greatest “weight” and others are attenuated according to their relative importance. Frequency weightings for human response to vibration have been derived from laboratory experiments in which volunteer subjects have been exposed to a set of motions having different frequencies. The subjects’ responses are used to determine equivalent comfort contours (Morioka & Griffin, 2006a). The reciprocal of such a curve forms the shape of the frequency weighting. Figure 4 shows frequency weightings as defined in International Standard 2631 (ISO, 1997, amended 2010). The most commonly used weightings are Wk and Wd primarily used for assessment of vibration on seat surfaces. Weightings are applied for predicting motion sickness (Wf ; see also Section 4), and vibration at various inputs and in different directions (Table 2). In order to minimize the number of frequency weightings, some are used for more than one axis of vibration, with different axis-multiplying factors allowing for overall differences in sensitivity between axes (see Table 3). The frequency-weighted acceleration should be multiplied by the axis-multiplying factor before the component is compared with components in other axes or included in any summation over axes. While seat back measurements are encouraged for health risk assessments, ISO 2631-1 does not include backrest vibration in calculations of vibration severity for health effects. Vibration occurring in several axes is more uncomfortable than vibration occurring in a single axis. To obtain an overall ride value, the root-sums-of-squares of the component ride values is calculated: [∑ ]1∕2 Over all ride value = (component ride values)2
Table 2 Guide for Application of Frequency Weighting Curves as Defined in ISO 2631-1 Weighting name
Comfort/perception
Wk
z-axis, seat surface
–
Wd
x-axis, seat surface y-axis, seat surface
Wc We
x-axis, seat back –
z-axis, seat surface z-axis, standing vertical recumbent (except head) x-, y-, z-axes feet (sitting; comfort only) x-axis, seat surface y-axis, seat surface x-, y-axes, standing horizontal recumbent y-, z-axes seat back (comfort only) x-axis, seat back
–
Wf
–
Rotational motion around the x-, y-, z-axes, seat surface –
Weighting Gain
–
–
vertical
Source: ISO, 1997. © 1997 ISO.
Overall ride values from different environments can be compared: a vehicle having the highest overall ride value would be expected to be the most uncomfortable with respect to vibration. The overall ride values can also be compared with the discomfort scale shown in Table 4. This scale indicates the approximate range of vibration magnitudes that are significant in relation to the range of vibration discomfort that might be experienced in vehicles. The full “12-axis” assessment is rarely achieved in practice due to its complexity and cost. The most important directions for assessment are translational x-, y-,
1
0.1
Wk Wd Wf Wc We 0.01 0.1
Motion sickness
Health
1
10
100
Frequency (Hz) Figure 4 Acceleration frequency weightings for whole-body vibration and motion sickness as defined in ISO 2631-1. (Source: ISO 1997. © 1997 ISO.)
498
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Table 3 Application of Frequency Weightings for Evaluation of Vibration with Respect to Health and Discomfort for Seated Persons
Dynamic fatigue factors
Axis
Weighting
Seat
x y z rx (roll) ry (pitch) rz (yaw) x y z x y z
Wd Wd Wk We We We Wc Wd Wd Wk Wk Wk
Seat back
Feet
Health
Discomfort
1.4 1.4 1.0 * -
1.0 1.0 1.0 0.63 0.4 0.2 0.8 0.5 0.4 0.25 0.25 0.4
Source: ISO, 1997. © 1997 ISO. ∗ ISO 2631-1 encourages measurement of x-axis vibration at the backrest and states that this should have a multiply factor of 0.8 applied; however, it does not include this measurement in risk calculations due to lack of epidemiological data.
Table 4 Scale of Vibration Discomfort from International Standard 2631
Extremely uncomfortable
Uncomfortable
A little uncomfortable
r.m.s weighted acceleration (ms−2 ) { 3.15 2.5 ⎫ 2.0 ⎪ ⎬ ⎧ 1.6 ⎪ ⎪ 1.25 ⎭ ⎨ ⎪ 1.0 ⎫ ⎩ 0.8 ⎪ ⎬ ⎧ 0.63 ⎪ ⎪ 0.5 ⎭ ⎨ 0.4 } ⎪ ⎩ 0.315 0.25
Very uncomfortable
Fairly uncomfortable
Not uncomfortable
Source: ISO, 1997. © 1997 ISO.
and z-axes; other measurements have little effect on overall assessments (e.g., Marjanen & Mansfield, 2010). 3.1.3 Effects of Vibration Duration The overall magnitude of vibration alone is insufficient for a full understanding of the perception of comfort in a vehicle seat. Ebe and Griffin (2000a, 2000b) described how static and dynamic factors need to be considered in order to obtain an overall understanding of seat discomfort. This model was extended to consider the duration of exposure which showed that not only did discomfort increase over time, but that exposure to vibration accelerated the onset of discomfort (Figure 5; Mansfield et al., 2014). The temporal factors can be partly mitigated through
Overall seat discomfort
Axis-multiplying factor Input position
Temporal factors
Dynamic factors Static factors 0
0
Time
Vibration magnitude Figure 5 Conceptual model of overall car seat discomfort showing static, dynamic, and temporal factors, with dynamic fatigue. (Source: Mansfield et al., 2014. Licensed under CC BY 3.0.)
taking breaks during travel and are enhanced when combined with some light physical exercise (Sammonds, Mansfield & Fray, 2017). 3.2 Interference with Activities Vibration and motion can interfere with the acquisition of information (e.g., by the eyes), the output of information (e.g., by hand or foot movements), or the complex central processes that relate input to output (e.g., learning, memory, decision making). Effects of oscillatory motion on human performance may impair safety. There is most evidence of whole-body vibration affecting performance for input processes (mainly vision) and output processes (mainly continuous hand control). In both cases there may be a disturbance occurring entirely outside the body (e.g., vibration of a viewed display or vibration of a hand-held control), a disturbance at the input or output (e.g., movement of the eye or hand), and a disturbance within the body affecting the peripheral nervous system (i.e., afferent or efferent nervous system). Central processes may also be affected by vibration, but understanding is currently too limited to make confident generalized statements (see Figure 6). The effects of vibration on vision and manual control are most usually caused by the movement of the affected part of the body (i.e., eye or hand). The effects may be decreased by reducing the transmission of vibration to the eye or to the hand or by making the task less susceptible to disturbance (e.g., increasing the size of a display or reducing the sensitivity of a control). Often, the effects of vibration on vision and manual control can be greatly reduced by redesigning the task. 3.2.1 Vision Reading a screen in a moving vehicle may be difficult because the screen is moving, the eye is moving, or both the screen and the eye are moving. There are many variables which affect visual performance in these conditions: it is not possible to represent adequately the effects of vibration on vision without considering the effects of these variables. Stationary Observer When a stationary observer views a moving display, the eye may be able to track the position of the display using pursuit eye movements. This closed-loop reflex will give smooth pursuit movements of the eye and clear vision
VIBRATION AND MOTION
499
Response of system HUMAN BODY
Figure 6
Input device (display)
Sensory system (eye)
Afferent system
Efferent system
CNS
ENVIRONMENT
Output system (hand)
Output OUT device (control)
VIBRATION
VIBRATION
IN
Information flow in a simple system and the areas where vibration may affect human activities.
if the display is moving at frequencies less than about 1 Hz and with a low velocity. At slightly higher frequencies of oscillation, the precise value depending on the predictability of the motion waveform, the eye will make saccadic eye movements to redirect the eye with small jumps. At frequencies greater than about 3 Hz, the eye will best be directed to one extreme of the oscillation and attempt to view the image as it is temporarily stationary while reversing the direction of movement (i.e., at the “nodes” of the motion). In some conditions, the absolute threshold for the visual detection of the vibration of an object occurs when the peakto-peak oscillatory motion gives an angular displacement at the eye of approximately 1 min of arc. The acceleration required to achieve this threshold is very low at low frequencies but increases in proportion to the square of the frequency to become very high at high frequencies. When the vibration displacement is greater than the visual detection threshold, there will be perceptible blur if the vibration frequency is greater than about 3 Hz. The effects of vibration on visual performance (e.g., effects on reading speed and reading accuracy) may then be estimated from the maximum time that the image spends over some small area of the retina (e.g., the period of time spent near the nodes of the motion with sinusoidal vibration). For sinusoidal vibration this time decreases (and so reading errors increase) in linear proportion to the frequency of vibration and in proportion to the square root of the displacement of vibration (O’Hanlon & Griffin, 1971). With dual-axis vibration (e.g., combined vertical and lateral vibration of a display) this time is greatly reduced and reading performance drops greatly (Meddick & Griffin, 1976). With narrow-band random vibration there is a greater probability of low image velocity than with sinusoidal vibration of the same magnitude and predominant frequency, so reading performance tends to be less affected by random vibration than sinusoidal vibration (Moseley, Lewis & Griffin, 1982). Display vibration reduces the ability to see fine detail in displays while having little effect on the clarity of larger forms.
the head is highly dependent on body posture but is likely to occur in translational axes (i.e., in the x-, y-, and z-axes) and in rotational axes (i.e., in the roll, pitch, and yaw axes). Often, the predominant head motions affecting vision are in the vertical and pitch axes of the head. The dynamic response of the body may result in greatest head acceleration in these axes at frequencies around 5 Hz, but vibration at higher and lower frequencies can also have large effects on vision. The addition of a helmet changes the dynamic response of the head and the resonance frequency can reduce (e.g., Mansfield, 2020). The pitch motion of the head is well compensated by the vestibulo-ocular reflex, which serves to help stabilize the line of sight of the eyes at frequencies less than about 10 Hz (e.g., Benson & Barnes, 1978). So, although there is often pitch oscillation of the head at 5 Hz, there is less pitch oscillation of the eyes at this frequency. Pitch oscillation of the head, therefore, has a less than expected effect on vision—unless the display is attached to the head, as with a helmet-mounted display (see Wells & Griffin, 1984). The effects on vision of translational oscillation of the head depend on viewing distance: the effects are greatest when close to a display. As the viewing distance increases, the retinal image motions produced by translational displacements of the head decrease until, when viewing an object at infinite distance, there is no retinal image motion produced by translational head displacement (Griffin, 1976). For a vibrating observer there may be little difficulty with low-frequency pitch head motions when viewing a fixed display and no difficulty with translational head motions when viewing a distant display. The greatest problems for a vibrating observer occur with pitch head motion when the display is attached to the head and with translational head motion when viewing near displays. Additionally, there may be resonances of the eye within the head, but these are highly variable between individuals and often occur at high frequencies (e.g., 30 Hz and greater) and it is often possible to attenuate the vibration entering the body at these high frequencies.
Vibrating Observer If an observer is sitting or standing on a vibrating surface, the effects of vibration depend on the extent to which the vibration is transmitted to the eye. The motion of
Observer and Display Vibrating When an observer and a display oscillate together, in phase, at low frequencies, the retinal image motions (and decrements in visual performance) are
500
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Reading time (s)
70
Display vibration only Observer and display vibration Observer vibration only
60
50
40
0.5
1
2
3
4
5
Frequency (Hz) Figure 7 Average times taken to read information on a display for (a) stationary observers reading from a vibrating display, (b) vibrating observers reading from a stationary display, and (c) vibration observers reading from a vibrating display with observer and display vibrating in phase. Data obtained with sinusoidal vertical vibration at 2.0 m s−2 r.m.s. (Source: Based on Moseley and Griffin, 1986.)
less than when either the observer or the display oscillates separately (Moseley & Griffin, 1986). However, the advantage is lost as the vibration frequency increases because there is increasing phase difference between the oscillation of the head and the oscillation of the display. At frequencies around 5 Hz the phase lags between seat motion and head motion may be 90∘ or more (depending on seating conditions) and sufficient to eliminate any advantage of moving the seat and the display together. Figure 7 shows an example of how the time taken to read information on a screen is affected for the three viewing conditions (display vibration with a stationary observer, vibrating observer with stationary display, both observer and display vibrating) with sinusoidal vibration in the frequency range 0.5 to 5 Hz. 3.2.2 Manual Control Simple and complex manual control tasks can also be impeded by vibration. The characteristics of the task and the characteristics of the vibration combine to determine effects of vibration on activities: a given vibration may greatly affect the performance of one task but have little effect on the performance of another task.
Remnant
Human operator transfer function
∑
Effects Produced by Vibration The most obvious consequence of vibration on a continuous manual control task is the direct mechanical loading of the hand causing unwanted movement of the control. This is sometimes called breakthrough or feedthrough or vibration-correlated error. The inadvertent movement of a finger while using a touch screen in a vehicle is a form of vibration-correlated error. In a simple tracking task, where the operator is required to follow movements of a target, some of the error will also be correlated with the target movements. This is called input-correlated error and often mainly reflects the inability of an operator to follow the target without delays inherent in visual, cognitive, and motor activity. The part of the tracking error which is not correlated with either the vibration or the tracking task is called the “remnant.” This includes operator-generated noise and any source of non-linearity: drawing a freehand straight line does not result in a perfect straight line even in the absence of environmental vibration. The effects of vibration on vision can result in increased remnant with some tracking tasks and some studies show that vibration, usually at frequencies greater than about 20 Hz, interferes with neuromuscular processes, which may be expected to result in increased remnant. The causes of the three components of the tracking error are shown in the model presented as Figure 8. Effects of Task Variables The gain (i.e., sensitivity) of a control determines the control output corresponding to a given force, or displacement, applied to the control by the operator. The optimum gain in static conditions (high enough not to cause fatigue but low enough to prevent inadvertent movement) is likely to be greater than the optimum gain during exposure to vibration where inadvertent movement is more likely (Lewis & Griffin, 1977). First-order and second-order control tasks (i.e., rate and acceleration control tasks) are more difficult than zero-order tasks (i.e., displacement control tasks) and so tend to give more errors. However, there may sometimes be advantages with such controls that are less affected by vibration breakthrough at higher vibration frequencies. In static conditions, isometric controls (that respond to force without movement) tend to result in better tracking performance than isotonic controls (that respond to movement but require the application of no force). However, several studies show that isometric controls may suffer more from the effects of vibration (e.g., Allen, Jex, & Magdaleno, 1973; Levison & Harrah, 1977). The relative merits of the two types of control and the optimum characteristics of a spring-centered control will depend on control gain and control order. The results of studies investigating the influence of the position of a control appear consistent with differences being
Biomechanical transfer function
∑
Vibration
Tracking output
Control transfer function
– + Tracking error Figure 8 Linear model of a pursuit manual control system showing how tracking errors may be caused by the vibration (vibrationcorrelated error), the task (input-correlated error), or some other cause (remnant).
VIBRATION AND MOTION
501
There is no simple relation between the frequency of vibration and its effects on control performance. The effects of frequency depend on the control order (that varies between tasks) and the biodynamic responses of the body (that varies with posture and between operators). With zero-order tasks and the same magnitude of acceleration at each frequency, the effects of vertical seat vibration may be greatest in the range 3–8 Hz since transmissibility to the shoulders is greatest in this range (see McLeod & Griffin, 1989). With horizontal whole-body vibration (i.e., in the x- and y-axes of the seated body) the greatest effects appear to occur at lower frequencies: around 2 Hz or below. Again, this corresponds to the frequencies at which there is greatest transmission of vibration to the shoulders. The axis of the control task most affected by vibration may not be the same axis as that in which most vibration occurs at the seat. Often, fore-and-aft movements of the control (which generally correspond to vertical movements on a display) are most affected by vertical whole-body vibration. Few controls are sensitive to vertical hand movements and these have rarely been studied. Multiple-frequency vibration causes more disruption to manual control performance than the presentation of any one of the constituent single frequencies alone. Similarly, the effects of multiple-axis vibration are greater than the effects of any single axis vibration.
dependent on the transmission of vibration to the hand in different positions (e.g., Shoenberger & Wilburn, 1973). Torle (1965) showed that the provision of an armrest could substantially reduce the effects of vibration on the performance of a task with a side-arm controller, and this was reinforced by Newell and Mansfield (2008) who showed reduced workload when using armrests with controllers. The shape and orientation of controls may also be expected to affect performance—either by modifying the amount of vibration breakthrough or by altering the proprioceptive feedback to the operator. Vibration may affect the performance of tracking tasks by reducing the visual performance of the operator. Wilson (1974) and McLeod and Griffin (1990) have shown that collimating a display by means of a lens so that the display appears to be at infinity can reduce, or even eliminate, errors with some tasks. It is possible that visual disruption has played a significant part in the performance decrements reported in other experimental studies of the effects of vibration on manual control. At very low levels of vibration, a task might be unaffected by vibration, denoted as automatic in Figure 9. As vibration increases, there will be vibration breakthrough where the task objective performance may remain unaffected but the operator’s workload increases as they adapt to the physical environment (adaptation). The next threshold occurs when the objective performance is affected by the vibration and there is an observable compromise in performance. The final threshold occurs when the vibration is so severe that the task is no longer possible at the tolerance limit (failure). With higher skill, the initial performance can be elevated giving a higher margin extending the compromise zone, therefore extending the range of vibration tolerance before failure occurs. A better-designed task will decrease the gradients of the increase in workload and decrease in performance, again extending the range of vibration tolerance before failure occurs.
Other Variables Repeated exposure to vibration may allow subjects to develop techniques to minimize vibration effects by, for example, adjusting body posture to reduce the transmission of vibration to the head or the hand or by learning how to recognize images blurred by vibration (e.g., Mansfield, 2020). Results of experiments performed in one experimental session of vibration exposure may not necessarily apply to situations where operators have an opportunity to learn techniques to ameliorate the effects of vibration. There have been few investigations of the effects of vibration on common everyday tasks. Corbridge and Griffin (1991) found that the effects of vertical whole-body vibration on spilling liquid from a hand-held cup were greatest close to 4 Hz. They also found that the effects of vibration on writing speed and subjective estimates of writing difficulty were most affected by vertical vibration in the range 4–8 Hz. Although 4 Hz was a sensitive frequency for both the drinking task and the writing task, the
Effects of Vibration Variables The vibration transmissibility of the body is approximately linear (i.e., doubling the magnitude of vibration at the seat may be expected to approximately double the magnitude of vibration at the head or at the hand). Vibration-correlated error, task completion time and workload may therefore increase in approximately linear proportion to vibration magnitude (Baker & Mansfield, 2010).
High
Workload
Performance
High
Minimal Fail Automatic
Adaptation
Compromise
Failure
Low 0 Figure 9
Vibration
Tolerance limit
Zero High
Model of effect of vibration on workload to complete a task and performance of task completion.
502
DESIGN FOR HEALTH, SAFETY, AND COMFORT
dependence on frequency of the effects of vibration was different for the two activities. 3.2.3 Cognitive Tasks To be useful, studies of cognitive effects of vibration must be able to show that any changes associated with exposure to vibration were not caused by vibration affecting input processes (e.g., vision) or output processes (e.g., hand control). Only a few investigators have addressed possible cognitive effects of vibration with care and demonstrated such problems (e.g., Sherwood & Griffin, 1990, 1992; Shoenberger, 1974). The effects of vibration on cognitive tasks can also be studied using workload with NASA-TLX being the most commonly used metric (Hart & Staveland, 1988). This has reliably shown that while it is possible to compensate for the mechanical effects of vibration, there is an associated workload cost (e.g., Paddan et al., 2012; Figure 9). 3.3 Health Effects Epidemiological studies have reported disorders among persons exposed to vibration from occupational, sport, and leisure activities (see Bovenzi, 2009; Bovenzi & Hulshof, 1998; Dupuis & Zerlett, 1986; Griffin, 1990; Mansfield & Marshall, 2001; National Institute for Occupational Safety & Health (NIOSH), 1997). The studies do not all agree on either the type or the extent of disorders and rarely have the findings been related to measurements of the vibration exposures. However, the incidence of some disorders of the back (back pain, displacement of intervertebral discs, degeneration of spinal vertebrae, osteoarthritis, etc.) appears to be greater in some groups of vehicle operators, and it is thought that this is sometimes associated with their vibration exposure. There may be several alternative causes of an increase in disorders of the back among persons exposed to vibration (e.g., poor sitting postures, heavy lifting). It is not always possible to conclude confidently that a back disorder in a person occupationally exposed to whole-body vibration is solely, or primarily, caused by vibration. Mechanical shocks can cause acute injury such as spinal fractures and disc herniation. These shocks can be particularly high in high-speed marine craft and military applications. 3.3.1 Vibration Evaluation Epidemiological data alone are not sufficient to define how to evaluate whole-body vibration so as to predict the relative risks to health from the different types of vibration exposure. A consideration of such data in combination with an understanding of biodynamic responses and subjective responses is used to provide current guidance. The manner in which the health effects of oscillatory motions depend upon the frequency, direction, and duration of motion is currently assumed to be similar to that for vibration discomfort (see Section 3.1). It is important to consider not only the vibration magnitude but also the vibration duration, and all exposures in a particular time analogous to a vibration ‘dose’. 3.3.2 International Standard 2631-1 International Standard 2631-1 (ISO, 1997) offers two different methods of evaluating vibration severity with respect to health effects. The frequency-weighted r.m.s. aw r.m.s. has an exponent of 2, and averages the weighted acceleration (aw ) over time (T). A short or long measurement of an identical vibration exposure will give identical values: [ awr.m.s. =
T
1 a2 (t) dt T ∫0 w
]1∕2
The 2010 amendment of ISO 2631-1 added the 8-hour dose A(8) where N is the number of different exposures being considered, awn is the frequency-weighted r.m.s. acceleration for exposure n and tn is the duration of exposure n. √ √ n=N √1 ∑ a2 t A(8) = √ 8 n=1 wn n This method allows for multiple exposures of different durations and vibration magnitudes to be combined into a single value for comparison with metrics. The second method for vibration evaluation is the vibration dose value (VDV) that uses an exponent of 4 with no averaging over time. This means that the VDV emphasizes highacceleration events more than r.m.s. and that the metric will continually increase over a measurement (Mansfield, 2004). However, it is difficult to extrapolate from measurements to an overall daily dose measure. { VDV =
}1∕4
T
[aw (t)]4 dt
∫0
ISO 2631-1 defines a health guidance caution zone between 10 min and 24 h (Figure 10), where it is suggested that below a boundary corresponding to vibration dose value of 8.5 m s−1.75 “health risks have not been objectively observed,” between 8.5 and 17 m s−1.75 “caution with respect to health risks is indicated,” and above 17 m s−1.75 “health risks are likely.” Equivalent boundaries of 0.43 m s−2 and 0.87 m s−2 are defined for A(8). These zones largely overlap between 4 h and 8 h but are not identical, and diverge for longer and shorter durations. At about 10 minutes, it is possible to exceed the zone if calculated according to VDV but be below it for r.m.s. and therefore the method must be used with caution. 3.3.3 International Standard 2631-5 International Standard 2631-5 (ISO, 2018) gives methods for assessments of vertical shocks up to 14 g. The methods used are defined for cases where free fall occurs before the shock and for less severe motions without free fall. It uses a model of the spinal response that is applied to a measured acceleration, and positive peaks in the signal are identified. The method uses an exponent of 6 to calculate an acceleration dose Dz , thereby emphasizing the shocks in a signal. The dose is defined by )1
( Dz = 1.07
∑
6
A6z,i
i
where Az,i is the ith peak in acceleration. The peak finding algorithm is illustrated in Figure 11 showing how discrete points in the waveform are considered in later stages of the calculation. A daily dose, Dzd , can be calculated by extrapolating the acceleration dose: ( )1 t 6 Dzd = Dz d tm where td is the time period of the exposure and tm is the time period of the measurement. 3.3.4 EU Physical Agents Directive (2002) The Parliament and Commission of the European Community have defined minimum health and safety requirements for the exposure of workers to the risks arising from vibration
VIBRATION AND MOTION
503
Weighted acceleration (m/s2)
10
1
0.1 100
8h
4h
10 min
HGCZ
10000
1000
100000
Exposure duration (s) Figure 10 Health guidance caution zones as defined in ISO 2631-1. Dotted lines show zone defined using r.m.s.; solid lines show zone defined using VDV. Between 4 and 8 hours the zone is similar for both calculation methods.
2
(a)
Acceleration (m/s2)
1.5 1 0.5 0 –0.5 –1 –1.5 –2 3308.0 2
3308.5
3309.0
3309.5
3310.0
3310.5
3311.0
3310.0
3310.5
3311.0
Time (seconds)
(b)
Acceleration (m/s2)
1.5 1 0.5 0 –0.5 –1 –1.5 –2 3308.0
3308.5
3309.0
3309.5 Time (seconds)
Figure 11 Plots of an earth-moving machine whole-body vibration z-axis acceleration: (a) unfiltered waveform, (b) waveform after application of ISO 2631-5 spinal response model with peaks identified using peak identification algorithm highlighted by “x” symbol. Positive acceleration in (b) corresponds to compressive load.
504
DESIGN FOR HEALTH, SAFETY, AND COMFORT
(European Parliament and the Council of the European Union, 2002). For whole-body vibration, the Directive defines an 8-h equivalent exposure action value of 0.5 m s−2 r.m.s. and an 8-h equivalent exposure limit value of 1.15 m s−2 r.m.s. At the heart of the Directive is the requirement to minimize risks from vibration. The Directive says that workers shall not be exposed above the “exposure limit value.” If the “exposure action value” is exceeded, the employer shall establish and implement a program of technical and/or organizational measures intended to reduce to a minimum exposure to mechanical vibration and the attendant risks. The Directive says workers exposed to vibration in excess of the exposure action values shall be entitled to appropriate health surveillance. Health surveillance is also required if there is any reason to suspect that workers may be injured by the vibration even if the exposure action value is not exceeded. The probability of injury arising from occupational exposures to whole-body vibration at the exposure action value and the exposure limit value cannot be estimated because epidemiological studies have not yet produced dose–response relationships. However, it seems clear that the Directive does not define safe exposures to whole-body vibration since the r.m.s. values are associated with extraordinarily high magnitudes of vibration (and shock) when the exposures are short: these exposures may be assumed to be hazardous (Griffin, 2004). The vibration dose value procedure suggests more reasonable vibration magnitudes for short-duration exposures, and ISO 2631-5 for extreme shocks (Rantaharju, Mansfield, Ala-Hiiro, & Gunston, 2015). 3.4 Disturbance in Buildings Acceptable magnitudes of vibration in buildings are generally close to, or below, vibration perception thresholds (Morioka & Griffin, 2008). The effects of vibration in buildings are assumed to depend on the use of the building in addition to the vibration frequency, the vibration direction, and the vibration duration. International Standard 2631-2 (ISO, 2003) provides some information on the measurement and evaluation of building vibration, but limited practical guidance. British Standard 6472-1 (BSI, 2008) offers guidance on the measurement, the evaluation, and the assessment of vibration in buildings, and BS 6472-2 (BSI, 2008) defines a method used for assessing the vibration of buildings caused by blasting. Using the guidance contained in BS 6472-1 (BSI, 2008) it is possible to predict the acceptability of vibration in different types of building by reference to a simple table of vibration dose values (see Table 5 and British Standard 6472-1 (BSI, 2008)). The vibration dose values in Table 5 are applicable, irrespective of whether the vibration occurs as a continuous vibration, intermittent vibration, or repeated shocks. Table 5 Vibration Dose Value Ranges Expected to Result in Various Degrees of Adverse Comment in Residential Buildings
Place Residential buildings 16-h day Residential buildings 8-h night
Low probability of adverse comment (ms−1.75 )
Adverse comment possible (ms−1.75 )
Adverse comment probable (ms−1.75 )
0.2–0.4
0.4–0.8
0.8–1.6
0.1–0.2
0.2–0.4
0.4–0.8
Source: Based on BSI, 2008. Note: For offices and workshops, multiplying factors of 2 and 4, respectively, can be applied to the vibration dose value ranges for a 16-h day.
3.5 Biomechanics The human body is a complex mechanical system that does not, in general, respond to vibration in the same manner as a rigid mass: relative motions between the body parts vary with the frequency and the direction of the applied vibration. Although there are resonances in the body, it is over-simplistic to summarize the dynamic responses of the body by merely mentioning one or two resonance frequencies. The biomechanics of the body affect human responses to vibration, but the discomfort, the interference with activities, and the health effects of vibration cannot be well predicted solely by considering the body as a mechanical system. 3.5.1 Transmissibility of the Human Body The extent to which the vibration at the input to the body (e.g., the vertical vibration at a seat) is transmitted to a part of the body (e.g., vertical vibration at the head or the hand) is described by the transmissibility. At low frequencies of oscillation (e.g., less than about 1 Hz), the oscillations of the seat and the body are very similar and so the transmissibility is approximately 1.0. With increasing frequency of oscillation, the motions of the body increase above that measured at the seat; the ratio of the motion of the body to the motion of the seat will reach a peak at one or more frequencies (i.e., resonance frequencies). At high frequencies the body motion will be less than that at the seat. The resonance frequencies and the transmissibilities at resonance vary according to where the vibration is measured on the body and the posture of the body. For seated persons, there may be resonances to the head and the hand at frequencies in the range 4–12 Hz when exposed to vertical vibration, at frequencies less than 4 Hz when exposed to x-axis (i.e., fore-and-aft) vibration, and less than 2 Hz when exposed to y-axis (i.e., lateral) vibration (see Paddan & Griffin, 1988a, 1988b). A seat back can greatly increase the transmission of fore-and-aft seat vibration to the head and upper-body of seated people, and bending of the legs can greatly affect the transmission of vertical vibration to the heads of standing persons. 3.5.2 Mechanical Impedance of the Human Body Mechanical impedance reflects the relation between the driving force at the input to the body and the resultant movement of the body. If the human body were rigid, the ratio of force to acceleration applied to the body would be constant and indicate the mass of the body. Because the human body is not rigid, the ratio of force to acceleration is only close to the body mass at very low frequencies (less than about 2 Hz with vertical vibration; less than about 1 Hz with horizontal vibration). Measures of mechanical impedance usually show a principal resonance for the vertical vibration of seated people at about 5 Hz, and sometimes a second resonance in the range 7 to 12 Hz (Fairley & Griffin, 1989; Griffin, 2000, 2001; Mansfield & Maeda, 2007; Matsumoto, Nawayseh & Griffin, 2003). Unlike some of the resonances affecting the transmissibility of the body, these resonances are only influenced by movement of large masses close to the input of vibration to the body. The large difference in impedance between that of a rigid mass and that of the human body means that the body cannot usually be represented by a rigid mass when measuring the vibration transmitted through seats. 3.5.3 Biomechanical Models Various mathematical models of the responses of the body to vibration have been developed (Mansfield, 2019). A simple model with one or two degrees-of-freedom can provide an adequate representation of the vertical mechanical impedance
VIBRATION AND MOTION
505
of the body (e.g., Fairley & Griffin, 1989; Wei & Griffin, 1998a) and be used to predict the transmissibility of seats (Wei & Griffin, 1998b) or construct an anthropodynamic dummy for seat testing (Lewis & Griffin, 2002). Compared with mechanical impedance, the transmissibility of the body is affected by many more variables and so requires a more complex model reflecting the posture of the body and the translation and rotation associated with the various modes of vibration (Matsumoto & Griffin, 2001). 3.6 Protection from Whole-Body Vibration Wherever possible, vibration should be reduced at the source. This may involve reducing the undulations of the terrain, or reducing the speed of travel of vehicles, or improving the balance of rotating parts. Methods of reducing the transmission of vibration to operators require an understanding of the characteristics of the vibration environment and the route for the transmission of vibration to the body. For example, the magnitude of vibration often varies with location: lower magnitudes will be experienced in some areas adjacent to machinery or in different parts of vehicles. 3.6.1 Seating Dynamics Almost all vehicle seats exhibit a resonance at low frequencies that results in a range of frequencies where higher magnitudes of vertical vibration occur on the seat surface than on the floor. At high frequencies there is usually attenuation of vibration. The resonance frequencies of common seats are usually in the region of 4 Hz (see Figure 12). The amplification at resonance is partially determined by the “damping” in the seat. Increases in the damping of a seat cushion tend to reduce the amplification at resonance but increase the transmission of vibration at higher frequencies. The variations in transmissibility between seats are sufficient to result in significant differences in the vibration experienced by people supported by different seats. A simple numerical indication of the isolation efficiency of a seat for a specific application is provided by the seat effective amplitude transmissibility (SEAT) (Griffin, 1990). A SEAT value greater than 100% indicates that, overall, the vibration on the seat is “worse” than the vibration on the floor beneath the seat: ride comfort seat SEAT(%) = × 100 ride comfort floor
Transmissibility
3 Spring cushion Spring case A Spring case B Foam A Foam cushion Rubberized hair Foam B Foam C 60-mm foam 30-mm foam
2
140% 141% 137% 128% 127% 124% 117% 117% 109% 102%
0
5
10 15 Frequency (Hz)
SEAT(%) =
vibration dose value on seat × 100 vibration dose value on floor
The SEAT value is a characteristic of the vibration input and not merely a description of the dynamics of the seat: different values are obtained with the same seat in different vehicles. The SEAT value indicates the suitability of a seat for a particular type of vibration. A separate suspension mechanism is provided beneath the seat pan in suspension seats. These seats, used in some off-road vehicles, trucks, and coaches, have low resonance frequencies (often less than about 2 Hz) and so can attenuate vibration at frequencies much greater than 2 Hz. The transmissibilities of these seats are usually determined by the seat manufacturer, but their isolation efficiencies vary with operating conditions. For high-shock environments, such as high-speed marine and some military applications, suspension seats can have long travel, enabling the seat to absorb a significant amount of shock and vibration energy. Even with such seats, the residual vibration exposure can still exceed regulated vibration limits.
4 MOTION SICKNESS Motion sickness is not an illness but a normal response to motion that is experienced by many fit and healthy people. A variety of different motions can cause sickness and reduce the comfort, impede the activities, and degrade the well-being of both those directly affected and those associated with the motion sick person. Although vomiting can be the most inconvenient consequence, other effects (e.g., yawning, cold sweating, nausea, stomach awareness, dry mouth, increased salivation, headaches, bodily warmth, dizziness, and drowsiness) can also be unpleasant. In some cases, the symptoms can be so severe as to result in reduced motivation to survive difficult situations. 4.1 Causes of Motion Sickness
1
0
Values below 100% indicate that the seat has provided some useful attenuation. Seats should be designed to have the lowest SEAT value compatible with other constraints. The SEAT value tends to cluster within a range depending on the vehicle type, but it is not uncommon to measure a SEAT value greater than 100% meaning that the seat has increased the vibration magnitude (Paddan & Griffin, 2002). In practice, the SEAT value is a mathematical procedure for predicting the effect of a seat on ride comfort. The ride comfort that would result from sitting on the seat or on the floor can be predicted using the frequency weightings in the appropriate standard. The SEAT value may be calculated from the r.m.s. values or the vibration dose values of the frequency-weighted acceleration on the seat and the floor:
20
25
Figure 12 Comparison of vertical transmissibilities and SEAT values for 10 alternative cushions of passenger railway seats. (Source: Data from Corbridge et al., 1989.)
Motion sickness can be caused by many different movements of the body (e.g., translational and rotational oscillation, constant speed rotation about an off-vertical axis), movements of the visual scene, and various other stimuli producing sensations associated with movement of the body (see Griffin, 1991). Motion sickness is neither explained nor predicted solely by the physical characteristics of the motion, although some motions can reliably be predicted as more nauseogenic than others. Motion sickness has been well described in many transport systems including sea, land, air and space travel. It is also common in environments where there is apparent movement such as when viewing screens with a wide field of view (e.g., gaming, CAD, cinema), or with VR headsets. Mitigation of motion sickness for autonomous vehicles is a significant challenge (Diels & Bos, 2016).
506
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Motions of the body may be detected by three basic sensory systems: the vestibular system, the visual system, and the somatosensory system. An additional “control” system can also be considered for drivers/operators (Mansfield, 2005). The vestibular system is located in the inner ear and comprises the semicircular canals, which respond to the rotation of the head, and the otoliths, which respond to translational forces (either translational acceleration or rotation of the head relative to an acceleration field, such as the force of gravity). The eyes may detect relative motion between the head and the environment, caused by either head movements (in translation or rotation) or movements of the environment or a combination of the movements of the head and the environment. The somatosensory systems respond to force and displacement of parts of the body and give rise to sensations of body movement, or force. It is assumed that in “normal” environments the movements of the body are detected by all sensory systems and that this leads to an unambiguous indication of the movements of the body in space. This will correspond to the control inputs for a driver. In some other environments the three sensory systems may give signals corresponding to different motions (or motions that are not realistic) and lead to some form of conflict. This leads to the idea of a sensory conflict theory of motion sickness in which sickness occurs when the sensory systems mismatch the motions which are occurring. However, this implies some absolute significance to sensory information, whereas the “meaning” of the information is probably learned. This led to the sensory rearrangement theory of motion sickness that states: all situations which provoke motion sickness are characterized by a condition of sensory rearrangement in which the motion signals transmitted by the eyes, the vestibular system, and the non-vestibular proprioceptors are at variance either with one another or with what is expected from previous experience (Reason, 1970, 1978). Reason and Brand (1975) suggest that the conflict may be sufficiently considered in two categories: intermodality (between vision and the vestibular receptors) and intramodality (between the semicircular canals and the otoliths within the vestibular system). For both categories it is possible to identify three types of situations in which conflict can occur (see Table 6). The theory implies that all situations which provoke motion sickness can be fitted into one of the six conditions shown in Table 6 (see Griffin, 1990). Table 6
There is evidence that the average susceptibility to sickness among males is less than that among females, and susceptibility decreases with increased age among both males and females (Clemes & Howarth, 2005; Lawther & Griffin, 1988a; Turner & Griffin, 1999). However, there are larger individual differences within any group of either gender at any age: some people are easily made ill by motions that can be endured indefinitely by others. The reasons for these differences are not properly understood. 4.2 Sickness Caused by Oscillatory Motion Motion sickness is not caused by oscillation (however violent) at frequencies much greater than about 1 Hz: the phenomenon arises from motions at the low frequencies associated with normal postural control of the body. Various experimental investigations have explored the extent to which vertical oscillation causes sickness at different frequencies. These studies have allowed the formulation of a frequency weighting, Wf (see Figure 4), and the definition of a motion sickness dose value, MSDV. The frequency weighting Wf reflects greatest sensitivity to acceleration in the range 0.125–0.25 Hz, with a rapid reduction in sensitivity at higher frequencies. The motion sickness dose value predicts the probability of sickness from knowledge of the frequency and magnitude of vertical oscillation (see International Organization for Standardization (ISO), 1997; Lawther & Griffin, 1987): Motion sickness dose value = arms t1∕2 where arms is the r.m.s. value of the frequency-weighted acceleration (m s−2 r.m.s.) and t is the exposure period (seconds). The percentage of unadapted adults who are expected to vomit is given by 1/3 MSDV. (These relationships have been derived from exposures in which up to 70% of persons vomited during exposures lasting between 20 min and 6 h.) The motion sickness dose value has been used for the prediction of sickness on various marine craft (ships, hovercraft, and hydrofoil) in which vertical oscillation has been shown to be a prime cause of sickness (Lawther & Griffin, 1988b). Vertical oscillation is not the principal cause of sickness in many road vehicles (Griffin & Newman, 2004; Turner & Griffin, 1999)
Examples of Types of Motion Cue Mismatch Produced by Various Provocative Stimuli Category of motion cue mismatch
TYPE 1 A and B simultaneously give contradictory or uncorrelated information TYPE IIa A signals in absence of expected B signal
TYPE IIb B signals in absence of expected A signals
Source: Adapted from Benson, 1984.
Visual (A)/Vestibular (B)
Canal (A)/Otolith (B)
Watching waves from a ship
Making head movements while rotating (Coriolis or cross-coupled stimulation)
Watching a video in a moving vehicle
Space sickness
Fixed-base simulator sickness
Positional alcohol nystagmus
VR headsets
Caloric reflex text (balance test using temperature controlled water)
Looking inside a moving vehicle without external visual reference (e.g., below deck in a boat) Reading in a moving vehicle
Low-frequency ( action value but < limit value
Minimise exposure Minimise risks Health surveillance
> limit value
Immediate action to reduce exposure below limit value
Eliminate vibration Assess or measure vibration exposure A(8) yes Exposure to vibration
Can vibration exposure be eliminated?
no
Information and training of workers
Regular reviews Figure 17 Flow chart showing actions necessary to comply with the EU Physical Agents (Vibration) Directive (2002). (Source: Mansfield, 2004. © 2004 Taylor & Francis.)
5.4.5 EU Physical Agents Directive (2002) For hand-transmitted vibration, the EU Physical Agents Directive defines an 8-h equivalent exposure action value of 2.5 m s−2 r.m.s. and an 8-h equivalent exposure limit value of 5.0 m s−2 r.m.s. (Figure 16) (European Parliament and the Council of the European Union, 2002). The Directive says workers shall not be exposed above the exposure limit value. If the exposure action values are exceeded, the employer shall establish and implement a program of technical and/or organizational measures intended to reduce to a minimum exposure to mechanical vibration and the attendant risks (Figure 17). The Directive requires that workers exposed to mechanical vibration in excess of the exposure action values shall be entitled to appropriate health surveillance. However, health surveillance is not restricted to situations where the exposure action value is exceeded: health surveillance is required if there is any reason to suspect that workers may be injured by the vibration, even if the action value is not exceeded. According to ISO 5349-1 (ISO, 2001), the onset of finger blanching would be expected in 10% of persons after 12 years at the EU exposure action value and after 5.8 years at the exposure limit value. The exposure action value and the exposure limit value in the EU Directive do not define safe exposures to hand-transmitted vibration (Griffin, 2004). REFERENCES Allen, R. W., Jex, H. R., & Magdaleno, R. E. (1973). Manual control performance and dynamic response during sinusoidal vibration. AMRL-TR-73-78. Ohio: Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base Baker, W. D., & Mansfield, N. J. (2010). Effects of horizontal whole-body vibration and standing posture on activity interference. Ergonomics, 53(3), 365–374. Benson, A. J. (1984). Motion sickness. In M. R. Dix & J. S. Hood (Eds.), Vertigo. New York: Wiley.
Benson, A. J., Barnes, G. R. (1978). Vision during angular oscillation: The dynamic interaction of visual and vestibular mechanisms. Aviation, Space and Environmental Medicine, 49(1), Section II, 340–345. Bovenzi, M. (2009). Metrics of whole-body vibration and exposure-response relationships for low back pain in professional drivers: A prospective cohort study. International Archives of Occupational and Environmental Health, 82, 893–917. Bovenzi, M., & Hulshof, C. T. J. (1998). An updated review of epidemiologic studies on the relationship between exposure to whole-body vibration and low back pain. Journal of Sound and Vibration, 215(4), 595–611. Brammer, A. J., Taylor, W., & Lundborg, G. (1987). Sensorineural stages of the hand-arm vibration syndrome. Scandinavian Journal of Work, Environment and Health, 13(4), 279–283. British Standards Institution (BSI) (1987). Measurement and evaluation of human exposure to whole-body mechanical vibration and repeated shock. BS 6841. London: BSI. British Standards Institution (BSI) (2008). Guide to the evaluation of human exposure to vibration in buildings. Part 1: Vibration sources other than blasting. BS 6472-1, London: BSI. Clemes, S. A., & Howarth, P. A. (2005). The menstrual cycle and susceptibility to virtual simulation sickness. Journal of Biological Rhythms, 20(1), 71–82. Corbridge, C., & Griffin, M. J. (1991). Effects of vertical vibration on passenger activities: Writing and drinking. Ergonomics, 34(10), 1313–1332. Corbridge, C., Griffin, M. J., & Harborough, P. (1989). Seat dynamics and passenger comfort. Proceedings of the Institution of Mechanical Engineers, 203, 57–64. Diels, C., & Bos, J. E. (2016). Self-driving carsickness. Applied Ergonomics, 53, 374–382. Dupuis, H., & Zerlett, G. (1986). The effects of whole-body vibration. Berlin: Springer-Verlag. Ebe, K. & Griffin, M. J. (2000a). Qualitative models of seat discomfort including static and dynamic factors. Ergonomics, 43(6), 771–790.
512 Ebe, K. & Griffin, M. J. (2000b). Quantitative prediction of overall seat discomfort. Ergonomics, 43(6), 791–806. European Parliament and the Council of the European Union (2002). On the minimum health and safety requirements regarding the exposure of workers to the risks arising from physical agents (vibration). Directive 2002/44/EC, Official Journal of the European Communities, L177/13-19, July 6, 2002. European Parliament and the Council of the European Union (2006). Directive of the European Parliament and of the Council on Machinery, and Amending Directive 95/16/EC. Official Journal of the European Communities, L157, 24–86. Fairley, T. E., & Griffin, M. J. (1989). The apparent mass of the seated human body: vertical vibration. Journal of Biomechanics, 22(2), 81–94. Gemne, G., Pyykko, I., Taylor, W., & Pelmear, P. (1987). The Stockholm Workshop Scale for the classification of cold-induced Raynaud’s phenomenon in the hand-arm vibration syndrome (Revision of the Taylor-Pelmear scale). Scandinavian Journal of Work, Environment and Health, 13(4), 275–278. Goggins, K. A., Oddson, B. E., Lievers, W. B., & Eger, T. R. (2020). Anatomical locations for capturing magnitude differences in foot-transmitted vibration exposure, determined using multiple correspondence analysis. Theoretical Issues in Ergonomics Science, 78, 1–15. Griffin, M. J. (1976). Eye motion during whole-body vertical vibration. Human Factors, 18(6), 601–606. Griffin, M. J. (1990), Handbook of human vibration. London: Academic. Griffin, M. J. (1991). Physical characteristics of stimuli provoking motion sickness. Paper 3 in Motion Sickness: Significance in aerospace operations and prophylaxis. AGARD Lecture Series LS-175, NATO Research and Technology Organisation. Griffin, M. J. (1997). Measurement, evaluation, and assessment of occupational exposures to hand-transmitted vibration. Occupational and Environmental Medicine, 54(2), 73–89. Griffin, M. J. (1998). Evaluating the effectiveness of gloves in reducing the hazards of hand-transmitted vibration. Occupational and Environmental Medicine, 55(5), 340–348. Griffin, M. J. (2004). Minimum health and safety requirements for workers exposed to hand-transmitted vibration and whole-body vibration in the European Union; a review. Occupational and Environmental Medicine, 61, 387–397. Griffin, M. J., & Bovenzi, M. (2002). The diagnosis of disorders caused by hand-transmitted vibration: Southampton Workshop 2000. International Archives of Occupational and Environmental Health, 75(1–2), 1–5. Griffin, M. J., Bovenzi, M., & Nelson, C. M. (2003). Dose response patterns for vibration-induced white finger. Occupational and Environmental Medicine, 60, 16–26. Griffin, M. J., & Newman, M. M. (2004). Effects of the visual field on motion sickness in cars. Aviation, Space and Environmental Medicine, 75, 739–748. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in Psychology (vol. 52, pp. 139–183). Amsterdam: North-Holland. International Organization for Standardization (ISO) (1997). Mechanical vibration and shock: Evaluation of human exposure to whole-body vibration. Part 1: General requirements. ISO 2631-1. Geneva: ISO (as amended, 2010). International Organization for Standardization (ISO) (2001). Mechanical vibration: Measurement and evaluation of human exposure to hand-transmitted vibration. Part 1: General requirements. ISO 5349-1:2001(E). Geneva: ISO. International Organization for Standardization (ISO) (2002). Mechanical vibration: Measurement and evaluation of human exposure to hand-transmitted vibration. Part 2: Practical guidance for measurement at the workplace. ISO 5349-2:2001 E. Geneva: ISO.
DESIGN FOR HEALTH, SAFETY, AND COMFORT International Organization for Standardization (ISO) (2003). Mechanical vibration and shock: Evaluation of human exposure to whole-body vibration. Part 2: Vibration in buildings (1 Hz to 80 Hz). ISO 2631-2. Geneva: ISO. International Organization for Standardization (ISO) (2008a). Mechanical vibration: Hand-held and hand-guided machinery. Principles for evaluation of vibration emission. ISO 20643, +A1 2012. Geneva: ISO. International Organization for Standardization (ISO) (2008b). Acoustics preferred reference quantities for acoustic levels. ISO 1683. Geneva: ISO. International Organization for Standardization (ISO) (2009). Hand-held portable power tools test methods for evaluation of vibration of vibration emission. Part 1: Angle and vertical grinders. ISO 28927-1.Geneva: ISO. International Organization for Standardization (ISO) (2013). Mechanical vibration and shock: Hand-arm vibration method for the measurement and evaluation of the vibration transmissibility of gloves at the palm of the hand. ISO 10819. Geneva: ISO. International Organization for Standardization (ISO) (2018). Mechanical vibration and shock: Evaluation of human exposure to whole-body vibration: Part 5: Method for evaluation of vibration containing multiple shocks. ISO 2631-5:2018, ISO, Geneva: ISO. Lawther, A., & Griffin, M. J. (1987). Prediction of the incidence of motion sickness from the magnitude, frequency, and duration of vertical oscillation. Journal of the Acoustical Society of America, 82(3), 957–966. Lawther, A., & Griffin, M. J. (1988a). A survey of the occurrence of motion sickness amongst passengers at sea. Aviation, Space and Environmental Medicine, 59(5), 399–406. Lawther, A., & Griffin, M. J. (1988b). Motion sickness and motion characteristics of vessels at sea. Ergonomics, 31(10), 1373–1394. Levison, W. H., & Harrah, C. B. (1977). Biomechanical and performance response of man in six different directional axis vibration environments. AMRL-TR-77-71. Ohio: Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base. Lewis, C. H., & Griffin, M. J. (1977). The interaction of control gain and vibration with continuous manual control performance. Journal of Sound and Vibration, 55(4), 553–562. Lewis, C. H., & Griffin, M. J. (2002). Evaluating vibration isolation of soft seat cushions using an active anthropodynamic dummy. Journal of Sound and Vibration, 253(1), 295–311. Loriga, G. (1911). Il lavoro con i martelli pneumatici [The use of pneumatic hammers]. Bollettino del Ispettorato del Lavoro, 2, 35–60. Mansfield, N. J. (2004). Human response to vibration. Boca Raton, FL: CRC Press. Mansfield, N. J. (2019). Models of the human in dynamic environments. In DHM and Posturography (pp. 487–496). London: Academic Press. Mansfield, N.J. (2020). Optimizing athlete performance in race vehicle systems exposed to mechanical shock and vibration. In P. Salmon, A. Hulme, S. McLean, C. Dallat, N. Mansfield, & C. Solomon (Eds.), Human factors and ergonomics in sports. Boca Raton, FL: CRC Press. Mansfield, N. J., Mackrill, J., Rimell, A. N., & MacMull, S. J. (2014). Combined effects of long-term sitting and whole-body vibration on discomfort onset for vehicle occupants. ISRN Automotive Engineering, 2014. Mansfield, N. J., & Maeda, S. (2007). The apparent mass of the seated human exposed to single-axis and multi-axis whole-body vibration. Journal of Biomechanics, 40(11), 2543–2551. Mansfield, N. J., & Marshall, J. M. (2001). Symptoms of musculoskeletal disorders in stage rally drivers and co-drivers. British Journal of Sports Medicine, 35(5), 314–320. Mansfield, N., Naddeo, A., Frohriep, S., & Vink, P. (2020). Integrating and applying models of comfort. Applied Ergonomics, 82, 102917.
VIBRATION AND MOTION Marjanen, Y., & Mansfield, N. J. (2010). Relative contribution of translational and rotational vibration to discomfort. Industrial Health, 48(5), 519–529. Matsumoto, Y., & Griffin, M. J. (2000). Comparison of biodynamic responses in standing and seated human bodies. Journal of Sound and Vibration, 238(4), 691–704. Matsumoto, Y., & Griffin, M. J. (2001). Modelling the dynamic mechanisms associated with the principal resonance of the seated human body. Clinical Biomechanics, 16(Suppl. 1), S31–S44. McLeod, R. W., & Griffin, M. J. (1989). A review of the effects of translational whole-body vibration on continuous manual control performance. Journal of Sound and Vibration, 133(1), 55–115. McLeod, R. W., & Griffin, M. J. (1990). Effects of whole-body vibration waveform and display collimation on the performance of a complex manual control task. Aviation, Space and Environmental Medicine, 61(3), 211–219. Meddick, R. D. L., & Griffin, M. J. (1976). The effect of two-axis vibration on the legibility of reading material. Ergonomics, 19(1), 21–33. Morioka, M., & Griffin, M. J. (2006a). Magnitude-dependence of equivalent comfort contours for fore-and-aft, lateral and vertical whole-body vibration. Journal of Sound and Vibration, 298(3), 755–772. Morioka, M., & Griffin, M. J. (2006b). Magnitude-dependence of equivalent comfort contours for fore-and-aft, lateral and vertical hand-transmitted vibration. Journal of Sound and Vibration, 298(3), 633–648. Morioka, M., & Griffin, M. J. (2008). Absolute thresholds for the perception of fore-and-aft, lateral, and vertical vibration at the hand, the seat, and the foot. Journal of Sound and Vibration, 314, 357–370. Moseley, M. J., & Griffin, M. J. (1986). Effects of display vibration and whole-body vibration on visual performance. Ergonomics, 29(8), 977–983. Moseley, M. J., Lewis, C. H., & Griffin, M. J. (1982). Sinusoidal and random whole-body vibration: comparative effects on visual performance. Aviation, Space and Environmental Medicine, 53(10), 1000–1005. National Institute for Occupational Safety and Health (NIOSH) (1997). Musculoskeletal disorders and workplace factors: A critical review of epidemiologic evidence for work-related disorders of the neck, upper extremities, and low back. Ed. B. P. Bernard. DHHS (NIOSH) Publication No. 97-141. Washington, DC: U.S. Department of Health and Human Services, NIOSH. Nawayseh, N., & Griffin, M. J. (2003). Non-linear dual-axis biodynamic response to vertical whole-body vibration. Journal of Sound and Vibration, 268, 503–523. Newell, G. S., & Mansfield, N. J. (2008). Evaluation of reaction time performance and subjective workload during whole-body vibration exposure while seated in upright and twisted postures with and without armrests. International Journal of Industrial Ergonomics, 38(5–6), 499–508. O’Hanlon, J. G., & Griffin, M. J. (1971). Some effects of the vibration of reading material upon visual performance. Technical Report No. 49. Southampton: Institute of Sound and Vibration Research, University of Southampton. Paddan, G. S., & Griffin, M. J. (1988a). The transmission of translational seat vibration to the head. I. Vertical seat vibration. Journal of Biomechanics, 21(3), 191–197. Paddan, G. S., & Griffin, M. J. (1988b). The transmission of translational seat vibration to the head, II. Horizontal seat vibration. Journal of Biomechanics, 21(3), 199–206.
513 Paddan, G. S., & Griffin, M. J. (2002). Effect of seating on exposures to whole-body vibration in vehicles. Journal of Sound and Vibration, 253(1), 215–241. Paddan, G. S., Holmes, S. R., Mansfield, N. J., Hutchinson, H., Arrowsmith, C. I., King, S. K., … & Rimell, A. N. (2012). The influence of seat backrest angle on human performance during whole-body vibration. Ergonomics, 55(1), 114–128. Rantaharju, T., Mansfield, N. J., Ala-Hiiro, J. M., & Gunston, T. P. (2015). Predicting the health risks related to whole-body vibration and shock: A comparison of alternative assessment methods for high-acceleration events in vehicles. Ergonomics, 58(7), 1071–1087. Reason, J. T. (1970). Motion sickness: A special case of sensory rearrangement. Advancement of Science, 26, 386–393. Reason, J. T. (1978). Motion sickness adaptation: A neural mismatch model. Journal of the Royal Society of Medicine, 71, 819–829. Reason, J. T., & Brand, J. J. (1975), Motion sickness. London: Academic. Sammonds, G. M., Mansfield, N. J., & Fray, M. (2017). Improving long term driving comfort by taking breaks: How break activity affects effectiveness. Applied ergonomics, 65, 81–89. Sherwood, N., & Griffin, M. J. (1990). Effects of whole-body vibration on short-term memory. Aviation, Space & Environmental Medicine, 61(12), 1092–1097. Sherwood, N., & Griffin, M. J. (1992). Evidence of impaired learning during whole-body vibration. Journal of Sound and Vibration, 152(2), 219–225. Shoenberger, R. W. (1974). An investigation of human information processing during whole-body vibration. Aerospace Medicine, 45(2), 143–153. Shoenberger, R. W., & Wilburn, D. L. (1973). Tracking performance during whole-body vibration with side-mounted and center-mounted control sticks. AMRL-TR-72-120, Ohio: Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base. Taylor, W., & Pelmear, P. L. (Eds.) (1975). Vibration white finger in industry. London: Academic Press. Torle, G. (1965). Tracking performance under random acceleration: Effects of control dynamics. Ergonomics, 8(4),481–486. Turner, M., & Griffin, M. J. (1999). Motion sickness in public road transport: The relative importance of motion, vision and individual differences. British Journal of Psychology, 90, 519–530. Wasserman, D., Taylor, W., Behrens, V., Samueloff, S., & Reynolds, D. (1982). Vibration white finger disease in U.S. workers using pneumatic chipping and grinding handtools. I: Epidemiology. Technical Report, DHSS (NIOSH) Publication No. 82-118, Washington, DC: U.S. Department of Health and Human Services, National Institute for Occupational Safety and Health. Wei, L., & Griffin, M. J. (1998a). Mathematical models for the apparent mass of the seated human body exposed to vertical vibration. Journal of Sound and Vibration, 212(5), 855–874. Wei, L., & Griffin, M. J. (1998b). The prediction of seat transmissibility from measures of seat impedance. Journal of Sound and Vibration, 214(1), 121–137. Wells, M. J., & Griffin, M. J. (1984). Benefits of helmet-mounted display image stabilisation under whole-body vibration. Aviation, Space and Environmental Medicine, 55(1) 13–18. Wilson, R. V. (1974). Display collimation under whole-body vibration. Human Factors, 16(2), 186–195.
CHAPTER
20
HUMAN ERRORS AND HUMAN RELIABILITY Peng Liu Zhejiang University Hangzhou, China
Renyou Zhang, Zijian Yin, and Zhizhong Li Tsinghua University Beijing, China
1
2
INTRODUCTION
514
4.4
Technological Measures
535
1.1
Prevalence of Human Error
514
4.5
Administrative Measures
536
1.2
Defining Human Error
515
4.6
Cultural Measures
536
1.3
Human Contribution to System Safety
516
HUMAN ERROR IN EMERGING AREAS
537
WHY HUMANS ERR
518
5.1
Opportunities and Challenges of New Technologies
537
2.1
The Human Information Processing (HIP) Model
519
5.2
Human Error in Human–Automation Interaction
539
2.2
GEMS
524
5.3
Human Error in Automated Vehicles
540
2.3
Human Cognition Processes from a Macro-Cognitive Perspective
5.4
Human Error in Cybersecurity
541
5
525 6
3
4
HUMAN ERROR CLASSIFICATION, PREDICTION, DETECTION, AND ANALYSIS
528
3.1
Human Error Classification
528
3.2
Human Error Prediction
530
3.3
Human Error Detection
531
3.4
Human Error Analysis
531
HUMAN ERROR CONTROL
532
4.1
Strategies for Human Error Control at the Task Level
532
4.2
Strategies for Human Error Control at the Organization Level
533
Hierarchy of Human Error Control
534
4.3
1 INTRODUCTION 1.1 Prevalence of Human Error It has been well recognized that human error has contributed considerably to accidents in various working environments. As an industry where safety is highly concerned and valued, the aviation industry has abundant cases of accident and incident records. As mentioned by Feggetter (1982), approximately 70% of aircraft accidents and incidents have been attributed to human error. A review of US Naval safety center data indicated that over 75% of naval aviation mishaps were attributable, at least in part, to some form of human error (Wiegmann & Shappell, 1999). For commercial aviation, nearly 70% of the accidents occurring between 1990 and 2002 were associated with some manner of aircrew or supervisory error (Shappell et al. 2006). The nuclear power plant industry asks for event records as well, thus a vast quantity of facts of human error is available in this particular field, such as those provided by Dhillon (2018).
514
HUMAN RELIABILITY ANALYSIS
541
6.1
The HRA Process
541
6.2
HRA Methods
542
6.3
Performance-Shaping Factors
552
6.4
The HRA Database
556
6.5
HRA Validation and Comparison
558
6.6
Other Remarks
560
CONCLUSIONS
565
ACKNOWLEDGMENTS
565
REFERENCES
565
7
A study performed by the US Nuclear Regulatory Commission (NRC) of licensee reports has shown that approximately 65% of nuclear system failures were due to human error (Trager, 1985). According to Moore (1993) and Hee et al. (1999), the source of more than 80% high-consequence marine accidents can be attributed to compounded human and organizational errors. Similar figures can be found in other industries. In general, human error as a primary cause represents between 60–90% of major accidents in complex systems, and figures given in all these studies reflect how broad and dominant is the role that human error plays in industrial accidents. In 2006, the Institute of Medicine (IOM) reported that medication errors harmed 1.5 million people each year in the US (Peters & Peters, 2007). What has attracted much attention is the estimation by Makary and Daniel (2016) that more than 250,000 deaths per year are due to medical error, suggesting that they are the third-leading cause of death after heart disease and cancer in the US. While the number could be controversial, it is well
HUMAN ERRORS AND HUMAN RELIABILITY
agreed that medical error has contributed to too many deaths. Human error also contributes to many accidents in other service industries and everyday life. The prevalence of human error has made it a highly concerning topic in most industries and services, especially in safetycritical domains. The past decades have seen consistently increasing yearly numbers of publications on this topic. According to the Web of Science, the number increased from about 200 in 2000 to more than 1600 in 2019. A multitude of books and studies focusing on human errors in various industries, such as aviation, nuclear power plants (NPP), railway and road transportation, medicine and health care, have been published. The study of human error has been extended from traditional safety-critical systems to other industries and services. It is worth mentioning that human errors in workplaces and in daily activities do not always result in accidents and that certain errors happen so often that they have been considered “normal.” Violations (e.g., not using a safety belt when working at a height, intake of alcohol or drugs, not following a specified procedure) can be observed in many workplaces. Runway incursions happen in airports. There are so many cases of running red lights in road transportation systems. These errors are not concerning, until they cause a disaster. However, this does not mean that we should blame those who committed errors but instead indicates the importance of studying human error. Criminalization of human error is seen as a threat to safety (Dekker, 2011). “To err is human” is a famous saying in the safety field. Reason (1997) states that the majority of unsafe acts—perhaps 90% or more—were blameless. There is a well-agreed opinion that humans make errors because they are put into unfavorable conditions. From this viewpoint, human error is a consequence but not a cause (Reason, 1997). People make more errors because they have been asked to perform riskier tasks but not because they are consistently error-prone. Human errors are inevitable, no matter how excellent and careful we are; however, human errors are preventable, as revealed in accident reports. There are always technological or administrative measures that could have prevented human errors in accidents. Many products’ new designs have been proved to be inherently safer by learning from accidents. 1.2 Defining Human Error There are various definitions of human error, reflecting different understandings or viewpoints of the concept as well as different objectives of the study on the topic. Reason (1990) gives a definition of human error as “[e]rror will be taken as a generic term to encompass all those occasions in which a planned sequence of mental or physical activities fails to achieve its intended outcome, and when these failures cannot be attributed to the intervention of some chance agency.” In this definition, the key judgment of a human error is whether the intended outcome is achieved. Other researchers define human error from the consequence or unintended outcome of a human activity. A classic definition was given by Sanders and McCormick (1993) as “an inappropriate or undesirable human decision or behavior that reduces, or has the potential for reducing effectiveness, safety, or system performance.” According to this definition, a decision or action that would potentially lead to undesirable consequences is considered a human error, even if it has not resulted in undesirable consequences. Such a definition is meaningful, since many improper human decisions and actions do not cause immediate effects. In fact, latent errors are quite common. They may contribute to a disaster at an unpredictable time after the error has been made. The disaster of Japan Air Lines Flight
515
123 is such a famous example. The deadliest (520 deaths in total) single-aircraft disaster in history happened on August 12, 1985, mainly because of the flawed repair by Boeing engineers following a tail strike incident on June 2, 1978. Sanders and McCormick’s definition is also meaningful in that the consequence of an improper human activity may depend on other conditions, such as the performance of protective equipment or systems, the activities of co-workers, or situational factors. Thus, it is inappropriate to judge a human error according to its real consequence. Heinrich’s accident triangle theory (Anderson & Denkl, 2010) shows that one major injury accident relates to 29 minor injury accidents and 300 no-injury incidents. Regardless of whether the relationship is valid for different domains, the triangle theory reveals the fact that serious accidents, minor accidents, and no-injury incidents may share common causes, which mostly consist of dangerous human behaviors. This leads to the safety philosophy that attempts to reduce the number of no-injury incidents or minor accidents will also reduce the number of major accidents. Thus, attention should be paid to non-injury incidents or minor accidents, rather than waiting for a major accident to happen. “Near misses” are incidents that have the potential to cause damage, but do not actually result in undesired consequences, but which can also provide vital information for improving safety. In practice, whether a human decision or behavior is “appropriate” or not has to be determined according to a certain standard, which practitioners find difficult to establish, since it may be unavailable or unknown before an accident takes place. It is not unusual for a human error to be identified only after a thorough accident investigation, in which it was found to have led to the undesired consequence. At this point, human error is an attribution after the fact (Woods et al., 2010, p. 2). Thus, human errors are not always apparent to operators. In fact, many (if not most) errors are not noticed or reported. In turn, matching a preset standard does not guarantee the avoidance of human errors, since a common standard may not be suitable for all situations. Another definition of human error was given by Hagen and Mays (1981) as “a failure on the part of the human to perform a prescribed act (or the performance of a prohibited act) within the specified limits of accuracy, sequence, or time, which could result in damage to equipment and property or disruption of scheduled operations.” This definition is suitable for the analysis of actions with specified procedures and performance standards. These actions are pre-identified to have significant risk and are carefully studied. However, it does not imply that other actions would not contribute to accidents; instead, since they were not previously considered risky, they are overlooked in practice, and as a consequence, numerous human errors happen through these actions. When investigating human errors, we have to specify the scope of the investigation. Do we only consider the people directly involved, or do we also include the supervisors, middle managers, or even top managers? It is unsurprising that faults can be found on all levels since there are always management deficiencies. Taking an aircraft crash accident as an example, apart from hardware failures and software bugs, the causes may include: (1) the pilot’s unawareness of alarms, imperfect understanding of the situation, incorrect judgment or decision, and incorrect operations; (2) poor communication, cooperation and coordination within the crew or between the crew and air traffic controllers; (3) passengers’ violations or violence; (4) imperfect maintenance by the ground service technicians; (5) deficient training, qualification, and licensing system; (6) substandard management system and negative safety culture of the airline company; or (7) ineffective air traffic safety
516
A system involving humans would bring up safety issues as they made mistakes, and currently, it is virtually impossible to totally remove humans from a system. Even for fully automatic or autonomous systems that do not require on-line operators, they still need humans to inspect their status periodically to avoid unpredicted failures, they have to be repaired by maintenance technicians once they have failed, they need humans to set up the working parameters, and they are initially designed by humans. At any point where a human is involved, there
Safety
Software
1.3 Human Contribution to System Safety
will be possibilities of human error that may lead to a system failure. For complex systems, there are always such possibilities. Thus, the safety of a complex system highly depends on human performance. A system can be complex in structure, functions, operations, management, and other aspects. These complex systems are usually socio-technical systems (Walker et al., 2008; Woo & Vicente, 2003) rather than purely technological (hardware and/or software) systems. An air transportation system involves not only aircraft, but also pilots (current autopilot technology can be good enough from a technological standpoint but passengers would always rather have human pilots in the cockpit), ground support facilities and maintenance staff, air traffic controllers, pilots in other aircrafts (e.g., the wrong response when approaching each other), or even other people outside the airports (e.g., radio interference). Railway systems are featured with complex rail networks and work organization. Many humans, including employees and passengers, are involved in railway safety. A nuclear power plant is a highly automated complex system, but humans are still needed for its operation and management. Its safety relies not only on the reliability of the hardware and software, but also on the qualified operators and high-quality work organization, especially in an emergency. Health care is another example of a socio-technical system where human error can cause catastrophic consequences. In all the above examples, safety is very important, thus the systems are regarded as safety-critical systems. Although much research has been done on such systems, too many questions remain so research can be furthered. The safety of a complex system is like a boulder supported by three pillars, i.e., hardware, software, and human, as shown in Figure 1. The hardware pillar is the strongest one. Very high reliability can be achieved through the adoption of advanced theories, designs, materials, and manufacturing technologies, and good practice in quality control methods and maintenance policies, if there are no human errors in the above aspects. Few accidents are caused solely by hardware failures. More likely, a hardware failure creates a stressful condition where operators are prone to make errors, which induces the conditions leading to undesired results; or can become a trap if operators are unaware of the failure. Occasionally, hardware fails because of human errors. In all cases, hardware is the pillar that can be better controlled and is more dependable. Software is rather different from hardware. Most theories and technologies to achieve high reliability for hardware do not work for software. It is controversial to use the term “reliability” to
Hardware
administration by the governmental agency. For higher-level management or administration, it becomes more complicated to set up a performance standard, since the link to accidents is weaker. On the other hand, it is always good to identify all related (direct or indirect) issues in accident investigation so that we can learn more on how to avoid accidents in future. If we trace back to all the people involved in the life cycle of a product, we may conclude that almost every mishap can be ultimately attributed to human errors, such as operation and maintenance errors, manufacturing and assembly errors, and design errors (Hammer, 1972). Human faults can be found in the three most serious NPP accidents, i.e., the Three Mile Island accident in 1979, the Chernobyl disaster in 1986, and the Fukushima Daiichi nuclear disaster in 2011, and can be categorized either as a primary event, an intermediate event, or design and management defects. The scope of human error investigation should be specified according to the objective of the investigation and the possibility of making changes. Nevertheless, to prevent human errors, it is always recommended to first consider making changes in the design and the adoption of technological measures, rather than in procedures, training, supervision, and other managerial aspects. The identification of a human error is often done together with the analysis of its cause. The cause of a human error could be vague. Sometimes environmental factors (e.g., weather, temperature, noise, vibration, confined space, and air quality), individual factors (e.g., personality and physical or mental status), team factors (e.g., climate and spirit), organizational factors (e.g., rules, policy, and culture), social factors (e.g., administration, legislation and enforcement, and social culture) are considered to be accident causes since these factors have contributed to an accident. However, the influence of these factors is indirect—they make humans more likely to err. Thus, in the field of human error and reliability, these factors are termed performance-shaping factors (PSFs), error-forcing conditions (EFCs), common performance conditions (CPCs), and the like. These factors do not determine the occurrence of human errors but mostly play moderating effects on human performance. Some individual factors (e.g., experience and skills) and task factors (e.g., complexity and time limit) have direct or mediating effects on human performance and are regarded as accident causes. Whether a factor will play a direct or indirect role on human error can be dependent on the scenario. In addition, among factors there may be complicated interactions. In real situations, it is not easy to identify all the above effects and interactions. Without a deep understanding of human error mechanisms, the effects of various factors on human error will remain confusing, confounding, or uncertain. The above definitions of human error do not distinguish whether an action is intentional or not. Unintended actions (e.g., missing a warning message, jumping a step in a procedure) could be slips and lapses, while an intended action with unintended consequences is regarded as a mistake (Reason, 1990). The above cases should be considered to be human errors. In contrast, an intended action with intended consequences is criminal and is excluded from the taxonomies of human errors.
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Humans at positions and levels Figure 1 Safety of a complex system supported by three pillars.
HUMAN ERRORS AND HUMAN RELIABILITY
describe software. Software represents logic, which can be either correct or incorrect. It does not wear, perish, or break. Multiple copies with the same logic do not make software more reliable. It fails because invisible bugs (wrong logic) are triggered. If the conditions for the bugs are met, it will definitely fail; otherwise, it does not fail. Software for a complex system is usually complex in certain aspects. With the application of digital technology, hardware is simplified but software becomes more complex with countless invisible interfaces between functions. Thorough software testing is practically impossible, and thus there are always undiscovered bugs. So, the quality of a piece of software is only partly under control. However, once proved to work at certain conditions, it is dependable under these conditions. One critical issue that has to be pointed out is that it is vulnerable to computer viruses and network intrusions. The human pillar cannot be well-controlled and is not dependable. There are too many factors that could influence human performance, such as fatigue, stress, knowledge and experience, time limit, task complexity, workload, environment conditions, interface design, work organization, and so on. Apart from not completing their tasks, humans may do things leading to hardware failures and software malfunctions. Quite typically, a piece of software does not work as expected because humans give it the wrong inputs. Human errors appear not only at an individual level, but also at a team level (e.g., communication and collaboration) and managerial and organizational level (e.g., training, job arrangement, planning, regulations, and policies). The human aspect will remain the focus of safety for many years, decades and possibly centuries. It is always recommended to strengthen the hardware and software pillars as much as possible with feasible technologies at an acceptable cost, and minimize the reliance on the human pillar. The weaknesses of the human pillar and the complex nature of human role in a complex system can be well illustrated by a railway accident that happened in the early morning of the 28th of April 2008 on the Jiaozhou-Jinan railway in China, resulting in 70 deaths and 416 injured. The T195 train, running from Beijing to Tsingtao, derailed because of speeding at a section with a small radius of curvature. From a technological standpoint, trains in China cannot exceed speed limits because an operation monitoring and control system is used, which works very reliably, strictly following the given data of speed limits for all sections. The problem lay in the wrong speed limit data set up by the Beijing Railway staff for that section for the accident. However, such an error was produced and it is a complicated story. A high-speed railway line was to be constructed at JiaozhouJinan Railway. To meet the standard of high-speed rail transportation, a large bridge was needed at the intersection between the railway and the No. 309 national road. To continue the railway system, an S-shape casual line was built, with the smallest radius of curvature of about 400 meters. The casual line came into service in March 2008. The Jinan Railway Administration issued a temporary order of a speed limit of 80 km/h for the casual line, directly to all control centers, stations, locomotive depots, and trains. The order was strictly followed. Then in April, a new train operation diagram was to be issued on the 28th. The Jinan Railway Administration decided to label this speed limit a regular limit on the diagram and thus abolished the temporary order. The diagram, which took effect from 0:00 a.m. on the 28th of April, was released on the 23rd of April. The diagram was sent to the Beijing Railway Administration by public mail, but it was not received before the accident. An order to retract the temporary speed limit was issued on the 26th of April, but without any explanation of why it was revoked (the temporary speed limit was to be changed to a regular one). This order was received by the Beijing Locomotive Depot; thus, the temporary speed limit was removed from the operation monitoring and control system, meaning that the default speed
517
limit of 140 km/h was reapplied. Before the train went into the S section, other trains reported the inconsistency of speed data with the speed signs. The Jinan Railway Administration realized the problem and issued an urgent order regarding the speed limit for the section at 4:02 a.m., but the T195 train was not on the list of direct recipients. The dispatcher at Wangchun station, located just before the S section, received the order, but did not communicate the message to the train drivers of the T195 when it passed through. The communication of speed information between a station dispatcher and a locomotive driver is required in China, but it is not strictly followed. The operation monitoring and control system works so reliably that dispatchers dif not think it was necessary to carry out the communication order. The dispatcher at Wangchun station did not mean to overlook the urgent order, but he did his job in his usual way—not giving the “useless” and somewhat “silly” communication. He did not realize the importance of the urgent order, since the speed limit was the same as the previous one. The T195 train then approached the S section at a speed of 131 km/h with only one of the two drivers on duty. After a five-hour driving shift, the train drivers were tired, especially in the early morning (a common time for many accidents). It was said that the driver on duty failed to notice the T speed sign at 1400 m before the section, but at about 4:38 a.m. he noticed the yellow light at 800 m before the section, and he started to brake but it was too late. Nine of the 17 carriages were derailed. More disastrous than that was that two carriages were thrown onto the rail track where trains come from the opposite direction. Three minutes later, another train (5034) came from the opposite direction and hit the two carriages, although its drivers put on the emergency brake. Visibility was less than 300 m in the dark with the train lights on. The collision drastically increased the number of casualties. For easier understanding, Figure 2 presents the storyline of the accident. This catastrophic accident was obviously preventable. It would not have happened if the Jinan Railway Administration had not decided to label the speed limit a regular one on the operation diagram, if the Jinan Railway Administration had explained the reason why they cancelled the temporary speed limit, if the diagram had been sent to the Beijing Railway Administration by the official channel rather than the public mail, if the Beijing Locomotive Depot had checked with the Jinan Railway Administration before removing the speed limit, if the Jinan Railway Administration had sent the urgent order to the T195 train, if the dispatcher at Wangchun station had communicated with the train drivers, if more drivers had been on duty … Many accidents have similar “if” scenarios. We can see how errors can occur from individuals to organizational levels. This is also an example showing that reliable hardware and software do not guarantee safety because of the existence of humans in the system. Currently, the high-speed railway Chinese network has been mostly built and proved to be very safe. However, we can never ignore the human aspects and have to feel uneasy about safety. Humans learn from errors. Experience with errors can make one more dependable. Identifying people who are “error-prone” and removing them from the system or their workplace is not wise. The right thing to do is to understand why they err and then determine what measures should be taken. This is the most effective way to meet safety objectives. Here let’s go back to the argument that humans make errors because they are put in unfavorable conditions. Instead of blaming the people on site, they should be encouraged, and the people who create the unfavorable conditions should take responsibility (but not be judged). Humans do not just play a negative role in a complex system. Some tasks are very difficult for technology to implement, but are very easy for humans, such as differentiating a stone
518
DESIGN FOR HEALTH, SAFETY, AND COMFORT
March 2008 23rd of April 2008
26 th of April
Because of high-speed railway construction, an S-shape casual line was built to maintain the service of railway transportation The casual line came into service. Jinan Railway Administration issued a temporary order of a speed limit of 80 km/h New train operation diagram was released, changing the temp limit to a regular limit The new train operation diagram was sent to Beijing Railway Administration through public postal service but it was not received The temporary order was abolished without mentioning the change Beijing Locomotive Depot removed the temporary speed limit from the operation monitoring and controlling system. The default speed of 140 km/h was reapplied
27th of April 10:50 p.m.
The T195 train started off from Beijing Station
28th of April 0:00 a.m.
The new train operation diagram came into effect
4:02 a.m.
Other trains reported the inconsistency of speed data with speed signs Jinan Railway Administration issued an urgent order of the speed limit, but omitted to inform the T195 train The dispatcher at Wangchun station received the urgent order, but did not communicate the message to the train drivers
The T195 train approached to the S section at a speed of 131 km/h. Only one driver was on duty. The T speed sign at 1400 m before the S section was missed The driver noticed the yellow light at 800 m before the S section 4:38 a.m. and braked. 9 of the 17 carriages derailed, among which two were thrown onto the rail where trains came from the opposite direction Another train (5034) from the opposite direction came and hit 4:41 a.m. the two carriages 70 deaths and 416 injured Figure 2
Storyline of the Jiaozhou-Jinan Railway accident on April 28, 2008.
from a plastic bag on a railway line. Furthermore, humans can realize and recover from their own errors, while also being able to solve hardware and software failures. In unforeseeable conditions, humans have the intelligence to collect the necessary information, make a diagnosis, and solve the problem, while technological systems are designed to work only under predicted conditions. This is the reason why some critical steps in a safety-critical system always require (or at least allow) operators to make the final decisions. Thus, complex systems should also be designed for operators to play their positive roles better. In summary, humans play important but complicated roles in complex systems. Compared to hardware and software, humans are not predictable and dependable. When evaluating the risks of a system, the human contribution cannot be ignored. It is impossible to completely eliminate human errors, but in system design and operation, there are always human factors principles that can be followed to reduce human errors, mitigate their consequences, and help humans play their positive roles better. Human-centered design is such a philosophy “that aims to make systems usable and useful by focusing on the users, their needs and requirements, and by applying human factors/ergonomics, usability knowledge, and techniques” (International Organization for Standardization, 2019). Standards and guidelines are now available, but not well practiced in industries yet. Apart from human-centered design, the development of a
good management system and safety culture is also important, since they have been found to be the common cause of many human issues.
2 WHY HUMANS ERR It is essential to make it clear why human errors occur and the underlying factors that may trigger their occurrence. There are various classifications of human errors and we can distinguish three levels, including behavioral, contextual, and conceptual levels, which approximately answer the “What,” “Where,” and “How” questions about human errors, respectively (Reason, 1990). The behavioral-level classification is based on the observable characteristics of a human error, like the characteristics of an erroneous behavior itself and the immediate consequences. The contextual-level classification takes the contextual triggering factors into consideration, and gives an overview of the interaction between a human and the context. The above two levels are both valuable in practical application, but the drawback is also apparent in that they lack explanations on error mechanisms. The conceptual-level classification, based on human cognitive process, tries to infer the underlying cognitive mechanisms under the observable behaviors. In this section, we will concentrate on the cognitive mechanisms related to the occurrence of human errors. This theory foundation can serve as
HUMAN ERRORS AND HUMAN RELIABILITY
519
2.1 The Human Information Processing (HIP) Model To start this section, we present a holistic overview of the human information processing, as shown in Figure 3. This model depicts a common series of mental operations, which characterizes the information flow when an individual performs a task (Wickens et al., 2013). To put it simply, information in the environment is first sensed and perceived, and important information will be screened out and further mentally manipulated. Higher-level human thinking activities, such as problem-solving, reasoning, and decision-making, will then occur in the support of working memory, within which complex interactions are involved along with new perceived information and information from short-term memory and long-term memory. All stages and processes need resources, known as attention. Soon to be discussed, each stage in the information processing has specific cognitive mechanisms. Due to the cognitive characteristics and limitations of human beings, failures can occur in each stage, and lead to observable human errors. Though, readers should keep in mind that all these stages combine in a continuous cognition process with complex interactions and flexible orders, rather than being independent stages.
2.1.1 Sensation Sensation refers to “the initial detection of energy from the physical world,” including vision, hearing, taste, smell, and touch (Solso, MacLin, & MacLin, 2005). It is the first stage of human information processing, the initial detection of stimuli without other mental manipulation. There exists a threshold for each sense, though the thresholds are not constant and are related to various factors such as fatigue and practice. As Figure 4 presents, the lower the stimuli energy is, the less likely that human can detect the stimuli. Human errors will occur if the individual ignores some important information, because the stimuli are too weak. The degree of change that a human can detect also has a threshold, which depends on the strength of the original stimulus. This is known as Weber’s law. Our sensory system has its specific memory called sensory memory, known as “brief storage of information from each of the senses, in a relatively unprocessed form beyond the duration of a stimulus, for recoding into another memory or for comprehension” (American Psychological Association, n.d.c). The sensory memory for visual stimuli is called iconic memory, and for auditory stimuli, it is called echoic memory. The incoming information in sensory systems will disappear quickly if not processed further, for example, a period of 250 ms to 4s for iconic memory.
1.0 Probability of detection
the technical basis for further research, such as error detection, error prediction, error analysis, error correction, etc. There have been plenty of psychological studies focused on human cognition processes which could explain why and how human errs. We suppose that many readers of this handbook are practitioners rather than psychologists, thus a brief introduction to some classical cognitive theories is presented to provide a basic understanding of human cognitive processes. More details can be found in psychological publications such as Cognitive Psychology (Solso, MacLin, & MacLin, 2005) and Engineering Psychology and Human Performance (Wickens, Hollands, Banbury, & Parasuraman, 2013). In addition, the generic error modeling system (GEMS) proposed by Reason (1990) (see Section 2.2) and a macro-cognitive perspective of human errors are also presented. In a complex socio-technical system, various contextual factors can also lead to human errors, such as the design of human– system interface (HSI), the environment, the organization, etc. These factors can contribute to the failure of specific cognitive mechanisms, and ultimately lead to human errors. Thus, they are called performance shaping/influencing factor (refer to Sections 4 and 6.3 for details).
0.5
0 Threshold
Figure 4 Psychometric curve. (Source: Smith, 2008. © 2008 John Wiley & Sons.)
Attention Resources
Long-term Memory Selection Sensory Processing STSS
Working Memory Cognition
Perception
Response Selection
Response Execution
System Environment (Feedback) Figure 3
Stimulus intensity
Human information processing model. (Source: Wickens et al., 2013. © 2013 Taylor & Francis.)
520
DESIGN FOR HEALTH, SAFETY, AND COMFORT
2.1.2 Perception Perception refers to “the process or result of becoming aware of objects, relationships, and events by means of the senses, which includes such activities as recognizing, observing, and discriminating” (American Psychological Association, n.d.d). It involves higher-order cognition that enables the individual to interpret sensory information. Generally speaking, our perception consists of two process: Bottom-up approach (driven by sensory stimuli) and top-down approach (based on our previous knowledge), also known as the direct perception and constructive perception, respectively. Specific perceptual mechanisms of human beings give us the advantage of perceiving more than just what sensory stimuli indicate. This is a complex and flexible process, and will be influenced by many other factors such as the context and an individual’s prior knowledge. For instance, the same letter can be recognized as “A” or “H” in different cases in Figure 5. The internal mechanism to process these signals can be described as “heuristics” and “algorithm.” A heuristic is a guess based on the “rule of thumb,” while an algorithm is a set of rules to follow (Solso et al., 2005). Because we rely on heuristics, our brain can make errors if some mechanisms work inappropriately, which are often called an “illusion.” One famous example is the Müller-Lyer illusion, as shown in Figure 6. People usually view line B as a longer line, but in fact the lengths of the two lines are equal. Another example is the Kanizsa Triangle in Figure 7 called illusory contours. Individuals often see a triangle in the picture, while this triangle is whiter than the other region and floating above the background, though all these feelings do not exist physically. There are a number of explanations for such illusions, while what we should know is that this is the natural characteristic of human perception. Designers are supposed to consider these potential illusions in interface design to avoid perception errors. There are several theories that try to describe the cognitive process of perception. For example, for visual pattern recognitions, several models have been proposed, including templatematching models, prototype models, distinctive-features models, pandemonium model, Gestalt theory, etc. Here, we take the Gestalt principles for example, to present to our readers what we tend to perceive. Humans tend to perceive objects in an integrated pattern, known as the Gestalt principles. The Gestalt principles account for the nature of visual grouping, including proximity, similarity, continuity, closure, symmetry, figure/ground, common
Figure 7
Kanizsa Triangle. (Source: Kanizsa, 1955.)
fate, etc. (Palmer, 2002). Proximity refers to the tendency that we believe objects that are close to each other form one group; for example, in Figure 8 (B) we tend to view the eight dots as four groups. Similarity refers to the fact that we tend to group objects with similar characteristics together; for example, in Figures 8 (C), (D), and (E), we tend to group objects with similar color, size, and orientation together correspondingly. Continuity means that we tend to perceive objects as a continuous form rather than distinct parts; for example, in Figure 8 (I) we’re more likely to observe it as two continuous intersecting lines rather than two angles with one common vertex. Closure is when we tend to view objects as a closed form even if it is not closed sometimes; for example, in Figure 8 (J) we’re more likely to view it as two angles, and in this case, closure governs grouping rather than continuity. Symmetry refers to when we often view objects as simple and as symmetric parts; for example, in Figure 8 (G) we tend to group the 1st and 2nd,
A
No Grouping
B
Proximity
C
Similarity of Color
D
Similarity of Size
E
Similarity of Orientation
F
Common Fate
Figure 5 The effect of context on human recognition. (Source: Selfridge, 1955. © 1955 Association for Computing Machinery.)
G
H
Symmetry
I
J
Continuity Figure 6
Müller-Lyer illusion. (Source: Müller-Lyer, 1889.)
Parallelism
Closure
Figure 8 Examples of Gestalt principles. (Source: Palmer, 1999. Reproduced with permission of MIT press.)
HUMAN ERRORS AND HUMAN RELIABILITY
the 3rd and 4th, the 5th and 6th, the 7th and 8th lines together respectively while excluding the 9th line. Figure/ground means that we often divide objects into a primary part (figure) and a secondary part (ground) unconsciously, with the figure being allocated more attention. Common fate refers to the tendency when we group objects that move together for moving cases; for example, in Figure 8 (F) the moving direction determines our perception. Such perceptual characteristics enable humans to perceive more than what visual signals show, but can also lead to errors if functioning inappropriately. 2.1.3 Attention Attention is “a state in which cognitive resources are focused on certain aspects of the environment rather than on others and the central nervous system is in a state of readiness to respond to stimuli” (American Psychological Association, n.d.e). It is known as one of three main limits of human information processing, along with storage-memory and speed-response time (Wickens & McCarley, 2007). To help our readers better understand attention, we present a metaphorical illustration, as shown in Figure 9. On one hand, attention serves as a filter to screen relevant and useful stimuli out. Two or more channels could be included if divided attention is needed. On the other hand, attention also serves as the fuel for information processing, and is a limited resource. Attention can be divided into several varieties: focused, selective, divided, and sustained (Wickens & McCarley, 2007). Studies have confirmed that humans are good at processing one of two concurrent sources of information, while the information from the non-selected source can “break through” (Reason, 1990). However, there are many tasks that require humans to deal with two or more concurrent sources of information. Related studies and experiments have been concerned with human abilities when faced with two concurrent tasks, and results indicate that dual-task interference can occur at different stages in the information processing sequence. This interference can occur in several ways, such as the complete pause in one task, a reduced response to one task, and some cross-walk errors (Reason, 1990). Because of the inadequate channel capacity, humans cannot process all cues from sensory systems,
521
and can only attend to a portion of cues selectively, known as selective attention (Solso et al., 2005). Three classical theories for selective attention are compared in Figure 10. In brief, the filter theory (Broadbent, 1958) points out that before signals are processed, they will first pass through a selective filter, which will screen the signals according to their physical characteristics. This is also known as the early selection model. The attenuation model (Treisman, 1960) argues that the unattended information is not completely filtered out, but rather, reduced to a weak signal which can still be processed at a later time. The late-selection model (Deutsch & Deutsch, 1963) gives another explanation that both attended and ignored information are processed equivalently in the perceptual system, because the capacity limitation is in the later response system. In spite of where the bottleneck lies, human attention resources are limited, and can only choose some information to process. Some models try to reveal the important focuses in attention, such as the SEEV model (Wickens et al., 2001) and the N-SEEV model (Wickens, McCarley, & Steelman-Allen, 2009). Human errors will occur if the individual fails to attend to some important cues due to the failure of the attentional process. Several specific types of attention failure have been identified and studied, such as inattentional blindness (Mack & Rock, 1998) and change blindness (Simons & Levin, 1997). 2.1.4 Memory Memory is a complex process including encoding, storage, and retrieval. Apart from the sensory memory introduced in Section 2.1.1, there is also short-term memory (STM) and long-term memory (LTM) in the human memory system. Sensory memory is the storage of stimulus in our sensory systems, and it can only store information for seconds. The information is then stored in STM, for a period of 10–30 seconds (American Psychological Association, n.d.c) and with a limited capacity (Miller, 1956) like the famous 7±2 principle. Information highly encoded and structured is stored in LTM, which has a limitless store theoretically (Miller, 1956). The content stored in LTM can be divided into three categories: episodic memory (about experience and events), semantic memory (general knowledge), and procedural memory (about the relation of stimuli and response).
TOP DOWN Expectancy
Value
Multiple Resource Model of Divided Attention The Fuel Resources
Events
Filter
Limits of Multi-tasking
Information Processing Perception Cognition Action
Salience Effort
BOTTOM UP Figure 9
A simple model of attention: The filter and the fuel. (Source: Wickens and McCarley, 2007. © 2007 Taylor & Francis.)
522
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Broadbent’s filter theory
Treisman’s attenuation theory
Sensory register
Selective filter
Sensory register
Attenuator
Short-term memory
INPUT
Short-term memory
Limited capacity
INPUT
Sensory register Deutsch & Deutsch theory Figure 10
Short-term memory
INPUT
Comparison of three selective attention theories. (Source: Eysenck and Keane, 2015. © 2015 Taylor & Francis.)
A natural characteristic of our memory system is to forget. Forgetting can be caused by several mechanisms, like encoding failure (failing to encode materials into LTM), consolidation failure (poorly formed memories due to organic disruption), amnesia (caused by problems in the brain), decay (fading of memory over time), interference (confusion of similar memories), retrieval failure (inability to find the necessary cues for retrieval), motivated forgetting (to avoid dealing with experiences), etc. (Solso et al., 2005). Reason (1990) has identified two important retrieval mechanisms: Similarity-matching (matching of the calling condition in a question to the attributes of knowledge stored in semantic memory) and frequency-gambling (preferring more frequent items). Another case is false memories. The information stored in memory is pieced together rather than an exact replica from the real world, and this construction process can be influenced by many factors such as prior experience, post-event information, perceptual factors, social factors, and one’s desire (Solso et al., 2005). Any negative influences from these factors can lead to a constructed but incorrect memory. Another important component in memory system is the working memory. The term refers primarily to the model introduced by Baddeley and Hitch (1974) for “the short-term maintenance and manipulation of information necessary for performing complex cognitive tasks such as learning, reasoning, and comprehension” (American Psychological Association, n.d.a). Working memory mainly includes three components when first identified: (1) a limited capacity central executive, which is the control system to allocate attention and coordinate; (2) a phonological loop to temporarily manipulate and store voice information; and (3) a visuospatial sketchpad to temporarily manipulate and store visual and spatial information. Another component called the episodic buffer is added later (Baddeley, 2000), which integrates information about one stimulus or event from different subsidiary systems. The complete working memory model is shown in Figure 11. Working memory is different from STM. In STM, information is only stored while in working memory, the information is not only stored but also mentally manipulated. Working memory has a limited capacity,
Central executive
Visuospatial sketchpad
Episodic buffer
Phonological loop
Visual semantics
Episodic LTM
Language
Figure 11 Working memory model. (Source: Baddeley, 2000. © 2000 Elsevier.)
and is refreshed by new information all the time. Important information manipulated in working memory may be interfered with due to the limited capacity, and lead to some cognitive human errors. 2.1.5 The Organization of Knowledge Knowledge is the well-organized and structured information stored in the LTM. There have been several models to explain how knowledge is organized and stored in the brain. For example, the semantic feature-comparison model (Smith, Shoben, & Rips, 1974) points out that the knowledge is stored in memory based on semantic features. The network model (e.g., Quillian, 1966) shapes the storage of knowledge as a network structure that connects independent units. The schema theories (e.g., Rumelhart & Ortony, 1976) borrow the term “schema” from computer science and model the stored knowledge as a slot structure, in which slots are various attributes of an entity. Another important construct is the mental model, known as the internal mental representation of the system (Gentner & Stevens, 1983). The purpose of the mental model is to “allow
HUMAN ERRORS AND HUMAN RELIABILITY
the person to understand and to anticipate in the behavior of a physical system” (Gentner & Stevens, 1983). A good mental model is the essential knowledge basis to perform a task. Studies have found that, the quality of a mental model is a qualified predictor for human performance (e.g., Lyu & Li, 2019). 2.1.6 Problem-Solving In general, problem-solving can be viewed as a process of searching a problem space, which consists of various states. The initial problem is the starting state; the goals are the goal state; problem-solving operators are the ways that can change a current state to the other state in the problem space (Anderson, 2014). It can be imagined that the problem-solving process for a problem solver is to find a series of operators, to go from the starting state to the goal state. When a problem solver encounters a new problem that was never encountered before, he has to find new operators to solve this problem. One typical operator method is the difference-reduction method, also called hill climbing, which means that the problem solver takes steps to reduce the difference between the current state and the goal state (Anderson, 2014). However, this method cannot be guaranteed to be effective as the problem solver may arrive at a local optimal solution rather than the global optimal solution. Another well-known method, the means-ends analysis, seems to overcome this shortcoming to some extent. The biggest difference of the two methods is that the means-ends analysis tries to identify the biggest difference from the current state to the goal state, and will not abandon a temporary unavailable pathway. If one path to achieve the original goal is blocked, a new sub-goal will be established to make that path available again rather than just abandon that way (Anderson, 2014). The problem-solving process can be influenced by several factors. An expert can perform better in many aspects of problem solving compared to a novice. Benefiting from their problem-specific knowledge, experts can view and solve a problem at a deeper level (Galotti, 2007). However, experience can also bias an individual’s preference for certain operators, known as the mental set, which refers to “the tendency to adopt a certain framework, strategy, or procedure or, more generally, to see things in a certain way instead of in other, equally plausible ways” (Galotti, 2007). Though a mental set can help the problem solver solve a class of problems more easily, they can find it nearly impossible to solve a problem if stuck in the wrong mental set. An example of a mental set is known as the functional fixedness, which refers to a creative ability to utilize available resources in a novel way. One typical experiment is Maier (1931), in which subjects were required to tie two cords hung from the ceiling. The distance between the two cords and the length of the cords prevented subjects from easily grasping both cords at the same time. Several objects such as poles, ring stands, clamps, pliers, extension cords, tables and chairs were provided. The experiment was concerned with the most challenging solution (i.e., tying a weight on the end of the cords and make it a pendulum), however, less than half of the participating subjects were able to reach this solution within a limited time. Lastly, the type of problem (insight vs. non-insight) can also influence problem-solving. The insight problem can be described as “solved by a sudden flash of illumination” (Metcalfe & Wiebe, 1987). Though looking like a routine problem sometimes, the insight problem can be difficult for the problem solver because a familiar solution process is not necessary the correct one to follow (Chu & MacGregor, 2011). 2.1.7 Reasoning Deductive reasoning and inductive reasoning are the two major forms of reasoning, while inductive reasoning cannot guarantee
523
the exactitude of results. A commonly used deductive reasoning is the conditional reasoning, which refers to the logic of “if-then,” with the “if” part named the antecedent and the “then” part named the consequent. Two basic rules guide the inference, including the modus ponens and the modus tollens (Anderson, 2014). Suppose if A, then B, the modus ponens indicates that given A is true infers B is true, while the modus tollens indicates that given B is false infers A is false. Despite these two rules, another two, i.e., affirmation of the consequent and denial of the antecedent, are also accepted sometimes but the results exhibited are invalid. Deductive reasoning is a complex cognitive activity, and incorrect conclusions may be reached sometimes which may lead to human errors. There are several cognitive biases which can inpact the reasoning process, for example, the belief bias and the confirmation bias. The belief bias effect refers to the tendency that the reasoning is influenced by one’s prior knowledge and people tend to endorse what they believe as valid arguments (Sternberg & Leighton, 2004). The confirmation bias (e.g., Wason, 1968) indicates that humans tend to confirm a hypothesis rather than falsify it. 2.1.8 Judgment and Decision-Making The probabilistic estimation of uncertainty has been well studied in probability theories. One typical theory is known as the Bayes’ theorem, which describes the relationship between prior probability, conditional probability, and posterior probability. However, humans are not skilled in probabilistic judgments. One typical bias is the base-rate neglect, which means that people often fail to consider the effect of prior probability in some cases (Anderson, 2014). In the experiment by Kahneman and Tversky (1973), for example, participants were divided into the engineer-high group (told that one object person was selected from a group consisting of 70 engineers and 30 lawyers) and the engineer-low group (told that one object person was selected from a group consisting of 30 engineers and 70 lawyers). Then, one description of the object person “Dick” was provided: Dick is a 30-year-old man. He is married with no children. A man of high ability and high motivation, he promises to be quite successful in his field. He is well liked by his colleagues. This description was uninformative and the posterior probability of his profession should be equal to the prior probability. However, the median estimates for the probability of him being an engineer are 50% from both groups. That is to say, participants failed to consider the prior probability. Another example is given in Hammerton (1973), in which participants were required to estimate the probability of the presence of a disease. In the settings, roughly 1% of people from a population had the disease. One detection method was developed, and for real sufferer, the probability of positive detection result was 90%, while for normal people there still existed a 1% chance of positive detection result. The median estimated result from participants was 85%, while the real probability was around 48%. The above two examples illustrate the underestimation of the prior probability, while sometimes the evidence can also be underestimated, known as conservatism (Edwards, 1968). The general decision-making process can be shown in Figure 12, in which many basic cognitive functions such as perception, attention, LTM, and working memory are involved. Perception here often acts as the estimation of cues, and has systematic biases when perceiving proportions, projections, and randomness from the outside environment (Wickens et al., 2013). Selective attention will play an important role in filtering due to the limitations of attention resources, and can
524
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Attention Resources Effort Meta-cognition Long-term Memory Con fi
rma
tion
Selective Attention Situation Clue Filtering Awareness Cues Senses
Environment
Figure 12
Perception
Options Risks (Values)
Working Memory and Cognition Diagnosis H1 H2 ...
Choice Response Selection
Response Execution
The decision-making process. (Source: Wickens et al., 2013. © 2013 Taylor & Francis.)
lead to several vulnerabilities such as cue missing, information overload, salience bias, as-if heuristic, etc. Later, cues will be further integrated to form the awareness or assessment of current situation, sometimes called “diagnosis.” This process is full of heuristics and biases, such as the representativeness heuristic, the availability heuristic, the anchoring heuristic, the confirmation bias, the overconfidence bias, and so on (Wickens et al., 2013). Such characteristics enable people to perform accurate judgments and assessments in a complex situation, but can also force belief in a wrong result. After an assessment is formulated, decision-makers must make a choice to decide how to act. Often, the value as well as the probability of choices is needed for consideration. One typical explanation for decision-making under uncertainty is known as the utility. As shown in Figure 13, the utilities for gains and losses are different. Different individuals have different utility functions. Inappropriate decisions may be made due to the inappropriate estimation of their utility and probability. Some researchers have summarized various cognitive biases, for example, Illankoon and Tretten (2019) and Heick (2019). 2.2 GEMS Reason (1990) has proposed two control modes (attentional mode and schematic mode) and two cognitive structures (workspace and knowledge base) in human cognition. The attentional mode refers to the “controlled or conscious processing,” while the schematic mode refers to the “automatic or unconscious processing.” The cognitive activities are organized
VALUE
LOSSES
Long-term Working Memory
GAINS
Figure 13 A hypothetical value (utility) function. (Source: Kahneman and Tversky, 1984.)
and controlled by these two different control modes. Correspondingly, the workspace or working memory is related to the attentional mode, while the knowledge base is related to the schematic mode. The attentional control mode represents a problem-solving process which requires humans to pay cognitive effort to analyze a situation and think about how to deal with the problem. As Reason (1990) mentioned, cognitive activities under this control mode are very slow and require efforts to be made, as there is no template for people to follow and the response depends heavily on the specific situation. Such activities can be a problem diagnosis and solving process, in which people are expected to find out what the problem is and where the root causes lie, and decide how to perform actions to deal with the situation in real constraints. This process requires people to make the best use of their knowledge and come up with a response plan by themselves, and therefore is a strict cognitive process. Compared with the attentional control mode, the schematic control mode has less requirements on the cognitive process, as this control mode mainly focuses on people’s cognition related to recurrent experiences. When people have collected experiences about some patterns of work, then the schematic control mode is more likely to be activated when people encounter some familiar or similar situations. In such cases the cognitive process is not to build an absolutely novel solution, but to match and combine previous experiences. Because previous experiences can serve as the template for people to deal with new situation, this control mode usually works rapidly. Based on the SRK framework (Rasmussen, 1983), Reason (1990) proposed GEMS to represent an integrated picture of error mechanisms, as shown in Figure 14. In GEMS, it is assumed that human behaviors lie in skill-based, rule-based, and knowledge-based levels. The skill-based and rule-based behaviors are feedforward. Routine actions in a familiar environment are usually skill-based behaviors. Such actions have low requirements on mental efforts, and can be completed automatically. Similarly, the individual performs rule-based behaviors under the guide of rules, and this process also has low requirements on mental cognition. These two behaviors are controlled by the automatic units stored in the knowledge base, and thus belong to the schematic control mode. When the situation is found to be beyond the pre-stored problem-solving routines, the individual has to deal with the problem by starting a slow and effortful cognition process, i.e., knowledge-based behaviors. Knowledge-based behaviors are under the attentional control mode. Under the GEMS, Reason has identified a set of
HUMAN ERRORS AND HUMAN RELIABILITY
525
SKILL-BASED LEVEL (Slips and lapses) Routine actions in a familiar environment YES OK? Attentional checks on progress of action RULE-BASED LEVEL (RB mistakes)
GOAL STATE
OK? NO
YES NO
Problem
IS PROBLEM SOLVED?
Consider local state information IS THE PATTERN FAMILIAR?
YES
Apply stored rule IF (situation) THEN (action).
NO KNOWLEDGEBASED LEVEL (KB mistakes)
Find higher level analogy NONE FOUND Revert to mental model of the problem space. Analyze more abstract relations between structure and function.
Infer diagnosis and formulate corrective actions. Apply actions. Observe results, ... etc.
Subsequent attempts Figure 14
An outline of Generic Error Modelling System. (Source: Reason, 1990. © 1990 Cambridge University Press.)
error modes in each performance level, which are summarized in Table 1. These error modes reflect how people make errors in different performance levels. 2.3 Human Cognition Processes from a Macro-Cognitive Perspective The development of macro-cognition is to “indicate a level of description of the cognitive functions that are performed in natural decision-making settings” (Klein et al., 2003). This term should be distinguished from “micro-cognition.” Studies on micro-cognition usually follow standard laboratory paradigms, and focus on the basic building blocks of a cognition process. However, in the real world, the context is more complex and not as well controlled as in the laboratory environment. According to Klein et al. (2003), features of the real-world situation include: (1) large amounts of information and complex decisions; (2) decisions are often made under time pressure and with high risks; (3) participants are real domain practitioners rather than experiment subjects; (4) there are ill-defined goals sometimes; and (5) many conditions are hard to control.
Given such a context, researchers who focus on problems in the real world are more interested in the macro-cognition of people. Take the decision-making strategies as example, normative models, such as utility theory, are based on the assumption that people have a thorough understanding of the problem, and have enough time to think of and compare all possible alternatives. However, in the real-world context, under various constraints people usually cannot get to the best choice but make an acceptable decision. To deal with such situations, some macro-cognition models have been developed, such as the recognition-primed decision (RPD) model (Klein, 1993) for the purpose of understanding how experienced decision-makers generate effective plans under these constraints. The fact is that the RPD model can be viewed as a combination of three different decision heuristics (micro-cognition), including the availability heuristic, the simulation heuristic, and simulation representativeness heuristic (Klein, 1993). The availability heuristic refers to the fact that people often judge the possibility of some items by how easily they can recall similar items in their memory. The representativeness heuristic indicates the fact
526 Table 1
DESIGN FOR HEALTH, SAFETY, AND COMFORT Error Modes in Each Performance Level
Performance level Skill-based performance
Error mode Inattention
Over-attention
Rule-based performance
Misapplication of good rules
Application of bad rules Knowledge-based performance
Selectivity Workspace limitations Out of sight out of mind Confirmation bias Overconfidence Biased reviewing Illusory correlation Halo effects Problems with causality Problems with complexity
Double-capture slips Omissions following interruptions Reduced intentionality Perceptual confusions Interference errors Omissions Repetitions Reversals First exceptions Countersigns and nonsigns Informational overload Rule strength General rules Redundancy Rigidity Encoding deficiencies Action deficiencies
Problems with delayed feedback Insufficient consideration of processes in time Difficulties with exponential developments Thinking in causal series not causal nets Thematic vagabonding Encysting
Source: Reason 1990. © 1990 Cambridge University Press.
that people often judge the possibility of some items according to how consistent the item is with their expected characteristics in their memory, and the simulation heuristic is that people often judge the likelihood of an event based on how easily they can mentally picture the event. From the above discussion, we can find that the microand macro-cognition are not in conflict. They are two different perspectives when analyzing human cognition process, and there is no obvious boundary between the two perspectives. In practice, we can distinguish them by observability and measurability. If a process can be directly observed and measured, we can view this process as macro-cognition, otherwise it is micro-cognition. Several macro-cognition models have been developed to reflect the human cognition processes, including Klein et al.’s model (2003), Patterson and Hoffman’s model (2012), Whaley et al.’s model (2016), etc. In these macro-cognition models, the main building blocks or information processing stages of human cognition processes are represented by macro-cognitive functions. Here, a brief introduction is given to the macro-cognition model proposed by Whaley et al. (2016). It is the cognitive basis
of the Integrated Human Event Analysis System (IDHEAS) developed by the National Research Council (NRC) recently. The macro-cognition model by Whaley et al. (2016) includes five macro-cognitive functions to describe operators’ cognitive processes and behaviors in the context of MCRs in NPPs, including detecting and noticing, understanding and sensemaking, decision-making, action, and teamwork. Detecting and noticing are the processes when people perceive information from the environment, and carry out a preliminary screening for further mental processes. Understanding and sensemaking refer to the process that people use to obtain the meaning from the information perceived, ranging from automatic recognition to effortful thinking processes. Decision-making is the process of goal selection, planning, re-planning and adapting, evaluating options, and selection. Action is the process where people physically manipulate predetermined manual actions. Teamwork focuses on the process that people use to interact with each other in an operating crew in order to complete one task together. The relationship between each macro-cognition function is parallel and cyclical, and each function overlaps and interacts with the other functions.
HUMAN ERRORS AND HUMAN RELIABILITY
2.3.1 Detecting and Noticing The detecting and noticing functions mainly include three pathways: The pre-attentive pathway, including visual segmentation and pop-out of salient stimuli; the bottom-up pathway, which is a stimulus-driven process; and the top-down pathway, which is a memory-driven process. In this macro-cognitive function, several micro-cognition processes are related, mainly including visual signal processing, segmentation and pop-out, visual feature perception, and pattern and object integration. In the detecting and noticing functions, errors can occur due to the failure of the following mechanisms: 1. Cue content. The type and quality of cues will influence this function. For example, an alarm must be salient enough either in visual characteristics or audio characteristics, so that the operator can easily detect and perceive such information, especially in an information-overload scenario. 2. Vigilance. Vigilance is “a state of extreme awareness and watchfulness directed by one or more members of a group toward the environment, often toward potential threats” (American Psychological Association, n.d.b). For operators in a Main Control Room (MCR) of an NPP, monitoring is a routine task, and it is important for operators to keep a level of vigilance in order to detect any abnormal conditions on time. The decline of vigilance can result from mental fatigue, workload, stress, etc. 3. Attention. Attention is an important part in detecting and noticing, as it will directly guide cognitive resources on where to focus. Since one’s cognitive resources are limited, how to allocate attention appropriately can have a huge influence on human performance. 4. Expectation. When the top-down process of perception is activated, the process is mainly driven by knowledge. All MCR operators of NPPs are well trained, and many of them have years of working experience. Their knowledge and experience will lead them to an expectation when encountering a situation. The correct expectations can help them detect and solve problems quickly, whereas the wrong expectations will bias their cognitive process. 5. Working memory. Working memory has limited capacity. If the current situation is complex and large amounts of information are presented, operators’ working memory will be challenged. 2.3.2 Understanding and Sensemaking The cognitive process underlying the understanding and sensemaking function is the data-frame theory proposed by Klein et al. (2007). The data are pieces of information that are detected and perceived by the operator. The frame, according to Klein et al. (2007), is the knowledge structure or mental cognitive model of operators, which can help operators understand the current situation and guide operators to find new information. The complete framework of sense-making activities in the data-frame theory consists of several stages: 1. Selecting an initial frame to match the data. This is finished by pattern matching and recognition, and often in preconscious stages/states. As a starting point, the initial frame will have a huge influence on the understanding process. 2. Elaborating the frame in more detailed levels. One may need to find more detailed information from the environment to get an accurate matching.
527
3.
4.
5. 6. 7.
Questioning the frame. When unforeseen situations occur, such as data not matching the frame or new information being inconsistent with what’s being expected, one will question whether the frame is appropriate. Preserving the frame. If the reason of a contradiction is not found, one alternative is to preserve the frame, and ignore the contradiction. However, such a choice may lead to misunderstanding. Comparing frames to determine which frame can best fit the current situation. Often, people can compare no more than three frames at the same time. Re-framing. When a frame is selected but cannot fit the current situation, one needs to replace the frame, re-frame and find new information based on that. Seeking a new frame. If a frame is totally rejected, one has to find a new frame to interpret the current situation.
In the understanding and sensemaking functions, errors can occur due to the failure of the following mechanisms: 1.
2.
3.
4.
Working memory. Working memory is an important mechanism underlying understanding, as the information perceived and knowledge retrieved are all manipulated here. The failure of working memory due to time limit or limited capacity can lead to errors in understanding. Attention. Attention is also important for understanding as in this process, individuals need to continuously attend to new information to form their situation awareness. The failure or inappropriate control of attention will directly influence the quality of situation awareness and the comprehension process. Frame. The frame is stored in the LTM, and serves as an important guide for individuals to understand the meaning of cues and search for more information. The quality of the frame determines the quality of comprehension. It is quite difficult for someone with a wrong mental model to gain the right comprehension in a complex situation. Mental manipulation processes. In the data-frame theory, many mental activities are essential for seeking a frame, matching information with frames, re-framing, etc. It is natural that these mental activities may fail, and any failures of these activities may lead to a wrong understanding result.
2.3.3 Decision Making One representative model to describe how people make decisions in the real-world context is the RPD model proposed by Klein (1993), which consists of three phases: situation recognition, option evaluation, and mental simulation. An experienced person will first assess the situation and judge whether he has encountered it before. After that, in the second stage, several alternative solutions will be generated and he or she will compare all these alternatives and decide which one to choose. Then, a mental simulation is conducted to evaluate the possible results if this plan was executed. If the result of the mental simulation is accepted, then a plan of series of actions will be generated to guide manual execution. In the decision-making function, errors can occur due to the failure of the following mechanisms: 1.
The goal selection process. The selection of goals is the aim of decision-making, and decision-making will fail if the goal is wrongly set. Goal selection can fail in several ways, such as incorrect goals, incorrect prioritization, and incorrect criteria for judgment.
528
DESIGN FOR HEALTH, SAFETY, AND COMFORT
2. The pattern matching process. Another important phase in decision-making is pattern matching, which refers to the process that experienced individuals compare current situations to their mental model, and formulate a response plan based on their experience. The pattern matching process can fail in several ways, for example, failing to retrieve experience, incorrect recall of experience, incorrect comparison with current situation, etc. 3. The mental simulation process. After the individual has set a goal and planned a series of response actions, he/she is supposed to conduct a mental simulation to evaluate the feasibility and effectiveness of this decision. Specific failure forms include inaccurate portrayal of action, incorrect inclusion of alternatives, inaccurate portrayal of system response, etc. 2.3.4 Action Manual actions also require a cognitive process to control the sequence of actions. From a neurophysiological process perspective, multiple brain cortical regions are involved and connected in a network, and there are mainly three pathways: (1) the hierarchy pathway involving movement programming, storing, and sequencing, and movement execution; (2) action automaticity without occupying the brain with the low-level details required; and (3) sensory feedback. In this function, errors can occur due to the failure of the following mechanisms: 1. Control selection. Errors occurring in the actioninitiating stage are related to control selection. Several neurophysiological mechanisms can lead to such errors, including monitoring information to initiate actions, detecting feedback to correct actions, etc. 2. Cognitive issues for action sequencing. The execution of actions needs a cognitive process to control it, and failure of such cognitive processes can cause action errors. The failure of cognitive processes can result from task interferences, memory limitations, attention, compatibility, etc. 3. Physiological issues for action execution. For the action execution, action control mechanisms can only encode the direction and amplitude, while the continuous execution requires a continuous sensory feedback. Physiological mechanisms that can lead to errors include automaticity control, habit intrusion, population stereotypes, motor learning, etc. 2.3.5 Teamwork The macro-cognition model by Whaley et al. (2016) identifies three major basic teamwork functions: (1) communication; (2) coordination; and (3) collaboration. Communication is the process by which team members exchange information
with each other. Coordination refers to the organization of activities of team members. Collaboration refers to the process that several team members work together to achieve a goal. Any individual errors that impacts these functions will lead to team errors. 2.3.6 The Overall Framework The overall cognitive framework in the macro-cognition model by Whaley et al. (2016) is presented in Figure 15. The macro-cognitive function is the first level of the framework. The cognitive mechanism is “the process by which macro-cognitive functions work” (Whaley et al., 2016), and includes many basic concepts in human cognition. One example of cognitive mechanisms is the working memory, which is a very important part in human information processing. The cognitive mechanism may fail to work in some cases, for example, working memory overload which will lead to misunderstanding. Whaley et al. summarized these possible outcomes caused by the failure of cognitive mechanisms, and grouped them into readily identifiable types of failures of macro-cognitive functions, named “proximate causes.” That is, a proximate cause is the result of cognitive mechanism failures, and leads to the occurrence of a human error as a consequence. The performance influencing factors (PIFs) refer to all possible contextual factors that can have an impact on human performance, also called performance-shaping factors (PSFs) in other studies. These contextual factors can have positive or negative impacts on the working of cognitive mechanisms.
3 HUMAN ERROR CLASSIFICATION, PREDICTION, DETECTION, AND ANALYSIS Human error classification is fundamental to human error research, investigation, prediction, detection, analysis, and control. In the past decades various classification ideas have been proposed, meanwhile techniques for human error prediction, detection, and analysis have been developed. Some classic ones are introduced in this sub-section. 3.1 Human Error Classification As stated in Section 1.2, there are different definitions of human error; some researchers define it from the aspect that people fail to achieve the goal of a task, while others define it from the view that people lead to some (potential) undesired consequences. Accordingly, there are also different ways to categorize human errors. The two most cited models are James Reason’s model and Jen Rasmussen’s model. These two models are also the source of many later developed human error classification models (Sutcliffe & Rugg, 1998). Suggested by Reason (1990), human failures can be divided into two broad categories, i.e., errors and violations. Errors belong to unintentional failures and can be categorized as slips, lapses, and mistakes. Slips are “potentially observable as externalized actions-not-as-planned”;
PIF 1 Mechanism A …
Proximate Cause 1 Failure of Macrocognitive Function
… …
Figure 15
The overall cognitive framework of the macro-cognition model. (Source: Modified from Whaley et al., 2016.)
HUMAN ERRORS AND HUMAN RELIABILITY
lapses involve failures of memory; while mistakes are “deficiencies or failures in the judgmental and/or inferential processes involved in the selection of an objective or in the specification of the means to achieve it.” Violations are defined as any conscious behavior that does not follow rules, procedures, instructions, or regulations for safe operation. An example of violation is, in the blowout accident of a gas well in Chongqing in 2003, the field engineers had realized that there was no back-pressure valve (used to prevent blowout), but they violated the safety regulation and continue operating the drilling rig. Finally, the blowout accident happened, which caused 243 fatalities and over 2000 injured people. Rasmussen’s model categorizes human errors into three kinds of human performance level: skill-based level, rule-based level, and knowledge-based level (Rasmussen, 1983). The explanation for the performance level of skill-based, rule-based, and knowledge-based was described in Section 2.2. Within each of the three performance levels, errors can be further classified into different modes (Reason, 1990). Skill-based performance has two failure modes, one is inattention, another is over-attention; the failure modes under rule-based performance are misapplication of good rules and application of bad rules; knowledge-based performance has more failure modes, i.e., selectivity, workplace limitations, out of sight or out of mind, confirmation bias, overconfidence, biased reviewing, illusory correlation, halo effects, problems with causality, and problems with complexity. The relationship between Rasmussen’s model with Reason’s model is: slips and lapses are generally blamed for failures in skill-based performance; the failures in rule-based or knowledge-based performance cause mistakes (Reason, 1990). Partly based on Reason’s and Rasmussen’s taxonomies, Senders and Moray (1991) provided three taxonomies of human error as phenomenological taxonomies (phenotypes), cognitive mechanism taxonomies (genotypes), and taxonomies for biases or deep-rooted tendencies, which can be explained as “what happened,” “how it happened,” and “why it happened,” respectively. Phenomenological taxonomies refer to observable events. The classification of “what happened” normally consists of action problems, information issues, and mismatched problems. Action problems involve actions or checks made too early or too late, actions or checks omitted or partially omitted, too much or too little of an action, and actions that too long or too short; mismatched problems could be the right actions or checks on the wrong objects, the wrong actions or checks on the right objects, actions in the wrong direction, and so on; while information issues involve information not obtained and wrong information obtained. Cognitive mechanism taxonomies deal with human error according to the different stages of human information processing. These stages include discrimination, input information processing, recall, inference, and physical coordination. Table 2 illustrates how errors can happen in each stage. The third taxonomies of deep-rooted tendencies are illustrated in Table 3. Another human error classification method is the Technique for the Retrospective and predictive Analysis of Cognitive Errors (TRACEr) which is developed based on the human information processing model and simple model of cognition (Graziano, Teixeira, & Soares, 2016). It is a structured method that initially includes three main taxonomies or classification schemes: (1) the context of the incident; (2) the production of the error (operator context); and (3) the recovery of the incident. Within each main group, there are further divisions. Table 4 illustrates the detailed categories. Other human error taxonomies can be found in Meister (1962, from Dhillon, 2003) as installation errors, assembly errors, inspection errors, design errors, operator errors, and
529 Table 2
Example of “How It Happened”
Stages
“How it happened”
Discrimination
Stubborn Familiar short-cut Familiar pattern not recognized Not receive information Misunderstanding Improper Assumption Forget Memory error Realities or side effects not considered Movement variability Spatial disorientation
Input information processing Recall Inference Physical coordination
Table 3
Example of “Why It Happened”
Performance level
“Why it happened”
Skill-based error
Too frequent in use Environmental control signals Shared schema properties Mismatching Overconfidence Oversimplification Wrong selection Insufficient working memory Out-of-sight-out-of-mind Matching bias Mental model issue
Rule-based error
Knowledge-based error
Table 4
Taxonomies in TRACEr
Context of the incident
Operator context
Error recovery
Task error Error information Casualty level External error mode Cognitive domain Internal error mode Psychological error mechanism Performance shaping factors Error recovery
Source: Graziano et al., 2016. © 2016 Elsevier.
maintenance errors; by Payne and Altman (1965) as input errors, mediation errors, and output errors; by Swain and Guttman (1983) as omission errors and commission errors; by Taylor-Adams and Kirwan (1995) as external error mechanisms, internal error mechanisms, psychological error mechanisms, and performance shaping factors; by Sutcliffe and Rugg (1998) as timing, action/sequence, force, duration, direction, and object errors; and by Shappell and Wiegmann (2001) as decision errors, skill-based errors, and perceptual errors. Based on different human error taxonomies, researchers recently have combined some of them together to form newly integrated human error classification model. The Korean
530
Institute of Nuclear Safety constructed an integrated model to classify human errors into three groups (Cho & Ahn, 2019). Group I is Swain and Guttman’s taxonomy which contains errors of omission and errors of commission; Group II is Reason’s model which includes mistakes, slips/lapses, and violations; and Group III covers latent errors and active errors. This integrated human error classification has effectively supported facility failure and human error analysis for NPPs in South Korea (Cho & Ahn, 2019). 3.2 Human Error Prediction Humans are hardly predictable in a general view, while specific human errors may be predictable for some given situations. In order to predict human errors, a detailed task analysis is necessary to recognize mismatches between demands from a task and the operator’s capabilities to implement this task (Senders & Moray, 1991). Among a variety of task analysis techniques, two well-known and widely-used methods are Hierarchical Task Analysis (HTA) and Cognitive Task Analysis (CTA). HTA has been successfully used in a variety of applications since its first publication in 1967 (Stanton, 2006). HTA is designed to understand and describe the interaction of human– machines and the interaction of human–humans (Rose & Bearman, 2012). It has also been successfully adopted as an essential part to predict human errors. HTA starts with a main task defined by its intended goals, and then decomposes this main task into its several sub-tasks. Each sub-task also has its sub-goals to reach, and all sub-goals connect together to achieve the main task goal. All sub-tasks can be further broken down, but the level of decomposition will depend on task complexity and requirements from the aim of the analysis. Table 5 presents an HTA sample for oil ship loading work. CTA is a technique which can be used to identify the cognitive processes that support an operator to conduct specific tasks. The cognitive processes include defining tasks, identifying essential information and the patterns of cues, assessing situations, making discrimination, addressing issues strategically, making judgments, and forming decisions (Militello & Hutton, 1998). The implementation of CTA requires a task analysis process to describe a task (e.g., steps, procedures, and actions) and to analyze what is necessary for operators to successfully complete a specific task within a system (Stanton, 2005). Successful application of CTA to enhance system performance will depend on a concurrent understanding of the cognitive processes underlying human performance in the work domain and the constraints on cognitive processing that the work domain imposes (Vicente, 1999). Similar to HTA, massive efforts (time and resources) are necessary to finish a CTA project. Based on task analysis techniques, several structured methods have been developed to systematically predict human errors. Among them, Predictive Human Error Analysis (PHEA) and Systematic Human Error Reduction and Prediction Approach (SHERPA) are two widely-applied techniques in real practice. 3.2.1 PHEA PHEA is a user-friendly tool for predicting human errors. This technique requires the use of task analysis methods such as Hierarchical Task Analysis (HTA), to give a decomposition and description for the selected task (Bligård & Osvalder, 2014). For each task step, the error code/explanation can be identified by checking Table 6 (Bligård & Osvalder, 2014; Jahangiri, Derisi, & Hobobi, 2015). Then, the error details are described, which is followed by an evaluation of the error consequences. Lastly, prevention and mitigation strategies are determined by subject-matter experts. During the PHEA practice, a designed worksheet is required to record the information
DESIGN FOR HEALTH, SAFETY, AND COMFORT Table 5 The Explanation of Each Task Code in Oil Ship Loading Task Sub-task 1. Safety check before oil offloading Sub-task 1.1 Give inspection on each safety critical equipment to ensure they are at correct position Sub-task 1.2 Test sensors and monitoring system to make sure they are functional. Sub-task 1.3 Check oil transfer arms, pipelines, valves, and flanges to guarantee there is no leakage. Sub-task 1.4 Keep communication with central control room both at oil ship and at oil port. Sub-task 1.5 Finish all documentation work and approved by both oil port and oil ship. Sub-task 1.6 Get official permission for oil loading work Sub-task 2. Operation of oil loading arms Sub-task 2.1 Starting oil loading arms one by one and move them towards the oil ship Sub-task 2.2 Connecting oil loading arms one by one with manifolds at the oil ship. Sub-task 3. Oil loading process control Sub-task 3.1 Periodically safety inspection to pipelines, valves, flanges, and transfer arms Sub-task 3.2 Continuously monitoring the ship’s conditions and keep effective communication Sub-task 4. Oil loading arm disconnection Sub-task 4.1 Vent all of the rest oil in each transfer arm Sub-task 4.2 Disconnect each transfer arm with manifolds at oil ship Sub-task 4.3 Quickly install blind flange and seal it on manifolds to avoid oil leakage. Sub-task 4.4 Control the transfer arms and move it towards oil port. Sub-task 4.5 Locate the oil transfer arms at correct position and lock them. Sub-task 4.6 Finish documentation work
on step numbers of the selected task, an error code/explanation, an error description, error consequences, and a prevention and recovery strategy. 3.2.2 SHERPA SHERPA is another structured human error prediction method proposed by Embrey (1986a). Similar to the PHEA technique, SHERPA also combines task analysis with error taxonomy to identify credible errors that are associated with human activities (Hughes et al., 2015). The regular procedure of the SHERPA method contains eight steps: 1. 2. 3. 4. 5. 6. 7. 8.
HTA practice Task classification Human error identification practice Consequence analysis Error recovery analysis Ordinal probability analysis Criticality analysis Remedy analysis.
HUMAN ERRORS AND HUMAN RELIABILITY Table 6
Error Taxonomy and Description in PHEA
Error types Planning errors
Action errors
Checking errors
Retrieval errors
Information communication errors Selection errors
Error code P1 P2 P3 P4 P5 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 C1 C2 C3 C4 C5 C6 R1 R2 R3 I1 I2 I3 S1 S2
Explanation Ignore preconditions of a plan Execute wrong plan Execute inappropriate plan Right plan but conducted in wrong order Execute correct plan too soon/late Action too late Action too early Action omitted Action too fast Action too slow Action too much Action too little Action incomplete Action in wrong direction Wrong action on correct object Checking too late Checking too early Correct check on wrong object Check incomplete Checking omitted Wrong check on wrong object Information not obtained Wrong information obtained Information retrieval omitted Information not communicated Wrong information communicated Information communication incomplete Selection omitted Wrong selection made
Source: Modified from Bligård and Osvalder, 2014.
So far, SHERPA has been practiced in different applications, such as the transportation of hazardous chemicals (Kirwan, 1994), the control units of petrochemical factories (Ghasemi et al., 2013), the development of compensatory cognitive rehabilitation strategies for stroke patients (Hughes et al., 2015), and the medical administration work of a cardiac telemetry unit (Bhuvanesh et al., 2008). 3.3 Human Error Detection Human error detection is a process to make operators aware of an error that is about to occur or has already occurred, which is independent from understanding the nature and cause of the error (Zapf et al. 1994). A book chapter on human error detection (Sharit, 2012) has classified human error detection into two primary forms: (1) cognitive strategy for error detection; and (2) use of redundancy for error detection. A four-stage human error detection strategy was proposed by Kontogiannis and Malakis (2009). The four stages are awareness-based detection, planning-based detection, actionbased detection, and outcome-based detection. In awarenessbased detection, introspection is used to check human abilities concerning their completeness, coherence, reliability, and other
531
additional data to guarantee enough consideration of hidden and untested errors. In planning-based detection, a time scale is necessary to revise previously established plans. Besides, conflicting goals are balanced through the mental simulation of the risks associated with carrying out alternative plans (Sharit, 2012). Action-based detection takes place when perceiving erroneous actions through either auditorily, visually, or proprioceptively (Sellen, 1994). This stage involves checking activities to detect errors before and after a specific task. Outcome-based detection includes strategies such as checking time-dependent changes, checking mismatches between reality results and expected outcomes, and the consideration of the effects from the interventions of other agents (Kontogiannis & Malakis, 2009). Generally speaking, redundancy is classified into two main categories, i.e., forcing function and other person’s detection (Sellen, 1994). The former (forcing function) usually represents some forced measures to avoid some potential mismatches. For instance, in the context of aviation, a computer algorithm can serve as a forcing function that gives commands to a pilot as a response to his/her erroneous action which could put an aircraft in a dangerous situation (Kontogiannis & Malakis, 2009). For the latter (other person’s detection), cross-checking is one of the effective ways for human error detection, and it can easily be understood that the people involved in cross-checking should better be from a third-party group and outside the operational situation (Sharit, 2012). Some researchers attempt to use physiological data-based methods to detect human error. For instance, the mean pupil diameter change and electroencephalogram were individually selected as an indicator to detect human error in real-time monitoring (Cha & Lee, 2019; Damacharla et al., 2018). However, such methods still need practical validation. 3.4 Human Error Analysis Human error analysis requires comprehensive considerations from the aspects of “what happened,” “where it happened,” “how it happened,” and “why it happened.” Two widely-applied comprehensive human error analysis methods, i.e., Human Factor Analysis and Analysis System (HFACS) and AcciMap are introduced in this part. These two methods are capable of fulfilling a high-level human error analysis in a structured way. 3.4.1 HFACS HFACS (Shappell & Wiegmann, 2000) is a well-recognized systematic human error analysis method. This model is the further development of the Taxonomy of Unsafe Operations (TOS) model designed by Shappell and Wiegmann in 1997 (O’Hare, 2000), and it is also an extension of the famous Swiss cheese model designed by Reason (1990). The aim of the HFACS technique is to provide a comprehensive and straightforward framework to assist practitioners to identify, to investigate, and to analyze human error in different engineering cases. Human errors in an accident are analyzed from four levels, as shown in Figure 16: organization influences, unsafe supervision, preconditions for unsafe acts, and unsafe acts. In general, the level of organization influences includes resource management, organization climate, and the operational process; the level of unsafe supervision consists of inadequate supervision, planned improper operation, failure to correct problems, and supervisory violations; the level of precondition for unsafe acts deals with substandard conditions and practices of operators; and the level of unsafe acts includes errors and violations (Akyuz & Celik, 2014). In 2006, a modified HFACS model was proposed by adding external factors as the first level above the level of organization influences. In this
532
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Organization Influences
Unsafe Supervision
Latent Failures
Precondition for Unsafe Acts
Unsafe Acts
Latent Failures
Active Failures
Latent Failures
Figure 16
Accidents
A typical HFACS model. (Source: Modified from Akyuz and Celik, 2014.)
level, management factors, economy factors, policy factors, and historical factors are added as supplements (Reinach & Viale, 2006). Due to its usefulness in comprehensive analysis of human error, HFACS has been widely applied in real practice. So far based on the original HFACS model, considerable attempts have been made to modify the traditional HFACS model to make it suitable for different domains (Yildirim, Ba¸sar, & U˘gurlu, 2019), such as coal mines (R. Liu et al., 2019), marine accidents (Akyuz & Celik, 2014), ship-icebreaker collision in ice-covered waters (Zhang et al., 2019), passenger vessel accidents (U˘gurlu et al., 2018), and fuzzy Bayesian network plus HFACS for accidents in process industries (Zarei et al., 2019). 3.4.2 AcciMap AcciMap is an extension of Rasmussen’s risk management framework (Rasmussen, 1997). It uses graphical models to analyze and to describe the failures, decisions, and actions in an accident (Salmon, Cornelissen, & Trotter, 2012). Six hierarchical tiers are involved in this method, i.e., government policy and budgeting; regulatory bodies and associations; local area government planning and budgeting; technical and operational management; physical processes and actor activities; and equipment surroundings. Contributory factors are identified and mapped to each one of these six tiers. Then according to cause-effect relations, all contributory factors are linked across each tier. Finally, a hierarchical evaluation is conducted to determine the significant factors of a potential accident. Figure 17 shows the framework of AcciMap. So far, this method has been used in aviation (Thoroman, Salmon, & Goode, 2020), road transportation (Hamim et al., 2020), and the mining industry (Stemn, Hassall, & Bofinger, 2020).
Government Regulatory policy and bodies and budgeting associations
Local area government planning and budgeting
4 HUMAN ERROR CONTROL This section discusses the general strategies and measures for human error control. There are publications illustrating various specific human errors in different industries. Kletz (2001) presents many engineering examples of real problems and the corresponding measures to prevent or solve them, mainly in the petroleum and chemical industries. For any control to be considered as effective, it should either reduce the possibility that human error will occur or mitigate the severity of its consequences, but also not create new safety hazards that can potentially cause injury or increase the human error rate. 4.1 Strategies for Human Error Control at the Task Level If we accept the view that humans make errors because they are put in unfavorable conditions, the primary strategy for human error control is to control the factors that constitute unfavorable conditions, and then enhance humans’ ability to work in unfavorable conditions. Humans, in a specific organizational form, perform tasks using tools or interfaces to interact with job objects in a certain physical and social environment. From this viewpoint, the factors influencing human performance could be identified from aspects of humans and their organization, the task characteristics, the physical environment, the social environment, tools and interfaces, job objects, and working processes, as illustrated in Figure 18. These factors can also be classified into internal, situational, and external factors, or other taxonomies. Each factor may have its own main effects, and interact with other factors. It could have direct or indirect effects on human performance.
Technical and operational management
Physical processes and actor activities
Represents failures, decisions, actions Figure 17
The framework of AcciMap.
Equipment surroundings
HUMAN ERRORS AND HUMAN RELIABILITY
533
Individual: Personality, cognitive style, physical status, intelligence, training, experience; responsibility, motivation, attitude; awareness, emotion, stress, attention, workload, etc. Team: Size, structure, role, climate, shared mental model, etc.
Human
Complexity, difficulty, time requirement, qualification requirement, repetitiveness, regularity, criticality, etc. Physical environment: Temperature, humidity, noise, vibration, illumination, air quality, altitude, ground/floor condition, weather, visibility, radiation, etc. Social environment: Social relationships, management Environment system and culture (leadership, policy, incentive system, rules, authority, job shifts, job rotation, responsibilities and rights, communication channels and style, open discussion, government administration and policy, etc.)
Task
Human performanceinfluencing factors
Tool/ interface
Job object Working process
Task support, usability, job aids, etc.
Visibility, feedback, interface, danger, etc.
Job methods, procedures, communication, collaboration, process control, etc. Figure 18
Human performance influencing factors.
Only based on knowledge of human performance and complete understanding of a task can the key factors favoring human errors during task performance be identified. The measures to control a factor depend greatly on the nature of the factor, but it is either technological or administrative. Prevention by design is always the first recommendation. Many factors (e.g., physical environment, task characteristics, tools and interfaces, job objects, and working processes) can be improved by proper design. Automation measures can be adopted to avoid having humans perform critical tasks in unfavorable conditions. Measures are rather limited to enhance humans’ ability to work or lower the probability of their errors in unfavorable conditions. The traditional measure is via personnel selection and training. Selection criteria for a profession or a specific task type are often not easy to identify and measure. Some criteria are abstract and not clearly defined, such as the ability to cope with stress. In addition, it is often difficult to set up selection standards. Training is effective for technical skills, however, it is not the case for non-technical skills (e.g., cooperation, communication, situation awareness, decision-making, leadership, teamwork, coping with stress, etc. Refer to Flin, O’Conner, and Crichton (2008), for more information about non-technical skills.).
Design measures can be adopted to mitigate the negative effects of various factors on human performance, such as adaptive interfaces, simplification of tasks, forced check and confirmation, memory aids, decision aids, and so on. Some monitoring technologies are available as alerts for fatigue and other dangerous states. In military and other specific domains, drugs and other biomedical measures could be adopted to keep humans conscious or energetic. 4.2 Strategies for Human Error Control at the Organization Level The above discussion focused on task level, i.e., how to avoid human errors when performing tasks. For an engineering project, a systematic human error reduction program would be necessary to achieve schedule control and the final success of the project. Such a program would cover the whole life cycle of the project (as shown in Figure 19) and involve actions of preliminary hazard (including critical human errors) analysis at the system definition stage, defining safety acceptance criteria at the system requirements stage, designing human error prevention features at the design and implementation stage, preventing human errors at the manufacturing/assembly/installation
Human errors occur at any stage of the life cycle of a product/project. Decreasing impact on human errors System Definition
System Requirements
Design/ Implementation
Manufacturing/ Assembly/ Installation
Validation
Operation/ Maintenance
Disposal
Seeking solutions at upper stages to prevent human errors Almost all safety problems can be traced back to management deficiencies. Management System and Culture Figure 19
Human error prevention in the life cycle of a product/project.
534
DESIGN FOR HEALTH, SAFETY, AND COMFORT
stages, validation of safety features, training of operators, undertaking monitoring and administrative measures at the operation/maintenance/disposal stages. Note that human errors at later stages can always be traced back to insufficient considerations of human error prevention at earlier stages. Solving a problem at the earlier stages would be more effective, but if the problem is identified too late, it becomes more difficult to solve it (requiring more cost and time). In addition, accident reports reveal that almost all safety problems can be traced back to management deficiencies, indicating the importance of establishing a systematic and comprehensive management system and the development of a good safety culture. Such a system and culture focus on learning and improving rather than blaming and punishing. When a specific safety problem such as a human error is identified, we should continue to find the reasons and the related factors, from both the technical and management aspects. There is always room for management improvement, as a management review has been proved to be a positive practice of organizational learning.
High
4.3 Hierarchy of Human Error Control Figure 20 presents the general hierarchy of human error control. The hierarchy reflects the philosophy to first reduce the probability of human errors and then to mitigate the consequences of human errors, to first adopt technological measures and then administrative measures. The hierarchy applies to all categories of human errors. To repeat, when implementing controls, there are always human factors principles that can be followed to reduce human errors and/or mitigate their consequences. Removing any possible chances of human errors by design should be the first priority for human error control. This principle follows the concept of “intrinsic safety.” As per Murphy’s Law, if an unsafe option is available, no matter how low the probability that someone would be careless or silly enough to choose it, someone will eventually choose it. Automation is the best solution if applicable. Other design solutions could be re-designing the structure, replacing dangerous materials or energy, removing hazardous features, isolation by safeguards,
and so on. Technological improvements may remove the existing chances of human errors. For example, if all trains can be informed of the speed limit data, errors can be avoided in the communication between two railway administrations, between a station and a train, but also in the data input by a staff at a locomotive depot. If the chances of human errors cannot be entirely eliminated, then it must be reduced as much as possible. Once more, design measures should be considered first because technological solutions are more reliable than administrative ones. Examples of design measures include simplifying tasks, providing task-support functions (such as memory aids, decision aids, and teamwork tools), improving interface usability, and presenting relevant and sufficient information. Checklists, personnel redundancy, procedures, warning signs are some examples of administrative measures, which do not control human behaviors but help humans avoid errors. An online monitoring system could help detect human errors. Video technology can be used to record human actions and detect wrong actions in real time. Using preset rules or by simulation, abnormal inputs by human can be detected and asked for correction before they perform real effects on the system. Feedbacks from a system are helpful for the users to be aware of their error. It is good practice to require a compulsory final check or asking for confirmation before releasing critical commands. The system must allow the users to cancel their inputs and commands. Another effective way to find out possible errors is by letting another person double-check since each person has his or her own independent thinking and would not make the same errors. The above measures could largely reduce human errors. If it is impossible to guarantee the elimination of errors, then effective ways should be found to mitigate their consequences in unforgiving situations (Reason, 1990). In most cases, human errors cannot be totally avoided. The further control of human errors is to tolerate them—to avoid unwanted consequences when they occur. For a safety-critical system, it should be designed with the ability to mitigate the influence of human errors. Such a resilient system meets the concept of “fail-safe.”
Make human errors impossible to occur (Remove opportunities of human errors by design)
Reducing human error opportunities
Priority
Make human errors more difficult to occur (Reduce the possibility of human errors by design and/or administrative measures)
Detect and correct human errors (Providing opportunities to find and recover human errors by design and/or administrative measures)
Low
Tolerate human errors (Enable a system to mitigate the influence of human errors and avoid unwanted consequences by design and/or administrative measures)
Mitigating human error consequences
Response to human errors (Prepare resources as response to human errors to mitigate the final consequences by technical and administrative measures)
Figure 20
A general hierarchy of human error control.
HUMAN ERRORS AND HUMAN RELIABILITY
535
By adopting interlock, a human error would not lead to an accident by itself, but needs to combine with more conditions, meaning that the human error is tolerated if the interlocking conditions are not invoked. Fault Tree Analysis can help identify how a human error would contribute to an accident. Backup is a typical administrative measure to mitigate human error consequences for electronic resources. If a human error is leading to a dangerous situation, there must be an emergency stop as a final choice to restore the system to a safe state. Emergency procedures must be prepared to control the consequences of an accident. When selecting human error control measures, technological measures should be put at the highest priority, since they are more reliable. In contrast, the effectiveness of administrative measures depends greatly on humans. The above control strategies and measures are not exclusive. Instead, their combined use often works better. Multiple measures to prevent the occurrence of a human error and prevent it from causing unwanted consequences reflect the idea of “defenses-in-depth.” At the organizational level, the establishment of a systematic Safety Management System (SMS) and the development of a good safety culture are helpful for human error prevention. SMS and a safety culture do not deal with specific human errors but regulate humans, and would have broad influence on human behaviors. This has been demonstrated by many accident reports. Another general hierarchy of human error control at a macro viewpoint could go from technological solutions, administrative solutions, SMS, to safety culture, as shown in Figure 21.
•
•
•
•
4.4 Technological Measures According to the literature review and the above discussion, technological measures can be summarized as: • Automation. It is always recommended to consider an automation solution in human error control since it removes the possibilities of human error, especially when the working conditions are unfavorable and difficult to change, or the task is so challenging that most human operators would not be able to do it even after reasonable training. Various mechanical, electronic, and
•
•
Technological measures first (try best to implement technological solutions for the most reliable control) Specific control Administrative measures (if technological measures are not realized or as supplements to technological measures)
• Safety management system General control Safety culture
• Figure 21 A general hierarchy of human error control at a macro viewpoint.
computer-based technologies can be applied to replace human daily operations and emergency response, shifting the risk of human error to hardware and software failures. An automation project begins with the understanding of the existing process, then simplifying the process, and finally automating the process (the so-called USA principle). Redesigning with a different theory, principle, structure, configuration, etc. Besides automation, there is a broad spectrum of other design solutions to reduce human error opportunities or dependence of a system on human performance. New NPP designs with big water tanks on the top of the containment could protect an NPP from meltdown without the need for operator actions within 72 hours. Replacing dangerous materials or energy. For example, replacing the use of high voltage with low voltage would reduce human errors in avoiding electric shock; replacing a flammable material with a non-flammable one would reduce the requirement on workers to avoid fire. Isolation. Safeguards such as protective covers, shields, and barriers are commonly used to isolate operators from rotating or moving machine components and hazardous materials to avoid operators’ unintentional contact with the hazards. The isolation principle can also be used to protect humans from released energy and materials. Simplification of system and task. System complexity and task complexity are critical influencing factors in human error. Many contemporary systems are so complex that operators have difficulties understanding their behaviors, and would not be able to correctly respond to system failures. Task complexity goes hand in hand with system complexity. The more mentally demanding a task is, the higher the probability the task performer will err. It is not surprising to see so many human error reports from complex safety-critical systems. Thus, if there is something complex, it should be simplified. A good design can be seen as one that does not require the operators to think too much. Task support. When tasks cannot be simplified, supporting functions should be implemented. Even simple aids could help operators avoid making memory and decision errors. Various teamwork tools could facilitate team communication and cooperation. Information supply. Providing relevant and sufficient information in a proper way is critical to planning, decision-making, and diagnosis tasks. It should be noted that giving too much information should be avoided. Working with computerized systems, an operator may be overloaded with too much information while the most relevant information is submerged. She, Li, and Ma (2019) found that information quality rather than quantity was positively correlated with team mutual awareness. Online monitoring and alarms. Implementation of such systems is essential for complex safety critical systems since operators cannot stay focused and vigilant with the deviation of one of many system parameters. For long-distance travel, such a system would alert a driver of his or her fatigue and sleepiness. On an assembly line, an online system can identify a wrong operation and ask for correction. Improving interface usability. Better interface usability means it is easier to learn and use, so a more reliable performance can be expected. Usability principles can be found in standards and other literature.
536
DESIGN FOR HEALTH, SAFETY, AND COMFORT
• Timely feedback. Without feedbacks, an operator cannot know whether a command has been executed or not, whether a parameter setting has been accepted or not, whether a piece of delivered information has been received or not. Feedbacks provide confirmation and make operators better aware of the system status. • Confirmation. This is a measure to make sure that the action is intentional rather than a misoperation, such as a careless touch of a button, to provide a chance for the operator to check whether the input is correct or not, or to confirm that the correct information has been received. • Interlock. Interlocks can be used to avoid the combination of conditions that lead to an accident, or prevent a human error causing unwanted consequences by combining it with other preventive conditions. When opening the door of a machine, the machine is stopped immediately. This is implemented by interlocking the running of the machine with the closing of the door. • Fail-safe design. This is the principle that a system remains at or is directed to a safe mode once there are failures of hardware, software or humanware. Rail traffic lights are required to turn red once the ATP (Automatic Train Protection) system fails. In many systems, if an operator input is invalid, these systems will not execute it. • Simulation. Simulations could help examine whether an action, a plan, or settings would work as it is supposed to. Errors can thus be identified before actual effects. Note that the above list is not exhaustive. The optimal measure can be selected according to the nature of a real problem. Kletz (2001) presents many real examples. 4.5 Administrative Measures To control a specific human error, the following administrative measures can be considered: • • • • • • • • • • • • • •
Job re-design Standardization and procedurization Checklists Personnel redundancy Double check/monitoring Working rules Warning signs or notices Reduction of distractions Slow down the working pace Training (knowledge, technical skills, and non-technical skills, attitude, etc.) Fatigue management Backup Version control Encourage reporting and open discussion of human errors
Again, this is not an exhaustive list. The selection of a measure should be based on a thorough error investigation including task analysis. For general control, it is essential to establish a SMS. A SMS is a systematic approach to managing safety, including the necessary organizational structures, accountabilities, policies, and procedures (International Civil Aviation Organization, 2013). Personnel selection, training and qualification, physical and mental status examination, identification and control of safety hazards including human errors, and safety incentives are
traditional measures that are included in a SMS. An effective SMS adopts a continuous improvement model featured by continual reviews and follow-up actions. As a key part in a SMS, identification and control of safety hazards is assisted by various analysis tools. Reason (1997) summarizes several human error management tools, such as Tripod-Delta, review, Managing Engineering Safety Health (MESH), Human Error Assessment and Reduction Technique (HEART), Influence Diagram Methodology (IDM), and Maintenance Error Decision Aid (MEDA). These tools can be used to reveal and correct error-producing factors at both the workplace and the organizational levels. Incident reporting is commonly required for safety-critical systems. Templates and forms along with standardized options and keywords are provided for such reporting so that important information is included in a consistent manner. Currently database systems are implemented for incident reporting to support computerized analysis for better organizational learning. 4.6 Cultural Measures As mentioned before, a safety culture has an influence on human behaviors. A good safety culture is a barrier to prevent a wide spectrum of human errors but not a specific one. There are various definitions and understanding of safety culture. As pointed out by Reason (1997), “[F]ew phrases occur more frequently in discussions about hazardous technologies than safety culture. Few things are so sought after and yet so little understood.” Reason and Hobbs (2003, Chapter 11) summarize two parts of safety culture: the first is comprised of the shared beliefs, attitudes and values of an organization’s membership regarding the pursuit of safety; the second embraces the structures, practices, controls, and policies that an organization possesses and employs to achieve greater safety. The second part is more concrete and should mostly be referred to as safety management. Thus, in this chapter, safety culture refers to the first part. The following important attributes of safety culture are emphasized by Reason and Hobbs (2003): • A safe culture is the “engine” that continues to drive the organization toward the goal of maximum attainable safety, regardless of current commercial pressures or who is occupying the top management posts. • A safe culture reminds the organization’s members to expect that people and equipment will fail. • A safe culture is an informed culture, which can only be achieved by creating an atmosphere of trust in which people are willing to confess their errors and near misses. • An informed culture is a just culture enabling an effective reporting culture. • A safe culture is a learning culture. In a word, with a good safety culture, safety is always set at the top priority for people at all levels in all activities. “Safety First” is not just a catchword, but is reflected in every activity. When performing tasks, making technological decisions, arranging jobs, or making policies and rules, one question that is always kept in mind is whether safety is optimized or not. We believe that people intend to do a good job and do not like to make errors. In only very few cases are people reckless and warrant disciplinary actions. Reason (1997, p. 208, Figure 9.4) proposes a decision tree for discriminating the culpability of an unsafe act. Questions to ask include whether the actions were intended, whether the consequences were as intended, whether an unauthorized substance was ingested, whether taking an unauthorized substance was the result of a medical condition, whether operating procedures were violated knowingly, whether the procedures were available, workable,
HUMAN ERRORS AND HUMAN RELIABILITY
Figure 22
A conventional control room of a nuclear power plant. (Source: Liu and Li, 2016a. © 2016a Elsevier.)
intelligible, and correct, whether the person in question could pass a substitution test, whether there were deficiencies in training and selection or inexperience, and whether the unsafe acts have been done before. Answers to these questions would conclude that the actions were sabotage, the use of substance abuse without mitigation, the use of substance abuse with mitigation, a possible reckless violation, a system-induced violation, a possible negligent error, a system-induced error, a blameless error but requiring corrective training or counselling, or a blameless error without need to take corrective actions. Such a decision tree would lead to just judgment of human errors. Only if there is no need for employees to worry about disciplinary actions, will they tell the true and complete stories of their errors, then safety issues can be identified and remedied. The blame-free culture is emphasized in industries such as NPP. In these industries, open discussion of human errors is considered good behavior, which is highly encouraged and rewarded, so that lessons can be learned and safety performance can be maximized. In contrast, any forms of penalties (even attending a training class) would hamper the reporting of personal faults and thus be harmful to safety. The Japanese train accident well tells such a story (Chikudate, 2009). “Foster blame-free culture, reinforce openness” is suggested as a sign of good safety culture practices by the International Atomic Energy Agency (IAEA, 1995). When dealing with human errors, managers should always ask themselves what would be the best move to improve safety performance, so that they really practice the policy of “Safety First.”
5
537
HUMAN ERROR IN EMERGING AREAS
5.1 Opportunities and Challenges of New Technologies The recent decade has seen new technologies adopted in more and more safety-critical systems, either new systems or modernized systems. In many cases, new technologies are introduced for their obvious merits, such as more powerful functions, higher flexibility and adaptivity, better usability, and lower cost. Sometimes, old technologies have to be upgraded because there are no other choices, especially for systems with long lifespan, such as NPPs. Almost all the existing NPPs need to modernize their existing Instrumentation and Control (I&C) systems and associated HSIs (Alonso, Illobre, & Pascual, 2015). Boring et al. (2019) point out that “[t]here remain few suppliers to make comparable I&C, which over time forces the need for modernization
in the main control room. There is no imperative that the technology does not work; rather the imperative is simply that the technology is no longer readily available or maintainable.” The introduction of new technologies may greatly change the nature of a system. Figure 22 and Figure 23 illustrate the differences between a conventional MCR of an NPP and a digital one (Liu & Li, 2016a). The nature of the work to be performed by operators has dramatically changed. Operators in a digital MCR do not need to walk around analog panels but instead they have to remain seated at their workstations, they do not need to get information by communication with others because all information can be obtained from the computerized interfaces, they do not look at different displays and operate on different controls but instead look at computer screens and operate with keyboards and mice. With new technologies, much more information can be collected and displayed and many more functions can be implemented, making a system more powerful and more complex. This brings opportunities and challenges for safety. On one hand, traditional safety issues may be solved by the adoption of digital technology, such as better interfaces (e.g., higher-level information, better organization and presentation of information, adaptive interfaces) and embedded safety functions (e.g., task aids, decision support, human error detection and feedback, computerized procedures, automatic data collection). Many human errors could be avoided in this view. On the other hand, new safety issues arise. The system becomes so complex that operators cannot easily understand its behaviors and decisions. Too much information would overload operators. If a navigation system is not well designed, operators may get lost in the depth of the complex interface space. The communication and collaboration modes within a crew are also changed. There are differences on complexity factors between these two MCRs (Liu & Li, 2016b). The change in human–system interaction is fundamental. Mouse clicking the wrong objects or incorrect keyboard inputs are likely to happen. The adoption of new technologies requires more research on human errors: (1) identifying new error modes, mechanisms, influencing factors ,and proposing the encountering measures; (2) developing new ideas for interface design to reduce mental workload, improving situation awareness, supporting task performance, providing decision aids, facilitating teamwork, and ultimately reducing human errors, especially for emergency and other critical tasks; (3) validation and comparison of interface designs; (4) collection of human performance data to enable in-depth understanding of human error and support human reliability analysis.
538
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Figure 23
A digital control room of a nuclear power plant. (Source: Liu and Li, 2016a. © 2016a Elsevier.)
As an example, Xu et al. (2008) compared two computerized presentations of an emergency operating procedure and found that the two-dimensional flow chart presentation (middle of Figure 24) worked better (significantly lower error rate) than the logic tree presentation (upper-right corner of Figure 25). An analysis of the cognitive process reveals that when using the logic tree, extra working memory is needed to keep the results of previous judgments before making the final one, while when using the two-dimensional flow chart, an operator just follows
Figure 24
the path and does not need to memorize the previous judgments. The logic tree presentation was originally proposed by experts in nuclear engineering. This example demonstrates the necessity of human factors experiments to validate engineering designs. With empirical data, we can show the differences and help to make a decision; but more importantly, if the underlying mechanism can be discovered, we can make our conclusions more persuasive and enrich our knowledge for understanding other cases.
Two-dimensional flow chart presentation of an emergency operating procedure. (Source: Xu et al., 2008. © 2008 Elsevier.)
HUMAN ERRORS AND HUMAN RELIABILITY
Figure 25
539
Logic tree presentation of an emergency operating procedure. (Source: Xu, et al. 2008. © 2008 Elsevier)
Mobile devices are being introduced into safety-critical systems. These devices are featured with the use of touch screens with limited size and wireless communication. They can provide various on-site aids for tasks such as inspection and maintenance and thus be very helpful for field operators. Controls on the touch screen should be large enough for the use of fingers in operations. In addition, touch screen operations under vibrating environment would be unsuitable since accidental touching would likely happen. The small screen would only allow simple interactions requiring limited amount of information. Another technology, augmented reality, could also be very helpful for field tasks. Artificial Intelligence (AI) technologies have been tried for fault alarms (e.g., Yang & Chang 1991), fault diagnosis (e.g., Martin & Nassersharif, 1990; Reifman, 1997; Dong, Zhou, & Zhang, 2018), and operation and maintenance support (e.g., Himeno et al. 1992; Jenkinson, Shaw, & Andow, 1991), in safety-critical systems. They received much attention in the 1990s and have regained attention recently. Many experts hold a conservative attitude to the use of AI in safety-critical systems because of the implicit logic inside and insufficient accuracy. Considering the high reliability requirement of such systems, how high the accuracy must be is always controversial. In addition, the problems can lead to distrust by the operators. Overall, AI not used as automation but as an aid or reference for operators seems to be favorable. There are many research opportunities in the area of human-AI interaction. Current research on the effects of new technologies on safety in terms of human errors is insufficient. Prudent decisions should be made based on validation experiments. Safety assessment has to be conservative. In addition, since new technologies may change the way to work, the management system may need to be reviewed and reformed. The baseline for approving the applications of new technologies is that the new system should be “at least as safe as the technology it replaces” (Boring et al., 2019).
5.2 Human Error in Human–Automation Interaction In the 1960s, the estimated contribution of human error was around 30%, whereas in the last 40 years, human error was implicated in 70% or more of all accidents in complex systems (Hollnagel, 1998). Have human beings become more fallible? The answer is “Not necessarily” (Reason, 2013). There are a number of reasons why the human error contribution is now more salient than ever before. For instance, technical reliability has become markedly high, leading to the reduction in its contribution to accidents and in turn making human errors more evident. The introduction of digitalization, automation, and computerization largely increases system reliability but changes the human error contribution and types. Reason (2013) summarized the various potential consequence of higher automation. For example, while it might actually reduce the total number of errors, it increases the probability of mistakes and lapses, which are hard to detect and likely to cause greater damage (Reason, 2013). Higher automation makes the system more opaque and non-transparent to frontline operators, reduces their quality of mental model, and creates new knowledge-based errors such as mode errors, which is highlighted below. Mode errors, well recognized on automated flight decks (Sarter, 2008; Sarter & Woods, 1995), result from a lack of mode awareness. Mode awareness refers to an operator’s knowledge and understanding of the current and future automation configuration (e.g., its status, targets, and behaviors). A mode error happens when an action is taken that would be appropriate for the assumed but not the actual configuration (Sarter, 2008). It is a byproduct of automated systems offering different automation levels, which create the possibility of confusion between them. Factors contributing to mode errors on flight desks include (1) low visibility due to poor automation feedbacks; (2) gaps and misconceptions in pilots’ mental models of highly
540
complex automation; (3) a high degree of system autonomy (the ability of the automation to initiate actions without immediately preceding pilot input); and (4) a high degree of system coupling (a high level of interdependence between various components of the automation) (Sarter, 2008). In particular, frontline operators have a tendency to rely on automation, called “automation complacency” (Parasuraman & Riley, 1997). Excessive automation dependency (i.e., automation complacency) is a contributory factor in aviation accidents and incidents. For instance, based on a sample of 161 automation incident reports related to general aviation flights from the Aviation Safety Reporting System (ASRS; https://asrs.arc .nasa.gov/search/database.html), Taylor, Keller et al. (2020a) found that 73 incidents were caused by automation dependency, 34 by automation malfunction, 42 by air traffic control, and 12 by lack of training or familiarity with the automated system. Thus, almost half of the automation-related incidents in general aviation (73/161) were due to pilots’ overreliance on cockpit automation. The specific causes in the four major categories were also identified by Taylor et al. For instance, among the 73 incidents related to automation dependency, 36 were due to pilot distraction, 26 due to improper automation monitoring, 6 due to lack of vigilance, and the remaining 5 due to override failure. That is, automation complacency could be not only a propensity but also a product of contextual factors. Erroneous actions in human–automation interaction (HAI) can never be eliminated, making error management in HAI pivotal. Four types of characteristics influence error management: automation characteristics (e.g., reliability, error type, level, feedback), individual characteristics (e.g., complacency potential, training, knowledge of automation), task characteristics (e.g., automation-error consequences, verification costs, human accountability), and emerging variables (e.g., trust in automation, workload, situation awareness) (McBride, Rogers, & Fisk, 2014). Error management consists of three phases: (1) detection (i.e., realizing that something has gone wrong or an error has occurred); (2) explanation (i.e., identifying the nature of the error and understanding its underlying cause); and (3) correction and recovery (i.e., correcting this error, modifying the existing or developing a new plan as a countermeasure against the potential adverse influence of the error). In the automation and error management literature, most efforts were upon the first phase (error detection). McBride et al. (2014) developed a framework outlining how general actions can be done in these three phases. For instance, disabling automation, employing stored rules, and applying correction actions can be used in the correction phase. As automated systems are becoming more autonomous and opaque, error management becomes a challenging mission. 5.3 Human Error in Automated Vehicles Automated vehicles (AVs) are now appearing on the public roads. According to the Society of Automotive Engineers’ definitions for vehicle automation (SAE, 2018), automation levels are classified into six levels: Level 0 (No Automation), Level 1 (Driver Assistance), Level 2 (Partial Automation), Level 3 (Conditional Automation), Level 4 (High Automation), and Level 5 (Full Automation). A Level 3 AV needs a human driver’s intervention when the automated driving system cannot manage certain conditions; that is, the human driver and the automated system are sharing the vehicle control. Thus, it does not mean that there are no human elements in the operation of an AV. The safe operation of AVs depends not only on the human–automation co-driving within the vehicle but also the human–AV interactions on public roads. There are different
DESIGN FOR HEALTH, SAFETY, AND COMFORT
lines of research on human error and reliability issues in both contexts (within and outside AVs). The first line of research examines human driver errors within AVs, which can be examined from perspectives of human factors and traffic safety, separately. As in the aviation sector, model confusion is also found in AVs. For instance, based on her 6-month personal experience with a Tesla Model S (Level 2), Endsley (2017) experienced an automation-related problem on 30% of her trips. Mode confusion was the most frequent problem that she encountered, usually because of the physical design of controls on the steering column. She found it difficult to understand why the Tesla Autopilot behaved the way that it did and faced the challenge of creating an accurate mental model for understanding and predicting Autopilot. Similar mode confusion phenomena were found in other on-road studies involving Tesla Level 2 vehicles (Banks et al., 2018; Wilson et al., 2020). Human drivers thought that the Autopilot was engaged but actually it was not. Potential reasons include the unintentional deactivation of Autopilot, failure to properly engage Autopilot in the first instance, and misunderstanding of the internal human–machine interface. Overall system safety could be compromised by the occurrence of mode confusion. Thus, preventing unintentional mode transitions and mode confusion should be a top priority for AV designers. Complacency and over-trust were also observed to be sources of driver errors (Wilson et al., 2020). Traffic safety research also indicates that manually initiated transitions represent 20% (Favarò, Eurich, & Rizvi, 2019) to 30% (Alambeigi, McDonald, & Tankasala, 2020) of AV crashes, which may suggest that these transitions may be a significant impediment to traffic safety involving AVs. Presently, there is insufficient empirical knowledge about drivers’ errors in the transitions that would damage traffic safety (Demeulenaere, 2020), which deserves more scholarly attention. The second line of research involves the behaviors of human drivers within conventional cars when these human drivers and AVs share public roads. This mixed-traffic flow could raise many safety issues and challenge road traffic safety. An examination of Google cars tested in the automated mode from 2009 to 2015 reported that 7 out of 10 incidents were due to a human driver rear-ending the automated car (Teoh & Kidd, 2017). In contrast, only 14% of conventional cars were rear-ended by another car (Teoh & Kidd, 2017). Similarly, based on AV crash reports by the California Department of Motor Vehicles, Alambeigi et al. (2020) and Petrovi´c et al. (2020) found that the majority of AV crashes (more than 60%) were rear-end ones. These over-represented rear-end collisions were probably because AVs behaved in ways that human drivers did not anticipate or want them to (Schwarting et al., 2019). It raised the safety concerns associated with the improper interactions between human drivers and AVs in a mixed-traffic flow (P. Liu et al., 2020). The public and consumers have a higher requirement over AV safety. They implicitly require AVs to be four to five times as safe as human drivers (Liu, Wang, & Vincent, 2020; P. Liu, Yang, & Xu, 2019). Thus, the above-mentioned sources of human error could hinder the deployment of AVs (Demeulenaere, 2020). Some of these sources are: (1) human drivers within AVs may have misunderstood and misinterpreted the automated driving system; and (2) human drivers of conventional cars may have misinterpreted AVs’ intentions and behaviors or have driven aggressively toward AVs. Thus, there is an emerging need for various human error and reliability studies focusing on how human drivers inside an AV interact with the automated system and resume the vehicle control and how human drivers of conventional cars and other road users interact with AVs.
HUMAN ERRORS AND HUMAN RELIABILITY
5.4 Human Error in Cybersecurity The wide deployment of computers and the spread of connectivity through the “Internet-of-Things” are changing the ways in which humans interact with human–machine systems. For instance, they enable a person to remotely monitor their home from far away through security cameras, remotely work together with team members, or perform financial transactions electronically. Correspondingly, computer hacking is becoming more widespread and damaging. As a major national challenge, cybersecurity is calling for individual, organizational, and national attention toward securing information and systems. The human factors community has a growing interest in addressing human factors issues in cybersecurity (Mouloua et al., 2019). For instance, Proctor and Chen (2015) identified two research areas to which human factors specialists can contribute: (1) classification and identification of phishing attacks; and (2) protection of privacy data held on mobile devices from malicious apps. Particularly, cybersecurity is increasingly recognized because of emerging threats in autonomous transportation and paid more attention in research of unmanned aerial vehicles and self-driving vehicles (Linkov et al., 2019). Research on cybersecurity behavior highlights human as the weakest link in cybersecurity (Mitnick & Simon, 2002), which is also called “the weakest link phenomenon” (Sasse, Brostoff, & Weirich, 2001). An international survey (CompTIA, 2016) reported that 58% of security breaches were due to human errors for most companies. A British study involving 322 information security incidents found that 298 (92.5%) were due to human errors (Evans et al., 2019). Human factors specialists and psychologists may help to prevent human errors in cyberspace through different lines of research. The first line of research aims to understand individual and organizational factors contributing to riskier cybersecurity decisions and behaviors, such as gender, attitude, motivation, personality, awareness, education and experience with technology, and organizational culture (Kraemer, Carayon, & Clem, 2009; Yan et al., 2018). For instance, men reported riskier behaviors in cyberspace (Anwar et al., 2017). Yan et al. (2018) reported that about one quarter (23%) of student participants made correct judgments in less than half of the experimental cybersecurity scenarios (that is, their performance was lower than random guessing) and only 4% had consistent correct judgments (correctness rate > 90%). Canfield et al. (2016) examined vulnerability to phishing attacks and found that phishing-related decisions are dependent on individuals’ detection ability, response bias, confidence, and consequence perception. Another line of research examines human and organizational factors in cyberspace at a higher and broader level and provides causal evidence on the contribution of human and organizational factors to cybersecurity failures. Liginlal et al. (2009) applied Reason’s generic error modeling system (GEMS) (Reason, 1990) to capture the manifestations of human errors in privacy breach incidents and found that humans errors accounted for about 67% of the incidents and that malicious acts accounted for the remaining 33%. Among the human error-related incidents, 74% were due to mistakes and 26% were due to slips. Further, they found out that 88% of the mistakes occurred during the information processing stage and 73% of the slips occurred during the information dissemination stage. In a qualitative study, Kraemer et al. (2009) interviewed cybersecurity experts and produced a causal network analysis of human and organizational factors pathways to cybersecurity vulnerabilities. Evans et al. (2019) analyzed human error-related incidents, identified their primary and secondary contributory factors, and found out that about 60% of the incidents were due to a shortage of time available for error detection and correction.
541
Existing studies also examined how to design cybersecurity technologies and education and training strategies to reduce human errors (Yan et al., 2018). For instance, Liginlal et al. (2009) suggested a defense-in-depth error management strategy and offer possible solutions for error avoidance, interception, and correction. 6 HUMAN RELIABILITY ANALYSIS In general, the human contribution to overall system performance is regarded as greater than that of hardware and software. To achieve a precise and accurate measure of system reliability, human error must be considered. Human reliability analysis (HRA) is the field concerned with concepts, approaches, methods, and models for understanding, predicting, assessing, communicating, managing, preventing, and governing human errors. For HRA practitioners, its most important function is to quantify human error probabilities (HEP) in a task or scenario of interest in risk assessments. HEP is the number of errors observed divided by the number of opportunities for error. The origin of HRA lies in the early probabilistic risk assessment (PRA) performed as part of the US nuclear energy development program in the 1960s. PRA is a qualitative and quantitative assessment of the risk associated with plant operation and maintenance (Adhikari et al., 2009). PRA is a logic structure used to relate accident initiators, equipment and human successes and failures and their consequences together in a logic array through their probabilities (Spurgin, 2010). The logic array (i.e., event trees and fault trees) represents various combinations of possibilities that may occur. The introduction of event and fault trees refers to the review by Sharit (2012). In other domains in which human errors could be a great source of vulnerability, HRA also has many considerations and adoptions, such as oil and gas (Bye et al., 2017), aviation (Burns & Bonaceto, 2020; Gibson & Kirwan, 2008), spaceflight (Calhoun et al., 2014; Chandler et al., 2006), health care and surgery (Onofrio & Trucco, 2020; Sujan, Embrey, & Huang, 2020), railways (Kyriakidis, Kant, et al., 2018), cybersecurity (Evans et al., 2019), and human-autonomy interaction (Ramos et al., 2020). 6.1 The HRA Process Recommended practices for conducting HRA can be found in the Institute of Electrical and Electronics Engineers (IEEE) standard 1082-2017 (also IEC 63260) (IEEE, 2017), the American Society of Mechanical Engineers (AMSE) standard ASME/ANS RA-Sa–2009 (ASME, 2013), and the IAEA standard IAEA-TECDOC-1804 (IAEA, 2016). These standards have requirements and processes for conducting HRA and for incorporating HRA results into PSA. However, HRA can be disconnected from PRA and other risk assessment activities. Kirwan (1994) proposed a general HRA process with 10 steps, as shown in Figure 26. The problem definition step is to determine the scope of HRA. A number of key questions will be answered, for instance, “Is the HRA part of a PSA or is it a stand-alone assessment?” and “How extensive are the resources available for the HRA?” The task analysis step is to formally describe and analyze human–system interactions and the roles of the operators within the system. The human error identification step is to identify the forms of human errors, from simple errors of psychomotor control to more complex errors associated with diagnosis and problem-solving during emergencies. In the human error representation step, it is to represent human errors, along with other failures, in fault or event trees, which will enable the use of mathematical formulae
542
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Problem definition
1
Task analysis 2
Human error analysis
3
Human error representation 4
Insignificant errors not studied further
Yes
Error avoidance Factors influencing performance and error causes or mechanisms
Is screening required?
Screening 5
Improving performance
No Quantification
Error reduction 6
8
Impact assessment 7
Human reliability acceptably high?
No
Yes Quality assurance 9
Documentation 10
Figure 26
The HRA process. (Source: Adapted from Kirwan, 1994.)
to calculate all potential combinations of failures that could result in the accidents under consideration and to add together all of the individual risk probabilities of each fault combination to obtain the total risk for each accident. In this step, it can also model dependence between different adjacent human errors (see Section 6.2.1 for more information about dependence). Next, it is to screen errors by assigning each error a pessimistic probability in the initial run of a PSA evaluation to determine whether more accurate human error quantification is needed. Human errors are predicted and quantified in Step 6, for which, there are a number of quantification methods available (see Section 6.2). Once human errors are represented and quantified, PSA or other quantitative risk assessments can be carried out to determine the overall system risk. If the system is not acceptably safe, risk-significant human errors can be targeted in the error reduction step. Potential error reduction measures include the re-design of the work environment or placing barriers in the system. The final two steps are quality assurance and documentation, during which the results obtained and methods applied are documented. Two aspects of quality assurance will be considered, including the assurance that a qualified HRA has been carried out and the assurance that certain human-work-system design will be implemented to ensure the intended impact of error reduction measures.
6.2 HRA Methods Over 50 HRA methods and tools have been proposed (Bell & Holroyd, 2009) and more HRA methods are still emerging. Different taxonomies or classification frameworks were suggested to organize these methods. For instance, Spurgin (2010) grouped methods into task-related, time-related, and context-related. In some task-related methods, an HRA event tree is built based on the decomposition of a task into a number of subtasks and the quantification of this event tree relies on the HEPs in each subtask (e.g., see Section 6.2.1). In other task-related methods, the quantification of HEP in a task is holistically done at the whole level (e.g., see Section 6.2.4). The concept behind the time-related HRA methods (e.g., see Section 6.2.2) is that a crew will eventually respond to an accident given sufficient time, so its HEP decreases with less available time. Building a curve between time and reliability (non-success or non-response) is their focus. In the task-related and time-related methods, the task and time elements are important in predicting HEP. However, the context under which the task is being performed is important in the context-related methods (e.g., see Sections 6.2.5 and 6.2.6). HEP is determined by the influential context elements. Another common taxonomy is to classify HRA methods in terms of being the first or second generation (Dougherty, 1990). The first-generation methods (see the methods showed
HUMAN ERRORS AND HUMAN RELIABILITY
543
in Sections 6.2.1–6.2.4) are very similar to those in other areas of reliability analysis, that is, human error is assessed through a simple event tree analysis. A human operator is treated as another component in systems. The first-generation methods tend to be atomistic in nature and require analysts to break a task into sub-task parts and then consider the potential impact of modifying certain factors (e.g., stress, time pressure). Usually, the theoretical basis for these methods is the error classification method according to errors of omission (EOOs) and errors of commission (EOCs) (in most applications, only EOOs are considered), definition of performance shaping factors, and cognitive model (the skill-based and rule-based behaviors). These methods have come under severe criticism (Dougherty, 1990; Swain, 1990); for instance, they fail to appropriately model more cognitive aspects of human performance in complex scenarios and they are based on highly questionable assumptions about human behavior and thus lack psychological realism. Despite these criticisms, many are in regular use for PRA (Bell & Holroyd, 2009). The second-generation methods (see methods shown in Sections 6.2.5–6.2.8) try to overcome these limitations by recognizing the importance of “cognitive errors” and “errors of commission,” providing guidance on an operator’s possible and probable decision paths based on mental process modeling from cognitive psychology, considering the dynamic aspects of human-system interaction, and highlighting the impact of “contexts” on human behavior. These methods are not well validated. Few of them (see Section 6.2.5) are in regular use. In addition, some researchers have suggested the classification of the third-generation HRA methods, which are also called “dynamic HRA methods” (Boring, 2007; Di Pasquale et al., 2015; Onofrio & Trucco, 2020) and focused simulation and modeling with HRA to account for the dynamic progression of human behavior leading up to and following human failure events. Dynamic HRA methods are usually based on the firstand second-generation HRA methods. For instance, the Human Error Assessment and Reduction Technique (see Section 6.2.3), belonging to the first-generation method, was integrated with the dynamic event tree and Monte Carlo simulation (Onofrio &
Trucco, 2020). Before the first- and second-generation methods can enable us to make accurate predictions, dynamic HRA is no more than conceptual. In the next sections, we overview a sample of HRA methods, including their major steps, theoretical base, and quantification. Their strengths and shortcomings are summarized based on previous reviews (Bell & Holroyd, 2009; Chandler et al., 2006; Spurgin, 2010) and the International HRA Empirical Study and US Empirical Study (Bye et al., 2010; Liao et al., 2019b). A number of other important methods are not covered, due to spaces limits. 6.2.1 THERP The Technique for Human Error Rate Prediction (THERP) was first introduced in 1962 in a Human Factors Society symposium, by human factor specialist Alan Swain (Boring, 2012). Later THERP was refined for nuclear power applications. In 1983, it was published as a handbook in NUREG/CR-1278 (Swain & Guttmann, 1983) sponsored by the US NRC. As the first HRA method, it mirrors the task analysis approach, with each element or sub-task being modeled as a reliability element (Spurgin, 2010). Then, the whole task will be composed of these elements or sub-tasks, which together are used to predict the reliability of operators performing the task. In nature, it is driven by a decomposition and a subsequent aggregation. To conduct an HRA through THERP, it has four stages and 12 steps, as shown in Figure 27. In the first phase (familiarization), the HRA analyst gathers plan-specific and event-specific information, through plant visits (Step 1) and by reviewing information from system analysts (Step 2). In the second phase (qualitative assessment), the HRA analyst performs a series of qualitative assessment activities. Talk- and walk-throughs (Step 3) are performed to determine the boundary conditions under which tasks are performed (e.g., time and skill requirements, information cues, and recovery factors). Task analysis (Step 4) is performed for the actions
Step 6: Assign nominal HEPs
Phase 1 Familiarization
Phase 2 Qualitative assessment
Step 1: Plant visit
Step 7: Estimate the effects of PSFs
Step 2: Review information from system analysts
Step 8: Assess dependence
Step 3: Conduct talk-and walk-through
Step 9: Determine success and failure probabilities
Step 4: Perform task analysis
Step 10: Determine the effects of recovery factors
Step 5: Develop HRA event trees
Step 11: Perform a sensitivity analysis, if warranted Step 12: Supply information to system analysts
Figure 27
THERP process.
Phase 3 Quantitative assessment
Phase 4 Incorporation
544
DESIGN FOR HEALTH, SAFETY, AND COMFORT
required of the operators in responding to each situation. In this step, operator actions are broken down into tasks and sub-tasks. The HRA analyst does not have to determine the motivation behind every operation action, but needs to identify the most likely operator activities. Then, the HRA analyst decides and identifies the possible human errors based on actual performance situations. The result of task analysis and error identification will be shown in an HRA event tree (Step 5). In this event tree, each relevant discrete sub-task, step, or activity is characterized by two limbs (or branches) representing either a failure or success event. Figure 28 shows such a sample. Its task is reading an analog meter, which is decomposed into two sub-tasks: selecting display and reading meter. Each sub-task has two outcomes: failure or success. Then, four possible events are specified, as depicted in Figure 28. The failure or success probability of the task is derived from the likelihood of the four events. In the third phase (quantitative assessment), five steps (from Step 6 to Step 10) are performed. In Step 6, nominal HEPs (NHEPs) are assigned to each of the branches corresponding to failure performance. For some tasks the NHEPs refer to joint HEPs (JHEPs) because it is the performance of a crew rather than an individual operator. These HEP data are derived from operational experiences, expert judgments, simulators, or laboratory experiments. THERP handbook provides tables for choosing these HEPs. Step 7 accounts for the influences of more specific individual, crew, environmental, and task-related factors, which are called performance-shaping factors (PSFs; more information about PSFs are given in Section 6.3), on reliable performance. That is, NHEPs are modified by the influences of these PSFs, resulting in basic HEPs (BHEPs). THERP handbook provides tables indicating the direction and extent of PSF influences (e.g., see its Table 18-1 for the modification due to stress and experience). When multiple PSFs exist, BHEPs are estimated as the produce of NHEPs multiplied by the weights of these multiple PSFs. In Step 8, a dependency model is incorporated, which considers positive dependencies existing between adjacent branches in the event tree, resulting in conditional HEPs (CHEPs). This model accounts for situations when the failure of one activity (A) can influence the error probability in the subsequent activity (B). THERP provides equations for modifying BHEPs (i.e., failure probability when activity B occurs alone) to CHEPs (i.e., conditional failure probability of B given the failure of activity A) based on the extent to which the HRA analyst thinks dependencies exist (see Figure 29). Five levels of dependency are: zero dependence (ZD), low dependence (LD), medium dependence (MD), high dependence (HD), and complete dependence (CD). For instance, let us assume the BHEP for an activity B is 0.01. Its CHEP given failure on activity A is calculated, which is 0.01 under a ZD condition,
Event a: Select correct display P(a) = 0.999
Event b: Read meter correctly P(b) = 0.997 S1
0.0595 under a LD condition, 0.1514 under a MD condition, 0.505 under a HD condition, and 1.0 under a CD condition. For small values of BHEP, CHEPs converge to 0.5, 0.15, and 0.05 for HD, MD, and LD conditions, respectively. The transfer from dependence levels to HEPs is based on expert judgments. To support the experts in assigning the dependence level, THERP suggests several guidelines (see THERP Table 10-1) to consider the following factors: closeness in time and space, functional relatedness (e.g., tasks related to the same subsystem), stress, and similarity of the performers. These guidelines cannot be used systematically as a basis for assigning the dependence level, leading the assignment to lack transparency and traceability (Podofillini et al., 2010). More discussions and reflections on dependence are given in Section 6.6.6. In Step 10, it considers the ways in which errors can be recovered. A recovery factor is any system element that acts to prevent deviant conditions from producing unwanted outcomes. Most recovery factors in NPPs are based on information via displays or other visible indications, direct observation of others’ work, or checking of that operator’s work (e.g., human redundancy). Similar to conventional event trees, these recovery paths can be represented in event trees in HRA. Recovery failure probabilities can be estimated through a similar way, as used to estimate HEP. More discussions and reflections on recovery are given in Section 6.6.7. In the fourth phase (incorporation), the HRA analyst deals with the use of the HRA results. Sensitivity analysis can be done to repeatedly examine the impact of the assumptions made in the analysis on the probability of system failures. These results can then be input into system risk assessments such as PRA. THERP was immediately used in the early 1980s. Until now, it is still the most-used method in NPP applications. It laid the foundations for HRA, including human failure events, task analysis, PSFs, HEPs, dependence, event trees, and recovery (Boring, 2012). THERP largely influenced its followers. It has two simplified descendants: Accident Sequence Evaluation Program (Swain, 1987) and Standardized Plant Analysis Risk-Human Reliability Method (Gertman et al., 2005). However, there are a number of criticisms in terms of its granularity of task description and HEP data (Kirwan, 2008). It was thought to be too “decompositional” in nature and its HEP database origins have never been published. Researchers from the human factors and ergonomics community were concerned about its “broad brush” treatment of psychological and human factors aspects (Kirwan, 2008). Significant resource requirements are needed to apply THERP in complex systems such as NPPs, which triggers the development of HRA methods requiring fewer resources (e.g., expert judgments). The International HRA Empirical Study (Bye et al., 2010) indicated its fair predictive power.
Event A: Select wrong display P(A) = 0.001
F1 Event B: Read Probability of success meter incorrectly P(S) = P(S ) = P(ab) = 0.996 1 P(B) = 0.003 F2
Probability of failure P(F) = P(F1 + F2) = P(A) + P(aB) = 0.004 also P(F) = 1 − P(S) = 0.004
Figure 28 An example of THERP event tree: determining the probability of misleading an analog meter. Its events are assumed to be independent.
HUMAN ERRORS AND HUMAN RELIABILITY
1
CD: CHEP = 1
Conditional Human Error Probability (CHEP) of B, Given Failure of A
1/2
HD: CHEP = (1+BHEP)/2
1/7 0.1
MD: CHEP = (1+6×BHEP)/7
1/20
LD: CHEP = (1+19×BHEP)/20
0.01
0.001 0.0001
545
ZD: CHEP = BHEP
0.001
0.01
0.1
1
Basic Human Error Probability (BHEP) of B Figure 29
THERP dependence model.
6.2.2 Time-Related HRA Models Time-related HRA models are based on time-reliability curves or time-reliability correlations (TRC), which focus on the time it takes for a crew or operator to complete a task in an emergency or accident. The well-known TRC model is the human cognitive reliability (HCR) model (Hannaman, Spurgin, & Lucki, 1984). HCR estimates the non-response probability (Pt ) by a crew to a problem within available time. Its major steps are as given below: Step 1. Identify actions to be analyzed using task analysis method. Step 2. Classify types of cognitive processing required by actions. Rasmussen (1983) described three levels of mental processing: skill-based, rule-based, and knowledgebased (SRK) levels. Activities at the skill-based level are highly practiced routines requiring little conscious attention. Activities at the rule-based level are performed according to a set of rules that have been established in the long-term memory based on past experience. At the knowledge-based level, the stored rules are not effective and the crew or operator should, for example, devise plans that involve hypothesis exploration and testing. Then heavily demanding resources will be required. A core assumption of HCR is that the three SRK levels can be distinguished by different TRC models. ∗ Step 3. Determine the nominal median response time (T1∕2 ). This nominal time is the time corresponding to a probability of 0.5 that the crew will successfully perform the required tasks under normal conditions. Step 4: Adjust the nominal median response time to account for the influence of PSFs. It is done by means of the PSF coefficients K1 (operator experience), K2 (stress), and K3 (quality of operator/plant interface), based on the flowing formula: ∗ (1 + K1 )(1 + K2 )(1 + K3 ) T1∕2 = T1∕2
Step 5. Estimate the non-response probability through a three-parameter Weibull distribution (Note: a lognormal distribution might be used elsewhere.) [ ] −(t∕T1∕2 ) − B C Pt = exp A where t = available time taken by the crew to complete actions following a stimulus; T1∕2 = estimated median time taken by the crew to complete the actions; t∕T1∕2 = “normalized time”; A, B, C = coefficients associated with the level of mental processing required, i.e., skill-, rule-, or knowledge-based levels. Several large-scale full-scope simulator studies have been conducted to validate HCR but failed as a whole. Its validity has been questioned (Adhikari et al., 2009). There is no base to connect Rasmussen’s SRK model (1983) with the three single curves in the HCR model. One of its developers, Anthony J. Spurgin, commented that “The HCR model formulation has significant issues and should not be used” (Spurgin, 2010, p. 97). 6.2.3 HEART, NARA, and CARA The Human Error Assessment and Reduction Technique (HEART), developed by Williams (1988), aims to assess tasks of a more holistic nature (against THERP) based on the assumption that human reliability is dependent on the generic nature of the tasks to be performed. Its development was based on the empirical findings from the human performance literature. Its key quantification steps are as below: Step 1. Classify a task under consideration into one of the nine generic task types (GTTs). Table 7 shows the taxonomy of GTTs in HEART. Step 2. Assign a nominal HEP to the task. The nominal HEP for these generic tasks and its 5th and 95th percentile bounds are given in Table 7.
546
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Table 7
Taxonomy of Generic Task Types in HEART Proposed nominal human unreliability (5th–95th percentile bounds)
Generic task type Totally unfamiliar, performed at speed with no real idea of likely consequences Shift or restore system to a new or original state on a single attempt without supervision or procedures Complex task requiring high level of comprehension and skill Fairly simple task performed rapidly or given scant attention Routine, highly-practiced, rapid task involving relatively low level of skill Restore or shift a system to original or new state following procedures, with some checking Completely familiar, well-designed, highly practiced, routine task occurring several times per hour, performed to highest possible standards by highly-motivated, highly-trained and experienced person, totally aware of implications of failure, with time to correct potential error, but without the benefit of significant job aids Respond correctly to system command even when there is an augmented or automated supervisory system providing accurate interpretation of system stage Miscellaneous task for which no description can be found
0.55 (0.35–0.97) 0.26 (0.14–0.42) 0.16 (0.12–0.28) 0.09 (0.06–0.13) 0.02 (0.007–0.045) 0.003 (0.0008–0.007) 0.0004 (0.00008–0.009)
0.00002 (0.000006–0.009) 0.03 (0.008–0.11)
Source: Based on Williams, 1988.
Step 3. Identify which error-producing conditions (EPCs; i.e., PSFs) may affect task reliability. There is a range of EPCs to choose (n = 38). EPCs are assumed to have generally consistent effects on human reliability across different conditions. For each EPC there is a maximum amount by which nominal HEP can be multiplied, which was based on an extensive analysis of the human performance literature (Williams, 1988). For instance, the maximum effect for “unfamiliarity with a situation which is potentially important but which only occurs infrequently or which is novel” is 17. Deciding which EPC has to be applied is important in HEART and based on the analyst’s earlier qualitative analysis. Step 4. Define the extent of the negative influence for each identified EPC. It is the proportion of the maximum effect the EPC can have, termed the “assessed proportion of affect” (APOA). The maximum effect of the EPC and its APOA are similar to the “weight” and “rating” of PSFs, respectively. The assessment of APOA is highly judgmental, without guidance given in HEART. It has a numerical value between 0.1 (a very weak affect) and 1 (a full affect). Step 5. Calculate the final HEP. The weighting factor (WF) for the EPC and HEP are calculated as given below: WF i = [(EPCi − 1) × APOAi − 1] ∏ WF i HEP = GT HEP × where EPCi is the maximum effect for iEPC; APOAi is the assessed proportion of affect of the iEPC for the task; WF i is the weighting factor of iEPC to the task; GT HEP is the nominal HEP in the generic task that the task fits; and HEP is the HEP in the task. The Nuclear Action Reliability Assessment (NARA; Gibson et al., 2008) was modified along the lines of HEART to meet the demand of the UK nuclear industry. Their differences lie in
three areas. NARA uses the Computerized Operator Reliability and Error database (CORE-DATA) for HEP values in GTTs, the substitution of the HEART GTTs with a set of NARA tasks (n = 18), and the incorporation of a human performance limit value (HPLV) when multiple HEPs occur together. The developers modified HEP data based on the CORE-DATA (see Section 6.4) which contains data collected from nuclear, offshore, manufacturing, railway, chemical, and aviation (Kirwan, Basra, et al., 1997). NARA identifies four groups of GTTs: A, task execution; B, ensuring correct plant status and availability of plant resources; C, alarm/indication response; and D, communication. The tasks within these GTT groups are often linked with each other. For instance, in the crew response to an accident, the first task may be in response to alarm (type C); after a type C, the crew would discuss the situation (type D); before a type A task, the crew checks the availability of systems and components (type B). NARA appears to lie between HEART and THERP. These three methods focus on the task and one of their differences is how the task is broken down. NARA tasks can come to define more complex tasks. Each task within each GTT group has an associated HEP value, and EPCs are used, similar to HEART, to adjust the nominal HEP values. For instance, its GTT A1 is “Carry out simple manual action with feedback. Skill-based and therefore not necessarily procedure,” with a nominal HEP as 0.005 (0.002–0.01) (Gibson et al., 2008; Kirwan, 2008). NARA provides guidance in determining the appropriate APOA. Similarly, the Controller Action Reliability Assessment (CARA; Gibson & Kirwan, 2008) is tailored to the context of air traffic management in aviation. HEART and its modifications were also applied in sectors like cybersecurity (Evans et al., 2019) and medical (Onofrio & Trucco, 2020), to name a few. Spurgin (2010) evaluated the strengths and weaknesses of HEART and NARA. He evaluated that HEART is appreciated by engineers, fairly easy to apply, and has reasonable documentation. However, the selection of generic tasks is not easy, and their descriptions are vague. Some of the 38 EPCs (e.g., the 38th EPC on age) do not apply in the PSA/HRA field. Spurgin also argued that HEP data is questionable. Note that in a
HUMAN ERRORS AND HUMAN RELIABILITY
547
recent review, Bell and Williams (2018) concluded that the nominal and limit values of HEP in these HEART GTTs are similar to those derived from the human performance literature of the last 30 years (since HEART). While evaluating NARA, Spurgin (2010) appreciated its strengths (e.g., relatively easy to apply, improvement in HEP data, reasonable documentation), but criticized that its selection of GTTs is not easy, its HEP data is still questioned, and the selection of AOPA is highly dependent on expert judgments. Regarding HEART, the International HRA Empirical Study (Bye et al., 2010) indicated its lack of guidance in identifying GTTs applicable to human failure events and lack of guidance in assessing APOA of the EPCs applicable to human failure events, difficulties in modeling complex human failure events, and no means to model interactions between negative EPCs and modeling the mitigating effects of positive EPCs. 6.2.4 SLIM The Success Likelihood Index Methodology (SLIM) is an expert-based method (Embrey, 1986b). Its underlying assumptions are that: (1) HEP of a task depends on the combined effect of a set of PSFs; and (2) these PSFs can be identified and appropriately evaluated through expert judgments (usually, domain expert judgments). Its major steps are as below: Step 1. Derive a set of PSFs that would influence HEP in situations or tasks of interest. This set should account for the major part of the variability of success likelihood and its PSFs should be independent of each other. Step 2. Rank the PSFs in order of importance and list them starting from the most (or the least) important. Step 3. Weight each PSF (wi ). The largest weight is assigned N ∑ to the most important PSF, and so on; where wi = 1, i=1
N = the number of PSFs. Step 4. Rate the “good” or “bad” condition of PSFs (ri ) in the tasks through a linear scale from 1 to 9 (e.g., 1 = “high time pressure” and 9 = “low time pressure”). A high rating means a better condition. Step 5. Calculate the success likelihood index (SLI) as SLI =
N ∑
wi ri
i=1
Step 6. Convert the SLIs into HEPs by a calibrated equation as log(HEP) = aSLI + b Step 7. Calculate uncertain bounds and examine interjudgment consistency. The calibration can be achieved when two or more reference tasks are available and their absolute HEPs (i.e., anchor HEP values) are known. When their HEPs are unavailable, expert judgments have to be used to extract their HEPs. SLIM is the first HRA method based on deriving HEPs from context and fairly easy to apply (Spurgin, 2010). However, it lacks guidance for the selection of PSFs which are required to be independent between each other, with problems in determining the relative importance and difficulties in selecting appropriate domain experts and appropriate anchor values (Spurgin, 2010). Certain of these weaknesses can be alleviated by other following works. For instance, SLIM has been incorporated with the analytic hierarchy process (AHP; Saaty & Vargas, 2006) to better determine the relative importance of PSFs (Park & Lee, 2008). Considering that AHP cannot deal with the dependence problem
associated with a number of PSFs, the analytic network process (ANP; Saaty & Vargas, 2006) was combined into SLIM, which is considered able to deal with the PSF dependence while determining the relative importance of the PSFs (De Ambroggi & Trucco, 2011; Kyriakidis, Majumdar, et al., 2018). In addition, SLIM should be validated based on real anchor values. 6.2.5 SPAR-H and Petro-HRA The Standardized Plant Analysis Risk-Human Reliability Method (SPAR-H; Gertman et al., 2005) is a relatively simple HRA quantification method for estimating HEPs in support of plant-specific PRA models. Its major steps are as given below: Step 1: Categorize the human failure event (HFE) as diagnosis and/or action. Its underlying psychological basis is the human performance model. More specifically, analysts decide whether a task involves diagnosis (cognitive processing), action (execution), or both. The default modeling in SPAR-H should include both. The nominal HEP (NHEP) is 0.01 for a diagnosis task and 0.001 for an action task. Step 2: Rate the PSFs. Eight PSFs are considered: available time, stress/stressors, complexity, experience/training, procedures, ergonomics/human-machine interface, fitness for duty, and work processes. Each of them is specified with several levels, including the nominal level. Associated with each of these levels is a multiplier (i.e., weight) that determines the extent of the negative or positive impact of this PSF on HEP. Certain multipliers are dependent on whether the activity is diagnosis or action. Take experience/training PSF for an example. In determining the level of this PSF, years of experience of the operator/crew, and whether or not the operator/crew has been trained on the type of the accident, the amount of time passed since training, and whether or not the scenario is novel or unique should be considered. It has four levels: • Low level (multiplier for diagnosis = 10 and for action = 3): less than 6 months experience and/or training. This level of experience/training does not provide the level of knowledge and the deep understanding required to adequately perform the required tasks; does not provide adequate practice in those tasks; or does not expose individuals to various abnormal conditions. • Nominal level (multiplier = 1 for both): more than 6 months experience and/or training. This level of experience/training provides an adequate amount of formal schooling and instruction to ensure that individuals are proficient in day-to-day operations and have been exposed to abnormal conditions. • High level (multiplier = 0.5 for both): extensive experience; a demonstrated master. This level of experience/training provides operators with extensive knowledge and practice in a wide range of potential scenarios. Good training makes operators well prepared for possible situations. • Insufficient information (multiplier = 1 for both): if analysts do not have sufficient information to choose among the other alternatives, assign this PSF level. Step 3: Calculate the PSF-modified HEP. The final HEP values are arrived at by multiplying the NHEP by the product of the multipliers of PSFs. In cases where the modified HEP exceeds 1.0, researchers use a mathematical modification of NHEP. The modification is not formulated from a psychological viewpoint nor from a theory; it uses a
548
DESIGN FOR HEALTH, SAFETY, AND COMFORT
mathematical solution to just solve the problem wherein the modified HEP exceeds 1.0, as given below: NHEP ∗ HEP = NHEP ∗
(8 ∏ i=1
8 ∏ i=1
PSF i
) PSF i − 1 + 1
where PSF i : multiplier of the ith PSF; NHEP: nominal HEP of a task (diagnosis or action). Step 4: Account for dependence and calculate conditional HEP. SPAR-H uses the five THERP dependence levels (see Figure 29) and considers four factors to determine the dependence level. The four factors are crews (same/different), time (close in time/not close in time), location (same/different), cues (additional cues/no additional cues). They construct a dependence matrix that yields 16 dependence rules mapping four dependence levels (from low to complete); a 17th rule is used to account for no dependence. More about dependence is given in Section 6.6.6. Step 5: Address the uncertainty. SPAR-H uses a “constrained non-informative prior” (CNI) distribution (Atwood, 1996). An approximate distribution is determined based on this distribution, which requires two parameters, 𝛼 and 𝛽. Atwood (1996) provides a table of 𝛼 parameters (as a function of mean HEP). The 𝛽 parameter is equal to 𝛼 (1− HEP) / HEP. Once the two parameters are known, there are ways to compute the 5th, 95th, or any percentile desired for the HEP, e.g., through the command of BETAINV (0.05, 𝛼, 𝛽) in Microsoft’s EXCEL. Due to its simplicity, it has been used in many PSA/HRA applications. However, this ease of use may be misleading, because in practice there can be great complexities in performing the underlying qualitative analysis, which is the premise for performing SPAR-H (Bye et al., 2010). SPAR-H lacks guidance for qualitative analysis guidance, for PSF ratings (no clear indicators for determining the PSF levels and the existence of overlap among specific PSFs), and for dependence. More criticisms can be found elsewhere (Bye et al., 2010; Laumann & Rasmussen, 2016; Liu, Qiu, et al., 2020). Petro-HRA uses SPAR-H as the basis for the quantitative model and adjusts it to the petroleum industry (Bye et al., 2017; Taylor, Øie et al., 2020b). Compared to SPAR-H (a pure quantification method), Petro-HRA is expected to be a complete method. It offers guidance on how to perform the necessary qualitative analysis, including scenario definition, qualitative data collection, task analysis, and human error identification, modeling, and reduction. In terms of its quantification, PetroHRA has several major differences. First, it does not include the separation between diagnosis and action as its researchers consider that all tasks are a combination of both. NHEP for all tasks modeled in Petro-HRA is 0.01. Second, it has significant changes in the definition of certain PSFs (e.g., work processes, fitness for duty) and the levels and associated multipliers of all PSFs. The analyst has to evaluate whether a PSF has an effect on the performance of the operator for the given task. The extent of the effect determines the multiplier. The levels of effect on operator performance (from the most positive to the most negative) are moderate positive (multiplier = 0.1), low positive effect (multiplier = 0.5), nominal (multiplier = 1), very low negative (multiplier = 2), low negative (multiplier = 5), moderate negative (multiplier = 10-15), high negative (multiplier = 20-25), very high negative (multiplier = 50), and extremely high negative (HEP = 1). Level descriptions are offered.
No empirical base for determining the multipliers is offered in its guideline. Taylor, Øie et al. (2020b) documented certain lessons learned in applications of this method in the petroleum industry. The qualitative and quantitative predictive power of this ongoing method in the petroleum context should be further validated. 6.2.6 HDT The Holistic Decision Tree (HDT) method, developed by Spurgin (2010), is another context-related method. It combines a tree structure with anchor values to determine the end-state HEPs for a particular accident scenario. Its quantification part has some connection with SLIM. It assumes, like SLIM, a loglinear relationship between HEP and PSFs. Spurgin (2010) demonstrated this method in International Space Station (ISS) accident scenarios. We summarize its major steps in the ISS case, as below. Step 1. Draw up a set of the potential influence factors (IFs; i.e., PSFs). Step 2. Sort the list into scenario-specific and global IFs. HEPs are affected by either scenario (local effect) or global influence. The global IFs will be found in every scenario. In the ISS application, all IFs were global, and the same ones were used for all scenarios considered. Step 3. Rank IFs in order of importance and select the most important ones. In the ISS study, six IFs were considered to be most important: communication, procedure, training, man-machine interface, workload, and command, control & decision making (CC&DM). Their explicit definitions were given by Spurgin (2010). Step 4. Select and define the quality factor (QF) levels to be used for each IF and draw the decision tree. For instance, for the CC&DM IF, three QF levels are “efficient,” “adequate,” and “deficient.” These levels are defined explicitly. For instance, the “adequate” level CC&DM refers to the “collaboration among team members, roles and responsibilities, and decision making are needed to identify the problems and perform all required tasks neither encourages nor interferes with the expeditious return to the desired state of the system” and examples of this level include “defined lines of authority, but some uncertainty” and “established protocol, but not experienced leadership” (Spurgin, 2010). Step 5. Estimate the importance weights of each IF. AHP was used in the ISS study. Step 6. Determine the upper and lower anchor values for HEP. The upper anchor value can be assumed to be 1.0, which suggests that if all IFs are poor, it is certain that the operators will fail. The lower anchor value can vary and its uncertainty can be included in the analysis. It corresponds to the context being the most favorable in that all IFs are at their best values. In the ISS case, each scenario has been assigned a different distribution for the lower anchor value. For most scenarios, the 5th percentile of this anchor value is set at 1.0E−4 and the 95th percentile at 1.0E−3. Step 7. Using the QF levels, rate each IF for each scenario. In this rating, a quality value (QV) of 1 (e.g., “good”), 3 (e.g., “fair”), or 9 (e.g., “poor”) is assigned. Thus, a factor of 3 is used to represent the transitions from good to fair and fair to poor. Step 8. Determine the incremental changes in HEP associated with each QF level in a decision tree. Figure 30 provides a portion of the decision tree for a coolant loop leak scenario. The HEPs in the end-state of the decision
HUMAN ERRORS AND HUMAN RELIABILITY
549
HDT for ISS PRA Scenario
Coolant loop leak
om m un ic at Pr io oc ns ed ur es Tr ai ni ng M an in –m te ac rfa h ce ine W or kl oa d C & om de m ci an si d, on c W o ei gh ma ntr ki ol te n d su g H m EP s R es ul t
3
C
Quality factors
IF weights====>
0.19
0.21
0.16
0.17
0.13
0.14
1
1
1
1
1
1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3 9 1 3
Enter
3
9
3
1
3
9 9
1
3
9
3
1
1
3
9
3
1
3
9 9
1
3
9 9
1
1
1.00E+00 1.28E+00 2.12E+00 1.26E+00 1.54E+00 2.83E+00 2.04E+00 2.32E+00 3.16E+00 1.34E+00 1.62E+00 2.46E+00 1.60E+00 1.88E+00 2.72E+00 2.38E+00 2.66E+00 3.50E+00 2.36E+00 2.64E+00 3.48E+00 2.62E+00 2.90E+00 3.74E+00 3.40E+00 3.68E+00 4.52E+00 1.32E+00 1.60E+00 2.44E+00 1.58E+00 1.86E+00 2.70E+00 2.36E+00 2.64E+00 3.48E+00 1.66E+00 1.94E+00 2.78E+00 1.92E+00 2.20E+00 3.04E+00 2.70E+00 2.98E+00 3.82E+00 2.68E+00 2.96E+00 3.80E+00 2.94E+00 3.22E+00 4.06E+00 3.72E+00 4.00E+00 4.84E+00 2.28E+00 2.56E+00
4.04E-04 4.90E-04 8.73E-04 4.83E-04 5.86E-04 1.04E-03 8.26E-04 1.00E-03 1.79E-03 5.10E-04 6.19E-04 1.10E-03 6.10E-04 7.40E-04 1.32E-03 1.04E-03 1.27E-03 2.26E-03 1.03E-03 1.25E-03 2.23E-03 1.23E-03 1.49E-03 2.67E-03 2.11E-03 2.56E-03 4.56E-03 5.03E-04 6.10E-04 1.09E-03 6.02E-04 7.30E-04 1.30E-03 1.03E-03 1.25E-03 2.23E-03 6.36E-04 7.71E-04 1.38E-03 7.61E-04 9.23E-04 1.65E-03 1.30E-03 1.58E-03 2.82E-03 1.28E-03 1.56E-03 2.78E-03 1.54E-03 1.86E-03 3.32E-03 2.63E-03 3.19E-03 5.69E-03 9.75E-04 1.18E-03
IF 1(Com) IF2 (Proc) IF 3 (Trng) IF 4 (MMI) IF5 (Work)
Rating==> Good Fair Poor
G 1 3 9
F 1 3 9
F 1 3 9
F 1 3 9
F 1 3 9
The order of IF in the tree does not matter in this model because using path-dependent factors. Indicates input cell Low HEP anchor High HEP anchor
4.04E-04 1.00E+00
The IF weight sum to unity. They are used as weighting factors on The quality factors are a factor that measures the increase or decre A quality factor larger than 1 indicates an increase in HEP. Model is normalized such that if all influence factors are “good,” then HEP is achieved, and if all influence factors are “poor,” then the maxi HEP is achieved. Legend low= high= lowws= highws=
Low HEP anchor High HEPanchor Lowest weighted sum Highest weighted sum
Mark scenario 2 in OR Mark scenario 3 in GR Mark scenario 4 in GR Mark scenario 5 in BLI
Figure 30 Representation of a portion of a holistic decision tree for a coolant loop leak scenario. (Source: Spurgin, 2010. © 2010 Taylor & Francis.)
tree are computed with the aid of a Microsoft Excel spreadsheet based on the relationship between the IF importance weights, the QVs, and the anchor values. The HEPs are derived based the following equation: ( ln(HEPi ) = ln(HEPl ) + ln Si =
N ∑ J=1
(QV j )Ij
HEPh HEPl
)( (
Si − Sl Sh − Sl
N ∑ J=1
) )
Ij = 1
where HEPi is the HEP of the ith pathway through the HDT; HEPl is the low HEP anchor value; HEPh is the high HEP anchor value; Sl is the lowest possible value of Si (=1 in the current formulation); Sh is the highest possible value of Si (=9 in the current formulation); QV j is the quality value (1, 3, or 9 in the current formulation); and Ij is the importance weight for the jth IF. Spurgin (2010) himself evaluated that the strengths of HDT include: (1) it deals with a crew’s whole response to an accident; (2) it focuses on the context of the accident; (3) it is easy to
550
DESIGN FOR HEALTH, SAFETY, AND COMFORT
understand; and (4) it indicates clearly which parameters can be changed in order to improve crew reliability. Its shortcomings include: (1) it highly relies on expert judgments to ascertain; (2) it does not explain detailed failures associated with tasks and subtasks; (3) it ignores the dependence in the set of IFs; and (4) it relies on expert judgment to obtain the anchor values. 6.2.7 CREAM The Cognitive Reliability Error Analysis Method (CREAM) was developed by Hollnagel (1998). It assumes that the context in which a task is performed, rather than the inherent properties of the task itself, is more likely to determine the human performance in the task. It has two methods for estimating HEPs: basic and extended. Basic CREAM estimates a general action failure probability and extended CREAM estimates a specific action failure probability. In both methods, the estimation of HEPs is based on assessed PSFs that are also called common performance conditions (CPCs). In basic CREAM, the association between four contextual control models and various combinations of nine CPCs is built. Each control model has a specific HEP interval. Compared with basic CREAM, extended CREAM is expected to offer more accurate HEP predictions and thus is our focus. Its major steps are as given below: Step 1. Build a cognitive demand profile. It indicates the required cognitive activities in a task as a whole. Critical cognitive activities include coordinate, communicate, compare, diagnose, evaluate, execute, identify, maintain, monitor, observe, plan, record, regulate, scan, and verify. Each cognitive activity can be described in terms of which combination of four cognitive functions (observation, interpretation, planning, and execution) that it requires. For instance, coordination involves planning and execution. According to each cognitive activity and their associated cognitive functions, a cognitive demands profile can be built by counting the number of times each of the four cognitive functions involves. Step 2. Identify possible cognitive failures. As shown in Table 8, 13 generic, important cognitive function failures are defined as related to the four cognitive functions in the associated cognitive model. In this step, analysts are required to assess the likely failures than can occur for each task segment in a task at hand based on their Table 8
knowledge of the task and of the CPCs under which the task is being performed. Step 3. Determine the nominal cognitive failure probability (CFP; similar to HEP) for each of the likely cognitive function failures. The nominal CFPs are based on a variety of sources, including THERP (Swain & Guttmann, 1983) and HEART (Williams, 1988). For instance, for the observation error “wrong object observed” (O1), its nominal CFP is defined as 1.0E-3, with 3.0E-4 as its lower bound and 3.0E-3 as its upper bound. Step 4. Adjust the nominal CFPs based on the couplings between the nine CPCs and the four cognitive functions. The nine CPCs are: adequacy of organization, working conditions, adequacy of man–machine interface and operational support, availability of procedures/plans, number of simultaneous goals, available time, time of day (circadian rhythm), adequacy of training and experience, and crew collaboration quality. Each CPC has different influences on the cognitive functions. For instance, the CPC “availability of procedures/plants” is expected to have a strong influence on “planning” but a weak influence on “interpretation.” Hollnagel (1998) assigned weighting factors to each of the four cognitive functions for each CPC level. For example, consider the influence of the “working conditions” CPC on the cognitive function failure of “wrong identification” (O2). Its nominal CFP is 0.07. The weighting factors of the three levels (advantageous, compatible, and incompatible) for the “working conditions” CPC used to modify “wrong identification” are 0.8, 1.0, and 2.0. Thus, if the level of the “working conditions” CPC is incompatible, then the resulting CFP related to “wrong identification” is 0.14 (with lower and upper and lower bounds adjusted accordingly). In cases when multiple CPCs exist, the nominal CFP is determined by the product of all weighting factors of the CPCs. Step 5. Finally, the adjusted CFP values are incorporated into event trees in PSA, which is beyond this chapter. CREAM is somewhat similar to HEART, NARA, and CARA (Spurgin, 2010), in that they consider a nominal HEP, corresponding to a task or cognitive action, and then modify the value by a set of multipliers. One difference between them is that CREAM considers both positive and negative effects of CPCs.
Generic Cognitive Function Failures
Cognitive function Observation errors
Interpretation errors
Planning errors Execution errors
Potential cognitive function failure O1 O2 O3 I1 I2 I3 P1 P2 E1 E2 E3 E4 E5
Source: Based on Hollnagel, 1998.
Observation of wrong object. A response is given to the wrong stimulus or event. Wrong identification made (e.g., due to a mistaken cue or partial identification). Observation not made (i.e., omission), overlooking a signal or a measurement. Faulty diagnosis, either a wrong diagnosis or an incomplete diagnosis. Decision error, either not making a decision or making a wrong or incomplete decision. Delayed interpretation (i.e., not made in time). Priority error, as in selecting the wrong goal (intention) Inadequate plan formulated, when the plan is either incomplete or directly wrong Execution of wrong type performed, with regard to force, distance, speed or direction Action performed at wrong time, either too early or too late Action on wrong object (neighbor, similar or unrelated) Action performed out of sequence, such as repetitions, jumps, and reversals. Action missed, not performed (i.e., omission), including the omission of the last actions in a series (“undershoot”).
HUMAN ERRORS AND HUMAN RELIABILITY
CREAM may have the largest number of modifications across nuclear power (He et al., 2008), health care (Zheng et al., 2020), space flight (Calhoun et al., 2014), railways (Sun et al., 2020), and maritime transportation (Yang et al., 2013), to name a few. Its validity was examined in the International HRA Empirical Study (Bye et al., 2010). It was found that the predictive power of the extended CREAM was fair, because its analyst only identified a single negative CPC and that it might be not sensitive for predicting and quantifying negative CPCs on performance. Thus, it will produce similar predictions across different scenarios (see Figure 33 in Section 6.5.3). The assignment of cognitive function failure types, which determine nominal HEPs, is subjective, and the process of determining the dominant failure type is complicated. This validation study also indicated that the effect of the CPC multipliers on increasing or decreasing the nominal HEP may be negligible in CREAM. 6.2.8 IDHEAS and IDHEAS-G The Integrated Human Event Analysis System (IDHEAS) was sponsored by the US NRC (Xing et al., 2017), called “IDHEAS At-Power Application.” Furthermore, this method was updated as the General Methodology of an Integrated Human Event Analysis System (IDHEAS-G) ( Xing & Chang, 2018; Xing, Chang, & DeJesus, 2020). Due to space limits in this chapter, our focus is on ongoing IDHEAS-G, which uses the cognitive model as the basis to analyze an event scenario, model important human actions, and quantify HEPs. Note that the IDHEAS-G report is not finalized yet, meaning that the following overviews of its general process and main steps might have certain changes in the future. Step 1. Scenario analysis. It aims to identify event context (e.g., system, personal, and task event) and develop the operational narrative (e.g., initial conditions, initiating events, and consequences of interest) for the event scenarios. Step 2. Identify and define important human actions pertinent to the mission of the event. Step 3. Task analysis. It aims to analyze tasks required for the human action and characterize the critical tasks for HEP quantification. HEP quantification of an HEA is performed on its critical tasks. Step 4. Identify applicable cognitive failure modes (CFMs) of the critical tasks. Macro-cognitive functions are used to model the cognitive process of performance critical tasks. The five functions are Detection, Understanding, Decision Making, Action Execution, and Teamwork (note: it may have a different taxonomy for macro-cognitive functions elsewhere). The failures of the macro-cognitive functions are used as CFMs to represent various task failures. The failure of a critical task can be represented by one or several CFMs. IDHEAS-G uses a three-level structure (high, middle, and detailed levels) to organize the CFMs. For instance, the high-level CFM for detection is the failure to detect cues/information and the middle-level CFMs for detection are D1 “Fail to establish the correct mental model or to initiate detection,” D2 “Fail to select, identify, or attend to sources of information,” D3 “Incorrectly perceive or classify information,” D4 “Fail to verify perceived information,” and D5 “Fail to retain, record, or communicate the acquired information.” For each middle-level CFMs, there are several detailed CFMs, e.g., the detailed CFMs for D2-1 “Fail to access the source of information” and D2-2 “Attend to wrong source of information.”
551
Step 5. Assess the relevant PIFs (i.e., PSFs). PIFs are classified into three categories and each category has high-level PIFs describing specific aspects of systems, tasks, and personnel. System-related PIFs include the availability and reliability of systems and instrument and controls, environmental factors, work location accessibility and habitability, and tools and equipment. Task-related PIFs include information availability and reliability, scenario familiarity, multi-tasking, interruptions and distractions, cognitive complexity, mental fatigue and stress, and physical demands. Personnel-related PIFs include human–system interface, staffing, training, procedures/guidelines/instructions, teamwork factors, and work process. Note that this PIF taxonomy is not finalized yet. Stage 6. Estimate the HEP. Unlike previous methods, IDHEAS-G models two kinds of HEP: the error probability attributed to uncertainties in time available and time needed for the action (Pt ) and the error probability attributed to the CFMs of all critical tasks (Pc ). That is, Pt characterizes the feasibility of the action (see Section 6.6.3 for more information on Pt ). Pc is modeled as the function of various PIFs. Usually, there is a lack of objective HEP data to determine the weights of the combination of PIF states. Thus, IDHEAS At-Power Application (Xing et al., 2017) used expert judgment to estimate CFMs under different combinations of PIFs, based on a structured expert judgment elicitation process. In the quantification model of CFMs, its developers found that three PIFs (information availability and reliability, task complexity, and scenario familiarity), referred to as the “base PIFs,” can result in a HEP that varies from a minimal value to 1. Other PIFs are termed “modification PIFs,” which typically modify base HEPs with a weight factor. Thus, the base HEP is determined by the states of the base PIFs, which are then modified by other PIFs, to get the final HEP. The quantification model is described as given below: • Model PIF states. Each PIF has a base state and several poor states. For each PIF, a base state is defined in IDHEAS-G, meaning that the PIF has no observable impact on HEPs. Several specific attributes (i.e., indicators) of each PIF describe its poor states that increase HEPs. For instance, regarding the PIF of “information availability and reliability,” its attributes can include “inadequate updates of information,” “information of different sources is not synchronized,” “conflicts in information,” “information is misleading or wrong,” etc. Each PIF attribute increases the likelihood of one or several CFMs. Each PIF attribute can define a PIF state. Or, PIF states can be simplified into several discrete states by grouping the attributes according to their impacts on CFMs. For instance, these PIF attributes can be grouped into three states: low, moderate, and high impact. • Model the impact of PIF states on CFMs as below: wi =
ERPIF − ERPIF Base ERPIF Base
where ERPIF is the human error rate at the given PIF state and ERPIF Base is the human error rate at the base state of the PIF. • Estimate base HEPs for every CFM. The base HEPs for the poor states of the three base PIFs are obtained through the human performance literature or structured expert judgments.
552
DESIGN FOR HEALTH, SAFETY, AND COMFORT
• Calculate the HEP of a CFM for a given set of PIF states (PCFM ) as given below: ( PCFM = PCFM
Base
×
1+
n ∑ i=1
) wi
×C×
1 Re
where PCFM Base is the base HEP of a CFM for the given states of the three base PIFs; wi is the PIF weight for the given state of the modification PIFs; C accounts for the interaction between PIFs, and it is set to 1 for linear combinations of PIF weights, unless there is empirical data suggesting otherwise; Re accounts for the potential recovery from failure of a task, and it is set to 1 unless there is empirical data suggesting otherwise. Step 7. Document the uncertainties in the event analysis and assess the dependence between important human actions. The current version of IDHEAS-G does not make new developments in the treatment of dependency. Information about dependency is given in Section 6.6.6. This complex method is expected to highly enhance the cognitive base (e.g., a complete set of CFMs, the clear association between CFMs and PIFs) for HRA, as compared to other HRA methods. However, several difficulties and challenges might influence its development and application. It would be challenging to identify critical tasks, their associated macro-cognitive functions (probably, more than one function for each task), and the associated CFMs for each macro-cognitive function. Modeling the detailed CFMs seems to be subjective. For each CFM, its developers have built its decision tree to consider the different likelihoods of the CFM under different combinations of specific PIFs, based on their expert judgments. Validating the various decisions trees will be difficult. In the quantification model, it would have insufficient data to estimate Pt . When estimating Pc , IDHEAS-G does not provide the numeric values of the base HEPs for every CFM at the poor states of the three base PIFs and the weights for the given state of modification PIFs. The complexity of this method could weaken HRA analysts’ and engineers’ willingness to use this method. The future of IDHEAS is uncertain. 6.3 Performance-Shaping Factors Human reliability is influenced by multiple PSFs, which are also called PIFs in IDHEAS, EPCs in HEART, and CPCs in CREAM. Any factor that, negatively or positively, influences human performance is designated as a PSF (Swain & Guttmann, 1983). Usually, we focus on their negative influences. Swain and Guttmann (1983) argued that PSFs are major determinants of HEPs and thus suggested HEP quantification centering on combinations of PSFs in the first HRA method—THERP. Following this tradition, HEP quantification relies highly on PSFs in other HRA methods. There are at least three types of PSF-based HRA methods (P. Liu et al., 2016; P. Liu et al., 2017): • First, PSF multipliers are used to modify a task’s nominal HEP to obtain its final HEP; see SPAR-H (Gertman et al., 2005) and HEART (Williams, 1988). • Second, the relative importance of PSFs is obtained to build a predictive function between PSFs and HEPs; see SLIM (Embrey et al., 1984) and HDT (Spurgin, 2010). • Third, HEPs are directly estimated in various combinations of PSFs, mainly through expert judgments or other techniques; see IDHEAS (Xing et al., 2017) and a technique for human error analysis (ATHEANA) (Cooper et al., 1996).
Generally, those methods ignore or do not explicitly consider failure mechanisms (e.g., descriptions of why and how a human failure event could occur because of these PSFs). Our knowledge of PSFs is critically important for accurate HEP quantification. PSFs are a heavily researched area in HRA or other human performance fields. Next, we review several issues relevant to PSFs. 6.3.1 Conceptualization and Identification The first issue is which PSFs should be included in HRA, which involves the conceptualization and identification of PSFs in a system of interest. PSFs encompass a wide range of factors from the HSI, procedural guidance, to training, experience, and organizational safety culture. Swain and Guttmann (1983) classified PSFs into three types in their THERP: external PSFs (e.g., work environment), internal PSFs (e.g., individual characteristics of operators, including skills and motivations), and psychological and physiological stressors. This taxonomy model appears to be coarse. PSF taxonomies are the subject of considerable academic research and debate over the years. PSFs can be conceptualized and identified based on detailed retrospective analyses of accidents and incidents (e.g., Kyriakidis et al., 2015), knowledge and domain expert judgments (e.g., Onofrio & Trucco, 2020), literature review (e.g., P. Liu et al., 2016), or their combinations. PSF taxonomies are often created based on data and research specific to domains such as nuclear, aviation, and healthcare sectors. They can be modified for different sectors. A major difference between these sectoral and generic PSF taxonomies is the level of detail. The latter taxonomies tend to be defined and described at a high, abstract level, where they are involved with the basic elements of any system, e.g., operators, HSI, and organization. The sectoral taxonomies tend to be defined at a lower, specific level, consisting of more contingent and detailed characteristics of a specific system or domain. Recommended PSFs in nuclear power are documented in certain existing guidance and standards including the ASME/ANS standard (ASME, 2013), IAEA-TECDOC-1804 (IAEA, 2016), and NUREG-1792 Good Practices for Implementing HRA (Kolaczkowski et al., 2005). HRA methods are equipped with their unique PSF taxonomies to characterize the context of human tasks (see Section 6.2). Next, we highlight several recent efforts made to model PSFs and improve our understanding of what may influence human performance. One similarity is that they have developed or conceptualized PSFs in a hierarchy or systematic manner. Groth and Mosleh (2012) proposed a set of principles to serve as guidelines for developing PSF sets: (1) analysis should consider only the PSFs that directly influence operator performance; (2) PSFs must be defined orthogonally (i.e., they must be separately defined entities); and (3) PSFs should be “value neutral” with the ability to expand in characterizing context. Based on these guidelines, Groth and Mosleh (2012) developed a PSF hierarchy used for qualitative and quantitative HRA. PSFs in this hierarchy were extracted by aggregating information on existing HRA methods, expert workshops, and retrospective data. The top level of the hierarchy contains five categories: organization-based, team-based, person-based, situation/stressor-based, and machine-based. Each category has its own sub-categories. Ekanem et al. (2016) built a similar PSF taxonomy. The PSFs are divided into groups and classified into levels within groups, forming a hierarchical, three-level structure which can be expanded or collapsed. The eight groups (Level 1 PSF) are HSI, Procedures, Resources, Team Effectiveness, Knowledge/Abilities, Bias, Stress, Task Load, and Time Constraint. An example of its three-level structure is given here. Communication (Level 2 PSF) in the Team Effectiveness group has two Level 3 PSFs: Communication Quality and Communication Availability.
HUMAN ERRORS AND HUMAN RELIABILITY
553
Kyriakidis, Kant, et al. (2018) developed a generic framework with a standardized list of PSFs across sectors, referred to as cross-sectoral performance shaping factors (C-PSFs), to describe the immediate and latent PSFs. Constructing a generic PSF framework will support an integrative view of human performance in system safety across various systems and allow trans-sector learning, safety enhancement, and human error prevention. In Kyriakidis et al.’s taxonomy, PSFs are divided into two main categories: Independence-based categories and Relational-based categories. Independence-based categories contain those that are conceptually not dependent on other entities for their description. Relational-based categories include those factors that interlink at least two of the entities in systems (person, task, and environment). These two major categories have five sub-categories, a total of ten (the first five belong to the independence-based categories): • Static factors (person-based) that do not change constantly and are usually stable over time, e.g., familiarity, individual characteristics, and motivation. • Operational environment factors that are involved in the immediate technical context, e.g., system design and working environment. • Environmental factors, e.g., visibility and weather conditions. • Team factors related to those in team functioning, e.g., communication between employees, relations within team, and trust in information. • Organizational factors, e.g., communication within organization, leadership, safety culture, and shift pattern. • Dynamic relational category I factors (cognitive emphasis), e.g., decision-making skills, fatigue, stress, vigilance, and situational awareness. • Dynamic relational category II factors (task emphasis), e.g., monotony, complexity, routine, and time pressure. • Human-technology coupling factors, e.g., HSI. • Team-organization coupling factors. • Organization-macro issues coupling factors (legislation/ governmental policy). P. Liu et al. (2016) developed a full-set PSF model with the aim of identifying the key PSFs in NPP MCRs. In their
work, PSFs were systematically conceptualized from three levels—components, factors, and indicators—following the general conceptualization process in social science. This conceptualization has three steps. Step 1 is to identify and define components in the PSF system. PSFs are the description and characterization of work environments or task contexts, which usually consist of task, operator, team/crew, etc. Thus, P. Liu et al. (2016) decomposed PSFs into the following eight components in work environments or task contexts, as below: 1. 2. 3. 4. 5. 6. 7. 8.
Operator: Individuals operating NPPs, including MCR operators and field operators. This covers individual characteristics. Crew: Operating crews in control rooms. This covers crew characteristics. Organization: Support from higher-level organizations, including resource support, training support, safety culture, management support and policies. HSI: Ways and means of interaction between the crew and the system. This mainly refers to the displays and controls in HSI. System: Physical system per se. Working Environment: Internal and external environments in control rooms. Procedure: Computerized and paper procedures, guidelines, checklists, etc. Task: High-level cognitive activities in specific task environments.
Step 2 is to identify PSFs, including indicators for each MCR component, with 500 indicators being identified from existing HRA methods, human performance database, human event reports, and other sources. Step 3 is to re-group and re-classify PSF indicators to identify PSF factors. Finally, the eight components were further divided into 30 PSF Factors and further into 140 PSF Indicators. The expected result in the conceptualization process is to build a hierarchy model with three levels in the PSF system (see Figure 31). PSF components comprise common “modules” in any complex systems. PSF Factors and Indicators can be tailored and modified to recognize the unique characteristics of these systems. This full-set PSF includes the
PSF Components: 1 Operator; 2 Crew; 3 Organization; 4 HumanSystem Interface; 5 System; 6 Working Environment; 7 Procedure; 8 Task PSF Factors: 1.1 Fatigue; 1.2 Experience/Training/Skill 1.3 Stress;1.4 Responsibility 1.5 Bias PSF Indicators: 1.1.1 Working for a considerable number of hours 1.1.2 Working without rest for a considerable time 1.1.3 Night shift; 1.1.4 Frequent changes of shift 1.1.5 Low vigilance; 1.1.6 Lack of sleep Figure 31
Components, factors, and indicators of the PSF system. (Source: P. Liu et al., 2016. © 2016 Springer Nature.)
554
ones that potentially influence operator and crew performance. Furthermore, P. Liu et al. (2017) developed a risk-based method to identify the important PSFs from this full set, which will be introduced in Section 6.3.2. There are other PSF taxonomies in the literature. Researchers and analysts within and beyond the HRA community differ in their understanding of PSF taxonomies. Although there are similarities in PSF taxonomies, this specific domain is far from standardized (Porthin, Liinasuo, & Kling, 2020). We summarize potential problems related to PSFs in HRA, some of which have been reported previously (Forester et al., 2014; P. Liu et al., 2016). For example, PSFs may overlap with each other, and their definitions may be unclear, which may lead to a double counting of the effect of specific PSFs on human performance (Gertman et al., 2005; Groth & Mosleh, 2012) or the assignment of the same situation to multiple PSFs (Forester et al., 2014). Among several HRA methods, their PSFs are defined at different levels, which may affect an appropriate comparison between their qualitative and quantitative results. Several important factors, e.g., organizational or HSI factors, are omitted from certain HRA methods (more discussions and reflections on organizational factors will be given in Section 6.6.4). In other words, different HRA methods focus on different PSFs, and PSF taxonomies have many differences and their PSFs are defined and measured in different ways across HRA methods. Given the debates around PSFs, a consensus on which PSFs should be considered and the appropriate number of PSFs to include in an HRA method should be reached. 6.3.2 Measurement and Quantification PSF taxonomies are a start in understanding the factors potentially influencing human performance. Several reasons lead us to proceed to measure and quantify PSFs. At least, two kinds of PSF quantification are required in HRA. First, it requires measuring the relative importance of PSFs. It can identify the influential PSFs from an increasing list of PSFs (Liao et al., 2019c; P. Liu et al., 2017). Assessing the relative importance of PSFs to operation success or failures is also required by several HRA methods that predict HEP as a function of PSF factors, such as SLIM (Embrey, 1986b) and HDT (Spurgin, 2010). Quantifying the importance of PSFs (e.g., in terms of impact, frequency, complexity) can help to identify the key PSFs. Two major types of measures (objective and subjective) used to assess the importance of PSFs in human performance are reviewed below. Objective measures are usually based on statistical data of incidents and accidents. The importance of a PSF is generally defined as the number of incidents and accidents involving the PSF divided by the number of all incidents and accidents (i.e., the relative likelihood that it will cause a failure event). Strater (2005) adopted this metric to analyze the contribution of PSFs to non-compliance with operational procedures and found that causal factors ranked among the first were the possibility of confusing equipment due to bad ergonomic design, procedure incompleteness, and lack of knowledge of the procedure. Strater argued that this metric can provide a hint for effective countermeasures. Several studies have used the number of PSF occurrences in incidents and accidents to identify the PSFs with the greatest impact (Sasangohar & Cummings, 2010). For example, Kyriakidis et al. (2015) identified PSFs that occurred most frequently in serious accidents, accidents, and incidents in railway operations and proposed a PSFs lite version that comprises 12 factors responsible for more than 90% of the total occurrences analyzed. Several factors in this lite version of PSFs are safety culture, fatigue (shift pattern), training (experience), expectation (familiarity), and distraction. Kyriakidis et al. (2015) also examined the contribution of the individual PSFs lite version to the different types of railway accidents through
DESIGN FOR HEALTH, SAFETY, AND COMFORT
multinomial regression analysis and identified the importance of “safety culture” as the prime contributor to accidents and serious accidents and “distraction” as the prime contributor to incidents. Lacking data on incidents and accidents in a specific domain will make the above objective measures useless. In this case, subjective measures have to be used. Psychometric scaling techniques and pair-comparison techniques have been used. The ordinal scale (Sasangohar & Cummings, 2010) and interval scale (usually Likert-type scales) (Liao & Chang, 2011; Liu & Li, 2016a, 2016b) have previously been used. For instance, Liu et al. (2017) suggested a risk-based approach to identify the key PSFs by ranking their risks to operator performance and elicited NPP operators’ perceptions of the frequency and impact of PSF indicators in NPP control rooms, quantifying the perceived risk scores of PSFs by multiplying their perceived frequency and impact, and classifying PSF risk levels accordingly. AHP has been used to evaluate the relative importance of organizational factors (Davoudian, Wu, & Apostolakis, 1994) and CPCs in CREAM (Marseguerra, Zio, & Librizzi, 2007). Unlike AHP, ANP can handle the dependency problem between PSFs. In an air traffic control context, De Ambroggi and Trucco (2011) used ANP and observed that traffic and airspace as well as training and experience were the two factors with the greatest weights. The fuzzy cognitive maps method was also used to assess the relative importance of PSFs on human reliability (Bertolini, 2007). Second, it requires measuring the level or status of a PSF in modifying or weighting the nominal HEP of a task in certain PSF-based methods (e.g., SPAR-H). Some PSFs, such as “the required time to successfully complete the task,” can be measured in a direct, objective manner, and other PSFs, “fitness for duty,” can only be measured indirectly through other measures and PSFs, such as through fatigue measures (Boring et al., 2007). A consensus on their measurement is far from being reached, for which we use time-related PSFs as an example to demonstrate. The required time to successfully complete the task is usually considered to measure time availability or temporal load. In a working environment, the required time among a group of operators or crews varies largely. Different suggestions have been proposed to estimate the required time of a group of crews/operators: the mean response time (Park & Jung, 2007), the median response time (Hannaman et al., 1984), and the time required by a better-than-average operator (Whaley et al., 2011). Unlike other PSFs, time is objective, on which point researchers are supposed to easily reach a consensus. Nevertheless, regarding this objective PSF, a consensus has not yet been reached. Lacking agreements on measuring PSFs causes significant difficulties in sharing empirical knowledge on the influences of PSFs on human performance across different sectors. It relies heavily on HRA analysts’ judgments on the level, magnitude, or rating of PSFs. As witnessed in the International HRA Empirical Study (Bye et al., 2010), inconsistencies were seen in the level, magnitude, or rating of important PSFs across different HRA analysts teams, causing observable variations in the HEP prediction for the same human failure events across methods and analysts teams using the same HRA method (Liao et al., 2019c). The validity and reliability of PSF measures or selections should be further addressed. 6.3.3 Effects and Multiplier Design of PSFs The predictive ability of HRA for HEP is dependent on our empirical knowledge on the effects of individual PSFs on HEP. The multiplier design (or weight design), which concerns the degree of PSFs’ influence on HEP, has been developed to modify and adjust the nominal HEP to obtain the final HEP. A fundamental issue troubling many PSF-based HRA methods
HUMAN ERRORS AND HUMAN RELIABILITY
is the lack of validated PSF multiplier design. Most multiplier designs are heavily based on expert judgments. Inconsistencies in the multiplier design for individual PSFs exist across PSF-based HRA methods. For instance, the multiplier for the most negative level of time available PSF is to define HEP to be 1.0 in SPAR-H (Gertman et al., 2005) and Petro-HRA (Bye et al., 2017), whereas the multiplier for the most negative level of available time is 5 in CREAM (Hollnagel, 1998). The International HRA Empirical Study (Bye et al., 2010) highlighted the importance of improving the PSF multiplier design. Determining the multiplier design of individual PSFs is a heavily researched area. The ideal approach to determine the magnitude of the influence of an PSF is to manipulate the level of this PSF and examine HEP change by this manipulation in full-scope simulators. However, this approach is very costly and complex, and usually does not enable HRA researchers to manipulate the level of an individual PSF on operator performance. A compromise is to use a so-called “relative effect” method wherein a PSF will be classified into several levels (usually, a nominal level, a negative level, a more negative level). Thus, HEP0 when this PSF is assessed as nominal (ideally, other PSFs are nominal, and usually, no matter how other PSFs are assigned) and HEP1 when this PSF is assessed as the negative level are calculated. Then the magnitude of the negative level of this PSF is equal to HEP1 divided by HEP0 . Based on this approach and HEP data from full-scope, high-fidelity simulators or operating experience data, several works were conducted to determine the influence of PSFs (e.g., Kubicek, 2009). Compared with full-scope simulators, cognitive experiments in laboratories or microworld simulators are another important source for determining the effects of certain PSFs (e.g., interface, time availability) on HEP (Kim et al., 2015; Liu & Li, 2014; Xing et al., 2015). Several HRA methods rely on data from controlled experiments in laboratories or microworld simulators, such as HEART (Williams, 1988). Based on HEP data in controlled experiments in the human factors literature, Bell and Williams (2018; Williams & Bell, 2015) evaluated the appropriateness of the multiplier design in HEART and made several modifications according to new data. Expert judgments also play a critical role in the PSF multiplier design in certain HRA methods, such as SPAR-H (Gertman et al., 2005) and Petro-HRA (Bye et al., 2017). Boring and Blackman (2007) introduced the PSF multiplier design in SPAR-H, based on available data within HRA, particularly data provided in THERP, and stated that “[w]here error rates were not readily derived from THERP and other sources, expert judgment was used to extrapolate appropriate values” (p. 177). Notably, THERP’s database origins have never been published (Kirwan, 2008), and its primary author, Alan Swain, emphasized that the data tables in THERP were not written in stone (Boring, 2012). Liu, Qiu, et al. (2020) applied two expert judgment techniques—absolute probability judgment and ratio magnitude estimation—to update the PSF multiplier design in SPAR-H. They recruited licensed NPP operators to use these two scales, found the comparable multipliers from them, then compared the multipliers estimated by these two scales with those from empirical studies in the human performance literature, and finally suggested the multiplier design of PSFs for SPAR-H based on these heterogeneous data sources. 6.3.4 The Influencing Mechanism of PSFs on HEP We should understand how individual PSFs influence HEP through a psychological or information processing mechanism to trigger a human failure event. Compared to the degree of PSFs’ influence on HEP, the causal mechanism to link PSFs to HEP is largely ignored (Ekanem, Mosleh, & Shen, 2016).
555
Without clear, convincing basis for error probabilities, it is difficult to offer theoretical foundation to quantitatively predict HEPs in work environments with varying PSFs. The human error or human factors literature would provide empirical knowledge about why a PSF triggers a human failure event (e.g., Reason, 1990). For instance, Liu, Qiu, et al. (2020) discussed the potential error mechanism triggered by several PSFs in SPAR-H based on a literature review to bridge PSFs, human errors, and cognition to enhance the psychological base for SPAR-H. Knowledge of the mechanism of PSFs that trigger a human failure event will form the cognitive basis for HRA. 6.3.5 Inter-Relationships among PSFs A number of PSFs are defined in the above PSF taxonomies and other ones, suggesting the possibility that these PSFs have certain inter-relationships between each other. Two kinds of PSFs inter-relationships should be clarified: mediating effects and moderating effects. A mediating effect means that a PSF indirectly influences HEP through another PSF, which means a causal relationship between these PSFs. Mediating effects mean the dependency, overlap, or non-orthogonality between PSFs. There are dozens of PSFs, from thirty (Liu et al., 2016) to sixty (Groth & Mosleh, 2012), included in existing HRA methods, with a tendency to increase the number of PSFs in HRA. Thus, these PSFs have a high likelihood of being overlapping. A moderating effect means that the influence of a PSF on HEP is dependent on the level of another PSF (called the “moderator” in statistics), which is equal to an interaction effect between these two PSFs. Compared with the mediating effects, the moderating effects of PSFs attract less attention. The side-effects caused by PSF dependence are obvious. For instance, PSF dependence results in the double-counting of their effects while quantifying HEP (Groth & Mosleh, 2012). Thus, their inter-relationships should be considered while counting their influences on HEP. Several studies (e.g., Groth & Swiler, 2013) have argued that the Bayesian Network (BN) is a promising technique for revealing PSFs’ inter-relationships because it can describe the causal relationship between PSFs. However, BN per se cannot detect the causal relationships among PSFs. Its network model in BN (i.e., the causal model) is usually based on the judgments of several analysts, and thus, is vulnerable to subjectivity. PSFs’ inter-relationships can be examined in controlled experiments (Hallbert et al., 2004). For instance, Liu and Li (2014) examined three inter-relationships between multiple PSFs: (1) the moderating role of time availability on the relationship between task complexity and HEP; (2) the moderating role of experience/training on the relationship between task complexity and HEP; and (3) the mediating role of time pressure on the relationship between time availability and HEP. They found that task complexity significantly and positively influenced HEP, which, however, depended on time availability and experience, and that time availability affected HEP through subjective time pressure. From a statistical perspective, controlled experiments are appropriate to examine PSFs’ inter-relationships. However, only a limited number of PSFs (usually, two or three) can be examined at the same time in controlled experiments, due to the difficulty and cost of running controlled experiments for a number of PSFs. Conventional methods (e.g., controlled experiments) cannot capture the causal relationships of a large number of PSFs. Methods from other fields have been used, one of them is the Interpretive Structural Modeling (ISM; Warfield, 1973). ISM uses experts’ practical experience and knowledge to decompose a complicated system into several sub-systems (or elements) and develop a multilevel hierarchical structure model. The direct influential factors, indirect influential factors, and root factors
556
DESIGN FOR HEALTH, SAFETY, AND COMFORT
the joint effect of PSFs, a fundamental question in HRA is: What is the functional form of the combined effect of multiple PSFs on human unreliability? The combined effect of multiple PSFs is usually treated as the product of the effects of single PSFs on HEP in a multiplicative model or a sum of the effects of single PSFs in an additive model (in addition, experts estimate the probability of an HFE or a failure mode for a given set of PSFs under consideration and do not explicitly model the combination of these PSFs, which is beyond our discussion). Both models—multiplicative and additive models—are used in the HRA methods. As introduced in Section 6.2, HEART, NARA, CARA, CREAM, and SPAR-H adopt the multiplicative model, which can be traced back to THERP (Swain & Guttmann, 1983). The multiplicative model is preferred in HRA in part because its supporters may be concerned that the additive model produces less-conservative estimations. Instead, HEP corresponds to a linear, additive function of the weights and quality of PSFs in several HRA methods (e.g., SLIM and HDT). Two studies (Liu & Liu, 2020; Xing et al., 2015) extracted data of the combined effect (Me ), multiplicative effect (Mm ), and additive effect (Ma ) of multiple PSFs on HEP from the human performance literature, investigated the relationship and difference between Me , Mm , and Ma , and found that Ma , rather than Mm , is closer to Me . Liu and Liu (2020) considered the cases of two PSFs, three PSFs, and four PSFs. Figure 32 shows their result in the case of three PSFs. In this case, Me was lower than Mm (Sign test: Z = 5.38, p < .001); Me did not significantly differ from Ma (Z = 1.42, p = .156); and Mm exceeded Ma (Z = 7.68, p < .001). Thus, the additive model matched the empirical data as opposed to the multiplicative model. The multiplicative model exhibited a tendency to produce pessimistic estimations. This result is in line with empirical evidence from other human performance fields (e.g., Lamb & Kwok, 2016; Van Iddekinge et al., 2018) that the additive model outperforms the multiplicative model in describing the combined effects of multiple PSFs on task or job performance.
of a performance measure can be isolated from all of the factors, which is helpful to clarify the dependence among factors and to provide a clear structure for the multiple influential factors. That is, ISM can be used to construct a structural model and illustrate the inter-relationships (both direct and indirect effects) between the elements in this model. Its procedure is not discussed here. Several studies (Shakerian et al., 2019; Wang et al., 2018) have demonstrated the ISM application in HRA. Its results can be input into the BN technique. Given that the inter-relationships of PSFs have been identified, does it increase the quantitative predictive performance of HRA methods? The answer is dependent on how we treat PSFs. We have the following analysis: • From a statistical perspective, whether to consider the different types of a PSFs’ effect (indirect and direct effects) on HEP or not does not affect our analysis of the PSFs’ total contribution (i.e., total effect) to HEP; that is, the consideration of the potential mediating effects does not increase the explained variance of HEP. • However, in practice, the occurrence of mediating effects will lead to double-counting some PSFs. We give an example. Assume PSF2 fully mediates the relationship between PSF1 and HEP and the effects of PSF1 and PSF2 on HEP are obtained separately. If we assess the joint effect of PSF1 and PSF2 on HEP by adding their single effects, we will double-count the effect of PSF1 on HEP. • Considering the moderating effects between pair-wise PSFs adds limited value to increase the explained variance of HEP (which will be analyzed in Section 6.3.5), from a pragmatic perspective. To conclude, for the sake of increasing the predictive ability of PSF-based HRA methods, we need to understand the dependence among PSFs and construct a PSF taxonomy consisting of orthogonal PSFs. The inter-relationships between PSFs in NPPs and other sections are far from being clear, which are still an important research area.
6.4 The HRA Database The availability of human performance data is problematic in HRA. Lack of appropriate and sufficient HEP data is a key factor affecting the quality of HRA results. Numerous HEP databases were established over the years, only few of which have survived. Certain HEP databases are being developed. Ideally, HEP data should derive from the relevant operating experiences or similar industrial experiences, as these data sources can provide data with high ecological validity.
6.3.6 Combined Effect of PSFs Usually, a human failure event (HFE) is not due to the influence of a single PSF but the combined influences of multiple PSFs. Swain and Guttmann (1983) suggested that if there was a good match between the external PSFs and internal PSFs, task performance will be more reliable than if there was a mismatch, implying that PSFs influence human performance jointly. Coming to
Me = Mm
20
25
Me = 3.19 + 0.12Mm Kendall’s rk = .598***
Me (Empirical effect)
Me (Empirical effect)
25
15 10 5 0
15 10
Me = 0.65 + 0.70Ma Kendall’s rk = .622***
5 0
0 (a)
M e = Ma
20
20
40
60
80
100
120
Mm (Multiplicative effect)
140
160
180 (b)
0
5
10
15
20
25
Ma (Additive effect)
Figure 32 Kendall–Theil robust line between the empirical combined effect (Me ) of three PSFs and (a) their multiplicative effect (Mm ) / (b) additive effect (Ma ). ***p < .001. (Source: Liu and Liu, 2020. © 2020 Taylor & Francis.)
HUMAN ERRORS AND HUMAN RELIABILITY
However, many reasons make HEP data collection through these event reports and operational reports challenging. It lacks well-established and widely-accepted human performance analysis methods and analysis upon which human performance data collection can be based. Many operators would be unwilling to report errors. The error probability in pre-specified human failure events is usually very low and thus these event reports do not give much data for HRA. Operator errors of a task or a scenario can be documented; however, their success in the task or the scenario is usually not recorded. It is also difficult to know if some particular PSFs are present when the operator error occurred compared to normal operations. These difficulties mean little HRA data has come from event reports and operation experience. However, in cases when the number of times that the tasks were performed periodically and the number of times that operator errors occurred in the tasks might be recorded in event reports or can be inferred from on-site investigations, so it is possible to estimate their HEP. Preischl and Hellmich (2013, 2016) developed a statistical method to estimate HEP based on these numbers collected from German license event reports and generated and tabulated 74 HEP estimates for a wide variety of tasks. PSFs relevant to operator errors in these tasks were also derived from these reports or identified by retrospective investigations (e.g., plant visits and interviews). Similarly, Park et al. (2018) estimated the number of task opportunities of specific tasks that are periodically conducted from investigation reports and calculated the nominal HEPs of 15 tasks. In addition, the Human Event Repository and Analysis (HERA) database sponsored by the US NRC was generated from event investigations of NPPs (Hallbert et al., 2006), to make available empirical and experimental human performance data suitable to HRA and human factors practitioners. HERA is expected to provide a basis for selecting empirical data sources (e.g., operations and event reports) and experimental data sources (e.g., full-scope simulators) of human performance, to provide a formal method for decomposing events into a series of sub-events related to plant systems or the personnel at the plant, and to analyze the significant PSFs. About 10 events were analyzed through the HERA system. However, it ceased after a few years of operation, for several reasons. It has a relatively low data generation rate and relatively high operations cost. The post-event inspection reports usually do not have sufficient details to analyze operator performance. Finally, there is no clear path on how to use HERA qualitative data (e.g., event analysis and PSF) to generate HEP estimates. To generate useful HRA data, it should be able to infer the number of errors in a task and the number of opportunities of task errors. In spite of the difficulties in extracting HRA data from event reports or operational experiences, HRA data from these resources will play an important role in developing a HRA database and improving HRA quality. Simulator data (e.g., NPP operator performance in full-scope simulators) is a good HRA data source because NPP operators have to routinely perform simulator training and examinations to maintain their licenses and qualifications. Compared with data from event reports, simulator data still have high ecological validity. Thus, several HRA databases are completely based on full-scope simulators, including the Scenario Authoring, Characterization, and Debriefing Application (SACADA) (Chang et al., 2014) and the Human Reliability data EXtraction (HuREX) databases (Jung et al., 2020). Full-scope simulators seem to be the most promising source for building an HRA database. It is important to build long-term sustainable programs to collect data. The US NRC funded work on the SACADA database and established an agreement with the South Texas Project Nuclear Operating Company (STPNOC) in the USA to support the SACADA development (Chang et al., 2014). SACADA
557
is implemented in NPP operator training program to collect operator performance in their re-training in full-scope NPP simulators. Its resultant software aims to support the STPNOC’s operator training program (e.g., authoring simulation scenarios, facilitating post-simulation debriefing, expediting crew performance communication, and exporting information for statistical analysis of crew performance) and to support the US NRC to improve HRA quality. This promising database was started from 2011 and had continuous data input. Elements in SACADA are explained here. Its data structure is consistent with the simulator scenarios for operator training. A simulator training scenario usually consists of a few plant malfunctions, which are actuated by the instructors to evaluate the operator performance in responses to them. The training objective elements (TOEs), pre-specified for each malfunction usually based on procedure instructions, are used to evaluate operator performance and as basic units in the SACADA taxonomy. Each TOE contains two data segments. The first segment is predictive information, which specifies its task context and characterizes the challenges in performing the TOE. Each TOE has its own context characterization, which is based on the macro-cognitive functions relevant to human performance (e.g., monitoring/detecting, understanding, deciding, manipulation, and communication and coordination). Each macro-cognitive function has a few factors to characterize its performance context and each factor has a few discrete states. The second segment of a TOE is retrospective information, including three performance classification levels (satisfactory, satisfactory but with performance deficiency, and unsatisfactory). For each performance deficiency (the latter two performance levels), additional information is collected, including macro-cognitive functions occurred, error modes, error causes, error recovery, effect on scenario, and remediation. Two kinds of important performance in the SACADA taxonomy for HEP quantification are the TOEs’ context (pre-defined by instructors) and overall performance classification (determined by instructors and shift managers). Until now, the US NRC has made a section of the SACADA database available to the public (see https://www.nrc .gov/data/). The HuREX database, developed by the Korean Atomic Energy Research Institute, is used for data collection and analysis from NPP simulators to generate HRA data, including HEPs and the association between PSFs and the associated HEPs (Jung et al., 2020). A structured process is provided with the methods, a set of forms, and guidance for the data collection and analyses for data obtained from NPP simulators. Its basic analysis unit is unsafe act (UA) by Reason (1990). The UA information indicates whether a human behavior is a UA or not, which procedural task is related to the behavior, and which type of cognitive activities are associated with the behavior. Four cognitive activities considered are information gathering and reporting, situation interpreting, response planning and instruction, and execution. The UA types are classified into EOO and EOC (note: EOC modes in manipulation tasks are further classified into wrong device, wrong direction, and wrong quantity). The HuREX process has four phases, including preparation, data collection, data analyses, and data reporting (see Jung et al., 2020, for more information). Unlike the SACADA which relies on instructors and shift managers to collect data, raw data in the HuREX are from video records, simulator logs, and even interviews. Qualitative and quantitative data analyses are performed. At present, 37 HEPs are quantified for 21 generic task types. In addition, Kirwan, Basra, et al. (1997) built the COREDATA (Computerized Operator Reliability and Error) database that contains HRA data extracted from simulator experiments, event reports, expert judgments, direct observations, etc. As the basis for the development of NARA and CARA
558
(see Section 6.2.3), this database has some high-quality data from domains such as NPPs, the oil, gas, and chemical industries, manufacturing, and aviation. It contains 370 HEPs, of which roughly 110 are related to NPPs. It is being released under the auspices of the Human Reliability Analysis Society. This database has not been updated. 6.5 HRA Validation and Comparison Given the significant differences in the scope, approach, data, and underlying models among various HRA methods, there is a need to compare their validity and reliability (Boring et al., 2010). In a qualitative comparison, methods are compared against a range of criteria relevant to reliability and validity, usually based on expert judgments. In a quantitative comparison (i.e., validation, benchmark), the HEP predictions of a method are compared to actual data or the predictions by another method. 6.5.1 Qualitative Comparison This subjective comparison can provide insights into the strengths and weaknesses of various HRA methods and the selection of HRA methods in practical applications. Kirwan (1992) qualitatively assessed HRA methods according to a set of criteria: comprehensiveness, accuracy, consistency, theoretical validity, usefulness, resource use, and auditability/acceptability. In the Human Reliability Analysis Methods: Selection Guidance for NASA report (Chandler et al., 2006), a list of 17 attributes was used to compare more than ten HRA methods. Certain attributes are specific to aerospace (e.g., Suitability for NASA Use), whereas most attributes are broadly relevant to HRA (e.g., PSF list and causal model, resource requirements, task dependencies and recovery). Based on these attributes and the input from HRA experts, four methods (THERP, CREAM, NARA, and SPAR-H) were recommended for NASA HRA needs. Expert judgments on the relative advantage of a method over another method can be obtained and aggregated by a formal method. For instance, Petruni et al. (2019) applied AHP to collect and compare expert judgments on HRA methods, to select the suitable HRA method for the automotive industry. Their experts were safety specialists who worked in the automotive industry and human factors researchers. The selected criteria were suitability, economic, usability, and utility, and each criterion had several sub-criteria. They found Hazard and Operability Analysis and CREAM were preferred over THERP and SPAR-H. In addition, Adhikari et al. (2009), Bell and Holroyd (2009), and Spurgin (2010) systematically evaluated the advantages and criticisms for each HRA method of their consideration, which would be beneficial for comparing and selecting a HRA method for practical usage. 6.5.2 Kirwan’s Quantitative Validation Kirwan, Kennedy, et al. (1997) validated three HRA methods used in the UK nuclear industry, which are THERP, HEART, and Justification of Human Error Data Information (JHEIDI). Their validation study used 30 real HEPs (ranging from 1.0 to 1E-5) from CORE-DATA (Kirwan, Basra, et al., 1997). A between-subjects design was employed in which 10 HRA analysts were engaged for each of the three methods. HRA analysts were selected with a premise that they had proper experience in using each method. They were given scenario information (e.g., general description of the scenario, relevant PSF information, and simple task analysis) and asked to independently use a method to model and quantify human errors within two days. Their estimations were compared with the HEPs from CORE-DATA. The 23 analysts showed a significant
DESIGN FOR HEALTH, SAFETY, AND COMFORT
correlation between their estimates and the empirical HEPs. Overall precision showed 72% of all estimation to be within a factor of 10 of the empirical HEPs and 38% of all estimates to be within a factor of three of the empirical HEPs. When the estimations were imprecise, the estimations tended to be pessimistic rather than optimistic. Kirwan et al.’s quantitative benchmark offers important insights into the reasons for inconsistencies (i.e., poor reliability) across and within methods. For instance, they found the variability in analysts’ selection of the generic error probability in HEART and in modeling and decomposition in THERP and JHEDI and the difficulty in consistently modeling EOC in HEART and JHEDI, slips in HEART, and cognitive tasks in THERP. 6.5.3 International and US HRA Empirical Studies As the first major efforts in recent years to benchmark HRA methods, the International HRA Empirical Study (hereafter “the International Study”) and the US HRA Empirical Study (hereafter “the US Study”) aimed to develop an empirically-based understanding of the performance, strengths, and weaknesses of HRA methods by comparing HRA predictions against actual operator performance in simulated accident scenarios on NPP simulators. Organizations from ten countries participated in the studies, representing industry, utilities, regulators, and the research community. In the International Study, 13 HRA methods’ predictions were benchmarked against observed crew performance in the Halden Reactor Project’s HAlden huMan-Machine LABoratory (HAMMLAB) research simulator located in Halden, Norway. In this study, as 12 of the 13 methods (e.g., THERP, SPAR-H, CREAM, HEART) were applied by only one HRA team, it was unable to identify the impact of HRA team’s inconsistent application of a given method on the variability of HRA predictions. Thus, the US Study was conducted on a US NPP full-scope simulator, where four HRA methods (e.g., SPAR-H) were applied by at least two HRA teams and thus the intra-method comparison was conducted. Both studies used full-scope simulators to run multiple accident scenarios by multiple licensed operating crews, with multiple HRA methods being applied to predict crew performance in each scenario. In the International Study, four scenarios were simulated by 14 NPP crews in a simulator with a computerized human-machine interface at HAMMLAB. In the US Study, three scenarios were simulated in a simulator with a conventional human–machine interface. HRA teams were unable to observe simulator exercises in the International Study but were able to in the US Study. In these two validation studies, several assessment criteria used were qualitative predictive power, quantitative predictive power (e.g., whether the methods provided capability to identify important PSFs and whether HRA teams used these methods to perform adequate qualitative analysis), adequacy of method guidance, insights for error reduction, and inter-analyst reliability. Although quantitative HRA results were essential to PRA, the qualitative criterion was prioritized over the quantitative criterion, not only because the small number of scenarios and crews may lead to uncertainties in empirical HEPs, but also because the qualitative analysis acted as the basis for the quantitative analysis. Two scenarios, steam generator tube rupture (SGTR) and loss of feed water (LOFW), with base and complex versions, were designed in the International Study. Several human failure events (HFEs, the unit for assessments) were designed in the two scenarios. Three scenarios were also designed in the US Study. Other information of scenario design and simulation is not discussed here (see Liao et al., 2019a). In the International Study, the crew failure rates and difficulty ranking of these HFEs specified in the accident scenarios
HUMAN ERRORS AND HUMAN RELIABILITY
559
Predicted HEPs (means)
1.E+0
1.E-1
1.E-2
1.E-3
1.E-4 5B1
1B
3B
3A
1A
2A
2B
5B2
4A
HFEs - SGTR scenarios, by decreasing difficulty per simulator data Figure 33 HEPs by HFE difficulty with Bayesian Uncertainty Bounds of the empirical HEPs in the International HRA Empirical Study. Dashed lines are the two uncertainty bounds. The solid line represents the predictions by CREAM. Circles represent the predicted HEPs. (Source: Bye et al., 2010. © 2010 OECD Halden Reactor Project.)
were obtained. In easy HFEs, no failures occurred. In difficult HFEs, the majority of the crews failed. As an example, HFEs predictions by the HRA methods are presented in Figure 33 against the 5th and 95th percentile Bayesian bounds of the empirical HEPs in the SGTR scenario. In Figure 33, the predicted HEPs by CREAM is highlighted as a solid line as an example. The detailed comparisons and assessment for each HRA method were given in (Bye et al., 2010; Lois et al., 2009). Major findings in this international validation study are as below (Bye et al., 2010; Liao et al., 2019b). The estimated HEPs showed a decreasing trend, consistent with the difficulty ranking of HFEs. Each HFE had one or more outliers (outside of the Bayesian uncertainty bounds). Significant disagreement was found among the HEPs predicted by the HRA methods. This variability was present for both easy and difficult HFEs and not correlated across HFEs. None of the methods were systematically more conservative or optimistic than other methods. The HEP predictions by some methods were unable to differentiate easy HFEs from difficult HFEs, that is, the range of these predictions for the set of HFEs was rather narrow, indicating the inappropriate discriminating power of the related methods. Certain HRA teams that applied the HRA methods heavily based on PSFs to estimate HEPs did not explicitly consider failure mechanisms. For those HRA methods based on the identification of failure mechanisms, their qualitative analysis was richer, which, however, did not lead to higher quantitative power. In the US Study, intra-method comparisons were made, revealing significant variabilities across two or three HRA teams for one of the four methods. These comparisons identified a number of method-driven factors and team-driven factors and their interaction that contributed to variability in predictive results. As summarized by Liao et al. (2019b), factors contributing to qualitative predictive differences included: (1) the approaches, scope, and depth of qualitative analysis (e.g., sufficient guidance for qualitative analysis was lacking for
some methods and a consistent approach or guidance for task analysis was lacking for most HRA methods); (2) estimations of time required (e.g., variability in HRA teams’ judgments on time required to perform actions and timing analysis lacking traceability); (3) difficulty in understanding and treating complexity (e.g., lack of guidance on dealing with complexity in complex scenarios). Factors contributing to quantitative predictive differences included: (1) judgments for crediting recovery (e.g., HRA teams were left to decide whether or not to consider error recovery and their preference toward conservative results affected their decision); (2) attempts to make up for an inadequate range of PSFs (e.g., some teams considered specific PSFs that are not addressed by their method and other teams did not); (3) differences in choosing PSFs and associated levels (e.g., overlap in the definitions of the involved PSFs and inadequate guidance on determining their levels); (4) inadequate means to relate detailed qualitative analysis to quantitative analysis; and (5) compensation for poor treatment of diagnosis (e.g., the lack of guidance to address a full scope of cognitive activities resulted in inadequate consideration or variability in the consideration of operators’ cognitive activities). Strictly speaking, none of the HRA methods were validated, which was due to method-related factors but also analyst-related factors. These two magnificent benchmark exercises suggest the inaccuracy in predicting human errors in complex systems and warn us of the risks of applying these methods in evaluating the risks of human elements to system reliability. However, the small sample size in these two studies led to their findings being far from inconclusive. Our focus should be on lessons learned from these two studies to further improve HRA. Liao et al. (2019c) summarized several very important lessons: • Comprehensively considering cognitive activities, including initial diagnosis (understanding the plant situation and deciding an appropriate response) and decision making related to executing the response plan,
560
DESIGN FOR HEALTH, SAFETY, AND COMFORT
•
• •
•
•
•
is necessary. The challenges in cognitive activities during the response execution are underestimated and not properly modeled in existing PSA/HRA. Thoroughly identifying failure mechanisms and contextual factors is necessary for consistent and reasonable predictions. For PSF-based HRA methods (e.g., SPAR-H), considering the possible failure mechanisms or causes might provide a rationale for identifying important PSFs and the degree of their effects. Guidance related to judging the influence of PSFs and choosing the right PSFs should be improved. As addressing a wide range of PSFs in a richer qualitative analysis did not always produce reasonable HEP predictions, one should consider whether a method using a key subset of PSFs, a corresponding qualitative analysis, and a dovetailing quantitative process can provide reasonable predictions. Parsimony does not have to weaken the validity of HRA. Considering that expert judgments were heavily used in almost all processes in all HRA methods, developing detailed, structured guidance and tools is absolutely necessary to reduce variability and support expert judgments required in applying HRA methods. Treatment of dependence between adjacent human failure events should be largely improved for all methods. The consideration of the dependence usually led to pessimistic predictions. To better treat dependence, it is more important to understand the dynamic nature in HFEs (e.g., plant status evolution, information flow, and corresponding procedural guidance), rather than the static factors (e.g., same crew, same procedure, or same location). Dependence is further examined in Section 6.6.6. Traceability in the quantification (e.g., the choice of nominal HEPs and the multipliers of PSFs) and in the translation from the qualitative analysis (e.g., identification of important PSFs) into HEPs are important for guaranteeing the reproducibility of HEP predictions.
6.6 Other Remarks 6.6.1 HRA for Digital Systems As a trend in safety-critical complex systems, digitalization changes the way how humans interact with systems. For instance, analog MCRs in NPPs, which are characterized by paper-based procedures, hard-wired indicators and analog controls, have been updated to or replaced by digital MCRs, which are characterized by computer-based procedures, digital HSIs, and soft controls (Liu & Li, 2016b). Digital systems are supposed to have many benefits over analog systems in NPPs, for example, improved system performance in terms of accuracy and computational capabilities, higher data handling and storage capacities, and easier to use as well as being more flexible (National Research Council, 1997). Digital systems might trigger different human error opportunities and sometimes even more opportunities, because digitalization and automation might produce the following side-effects relevant to human error and human reliability: new mode errors (Sarter & Woods, 1995), new knowledge demands in understanding the interaction between the coupled system elements (Sarter et al., 1997), secondary interface management complexity (O’Hara et al., 2002), and new coordination demands between crew members (Salo et al., 2006). There are limited empirical studies on the influence of digitalization on operators in NPP MCRs. Liu and Li (2016a, 2016b) compared complexity factors in conventional and digital NPP MCRs and found that operators in digital NPP MCRs experience higher
complexity and workload, implying potential side-effects of digitalization. HSI type did not significantly influence error rate in identification tasks (Massaiu & Fernandes, 2017). However, based on HRA data from event investigation reports, Ham and Park (2020) reported ten times higher HEPs for tasks with digital HSIs compared to analog ones. Therefore, in the light of the current empirical knowledge, it cannot be concluded that “either analog or digitalized HSI of the control room would always be better than the other” (Porthin et al., 2020). The effect of digital HSIs is highly dependent on the specific design and technological realization. It raises a challenge for HRA, which was developed in the age of analog systems, in digital MCRs: it should capture the specific design factors supporting or impeding operator performance, in an appropriate way, to give realistic and consistent HEP assessments. As stated in NUREG-0700 guidelines (O’Hara & Fleger, 2020), certain considerations should be given, including whether the original HRA assumptions are still valid in digital systems, whether the human errors analyzed in existing HRA are still relevant, whether the probability of operator errors may change, and whether new errors not modeled by existing HRA may be introduced, and whether the consequence of errors modeled in existing HRA may change. That is, all main elements of HRA may be affected by the transition from analog to digital systems. The suitability of existing HRA methods in digital systems is often questioned (Hickling & Bowie, 2013; Zou et al., 2017). Limited efforts were put into improving the suitability of HRA in digital systems, mainly focusing on collecting HEP data of tasks and determining the effect of PSFs in these systems. For instance, these data (HEP data and PSFs’ multiplier data) are being collected through full-scope simulators of digital MCRs (Chang et al., 2014; A.R. Kim et al., 2018; Y. Kim et al., 2018), partial-scope and microscale simulators (Y. Kim et al., 2015; Liu & Li, 2014), cognitive experiment literature (Williams & Bell, 2015; Xing et al., 2015), and expert judgments (Liu et al., 2020). PSFs in control rooms should be redefined and the extent of their effect on HEP should be reassessed (Porthin et al., 2020). Very few HRA methods are being developed for digital HSIs, including EMpirical data-Based crew Reliability Assessment and Cognitive Error analysis (EMBRACE; Kim et al., 2019) and Zhang and his colleagues’ work in addressing human reliability in digital NPPs in China (e.g., Zhang, 2019; Zou et al., 2017). 6.6.2 Expert Judgments Although expert judgments have been criticized for a long time (Mosleh et al., 1988), they are still critical for HRA. Without them, it would be difficult or even impossible to conduct a PRA/HRA (Spurgin, 2010). HRA highly depends on the use of expert judgments, whether it is to identify relevant human interactions; provide a lower, nominal, or upper bound estimation of a human failure probability; identify contextual factors such as common performance conditions that could influence performance in a given scenario; generate important weights and quality ratings for those factors; resolve the effects of dependencies among human activities and factors defining work contexts; or provide guidance on how to extrapolate human error data to new contexts. (Sharit, 2012, p. 782) The International HRA Empirical Study and the US Empirical Study provided strong evidence that all HRA methods continue to involve significant expert judgments, and that the quality of the results can depend to a great extent on decisions about what to include in
HUMAN ERRORS AND HUMAN RELIABILITY
the analysis and how to include it, and decisions about the level or expected impact of PSFs. In some cases, analyst expertise was used to extend the methods and improve their overall performance. (Liao et al., 2019c, p. 4) Several HRA methods strongly rely on expert judgments to generate HEP data. The rationale behind expert judgments may be the “wisdom of crowds”: under the right circumstances, a group of people are remarkably intelligent and are able to make right assessments and predictions. Or, when objective data are unavailable, expert judgments become the only way of generating HEP data. In the 1980s, the US NRC published several guidelines supporting the application of several expert judgment methods for generating HEP estimation, including paired comparison, ranking/rating, absolute probability judgment (also called direct numerical estimation), and ratio magnitude estimation (also called indirect numerical estimation) (Comer et al., 1984; Seaver & Stillwell, 1983). For instance, in the works sponsored by the US NRC, APJ relies on experts to directly estimate HEPs on a logarithmic scale from 1.0 to 10−5 ; RME requires experts to make ratio judgments regarding the relative likelihood of human failures in pairs of tasks based on their knowledge and experience. In comparison to other techniques, they have relatively higher empirical support and lower data processing requirements, but lower acceptability to experts (Seaver & Stillwell, 1983). Further, the US NRC funded work on another HRA method, called a technique for human error analysis (ATHEANA) (Forester et al., 2004), the quantification of which largely relies on expert judgment. Its quantification is a two-step process: (1) identification of contextual information (i.e., PSFs and plant conditions); and (2) translation of the contextual information into probability distributions for HFEs. In the recent IDHEAS (Xing et al., 2017), a formal expert elicitation process was adopted to estimate the HEP distributions (including 5th, 50th, and 95th percentiles) for different decision tree paths, which is not discussed here. Expert judgment also plays a critical role in determining the relative importance of a set of PSFs, e.g., in SLIM (Embrey et al., 1984), and the PSF multiplier design, e.g., in SPAR-H (Gertman et al., 2005), and Petro-HRA (Bye et al., 2017). Liu, Qiu, et al. (2020) applied absolute probability judgment and ratio magnitude estimation to update the PSFs’ multiplier design in SPAR-H, based on the judgments from licensed NPP operators. In addition, expert judgments have also been used for dependence assessment and to assess the influence of operator failure to perform one task on the HEP in the subsequent task (Podofillini et al., 2010). Several relevant issues should be noted to increase the validity and reliability of expert judgments. First, regarding the selection of experts, there are two types of experts (Spurgin, 2010): knowledge and domain experts. The former is expected to understand the whole field of HRA in terms of methods and models (e.g., HRA practitioners and researchers); the latter is expected to have sufficient exposure to and understanding of the circumstances of interest (e.g., NPP operators and instructors). Spurgin (2010) reinforced that “[i]t is not acceptable for an analyst, who is a knowledge expert but not a domain expert, to guess (estimate) HEPs” (p. 80). Kirwan (2008) argued that involving domain experts can maximize expertise input and he also cautioned about potential biases in judgment processes. Domain experts with long operational experience—e.g., a recommended minimum operational experience is 10 years and still be an active operator (Kirwan, 2008)—are preferred. That length of experience should allow domain experts to be fully aware of what they judge (Spurgin, 2010). For instance, while being asked to judge the extent of the effect of a PSF on HEP, domain experts know better how this PSF affects human errors.
561
Second, to facilitate expert judgment and reduce biases, a formal expert judgment process (see the IDHEAS report) and certain tools should be used. Given that people, including well-trained experts, may be biased in the interpretation of very small probabilities (Kirwan, 2008), the logarithm scale with numerical probabilities (e.g., 1 in 10 and 0.01) and corresponding verbal probabilities (e.g., “infrequently fails”) or the qualitative scale (e.g., a condition producing a “moderate impact” might mean that it makes personnel fail with a probability of 0.01) can be used to carry out a direct estimation of HEP (e.g., Forester et al., 2004; Pandya et al., 2020; Spurgin, 2010). One pending issue is that the correspondence between numerical and verbal probabilities and the qualitative scale should be validated. Third, to obtain more acceptable predictions, combining different data sources (e.g., knowledge and domain expert judgments, different expert judgment techniques or different expert judgments, cognitive experimental studies, and full-scope simulator studies) is necessary and is expected to improve the empirical base of HRA (Mkrtchyan, Podofillini, & Dang, 2016). A formal process for aggregating these data sources should be developed to inform HRA. 6.6.3 Time-Related PSFs Limited research efforts in HRA have been put into building the quantitative relationship between individual PSFs and HEP. Time-related PSF is one exception. Time is a special PSF. It can be used as a criterion to determine the success and failure of a task. A speed–accuracy trade-off (SAT) (Fitts, 1966) exists such that people can trade speed for accuracy or vice versa. A negative relationship exists between speed and accuracy. SAT is described as the time–reliability curve in HRA, which is the core assumption of the HCR model (Hannaman et al., 1984). As reviewed in Section 6.2.2, it was suggested abandoning the HCR model. However, several researchers recently highlighted the importance of time-related PSFs and attempted to build other models to describe the relationship between these PSFs and HEP. Researchers of IDHEAS-G proposed a new idea which is to consider the contribution of insufficient time to HEP (Xing et al., 2020). The HEP of an HFE is quantified into two parts: the error probability attributed to insufficient time (Pt ) and the error probability attributed to failure of the macro-cognitive functions (Pc ), as given below: HEP = 1 − (1 − Pt )(1 − Pc ) = Pt + (1 − Pt )Pc Pt accounts for the feasibility whether there is adequate time for the required action or not. It does not account for human errors caused by time pressure. It is possible, even if operators have adequate time, they are still under time pressure so that they may try to complete the action as fast as possible or the time pressure disrupts their HIP processes. In this case, researchers of IDHEAS-G suggested modeling time pressure as a PSF in assessing Pc . A convolution method was suggested to calculate Pt , in which this error probability is the convolution of the probability density functions of the required time and available time, that is, the cumulative probability that the required time is greater than the available time. However, it is extremely challenging, and practically impossible, to obtain the probability density functions of the required time and available time of HFEs in NPPs and other complex, safety-critical systems. Thus, a different idea is needed in considering the relationship between time factor and HEP. The difference between required time and available time is the time margin. Certain researchers believe there is a close relationship between HEP and the time margin and the possibility exists to predict HEP
562
DESIGN FOR HEALTH, SAFETY, AND COMFORT
by using the time margin. An inverse S-shaped relationship might exist between HEP and the time margin. Meister (1976) proposed that human performance remains unchanged with an increase in task demand, then decreases when task demand increases continuously, and finally remains at the minimum level even with a further increase in task demand. The time margin can be regarded as an indicator of task demand. A lower level of time margin means a higher temporal demand. This inverse S-shaped relationship can be described by a logistic model (Xing, 2015): HEP =
1 1 + k × exp(a × time margin)
where k and a are two parameters estimated through expert judgment or data fitting. Liu and Li (2020) investigated and compared two models (logistic and linear) to explain HEP based on the time margin. The time margin was operationalized as the difference between the time available to complete a task and the time required to successfully complete the task, divided by the required time. While explaining HEP data from a microworld simulator study (Liu & Li, 2014), both models exhibited an acceptable, equivalent explanatory power. While explaining HEP data from a full-scope simulator study (Bye et al., 2010), the logistic model explained more variances in HEP, although both models exhibited an acceptable explanatory ability. Thus, Liu and Li (2020) indicated the potential of the logistic model to explain and predict HEP based on the time margin in time-critical tasks. Using the time margin or other time-related PSFs to explain and predict HEP does not mean ignoring other PSFs. Other PSFs will still influence the required time to complete a task. Thus, their contribution to HEP is still, in part, reflected in such a model. In cases where time is non-critical, this model will lose its predictive ability. More studies are needed to model the quantitative relationship between HEP and individual PSFs. 6.6.4 Organizational Factors Man-made disasters—recent examples include the 2011 Fukushima nuclear disaster (Alvarenga & Frutuoso-e-Melo, 2015) and the 2015 Tianjin Port explosion (Liu & Wang, 2019)—repeatedly remind us that organizational problems are usually at the root of what causes disasters (Gould, 2020; Paté-Cornell & Murphy, 1996). Thus, they should be included in risk assessments. As suggested by Rasmussen (1997), risk management in sociotechnical systems should include multiple levels, ranging from legislators, managers and work planners, to frontline operators. Many studies have been done within and beyond HRA, through different lines of research. In the HRA realm, organizational factors are usually regarded as important PSFs in their second-generation methods to modify nominal HEPs or to predict HEPs. The HRA community puts great emphasis on individual/team problems and does not appropriately or sufficiently treat organizational problems (Dallat et al., 2019). Due to lacking relevant empirical data, HRA methods are not able to accurately quantify the influence of organizational PSFs on HEP. Nevertheless, it does not bother HRA practitioners, because usually they ignore them or consider them to be nominal. Several methods have been suggested to explicitly incorporate human and organizational factors into PSA and quantitative risk assessment, including, to name a few, the Model of Accident Causation using Hierarchical Influence NEtwork (MACHINE; Embrey, 1992), the Work Process Analysis Model (WPAM; Davoudian et al., 1994), System-Action-Management (SAM; Paté-Cornell & Murphy, 1996), and Socio-Technical Risk Analysis (SoTeRiA; Mohaghegh et al., 2009). These methods
share certain commonalities: a set of organizational factors, a conceptual model linking hierarchical organizational factors and human and/or equipment reliability (usually, linear cause and effect models), and a set of quantification techniques (e.g., fault and even trees and Bayesian networks). For instance, in the WPAM, two levels of organizational factors were considered: the top level, represented by the overall culture and its constituents (e.g., organizational culture, safety culture, and time urgency), and the second level, represented by organizational factors under decision making, communication, administrative knowledge and human resource allocations. Organizational factors will influence personal and/or equipment performance by their influences on the quality and efficiency of a given work process. SoTeRiA (Mohaghegh et al., 2009) offers a very complex theoretical framework on the causal relationships among multiple levels of organizational factors. An example of causal chains is (Note: “→” means a cause-effect link in the chains:) industrial & business environment → organizational culture (including safety culture) → organizational structure and practices → organizational climate → group climate → individual PSFs (including psychological safety climate) → “unit process model” (representing the performance of units such as operation units and maintenance units). These methods have been subjected to criticism, including that most of these methods are conceptual discussions or models, and thus, are unable to make quantitative predictions, that they usually rely on expert judgments in their quantification process and lack scientific data to validate their results, or that they imply the linear, cause-and-effect models to describe the complex relations among humans, organizations, and technology (Dallat et al., 2019; Levenson, 2012). Another line of research takes system theory, rather than reliability theory, to capture the non-linear dynamics of interactions among system components and performance variability in the system. Examples of methods are the System-Theoretic Accident Model and Processes (STAMP; Levenson, 2012) and Functional Analysis Resonance Method (FRAM; Hollnagel, 2012). These methods do not aim to generate a numerical probability related to system failures, because their supporters argue that probabilistic analyses are not representative of the risks and are misleading (Dallat, Salon, & Goode, 2019). They aim to reveal the underlying organizational mechanisms of system accidents. These methods also have significant weaknesses, including that they are still not widely applied in practice, are cost-intensive, that they make high requirements on their analysts, and that their results are not well validated (Dallat et al., 2019). Organizational theory is also used by sociologists to conceptualize and describe how organizations could contribute to accidents and disasters. Examples of methods or theories include Human Reliability Organization (LaPorte & Consolini, 1991) and Normal Accident Theory (Perrow, 1984), which are beyond our discussion. Other debates on how to treat organizational factors in risk and safety research can be found in two recent reviews (Gould, 2020; Pence & Mohaghegh, 2020), which are far from reaching a final resolution. As summarized by Gould (2020), “what to model if organizational factors are to be included in risk assessments remains as big a question as how to model.” 6.6.5 Qualitative Analysis HRA aims to input quantitative HEP predictions to PSA and other risk assessment practices. Usually, HRA is supposed to be a method to quantify HEPs, even though most HRA steps are
HUMAN ERRORS AND HUMAN RELIABILITY
actually qualitative. For instance, qualitative analysis includes problem definition, task and scenario analysis, and human error identification (Kirwan, 2008). Qualitative analysis is the starting point and premise of quantitative analysis. However, qualitative analysis has received insufficient attention in HRA (Laumann, 2020). Two large-scale validation studies observed qualitative analysis as a weak point in HRA and confirmed the importance of qualitative analysis (Bye et al., 2010; Liao et al., 2019c). Researchers observed the significant differences in qualitative analysis approach, scope, and depth within the qualitative analyses. All HRA methods do not provide sufficient guidance or an explicit framework for analysts to conduct a structured and consistent qualitative analysis. Some HRA methods (e.g., SPAR-H) are by nature a quantification method. Their analysts only need to decide a PSF, its level, and multiplier. Other HRA methods (e.g., IDHEAS) enable a richer qualitative analysis, which can help analysts to understand the context and dynamics of complex scenarios and uncover scenario-specific PSFs. However, a richer qualitative analysis may not necessarily lead to appropriate predictions because of the difficulties in translating qualitative analysis into quantitative analysis, especially in challenging scenarios. A poor qualitative-quantitative interface exists. Inadequate guidance for selecting the key PSFs and giving corresponding ratings can lead to the breakdown in the interface and in turn lead to variability in quantitative results. In summary, these two validation studies highlighted the importance of qualitative analysis. A structured and systematic guidance for qualitative analysis should cover a thorough set of scenario conditions and PSFs, concern potential failure mechanisms, failure modes, and associated causes, and carefully examine the crew’s interactions with the procedures, interface, and systems, and develop a good qualitative-quantitative interface. Researchers in these two validation studies gave important recommendations to improve qualitative analysis. For instance, they suggested enhancing the rating scales for PSFs by providing anchors for the ratings, to promote consistency in the scope of PSFs and their levels addressed by different analysts and consistency in the quantification model inputs. Qualitative analysis in HRA will be a promising direction and deserves much scholarly attention. Laumann (2020) summarized the quality criteria for qualitative research methods and discussed their relevance for HRA. These criteria are grouped into analysis steps in a research (e.g., background of analysis, selection and description of sample, data collection, and ethics). Take background analysis for example. Its first criterion is “define and describe the purpose of the analysis.” Correspondingly, an HRA analysis should describe its purpose and scope (Kirwan, 1994). Its second criterion is “define and describe theoretical framework.” In HRA, different types of background information should be provided and discussed, including which type of task analysis, error identification, and HRA methods, and why a specific method is chosen. It is necessary to use these criteria relevant to qualitative research to improve the quality of HRA qualitative analysis. 6.6.6 Dependence Dependence between HFE events is tricky in HRA practice. Dependence is a heritage from Swain and Guttmann (1983), based on their judgment. In THERP, a dependency model (see Section 6.2.1) consisting of five positive dependency levels (from no dependency to complete dependency) is used to account for situations that failure of one activity (A) can influence the error probability in the subsequent activity (B). When dependence is considered, the basic HEP of B is assessed as its conditional HEP (CHEP). Dependence assessment has a
563
significant influence of the overall results, since CHEP might be an order of magnitude or larger than the BHEP. Thus, it is essential to avoid under-estimating of the risk; however, it also can over-estimate the risk. Dependence is troubling for many HRA methods and analysts. Usually, HRA analysts are free to select or modify which guidelines (e.g., see THERP Table 10-1) and procedures to use to assess dependence. Spurgin (2010) suggested using the beta factor as a means for accounting for dependence: CHEP = 𝛽 × NHEP where CHEP is the failure probability of activity B given failure on activity A, and NHEP is the failure when activity B occurs alone, and 𝛽 is the dependency factor. The beta factor (𝛽) is not the same factor as given in THERP (see Section 6.2.1). HRA analysts should determine which human activities are dependent on what other activities in task sequences. As THERP does not provide clear guidance, the assignment of the dependence level is essentially a direct expert judgment in THERP. To reduce the subjectivity of this process, a decision tree (DT) is usually suggested to model the influence of dependencies in any particular sequence of dependent human activities (Podofillini et al., 2010; Spurgin, 2010). Constructing the DT becomes central, which has to start with an appreciation of the potential dependence-influencing factors among human activities. These factors will be shown on the columns in the DT and can be crews (same/different), time (close in time/not close in time), location (same/different), cues (additional cues/no additional cues) (Gertman et al., 2005; Podofillini et al., 2010). Two activities sharing more of these factors would have a higher level of dependence. Each of these factors has two or more discrete levels. The various paths through the dependence DT correspond to dependency influences. The end branches of these paths would be designated by different beta values, with higher beta values resulting in more increase in HEPs in activity B. By using a DT, the HRA analyst only has to give judgment on the levels of dependence influencing factors, which reduces the subjectivity. Different DTs exist in the HRA community (e.g., Gertman et al., 2005; Spurgin, 2010), which can lead to significant differences in the HRA results and in the evaluation of risk contributors. The beta values associated with the end-branch dependence levels and the relationships between the levels of these dependence influencing factors in DTs are usually based on expert judgments. DTs are not as flexible as the analyst’s judgments are, especially when constrained to extreme situations. To improve flexibility and transparency in the expert elicitation process, different expert judgment techniques have been used, such as fuzzy expert system (FES; Podofillini et al., 2010) and the Dempster-Shafer evidence theory (Su et al., 2015). Their technical details are not discussed here. These methods are largely based on expert judgments and ignore potential psychological processes underlying dependence. Researchers in the International HRA Empirical Study and the US Empirical Study (Liao et al., 2019c) contributed new insights in thinking about dependence. Most HRA teams analyzed the conditional HFE, addressed potential dependence with the THERP dependence model, and obtained pessimistic HEP predictions as opposed to empirical HEP data. HRA teams only accounted for positive dependence, that is, the failure of a preceding task increased the HEP of its subsequent task, a common practice in HRA. It might be important to consider the potential for negative dependence, even when previous failures occur, in some cases. Significant improvement in the treatment of dependence is needed for all methods. In particular, analysts need to understand the dynamic nature of the plant status evolution and the information flow and procedural guidance that the evolution entails, rather than the current emphasis on factors like the same
564
crew, the same procedure, or the same location, which focus on more static aspects (Liao et al., 2019c). Assessing dependencies remains as much an art as a science (Sharit, 2012). Given the limited empirical knowledge on the dependence between HFEs and the uncertainty in the added value from the dependence consideration, we suggest putting the dependence assessment aside before we have sufficient theoretical and empirical knowledge on dependence between HFEs. 6.6.7 Recovery Recovery is another tricky issue in HRA, another heritage from Swain and Guttmann (1983). In THERP, its analysts may wish to pursue factors related to error detection and the potential for error recovery. HRA analysts can opt to consider error recovery or not. Error recovery has been seen in practice, such as the Three Mile Island Unit 2 accident in which recovery was due to the arrival of an outside person during the accident progression (however, this intervention by another operator was too late). Recovery error probability is highly dependent on error type and PSFs. Error types can be roughly classified into errors of diagnosis and errors of execution. Errors of diagnosis are difficult to recover from, even in the presence of contrary information (Spurgin, 2010; Woods, 1984). Humans create ideas to explain away counter-information, called the confirmation bias. If an operator has the correct intention but executes the wrong action (slip), there is a strong possibility of recovery. In simulated NPP emergencies, Woods (1984) compared error corrections related to problems in state identification (diagnosis) and in execution. When errors were related to misidentification of the plant state, none of the 14 errors were corrected by the review or through external agent. All of the ten errors in execution were corrected, with a relatively long time period elapsing: four of the ten corrected execution problems were detected almost immediately but the other six took a long time before being detected. Beare and Dorris (1983) also found that most faulty executions (i.e., slips) (195 of 225) were usually immediately recovered in simulated emergencies in NPPs. The observed rate of commission errors of all types was 0.00316 before recovery and 0.00042 after recovery. The HuREX database (Jung et al., 2020) shows that non-recovery percentages (almost 50%) were considerably high compared to non-recovery HEPs generally used in HRA. The recovery percentages were relatively low in some tasks involving cognitive activities (e.g., parameter comparison, trend evaluation) and high in tasks involving information verification and responses planning and instruction. Studies in surgeries reported somewhat similar findings. Pugh et al. (2020) examined resident performance in a simulated laparoscopic ventral hernia (LVH) repair on a table-top simulator and provided specific information for understanding error recovery. Almost half of all errors (47%) went completely undetected, which is similar to the finding in the nuclear context (Jung et al., 2020). For the 53% detected errors, cognitive errors were more likely to have delayed detection, and execution errors (technical errors in their terminology) were more likely to have been detected immediately. Cognitive and execution errors had a similar likelihood of being detected in their study. This finding is different from that in the nuclear context (Woods, 1984), probably because the extreme complexity in nuclear emergencies leads to the difficulty in detecting and correcting cognitive and diagnostic errors. EOCs were more likely to be detected immediately, while EOOs were more likely to have delayed detection. EOCs were 93% more likely to be detected as compared to EOOs. In terms of error recovery, of the errors that residents attempted to recover, 86.0% were recovered successfully (Pugh et al., 2020). EOCs were three times more
DESIGN FOR HEALTH, SAFETY, AND COMFORT
likely to be successfully recovered than EOOs. Execution errors were four times more likely to be successfully recovered than cognitive errors. The recovery success of detected errors is highly context-specific. In NPPs, three possible recovery paths may occur, including cycling through the procedure itself, the presence of a shift technical advisor or safety engineer using different procedures, and the board operators commenting on the fact that the plant state is not stabilizing or improving (Spurgin, 2010). The shift technical advisor or safety engineer might be an important mechanism to trigger the operator to reassess his mental model of what is required to recover from a mistake. In other contexts, human–computer interface design can alert the operator to the occurrence of an error. In HRA practice, error recovery is usually not considered to keep conservative predictions. The nominal HEP or the weights of PSFs might have already included the recovery effect. In the case of error recovery being considered, full recovery is usually counted (i.e., assuming recovery error probability as zero). More empirical data is needed to support HRA analysts to make their decisions on recovery. Presently, our suggestion is to ignore recovery to reduce the complexity of the analysis, which, however, does not mean that we deny the existence of error recovery and detection. 6.6.8 Ethics in Human Error and HRA research Ethical issues related to HRA are not explicitly considered. Kirwan (2008) is an exception. Given that this topic has rarely been discussed in the general HRA literature, we should discuss it here, although our discussion is superficial. HRA is a discipline to deal with “human reliability” and “human error probability,” which does not mean that HRA is to allocate blame. Usually, the HRA literature argues that most system accidents and incidents are due to human errors. However, accidents, incidents, and human errors are present in most cases because of system design and complexity, operational constraints and complexities, and organizational pressures (Kirwan, 2008). Of the NPP events, 80% can be attributed to human error, while the remaining 20% can be attributed to equipment failure (International Atomic Energy Agency, 2013). Close observation indicates that a weakness in organizational processes and cultures has contributed significantly more to the occurrence of NPP events than have individual errors and mistakes: 70% of human errors were found to be the result of organizational weaknesses rather than individual weaknesses (International Atomic Energy Agency, 2013). In nature, HRA does not view human error as the product of individual weaknesses but rather as a result of contextual and situational factors that influence human performance. As HRA analysts attempt to predict how an operator or a pilot could make serious errors, which could lead to death and system breakdowns, their analysis should be handled sensitively. Kirwan (2008) proposed that they must convey and believe that their work is to support these operators by revealing system weaknesses and helping them overcome them. In addition, due to the uncertainty and validity lacking in HRA quantification, it is dangerous to rely on HRA results in allocating blame and responsibility. In human error research, Reason (1997) showed that there is no simple causal relation between the frequency of individual errors and the risk of major accidents. He explained how human errors are the symptoms that reveal the presence of latent conditions in organizations at large. Several related terms were coined by Reason, such as “organizational accidents” and “latent failures” (later “latent conditions”). This calling for a new thinking of humans has been widely accepted in human error research. For instance, Woods et al. (2010) raised the premise that “Erroneous actions and assessments are a symptom, not a cause”
HUMAN ERRORS AND HUMAN RELIABILITY
(p. 2). Rather than blaming individuals for errors in a given situation, a more important question is to understand how these people acted in this situation and make sense of their activities (Dekker, 2012).
7
CONCLUSIONS
Human error will remain a focus in various domains where safety is concerned. New technologies will further emphasize the importance of human error research. The existing cognitive theories of human error would also be useful in explaining the findings from a new technological context for quite a long time, from the standpoint that humans remain almost the same; while new theories may be needed to explain specific findings associated with the new interaction methods in new technologies. New error mechanisms and modes would be identified and require the development of the corresponding control measures. New technologies as effective human error control measures are also desired. On the other hand, our understanding of human error is still limited even for the situations that have been studied for a long time. Future research on human error should shift from studying the relationship between inputs and outputs to studying the process to improve our understanding of human cognitive processes.
ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No. 71601139) and the National Key R&D Program of China (No. 2017YFF0208001).
REFERENCES Adhikari, S., Bayley, C., Bedford, T., et al. (2009). Human reliability analysis: A review and critique. Manchester Business School Working Paper No. 589. Manchester: Manchester Business School. Akyuz, E., & Celik, M. (2014). Utilisation of cognitive map in modelling human error in marine accident analysis and prevention. Safety Science, 70, 19–28. Alambeigi, H., McDonald, A. D., & Tankasala, S. R. (2020). Crash themes in automated vehicles: A topic modeling analysis of the California Department of Motor Vehicles Automated Vehicle Crash Database. In Transportation Research Board 99th Annual Meeting. Washington, DC. Trueba Alonso, P., Illobre, L. F., & Ortega Pascual, F. (2015). Experiences in the application of human factors engineering to human-system interface modernization. ATW-International Journal for Nuclear Power, 60(7), 457–461. Alvarenga, M. A. B., & Frutuoso-e-Melo, P. F. (2015). Including severe accidents in the design basis of nuclear power plants: An organizational factors perspective after the Fukushima accident. Annals of Nuclear Energy, 79, 68–77. American Psychological Association. (n.d.a). Working memory. In APA dictionary of psychology- https://dictionary.apa.org/workingmemory. Accessed 15 July 2020. American Psychological Association. (n.d.b). Vigilance. In APA dictionary of psychology - https://dictionary.apa.org/vigilance. Accessed 15 July 2020. American Psychological Association. (n.d.c). Sensory memory. In APA dictionary of psychology - https://dictionary.apa.org/sensorymemory. Accessed 15 July 2020. American Psychological Association. (n.d.d). Perception. In APA dictionary of psychology https://dictionary.apa.org/perception. Accessed 15 July 2020.
565 American Psychological Association. (n.d.e). Attention. In APA dictionary of psychology - https://dictionary.apa.org/attention. Accessed 15 July 2020. American Psychological Association. (n.d.f). Short-term memory. In APA dictionary of psychology - https://dictionary.apa.org/shortterm-memory. Accessed 15 July 2020. American Society of Mechanical Engineers. (2013). Addenda to ASME/ANS RA-S-2008 Standard for level 1/large early release frequency probabilistic risk assessment for nuclear power plant applications (ASME/ANS RA-Sb-2013). Anderson, J. R. (2014). Cognitive psychology and its implications (8th ed.). New York: Worth Publishers. Anderson, M., & Denkl, M. (2010). Heinrich’s accident Ttiangle – Too simplistic a model for hse management in the 21st century? In Proceedings of the SPE International Conference on Health, Safety and Environment in Oil and Gas Exploration and Production. Richardson, Texas: Society of Petroleum Engineers. Anwar, M., He, W., Ash, I., et al. (2017). Gender difference and employees’ cybersecurity behaviors. Computers in Human Behavior, 69, 437–443. Atwood, C. L. (1996). Constrained noninformative priors in risk assessment. Reliability Engineering & System Safety, 53(1), 37–46. Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. Baddeley, A. D., & Hitch, G. (1974). Working memory. In H.B. Gordon (Ed.), The psychology of learning and motivation. New York: Academic Press. Banks, V. A., Eriksson, A., O’Donoghue, J., et al. (2018). Is partially automated driving a bad idea? Observations from an on-road study. Applied Ergonomics, 68, 138–145. Beare, A. N., & Dorris, R. E. (1983). A simulator-based study of human errors in nuclear power plant control room tasks. In Proceedings of the Human Factors Society Annual Meeting, 27(2), 170–174. Bell, J., & Holroyd, J. (2009). Review of human reliability assessment methods (Report No. RR679). London: Health and Safety Executive. Bell, J. L., & Williams, J. C. (2018). Evaluation and consolidation of the HEART human reliability assessment principles, In R. L. Boring (Ed.), Advances in human error, reliability, resilience, and performance (pp. 3–12). Berlin: Springer. Bertolini, M. (2007). Assessment of human reliability factors: A fuzzy cognitive maps approach. International Journal of Industrial Ergonomics, 37(5), 405–413. Bhuvanesh, A., Wang, S., Khasawneh, M., et al. (2008). Applying SHERPA to analyze medication administration in the cardiac telemetry unit. In IIE Annual Conference and Exposition (pp. 1677–1682). Bligård, L. O., & Osvalder, A. L. (2014). Predictive use error analysis: Development of AEA, SHERPA and PHEA to better predict, identify and present use errors. International Journal of Industrial Ergonomics, 44(1), 153–170. Boring, R. L. (2007). Dynamic human reliability analysis: Benefits and challenges of simulating human performance. In T. Aven & J. E. Vinnem (Eds.), Proceedings of the European Safety and Reliability Conference 2007, ESREL 2007: Risk, Reliability and Societal Safety (Vol. 2, pp. 1043–1049). Boring, R. L. (2012). Fifty years of THERP and human reliability analysis. In Proceedings of the 11th International Probabilistic Safety Assessment and Management Conference (PSAM11) and the Annual European Safety and Reliability Conference (ESREL 2012) (Vol. 5, pp. 3523–3532). Boring, R. L., & Blackman, H. S. (2007). The origins of the SPAR-H method’s performance shaping factor multipliers. In Proceedings of the Joint 8th IEEE Conference on Human Factors and Power Plants and 13th Annual Workshop on Human Performance/Root Cause/Trending/Operating Experience/Self-Assessment (pp. 177–184). Boring, R. L., Griffith, C. D., & Joe, J. C. (2007). The Measure of human error: Direct and indirect performance shaping factors.
566 In 2007 IEEE 8th Human Factors and Power Plants and HPRCT 13th Annual Meeting (pp. 170–176). Boring, R. L., Hendrickson, S. M. L., Forester, J. A., et al. (2010). Issues in benchmarking human reliability analysis methods: A literature review. Reliability Engineering & System Safety, 95(6), 591–605. Boring, R. L., Ulrich, T. A., Lew, R., Kovesdi, C. R., & Rashdan, A.A. (2019). A comparison study of operator preference and performance for analog versus digital turbine control systems in control room modernization. Nuclear Technology, 205(4), 507–523. Broadbent, D. E. (1958). Perception and communication. Oxford: Oxford University Press. Burns, K., & Bonaceto, C. (2020). An empirically benchmarked human reliability analysis of general aviation. Reliability Engineering & System Safety, 194, 106227. Bye, A., Laumann, K., Taylor, C., et al. (2017). The Petro-HRA guideline (IFE/HR/E-2017/001). Halden, Norway: Institute for Energy Technology. Bye, A., Lois, E., Dang, V. N., et al. (2010). The International HRA Empirical Study – Phase 2 report: Results from comparing HRA methods predictions to HAMMLAB simulator data on SGTR scenario (Report No. HWR-915). Halden, Norway: OECD Halden Reactor project. Calhoun, J., Savoie, C., Randolph-Gips, M., et al. (2014). Human reliability analysis in spaceflight applications, Part 2: Modified CREAM for spaceflight. Quality & Reliability Engineering International, 30(1), 3–12. Canfield, C. I., Fischhoff, B., & Davis, A. (2016). Quantifying phishing susceptibility for detection and behavior decisions. Human Factors, 58(8), 1158–1172. Cha, K., & Lee, H. (2019). A novel qEEG measure of teamwork for human error analysis: An EEG hyperscanning study. Nuclear Engineering and Technology, 51(3), 683–691. Chandler, F. T., Chang, Y. H., Mosleh, A., et al. (2006). Human reliability analysis methods selection guidance for NASA. Washington, DC: National Aeronautics and Space Administration. Chang, Y. J., Bley, D., Criscione, L., et al. (2014). The SACADA database for human reliability and human performance. Reliability Engineering & System Safety, 125, 117–133. Chikudate, N. (2009). If human errors are assumed as crimes in a safety culture: A lifeworld analysis of a rail crash. Human Relations, 62, 1267–1287. Cho, W.C., & Ahn, T.H. (2019). A classification of electrical component failures and their human error types in South Korean NPPs during last 10 years. Nuclear Engineering and Technology, 51, 709–718. Chu, Y., & MacGregor, J. N. (2011). Human performance on insight problem solving: A review. The Journal of Problem Solving, 3(2), 6. Comer, M. K., Seaver, D. A., Stillwell, W. G., et al. (1984). Generating human reliability estimates using expert judgment volume 1. Main report (Report No. NUREG/ CR-3688/ 1 of 2). Washington, DC: US Nuclear Regulatory Commission. CompTIA (2016). International trends in cybersecurity - Overview. https://comptiacdn.azureedge.net/webcontent/docs/defaultsource/research-reports/comptia-intnl-security-snapshotoverview.pdf?sfvrsn=d2a3ea41_2. Accessed 15 July 2020. Cooper, S. E., Ramey-Smith, A. M., Wreathall, J., et al. (1996). A Technique for Human Error Analysis (ATHEANA): Technical basis and methodology description (Report No. NUREG/CR-6350). Washington, DC: US Nuclear Regulatory Commission. Dallat, C., Salmon, P. M., & Goode, N. (2019). Risky systems versus risky people: To what extent do risk assessment methods consider the systems approach to accident causation? A review of the literature. Safety Science, 119, 266–279. Damacharla, P., Javaid, A.Y., & Devabhaktuni, V. K. (2018). Human error prediction using eye tracking to improvise team cohesion in human-machine teams. In Proceedings of the 9th International Conference on Applied Human Factors and Ergonomics, 21–25 July; Orlando, FL.
DESIGN FOR HEALTH, SAFETY, AND COMFORT Davoudian, K., Wu, J. S., & Apostolakis, G. (1994). Incorporating organizational factors into risk assessment through the analysis of work processes. Reliability Engineering & System Safety, 45(1), 85–105. De Ambroggi, M., & Trucco, P. (2011). Modelling and assessment of dependent performance shaping factors through Analytic Network Process. Reliability Engineering & System Safety, 96(7), 849–860. Dekker, S. (2011). The criminalization of human error in aviation and healthcare: A review. Safety Science, 49, 121–127. Dekker, S. (2012). Just culture: Balancing safety and accountability. Farnham: Ashgate Publishing Ltd. Demeulenaere, X. (2020). How challenges of human reliability will hinder the deployment of semi-autonomous vehicles. Technological Forecasting and Social Change, 157, Article120093. Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80–90. Dhillon, B. S. (2003). Human reliability and error in medical system. Singapore: World Scientific Publishing. Dhillon, B. S. (2018). Safety, reliability, human factors, and human error in nuclear power plants. Boca Raton, FL: CRC Press. Di Pasquale, V., Miranda, S., Iannone, R., et al. (2015). A Simulator for Human Error Probability Analysis (SHERPA). Reliability Engineering & System Safety, 139, 17–32. Dong, C. L., Zhou, Z. X., & Zhang, Q. (2018). Cubic dynamic uncertain causality graph: A new methodology for modeling and reasoning about complex faults with negative feedbacks. IEEE Transactions on Reliability, 67(3), 920–932 Dougherty, E. M. (1990). Human reliability analysis—Where shouldst thou turn? Reliability Engineering & System Safety, 29(3), 283–299. Edwards, W. (1968). Conservatism in human information processing. In B. Kleinmuntz (Ed.), Formal representations of human judgment (pp. 17–52). New York: Wiley. Ekanem, N. J., Mosleh, A., & Shen, S.-H. (2016). Phoenix–A model-based Human Reliability Analysis methodology: Qualitative analysis procedure. Reliability Engineering & System Safety, 145, 301–315. Embrey, D. E. (1986a). SHERPA: A systematic human error reduction and prediction approach. In Proceedings of the International Topical Meeting on Advances in Human Factors in Nuclear Power Systems (pp. 184–193). Embrey, D. E. (1986b). SLIM-MAUD: A computer-based technique for human reliability assessment. International Journal of Quality & Reliability Management, 3(1), 5–12. Embrey, D. E. (1992). Incorporating management and organizational factors into probabilistic safety assessment. Reliability Engineering & System Safety, 38(1), 199–208. Embrey, D. E., Humphreys, P., Rosa, E. A., et al. (1984). An approach to assessing human error probabilities using structured expert judgment. Washington, DC: US Nuclear Regulatory Commission. Endsley, M. R. (1988). Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society Annual Meeting (Vol. 32, No. 2, pp. 97–101). Thousand Oaks, CA: SAGE. Endsley, M. R. (2017). Autonomous driving systems: A preliminary naturalistic study of the Tesla Model S. Journal of Cognitive Engineering and Decision Making, 11(3), 225–238. Evans, M., He, Y., Maglaras, L., et al. (2019). Evaluating information security core human error causes (IS-CHEC) technique in public sector and comparison with the private sector. International Journal of Medical Informatics, 127, 109–119. Favarò, F. M., Eurich, S. O., & Rizvi, S. S. (2019). “Human” problems in semi-autonomous vehicles: Understanding drivers’ reactions to off-nominal scenarios. International Journal of Human–Computer Interaction, 35(11), 956–971. Feggetter, A. J. (1982). A method for investigating human factor aspects of aircraft accidents and incidents. Ergonomics, 25, 1065–1075.
HUMAN ERRORS AND HUMAN RELIABILITY Fitts, P. M. (1966). Cognitive aspects of information processing: III. Set for speed versus accuracy. Journal of Experimental Psychology, 71(6), 849–857. Flin, R. H., O’Connor, P., & Crichton, M. (2008). Safety at the sharp end: A guide to non-technical skills. Farnham: Ashgate Publishing Ltd. Forester, J., Bley, D., Cooper, S., et al. (2004). Expert elicitation approach for performing ATHEANA quantification. Reliability Engineering & System Safety, 83 (2), 207–220. Forester, J., Dang, V.N., Bye, A., et al. (2014). The International HRA Empirical Study: Lessons learned from comparing HRA methods predictions to HAMMLAB simulator data (Report No. NUREG-2127). Washington, DC: US Nuclear Regulatory Commission. Galotti, K. M. (2007). Cognitive psychology: In and out of the laboratory. Belmont, CA: Wadsworth Publishing. Gentner, D. Q., & Stevens, A. L. (Eds.). (1983). Mental models. Mahwah, NJ: Erlbaum. Gertman, D .I., Blackman, H., Marble, J., et al. (2005). The SPAR-H Human Reliability Analysis Method (Report No. NUREG/CR-6883). Washington, DC: US Nuclear Regulatory Commission. Ghasemi, M., Nasleseraji, J., Hoseinabadi, S., et al. (2013). Application of SHERPA to identify and prevent human errors in control units of petrochemical industry. International Journal of Occupational Safety and Ergonomics, 19(2), 203–209. Gibson, W. H., & Kirwan, B. (2008). Application of the CARA HRA tool to air traffic management safety cases. In Proceedings of 9th International Conference on Probabilistic Safety Assessment and Management (PSAM 9). Gibson, W. H., Kirwan, B., Kennedy, R., et al. (2008). Nuclear Action Reliability Assessment (NARA), further development of a data-based HRA tool. In Proceedings of the International Conference on Contemporary Ergonomics (CE2008). Gould, K. P. (2020). Organizational risk: “Muddling through” 40 years of research. Risk Analysis., 41(3), 456–465. Graziano, A., Teixeira, A. P., & Soares, C.G. (2016). Classification of human errors in grounding and collision accidents using the TRACEr taxonomy. Safety Science, 86, 245–257. Groth, K. M., & Mosleh, A. (2012). A data-informed PIF hierarchy for model-based human reliability analysis. Reliability Engineering & System Safety, 108, 154–174. Groth, K. M., & Swiler, L. P. (2013). Bridging the gap between HRA research and HRA practice: A Bayesian network version of SPAR-H. Reliability Engineering & System Safety, 115, 33–42. Hagen, E. W., & Mays, G. T. (1981). Human factors engineering in the US nuclear arena. Nuclear Safety, 22(3), 337–346. Hallbert, B., Boring, R., Gertman, D., et al. (2006). Human Event Repository and Analysis (HERA) System, Overview. Report No. NUREG/CR-6903, Vol. 1. Washington, DC: US Nuclear Regulatory Commission. Hallbert, B., Gertman, D., Lois, E., et al. (2004). The use of empirical data sources in HRA. Reliability Engineering & System Safety, 83(2), 139–143. Ham, D-H., & Park, J. (2020). Use of a big data analysis technique for extracting HRA data from event investigation reports based on the Safety-II concept. Reliability Engineering & System Safety, 194, 106232. Hamim, O. F., Hoque, M. S., Mcllroy R. C., et al. (2020). A sociotechnical approach to accident analysis in a low-income setting: Using Accimaps to guide road safety recommendations in Bangladesh. Safety Science, 124, 104589. Hammer, W. (1972). Handbook of system and product safety. Englewood Cliffs NJ: Prentice Hall. Hammerton, M. (1973). A case of radical probability estimation. Journal of Experimental Psychology, 101(2), 252–254.
567 Hannaman, G. W., Spurgin, A. J., & Lucki, Y. (1984). Human cognitive reliability model for PRA Analysis (Report No. NUS-4531). Washington, DC: Electric Power Research Institute. He, X., Wang, Y., Shen, Z., et al. (2008). A simplified CREAM prospective quantification process and its application. Reliability Engineering & System Safety, 93(2), 298–306. Hee, D. D., Pickrell, B. D., Bea, R. G., et al. (1999). Safety Management Assessment System (SMAS): A process for identifying and evaluating human and organization factors in marine system operations with field test results. Reliability Engineering and System Safety, 65, 125–140. Heick, T. (2019). The cognitive bias codex: A visual of 180+ cognitive biases. https://www.teachthought.com/critical-thinking/ the-cognitive-bias-codex-a-visual-of-180-cognitive-biases. Accessed July 15, 2020. Hickling, E. M., & Bowie, J. E. (2013). Applicability of human reliability assessment methods to human-computer interfaces. Cognition, Technology & Work, 15(1), 19–27. Himeno, Y., Nakamura, T., Terunuma, S., et al. (1992). Improvement of man-machine interaction by artificial-intelligence for advanced reactors. Reliability Engineering & System Safety, 38(1–2), 135–144. Hollnagel, E. (1998). Cognitive reliability and error analysis method. Amsterdam: Elsevier. Hollnagel, E. (2012). FRAM: The functional resonance analysis method, modelling complex socio-technical systems. Farnham: Ashgate Publishing Ltd. Hughes, C. M. L., Baber, C., Bienkiewicz. M., et al. (2015). The application of SHERPA (Systematic Human Error Reduction and Prediction Approach) in the development of compensatory cognitive rehabilitation strategies for stroke patients with left and right brain damage. Ergonomics, 58(1), 75–95. Illankoon, P., & Tretten, P. (2019). Judgemental errors in aviation maintenance. Cognition, Technology & Work. https://doi.org/10.1007/ s10111-019-00609-9. Accessed 15 July 2020. Institute of Electrical and Electronics Engineers. (2017). IEEE Guide for incorporating human reliability analysis into probabilistic risk assessments for nuclear power generating stations and other nuclear facilities (IEEE Standard No. 1082-2017). International Atomic Energy Agency. (1995). Experience with strengthening safety culture in nuclear power plants (IAEA TECDOC No. 821). International Atomic Energy Agency. (1998). Developing safety culture in nuclear activities (IAEA Safety Report Series No. 11). International Atomic Energy Agency. (2013). Managing human performance to improve nuclear facility operation (IAEA Nuclear Energy Series NG-T-2.7). International Atomic Energy Agency. (2016). Attributes of full scope level 1 probabilistic safety assessment (PSA) for applications in nuclear power plants (IAEA TECDOC No. 1804). International Civil Aviation Organization. (2013). Safety management manual (3rd Ed., ICAO Doc 9859). International Nuclear Safety Advisory Group. (2002). Key practical issues in strengthening safety culture (Report No. IAEA INSAG-15). International Organization for Standardization. (2019). Human-centered design for interactive systems (ISO Standard No. 9241-210: 2019). Jahangiri, M., Derisi, F. Z., & Hobobi, N. (2015). Predictive human error analysis in permit to work system in a petrochemical plant. In Proceedings of the 25th European Safety and Reliability Conference (ESREL 2015) (pp. 1007–1010). Jenkinson, J., Shaw, R., & Andow, P. (1991). Operator support systems and artificial-intelligence. Reliability Engineering & System Safety, 33(3), 419–437. Jung, W., Park, J., Kim, Y., et al. (2020). HuREX – A framework of HRA data collection from simulators in nuclear power plants. Reliability Engineering & System Safety, 194, Article 106235.
568 Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80(4), 237–251. Kahneman, D., & Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39(4), 341–350. Kanizsa, G. (1955). Margini quasi-percettivi in campi con stimolazione omogenea. Rivista di Psicologia, 49(1), 7–30. Kim, A. R., Kim, J. H., Jang, I., et al. (2018). A framework to estimate probability of diagnosis error in NPP advanced MCR. Annals of Nuclear Energy, 111, 31–40. Kim, J. W., & Jung, W. (2003). A taxonomy of performance influencing factors for human reliability analysis of emergency tasks. Journal of Loss Prevention in the Process Industries, 16(6), 479–495. Kim, Y., Kim, J., Park, J., et al. (2019). An HRA method for digital main control rooms -Part I: Estimating the failure probability of timely performance (Report No. KAERI/TR-7607/2019). Korean Atomic Energy Research Institute, South Korea. Kim, Y., Park, J., Jung, W., et al. (2015). A statistical approach to estimating effects of performance shaping factors on human error probabilities of soft controls. Reliability Engineering & System Safety, 142, 378–387. Kim, Y., Park, J., Jung, W., et al. (2018). Estimating the quantitative relation between PSFs and HEPs from full-scope simulator data. Reliability Engineering & System Safety, 173, 12–22. Kirwan, B. (1992). Human error identification in human reliability assessment. Part 1: Overview of approaches. Applied Ergonomics, 23(5), 299–318. Kirwan, B. (1994). A guide to practical human reliability assessment. London: Taylor & Francis. Kirwan, B. (1997). Validation of human reliability assessment techniques: Part 2 — Validation results. Safety Science, 27(1), 43–75. Kirwan, B. (2008). Human reliability assessment, In E.L. Melnick, B.S. Everitt, (Eds.), Encyclopedia of quantitative risk analysis and assessment (pp. 853–873). Hoboken, NJ: Wiley. Kirwan, B., Basra, G., & Taylor-Adams, S. E. (1997). CORE-DATA: a computerized human error database for human reliability support. In Proceedings of the 1997 IEEE Sixth Conference on Human Factors and Power Plants (pp. 9-7–9-12). IEEE. Kirwan, B., Kennedy, R., Taylor-Adams, S. E., et al. (1997). The validation of three human reliability quantification techniques THERP, HEART and JHEDI: Part II – Results of validation exercise. Applied Ergonomics, 28(1), 17–25. Klein, G. A. (1993). A recognition-primed decision (RPD) model of rapid decision making. In G. A. Klein, J. Orasanu, R. Calderwood, et al. (Eds.), Decision making in action: Models and methods (pp. 138–147). Norwood, NJ: Ablex Publishing. Klein, G., Phillips, J. K., Rall, E. L., et al. (2007). A data-frame theory of sensemaking. In Proceedings of the Sixth International Conference on Naturalistic Decision Making (pp. 113–155). Mahwah, NJ: Lawrence Erlbaum Assoc Inc. Klein, G., Ross, K. G., Moon, B. M., et al. (2003). Macrocognition. IEEE Intelligent Systems, 18(3), 81–85. Kletz, T. (2001). An engineer’s view of human error. New York: Routledge. Kolaczkowski, A., Forester, J., Lois, E., et al. (2005). Good practices for implementing human reliability analysis (HRA) (Report No. NUREG-1792). Washington, DC: US Nuclear Regulatory Commission. Kontogiannis, T., & Malakis, S. (2009). A proactive approach to human error detection and identification in aviation and air traffic control. Safety Science, 47(5), 693–706. Kraemer, S., Carayon, P., & Clem, J. (2009). Human and organizational factors in computer and information security: Pathways to vulnerabilities. Computers & Security, 28(7), 509–520. Kubicek, J. (2009). Using human event data to validate PSF multipliers: a proof-of-concept study. In Workshop proceedings of simulator studies for HRA purposes. Budapest, Hungary: Nuclear Energy Agency,
DESIGN FOR HEALTH, SAFETY, AND COMFORT Kyriakidis, M., Kant, V., Amir, S., et al. (2018). Understanding human performance in sociotechnical systems: Steps towards a generic framework. Safety Science, 107, 202–215. Kyriakidis, M., Majumdar, A., & Ochieng, W. Y. (2015). Data based framework to identify the most significant performance shaping factors in railway operations. Safety Science, 78, 60–76. Kyriakidis, M., Majumdar, A., & Ochieng, W. Y. (2018). The human performance railway operational index: A novel approach to assess human performance for railway operations. Reliability Engineering & System Safety, 170, 226–243. Lamb, S., & Kwok, K. C. S. (2016). A longitudinal investigation of work environment stressors on the performance and wellbeing of office workers. Applied Ergonomics, 52, 104–111. LaPorte, T. R., & Consolini, P. M. (1991). Working in practice but not in theory: Theoretical challenges of “high-reliability organizations”. Journal of Public Administration Research and Theory, 1(1), 19–48. Laumann, K. (2020). Criteria for qualitative methods in human reliability analysis. Reliability Engineering & System Safety, 194, Article 106198. Laumann, K., & Rasmussen, M. (2016). Suggested improvements to the definitions of Standardized Plant Analysis of Risk-Human Reliability Analysis (SPAR-H) performance shaping factors, their levels and multipliers and the nominal tasks. Reliability Engineering & System Safety, 145, 287–300. Levenson, N. G. (2012). Engineering a safer world: Systems thinking applied to safety. Cambridge, MA: MIT Press. Liao, H., & Chang, J.L. (2011). Human performance in control rooms of nuclear power plants: A survey study. Human Factors and Ergonomics in Manufacturing & Service Industries, 21(4), 412–428. Liao, H., Forester, J., Dang, V. N., et al. (2019a). Assessment of HRA method predictions against operating crew performance: Part I: Study background, design and methodology. Reliability Engineering & System Safety, 191, Article 106509. Liao, H., Forester, J., Dang, V. N., et al. (2019b). Assessment of HRA method predictions against operating crew performance: Part II: Overall simulator data, HRA method predictions, and intra-method comparisons. Reliability Engineering & System Safety, 191, Article 106510. Liao, H., Forester, J., Dang, V. N., et al. (2019c). Assessment of HRA method predictions against operating crew performance: Part III: Conclusions and achievements. Reliability Engineering & System Safety, 191, Article 106511. Liginlal, D., Sim, I., & Khansa, L. (2009). How significant is human error as a cause of privacy breaches? An empirical study and a framework for error management. Computers & Security, 28(3), 215–228. Linkov, V., Zámeˇcník, P., Havlíˇcková, D., et al. (2019). Human factors in the cybersecurity of autonomous vehicles: Trends in current research. Frontiers in Psychology, 10, Article 995. Liu, P., Du, Y., Wang, L., et al. (2020). Ready to bully automated vehicles on public roads? Accident Analysis & Prevention, 137, Article 105457. Liu, P., & Li, Z. Z. (2014). Human error data collection and comparison with predictions by SPAR-H. Risk Analysis, 34(9), 1706– 1719. Liu, P., & Li, Z. Z. (2016a). Comparison between conventional and digital nuclear power plant main control rooms: A task complexity perspective, Part I: Overall results and analysis. International Journal of Industrial Ergonomics, 51(1), 2–9. Liu, P., & Li, Z. Z. (2016b). Comparison between conventional and digital nuclear power plant main control rooms: A task complexity perspective, Part II: Detailed analysis and results. International Journal of Industrial Ergonomics, 51(1), 10–20. Liu, P., & Li, Z. Z. (2020). Quantitative relationship between time margin and human reliability. International Journal of Industrial Ergonomics, 78, Article 102977.
HUMAN ERRORS AND HUMAN RELIABILITY Liu, P., & Liu, J. (2020). Combined effect of multiple performance shaping factors on human reliability: Multiplicative or additive? International Journal of Human–Computer Interaction, 36(9), 828–838. Liu, P., Lv, X., Li, Z. Z., et al. (2016). Conceptualizing performance shaping factors in main control rooms of nuclear power plants: A preliminary study. In Lecture Notes in Computer Science, vol. 9736, Proceedings of the 13th International Conference on Engineering Psychology and Cognitive Ergonomics, held as part of HCI International 2016 (pp. 322–333). Cham: Springer. Liu, P., Lyu, X., Qiu, Y. P., et al. (2017). Identifying key performance shaping factors in digital main control rooms of nuclear power plants: A risk-based approach. Reliability Engineering & System Safety, 167, 264–275. Liu, P., Qiu, Y. P., Hu, J. T., et al. (2020). Expert judgments for performance shaping factors’ multiplier design in human reliability analysis. Reliability Engineering & System Safety, 194, Article 106343. Liu, P., Wang, L., & Vincent, C. (2020). Self-driving vehicles against human drivers: Equal safety is far from enough. Journal of Experimental Psychology: Applied, 26 (4), 692–704. Liu, P., & Wang, R. (2019). Public attitudes toward technological hazards after a technological disaster: Effects of the 2015 Tianjin Port explosion, Tianjin, China. Disaster Prevention and Management, 28(2), 216–227. Liu, P., Yang, R., & Xu, Z. (2019). How safe is safe enough for self-driving vehicles? Risk Analysis, 39(2), 315–325. Liu, R., Cheng, W., Yu, Y., et al. (2019). An impacting factors analysis of miners’ unsafe acts based on HFACS-CM and SEM. Process Safety and Environmental Protection, 122, 221–231. Lois, E., Dang, V. N., Forester, J., et al. (2009). International HRA empirical study-phase 1 report description of overall approach and pilot phase results from comparing HRA methods to simulator performance data (Report No. NUREG/IA-0216, Vol. 1). Washington, DC: US Nuclear Regulatory Commission. Lyu, X., & Li, Z. Z. (2019). Predictors for human performance in information seeking, information integration, and overall process in diagnostic tasks. International Journal of Human–Computer Interaction, 35(19), 1831–1841. Mack, A. & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press. Maier, N. R. (1931). Reasoning in humans. II. The solution of a problem and its appearance in consciousness. Journal of Comparative Psychology, 12(2), 181–194. Makary, M., & Daniel M. (2016). Medical error—The third leading cause of death in the US. British Medical Journal, 2016, Article 353 Marseguerra, M., Zio, E., & Librizzi, M. (2007). Human reliability analysis by fuzzy “CREAM”. Risk Analysis, 27(1), 137–154. Martin, R.P., & Nassersharif, B. (1990). A best-estimate paradigm for diagnosis of multiple failure transients in nuclear-power-plants using artificial-intelligence. Nuclear Technology, 91(3), 297–310. Massaiu, S., & Fernandes, A. (2017). Comparing operator reliability in analog vs. digital human-system interfaces: An experimental study on identification tasks. In PSAM Topical Conference on Human Reliability, Quantitative Human Factors, and Risk Management. Munich, Germany. McBride, S. E., Rogers, W. A., & Fisk, A. D. (2014). Understanding human management of automation errors. Theoretical Issues in Ergonomics Science, 15(6), 545–577. Meister, D. (1962). The problem of human-initiated failures. In Proceedings of the 8th National Symposium on Reliability and Quality Control (pp. 234–239). Meister, D. (1976). Behavioral foundations of system development. Hoboken, NJ: Wiley. Metcalfe, J., & Wiebe, D. (1987). Intuition in insight and noninsight problem solving. Memory & Cognition, 15(3), 238–246.
569 Militello, L. G., & Hutton, R. J. (1998). Applied Cognitive Task Analysis (ACTA): A practitioner’s toolkit for understanding cognitive task demands. Ergonomics, 41(11), 1618–1641. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. Mitnick, K. D., & Simon, W. L. (2002). The art of deception: Controlling the human element of security. Hoboken, NJ: Wiley. Mkrtchyan, L., Podofillini, L., & Dang, V. N. (2016). Methods for building conditional probability tables of Bayesian belief networks from limited judgment: An evaluation for human reliability application. Reliability Engineering & System Safety, 151, 93–112. Mohaghegh, Z., Kazemi, R., & Mosleh, A. (2009). Incorporating organizational factors into Probabilistic Risk Assessment (PRA) of complex socio-technical systems: A hybrid technique formalization. Reliability Engineering & System Safety, 94(5), 1000–1018. Moore W. H. (1993). Management of human error in operations of marine systems. Doctoral dissertation, University of California, Berkeley. University Microfilms International. Mosleh, A., Bier, V. M., & Apostolakis, G. (1988). A critique of current practice for the use of expert opinions in probabilistic risk assessment. Reliability Engineering & System Safety, 20(1), 63–85. Mouloua, S. A., Ferraro, J., Mouloua, M., et al. (2019). Trend analysis of cyber security research published in HFES proceedings from 1980 to 2018. In Human Factors and Ergonomics Society Annual Meeting 2019 (pp. 1600–1604). Thousand Oaks, CA: SAGE Publications Inc. Müller-Lyer, F. C. (1889). Optische Urteilstäuschungen. Archiv für Physiologie Suppl., 263–270. National Research Council (1997). Digital instrumentation and control systems in nuclear power plants: Safety and reliability issues. National Academy Press. O’Hara, J. M., Brown, W. S., Lewis, P. M., & Persensky, J. J. (2002). The effects of interface management tasks on crew performance and safety in complex, computerbased systems: Overview and main findings (NUREG/CR-6690, Vol. 1). Washington, DC: U.S. Nuclear Regulatory Commission. O’Hara, J. M. ,& Fleger S. (2020). Human-system interface design review guidelines (Report No. NUREG-0700 Rev. 3). Washington, DC: US Nuclear Regulatory Commission. O’Hare, D. (2000). The ‘Wheel of Misfortune’: A taxonomic approach to human factors in accident investigation and analysis in aviation and other complex systems. Ergonomics, 43(12), 2001–2019. Onofrio, R., & Trucco, P. (2020). A methodology for Dynamic Human Reliability Analysis in robotic surgery. Applied Ergonomics, 88, Article 103150. Palmer, S. E. (1999). Vision science: Photons to phenomenology. Cambridge, MA: MIT Press. Palmer, S. E. (2002). Perceptual organization in vision. In H. Pashler & S. Yantis (Eds.), Stevens’ handbook of experimental psychology, Vol. 1, Sensation and perception (3rd ed., p. 177–234). Hoboken, NJ: Wiley. Pandya, D., Podofillini, L., Emert, F., et al. (2020). Quantification of a human reliability analysis method for radiotherapy applications based on expert judgment aggregation. Reliability Engineering & System Safety, 194, Article 106489. Parasuraman, R. (2003). Neuroergonomics: Research and practice. Theoretical issues in Ergonomics Science, 4(1–2), 5–20. Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. Park, J., & Jung, W. (2007). OPERA-A human performance database under simulated emergencies of nuclear power plants. Reliability Engineering & System Safety, 92(4), 503–519. Park, J., Kim, Y., & Jung, W. (2018). Calculating nominal human error probabilities from the operation experience of domestic nuclear power plants. Reliability Engineering & System Safety, 170, 215–225.
570 Park, K. S., & Lee, J. I. (2008). A new method for estimating human error probabilities: AHP–SLIM. Reliability Engineering & System Safety, 93(4), 578–587. Paté-Cornell, M. E., & Murphy, D. M. (1996). Human and management factors in probabilistic risk analysis: The SAM approach and observations from recent applications. Reliability Engineering & System Safety, 53(2), 115–126. Patterson, E. S., & Hoffman, R. R. (2012). Visualization framework of macrocognition functions. Cognition, Technology & Work, 14(3), 221–227. Payne, D., & Altman, J. (1965). An index of electronic equipment operability. Washington, DC: American Institute of Research. Pence, J., & Mohaghegh, Z. (2020). A discourse on the incorporation of organizational factors into probabilistic risk assessment: Key questions and categorical review. Risk Analysis, 40(6), 1183–1211. Perrow, C. (1984). Normal accidents. New York: Basic Books. Peters, G. A., & Peters, B. J. (2007). Medical error and patient safety: Human factors in medicine. Boca Raton, FL: CRC Press. Petrovi´c, Ð., Mijailovi´c, R., & Peši´c, D. (2020). Traffic accidents with autonomous vehicles: Type of collisions, manoeuvres and errors of conventional vehicles’ drivers. Transportation Research Procedia, 45, 161–168. Petruni, A., Giagloglou, E., Douglas, E., et al. (2019). Applying Analytic Hierarchy Process (AHP) to choose a human factors technique: Choosing the suitable Human Reliability Analysis technique for the automotive industry. Safety Science, 119, 229–239. Podofillini, L., Dang, V., Zio, E., et al. (2010). Using expert models in human reliability analysis—A dependence assessment method based on fuzzy logic. Risk Analysis, 30(8), 1277–1297. Porthin, M., Liinasuo, M., & Kling, T. (2020). Effects of digitalization of nuclear power plant control rooms on human reliability analysis: A review. Reliability Engineering & System Safety, 194, Article 106415. Preischl, W., & Hellmich, M. (2013). Human error probabilities from operational experience of German nuclear power plants. Reliability Engineering & System Safety, 109, 150–159. Preischl, W., & Hellmich, M. (2016). Human error probabilities from operational experience of German nuclear power plants, Part II. Reliability Engineering & System Safety, 148, 44–56. Proctor, R. W., & Chen, J. (2015). The role of Human Factors/Ergonomics in the science of security: Decision making and action selection in cyberspace. Human Factors, 57(5), 721–727. Pugh, C. M., Law, K. E., Cohen, E. R., et al. (2020). Use of error management theory to quantify and characterize residents’ error recovery strategies. The American Journal of Surgery, 219(2), 214–220. Quillian, M. R. (1966). Semantic memory. New York: Bolt, Beranak and Newman. Ramos, M. A., Thieme, C. A., Utne, I. B., et al. (2020). A generic approach to analyzing failures in human: System interaction in autonomy. Safety Science, 129, Article 104808. Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Transactions on Systems, Man, and Cybernetics, 13(3), 257–266. Rasmussen, J. (1997). Risk management in a dynamic society: A modelling problem. Safety Science, 27(2–3), 183–213. Reason, J. (1990). Human error. Cambridge: Cambridge University Press. Reason J. (1997). Managing the risks of organizational accidents. Farnham: Ashgate Publishing Ltd. Reason J. (1998). Achieving a safe culture: Theory and practice. Work and Stress, 12(3), 293–306 Reason, J. (2000). Human error: Models and management. British Medical Journal, 320(7237), 768–770. Reason, J. (2013). A life in error: From little slips to big disasters. Farnham: Ashgate Publishing Ltd.
DESIGN FOR HEALTH, SAFETY, AND COMFORT Reason, J., & Hobbs, A. (2003). Managing maintenance errors: A practical guide. FarnhamL Ashgate Publishing Ltd. Reece, W. J., Gilbert, B. G., & Richards, R.E. (1994). Nuclear computerized library for assessing reactor reliability (nuclear) data manual, Part 2: Human error probability (HEP) data (Report No. NUREG/CR-4639, EGG-2458, V. 5 Rev. 4). Washington, DC: US Nuclear Regulatory Commission. Reifman, J. (1997). Survey of artificial intelligence methods for detection and identification of component faults in nuclear power plants. Nuclear Technology, 119(1), 76–97. Reinach, S., & Viale, A. (2006). Application of a human error framework to conduct train accident/incident investigations. Accident Analysis and Prevention, 38(2), 396–496. Rose, J. A., & Bearman, C. (2012). Making effective use of task analysis to identify human factors issues in new rail technology. Applied Ergonomics, 43(3), 614–624. Rumelhart, D. E., & Ortony, A. (1976). The representation of knowledge in memory. In R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.), Semantic factors in cognition (pp. 99–135). Mahwah, NJ: Erlbaum. Saaty, T. L., & Vargas, L. G. (2006). Decision making with the analytic network process: Economic, political, social and technological applications with benefits, opportunities, costs and risks. Berlin: Springer. Salmon, P. M., Cornelissen, M., & Trotter, M. J. (2012). Systems-based accident analysis methods: A comparison of Accimap, HFACS, and STAMP. Safety Science, 50(4), 1158–1170. Salo, L., Laarni, J., & Savioja, P. (2006). Operator experiences on working in screen-based control rooms. In Proceedings of the 5th ANS International Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human Machine Interface Technology (pp. 451–458). American Nuclear Society. Sanders, M. S., & McCormick, E. J. (1993). Human factors in engineering and design (7th ed.). New York: McGraw-Hill. Sarter, N. B. (2008). Investigating Mode errors on automated flight decks: Illustrating the problem-driven, cumulative, and interdisciplinary nature of human factors research. Human Factors, 50(3), 506–510. Sarter, N. B., & Woods, D. D. (1995). How in the world did we ever get into that mode? Mode error and awareness in supervisory control. Human Factors, 37(1), 5–19. Sarter, N.B., Woods, D. D., & Billings, C. E. (1997). Automation surprises, In G. Salvendy (Ed.), Handbook of human factors & ergonomics (2nd ed.). Hoboken, NJ: Wiley. Sasangohar, F., & Cummings, M. L. (2010). Human-system interface complexity and opacity, part II: Methods and tools to assess HSI complexity (Report No. HAL2010-03). Cambridge, MA: Massachusetts Institute of Technology. Sasse, M. A., Brostoff, S., & Weirich, D. (2001). Transforming the ‘weakest link’: A human/computer interaction approach to usable and effective security. BT Technology Journal, 19(3), 122–131. Schwarting, W., Pierson, A., Alonso-Mora, J., et al. (2019). Social behavior for autonomous vehicles. Proceedings of the National Academy of Sciences of the United States of America, 116(50), 24972–24978. Seaver, D. A., & Stillwell, W. G. (1983). Procedures for using expert judgment to estimate human error probabilities in nuclear power plant operations (Report No. NUREG/CR-2743). Washington, DC: US Nuclear Regulatory Commission. Selfridge, O. G. (1955). Pattern recognition and modern computers. In Proceedings of the Western Joint Computer Conference (pp. 91–93). Association for Computing Machinery. Sellen, A. J. (1994). Detection of everyday errors. Applied Psychology, 43(4), 475–498. Senders, J. W., & Moray, N. P. (1991). Human error: Cause, prediction and reduction. Hillsdale, NJ: Lawrence Erlbaum Associates.
HUMAN ERRORS AND HUMAN RELIABILITY Shakerian, M., Jahangiri, M., Alimohammadlou, M., et al. (2019). Individual cognitive factors affecting unsafe acts among Iranian industrial workers: An integrative meta-synthesis interpretive structural modeling (ISM) approach. Safety Science, 120, 89–98. Shappell, S. A., Detwiler, C., Holcomb K., et al. (2006). Human error and commercial aviation accidents:A comprehensive, fine-grained analysis using HFACS (Report No. DOT/FAA/AM-06/18). Washington, DC: US Department of Transportation, Federal Aviation Administration. Shappell, S. A., & Wiegmann, D. A. (1997). A human error approach to accident investigation: The taxonomy of unsafe operations. The International Journal of Aviation Psychology, 7(4), 269–291. Shappell, S. A., & Wiegmann, D. A. (2000). The human factors analysis and classification system-HFACS. American (Report No. DOT/FAA/AM-00/7). Washington, DC: Office of Aviation Medicine. Shappell, S. A., & Wiegmann, D. A. (2001). Human error analysis of commercial aviation accidents: Application of the human factors analysis and classification system (HFACS). Aviation, Space, and Environmental Medicine, 72(11), 1006–1016. Sharit, J. (2012). Human error and human reliability analysis. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (4th ed., pp. 734–800). Hoboken, NJ: Wiley. She, M. R., Li, Z. Z., & Ma, L. (2019). User-defined information sharing for team situation awareness and teamwork. Ergonomics, 62(8), 1098–1112. Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Science, 1(7): 261–267. Smith, E. E., Shoben, F. J., & Rips, L. J. (1974). Structure and process in semantic memory: A featural model for semantic decisions. Psychological Review, 81(3), 214–241. Society of Automotive Engineers (2018). Taxonomy and definitions for terms related to on-road motor vehicle automated driving systems (SAE Standard No. J3016_201806). Warrendale: SAE International. Solso, R. L., MacLin, M. K., & MacLin, O. H. (2005). Cognitive psychology (7th ed.). Auckland: Pearson Education New Zealand. Spurgin, A. J. (2010). Human reliability assessment theory and practice. Boca Raton, FL: CRC Press, Stanton, N. A., Salmon P., Walker, G., et al. (2005). Human factors methods: A practical guide for engineering and design. Farnham: Ashgate Publishing Ltd. Stanton, N. A. (2006). Hierarchical task analysis: Developments, applications, and extensions. Applied Ergonomics, 37(1), 55–79. Stemn, E., Hassall, M. E., & Bofinger, C. (2020). Systemic constraints to effective learning from incidents in the Ghanaian mining industry: A correspondence analysis and AcciMap approach. Safety Science, 123, Article 104565. Sternberg, R. J., & Leighton, J. P. (2004). The nature of reasoning. Cambridge: Cambridge University Press. Strater, O. (2005). Cognition and safety: An integrated approach to systems design and assessment. Farnham: Ashgate Publishing Ltd. Su, X., Mahadevan, S., Xu, P., et al. (2015). Dependence assessment in human reliability analysis using evidence theory and AHP. Risk Analysis, 35(7), 1296–1316. Sujan, M. A., Embrey, D., & Huang, H. (2020). On the application of Human Reliability Analysis in healthcare: Opportunities and challenges. Reliability Engineering & System Safety, 194, Article 106189. Sun, Y., Zhang, Q., Yuan, Z., et al. (2020). Quantitative analysis of human error probability in high-speed railway dispatching tasks. IEEE Access, 8, 56253–56266. Sutcliffe, A., & Rugg, G. (1998). A taxonomy of error types for failure analysis and risk assessment. International Journal of Human-Computer Interaction, 10(4), 381–405.
571 Swain, A. D. (1987). Accident sequence evaluation program human reliability analysis procedure (Report No. NUREG/CR-4772). Washington, DC: US Nuclear Regulatory Commission. Swain, A. D. (1990). Human reliability analysis: Need, status, trends and limitations. Reliability Engineering & System Safety, 29(3), 301–313. Swain, A. D., & Guttmann, H. E. (1983). Handbook of human reliability analysis with emphasis on nuclear power plant applications (Report No. NUREG/CR-1278). Washington, DC: US Nuclear Regulatory Commission. Taylor, C., Keller, J., Fanjoy, R. O., et al. (2020a). An exploratory study of automation errors in Part 91 Operations. Journal of Aviation/Aerospace Education & Research, 29(1), 33–48. Taylor, C., Øie, S., & Gould, K. (2020b). Lessons learned from applying a new HRA method for the petroleum industry. Reliability Engineering & System Safety, 194, Article 106276. Taylor-Adams, S., & Kirwan, B. (1995). Human reliability data requirements. International Journal of Quality & Reliability Management, 12(1), 24–52. Teoh, E. R., & Kidd, D. G. (2017). Rage against the machine? Google’s self-driving cars versus human drivers. Journal of Safety Research, 63, 57–60. Thoroman, B., Salmon, P., & Goode, N. (2020). Applying AcciMap to test the common cause hypothesis using aviation near misses. Applied Ergonomics, 87, Article 103110. Trager, T. A. (1985). Case study report on loss of safety system function events (Report No. AEOD/C 504). Washington, DC: US Nuclear Regulatory Commission. Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12(4), 242–248. U˘gurlu, Ö., Y𝚤ld𝚤r𝚤m, U., Loughney, S., et al. (2018). Modified human factor analysis and classification system for passenger vessel accidents (HFACS-PV). Ocean Engineering, 161, 47–61. US Nuclear Regulatory Commission (2000). Technical basis and implementation guidelines for A Technique for Human Event Analysis (ATHEANA) (Report No. NUREG-1624 Rev. 1). Washington, DC: US Nuclear Regulatory Commission. Van Iddekinge, C. H., Aguinis, H., Mackey, J. D., et al. (2018). A meta-analysis of the interactive, additive, and relative effects of cognitive ability and motivation on performance. Journal of Management, 44(1), 249–279. Vicente, K. J. (1999). Cognitive work analysis: Toward safe, productive, and healthy computer-based work. Boca Raton, FL: CRC Press. Walker, G. H., Stanton, N. A., Salmon, P. M., et al. (2008). A review of sociotechnical systems theory: A classic concept for new command and control paradigms. Theoretical Issues in Ergonomics Science, 9(6), 479–499. Wang, W., Liu, X., Qin, Y., et al. (2018). Assessing contributory factors in potential systemic accidents using AcciMap and integrated fuzzy ISM-MICMAC approach. International Journal of Industrial Ergonomics, 68, 311–326. Warfield, J. N. (1973). On arranging elements of a hierarchy in graphic form. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(2), 121–132. Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20(3), 273–281. Whaley, A. M., Kelly, D. L., Boring, R. L., et al. (2011). SPAR-H Step-by-Step Guidance (Report No. NL/EXT-10-18533, Rev. 2). Idaho National Laboratory. Whaley, A. M., Xing, J., Boring, R. L., et al. (2016). Cognitive basis for human reliability analysis (Report No. NUREG-2114). Washington, DC: US Nuclear Regulatory Commission. Wickens, C. D., & McCarley, J. S. (2007). Applied attention theory. Boca Raton, FL: CRC Press. Wickens, C. D., Helleberg, J., Goh, J., Xu, X., & Horrey, W. J. (2001). Pilot task management: testing an attentional expected value model
572 of visual scanning (Report No. ARL-01-14/NASA-01-7). NASA Ames Research Center. Wickens, C., Hollands, J., Banbury, S., & Parasuraman, R. (2013). Engineering psychology and human performance. Hove: Psychology Press. Wickens, C., McCarley, J., & Steelman-Allen, K. (2009). NT-SEEV: A model of attention capture and noticing on the flight deck. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 53, No. 12, pp. 769–773). SAGE Publications, Inc. Wiegmann, D., & Shappell, S. (1999). Human error and crew resource management failures in Naval aviation mishaps: A review of US Naval Safety Center data, 1990-96. Aviation, Space, and Environmental Medicine, 70(12), 1147–1151. Williams, J. C. (1988). A data-based method for assessing and reducing human error to improve operational performance. In Proceedings of the IEEE 4th Conference on Human Factors in Power Plants. IEEE. Williams, J. C., & Bell, J. L. (2015). Consolidation of the error producing conditions used in the Human Error Assessment and Reduction Technique (HEART). Safety and Reliability, 35(3), 26–76. Wilson, K. M., Yang, S., Roady, T., et al. (2020). Driver trust & mode confusion in an on-road study of level-2 automated vehicle technology. Safety Science, 130, Article 104845. Woo, D. M., & Vicente, K. J. (2003). Sociotechnical systems, risk management, and public health: Comparing the North Battleford and Walkerton outbreaks. Reliability Engineering & System Safety, 80(3), 253–269. Woods, D. D. (1984). Some results on operator performance in emergency events, In D. Whitfield, P. K. Andow, L. Bainbridge, et al. (Eds.), Ergonomics problems in process operations (pp. 21–32). Amsterdam: Elsevier. Woods, D. D., Dekker, S., Cook, R., et al. (2010). Behind human error (2nd ed.). Farnham: Ashgate Publishing Ltd. Woods, D. D., Roth, E. M., Stubler, W. F., et al. (1990). Navigating through large display networks in dynamic control applications. In Proceedings of the Human Factors Society 34th Annual Meeting (pp. 396–399). Human Factors Society. Xing, J. (2015). HRA method development-IDHEAS for internal at-power events. ACRS subcommittee briefing on HRA. Washington, DC: US Nuclear Regulatory Commission. Xing, J., & Chang, Y.J. (2018). Use of IDHEAS General Methodology to incorporate human performance data for estimation of human error probabilities. In Proceedings of the 14th Probabilistic Safety Assessment and Analysis (PSAM14). 16–21 September, Los Angeles, CA. Xing, J., Chang, J., & Siu, N. (2015). Insights on human error probability from cognitive experiment literature. In Proceedings of the 2015
DESIGN FOR HEALTH, SAFETY, AND COMFORT International Topical Meeting on Probabilistic Safety Assessment and Analysis. American Nuclear Society. Xing, J., Chang, Y.J., & DeJesus, J. (2020). The general methodology of an integrated human event analysis system (IDHEAS-G) (Draft Report No. NUREG-2198). Washington, DC: US Nuclear Regulatory Commission. Xing, J., Parry, G., Presley, M., et al. (2017). An integrated decision-tree human event analysis system (IDHEAS) method for NPP internal at-power operation (Report No. NUREG-2199). Washington, DC: US Nuclear Regulatory Commission. Xu, S., Song, F., Li, Z., Zhao, Q., Luo, W., He, X., & Salvendy, G. (2008). An ergonomics study of computerized emergency operating procedures: Presentation style, task complexity, and training level. Reliability Engineering and System Safety, 93(10), 1500–1511. Yan, Z., Robertson, T., Yan, R., et al. (2018). Finding the weakest links in the weakest link: How well do undergraduate students make cybersecurity judgment? Computers in Human Behavior, 84, 375–382. Yang, J. O., & Chang, S. H. (1991). An alarm processing system for a nuclear-power-plant using artificial-intelligence techniques. Nuclear Technology, 95(3), 266–271 Yang, Z. L., Bonsall, S., Wall, A., et al. (2013). A modified CREAM to human reliability quantification in marine engineering. Ocean Engineering, 58, 293–303. Yildirim, U., Ba¸sar, E., & U˘gurlu, Ö. (2019). Assessment of collisions and grounding accidents with human factors analysis and classification system (HFACS) and statistical methods. Safety Science, 119, 412–425. Zapf, D., Maier, G. W., Rappensperger, G., et al. (1994). Error detection, task characteristics, and some consequences for software design. Applied Psychology, 43(4), 499–520. Zarei, E., Yazdi, M., Abbassi, R., et al. (2019). A hybrid model for human factor analysis in process accidents: FBN-HFACS. Journal of Loss Prevention in the Process Industries, 57, 142–155. Zhang, L. (2019). Human reliability of digitalized nuclear power plants. Beijing, China: National Defense Industry Press (in Chinese). Zhang, M., Zhang, D., Goerlandt, F., et al. (2019). Use of HFACS and fault tree model for collision risk factors analysis of icebreaker assistance in ice-covered waters. Safety Science, 111, 128–143. Zheng, X., Bolton, M.L., Daly, C., et al. (2020). The development of a next-generation human reliability analysis: Systems Analysis for Formal Pharmaceutical Human Reliability (SAFPH). Reliability Engineering & System Safety, 202, Article 106927. Zou, Y., Zhang, L., Dai, L., et al. (2017). Human reliability analysis for digitized nuclear power plants: Case study on LingAo II NPP. Nuclear Engineering and Technology, 49(2), 335–341.
CHAPTER 21 OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT Jeanne Mager Stellman and Sonalee Rau Columbia University New York, New York
Pratik Thaker New York Presbyterian Hospital New York, New York
573
1
INTRODUCTION
2
MANAGEMENT THROUGH LEGISLATION AND REGULATION
574
2.1
574
3
4
1
Historical Context
2.2
Workers’ Compensation Systems
2.3
The Occupational Safety and Health Act of 1970
576
2.4
The General Duty Clause
576
5
575 6
OCCUPATIONAL HEALTH MANAGEMENT SYSTEMS
581
5.1
ILO Management Guidelines
581
5.2
OHSAS 18001 and ISO 45001
582
5.3
Basic Building Blocks of OHS Management
582
SYSTEMS ANALYSIS AND OSH IN HEALTH CARE 6.1
Complexity of Regulatory Oversight of Occupational Health and Safety
582 582
OPERATIONALIZING OSH: BENCHMARKING
577
6.2
OSH Chemical Hazards: Management Approaches
583
3.1
577
6.3
Total Worker Health: A Holistic Approach
586
6.4
Creating a Safety Culture: Driving Toward Zero Harm
586
New York Presbyterian Case Study: NYP Zero Harm Initiative
586
OSHA Recordkeeping Requirements
3.2
The Bureau of Labor Statistics
577
3.3
New Statistical Systems
577
3.4
Recording Criteria
578
3.5
Types of Abnormal Condition or Disorder
578
3.6
Case Types
579
7
NUMBERS OF INJURIES, ILLNESSES, AND DEATHS
579
APPENDIX CROSSWALK BETWEEN OSHA GUIDELINES AND JCAHO REQUIREMENTS
589
4.1
The United States
579
4.2
Global Data
579
REFERENCES
595
INTRODUCTION
Improvement in the management of occupational health and safety (OHS) has always lagged behind growth in worker productivity and production. Too often progress has been slow, even painful, but major advances are measurable, as we will show throughout this chapter. Many dangerous chemicals have now been banned from commerce. Engineering and safety process controls are incorporated into more production systems worldwide, often mandated by regulation in industrialized countries. In this chapter we will provide examples of the hazards that can be encountered at work and the recordkeeping systems by which we measure both the prevalence of problems and the success of any systems developed to manage risks. We will present several basic systems approaches to managing occupational health and safety. It should be emphasized, however, that the systems approaches and programmatic goals discussed here are neither universally accepted nor applied. Even in industrialized nations with relatively high gross domestic products, there remain large swathes of workers who are exposed to hazardous conditions and for whom recourse to regulatory relief and oversight is either limited or non-existent. In addition, as recently as 2017, according to the International Labour Office, there were at least 40 million people globally
6.5
FUTURE TRENDS AND ISSUES
589
who were classified as victims of modern slavery, either through forced labor, or forced marriages tantamount to enslavement. Among the enslaved were approximately 4.4 child victims for every 1,000 children in the world. Given the devastating toll that the COVID-19 pandemic is having on the world economy at the time of this writing (2020), it is likely that the numbers of people suffering hazardous working conditions, and enslavement, will dramatically increase, and their working conditions will further deteriorate. As employment opportunities decrease and the number of unemployed grow to hitherto unseen proportions of the working-age population, and as the world economy shrinks, we can only anticipate serious challenges to worker well-being and management of occupational health and safety risks. Despite this gloomy projection, in this chapter we will concentrate on the progress that has been made in improving working conditions and into making OHS a standard aspect of work. Some examples include the following. • In 2018, the International Standards Organization (ISO) published ISO 45001 that “specifies requirements for an occupational health and safety (OH&S) management system, and gives guidance for its use, to enable organizations to provide safe and healthy workplaces by 573
574
DESIGN FOR HEALTH, SAFETY, AND COMFORT
preventing work-related injury and ill health, as well as by proactively improving its OH&S performance” (ISO 45001:2018, 2018). • The US National Institute for Occupational Safety and Health, (NIOSH) actively promotes the concept of Total Worker Health, defined as “policies, programs and practices that integrate protection from work-related safety and health hazards with the promotion of injury and illness prevention efforts to advance worker well-being” (National Institute for Occupational Safety and Health (NIOSH), n.d.; Tamers et al., 2018). • The drive towards Zero Harm for both patients and workers in the healthcare industry is accelerating (Gandhi, Feeley, & Schummers, 2020) Indeed, many hospital systems have worked the concept into their strategic plans and set goals and thresholds to ensure that it is prioritized. The concept of a “blame-free environment,” as ensured by means of a Fair Culture Tool that assesses whether an intervention meets the standard of care and passes the muster of a “Test of Intention,” has been held up as a rigorous, measurable means of ensuring a culture of Zero Harm at New York Presbyterian (NYP) Hospital. As of 2019, NYP’s guidelines also include a “Daily Operations Huddle Checklist” that incorporates a tally and account of “serious safety events, patient falls, patient falls with injuries, employee injuries or risks,” and other factors that contribute to the overall culture of safety (Brady & Jaffer, 2019). Most important for readers concerned with human factors is the growing realization that health and safety at work require more than merely preventing exposures to toxins and to avoiding traumatic injury. The successful design and implementation of OHS workplace management systems require an understanding of human behavior, both on the individual level and within organizations, and systems that harness human behavior effectively. Human factors play a decisive role in creating a culture of safety at work that is essential to the enhancement and promotion of occupational health and safety. A review of practices across industries and of governmental and professional society guidelines clearly shows that the focus of occupational health and safety has shifted from a hazard-byhazard approach to a systems approach and the development of a “culture” of safety within the work enterprise. In June 2020, the AIHA formally announced that it was no longer the “American Industrial Hygiene Association” but would simply be known by its acronym, AIHA, the American association of occupational health and safety science professionals, a change undoubtedly influenced by the transition from a one-hazard-at-a-time engineering approach to an integrated systems model. The chapter will draw on many examples from the health care industry, both because health care provides important illustrations of occupational safety and health management systems, and because employment in healthcare is a major portion of the total workforce worldwide. In the United States, about 12% of the workforce was employed in health care in 2018 and the US Bureau of Labor Statistics, in its Occupational Outlook Handbook, projects that health care occupations will grow by an additional 14% in the next ten years (Office of Occupational Statistics and Employment, 2020). 2 MANAGEMENT THROUGH LEGISLATION AND REGULATION 2.1 Historical Context Regulation and legislation in occupational health and safety in the United States have evolved slowly. Horror stories exist in a number of trades and make for grim but enlightening reading
(Hamilton, 1985; Hunter, 1978). The first decades of the twentieth century were marked by major worker movements seeking to reduce and regulate the hours of work per day and per week, to limit child labor and to improve safety. A watershed moment arose following the horror of the Triangle Shirtwaist Factory Fire on March 25, 1911, in which 145 mostly young Italian and Jewish immigrant women workers perished because they were locked in the premises and could not escape. This tragedy ushered in what is often called the “golden age of labor legislation” (Stellman, 1977). The early days of occupational health and safety, of necessity, focused on safety, hours of work and basic amenities, such as access to clean toilets and drinking water. Measurement and monitoring systems, if they existed at all, were ad hoc and crude. The popular and scientific understanding of the mechanisms of disease and chronic conditions were equally rudimentary. Coal miners brought canaries into the shafts as an early warning system for the presence of deadly carbon monoxide fumes. The peril to coal miners’ lungs from exposure to coal dust, often laden with crystalline silica, was barely considered and was secondary to safety because accidents presented immediate, often fatal, risks. Of course, exposure to the dusts caused disabling, often fatal pneumoconioses (Black Lung and silicosis) but, as with most occupational diseases, many years elapsed before they were “recognized” in compensation systems. Federal compensation did not come into being until 1973, when the Black Lung Benefits Act was passed. This is sad and ironic considering that during the depths of the Great Depression, thousands of workers, about three-quarters of whom were Black, flocked to West Virginia to blast through a silica-laden mountain to create the Hawks Nest Tunnel in Gauley Bridge. Hundreds of workers died from acute silicosis, their lungs torn up by the anthracite shards, their families not informed, and their graves not marked, so that a power plant could be constructed to power the growth of the chemical industry in the West Virginia valley (Cherniak, 1989; Lancianese, 2019). The production of matches and the exposure of “match girls” to white phosphorus that cruelly ate away at their jaws provides another informative example from the early days of industrialization, where life and limb were sacrificed in order to make a living. William Booth, the founder of the Salvation Army, was instrumental in the passage of groundbreaking legislation to eliminate this exposure and in the substitution of the far less dangerous red phosphorus. These efforts helped launch regulatory oversight of the working conditions in England and were a first step toward the creation of the Health and Safety Executive (HSE) in the UK and other OHS institutions that have pioneered and inspired much regulatory reform and basic research around the globe. Yet another example showing the idea that workers enjoy a right to a safe and healthy workplace was provided by Upton Sinclair in his groundbreaking muckraking book, The Jungle. In the book he famously described how a worker was ground into sausages after he accidently fell into the sausage-making machine (Sinclair, 1906). Sinclair’s description of the horrors of the meatpacking industry caused an uproar but, ironically for the workers, the uproar led to the passage of the Pure Food and Drug Act of 1906 (PL 59-384), controlling the hygiene of food products, but not the regulation of the horrendous working conditions in the stockyards and meatpacking plants. To this day the meatpacking industry remains one of the most dangerous workplaces in the United States, even though the conditions are orders of magnitude better than a century ago. Meatpacking and poultry packing plants are among the “hotspots” for the coronavirus epidemic of 2020. Despite the recent, pointed focus on meatpacking as a hotspot of coronavirus infection, in fact, meatpacking workers represent an excellent case study in exposure to all chemical, physical, and
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
575
Table 1 Selected Types of Injuries Involving Days Away from Work in Private Industry, 2018
Table 2 2018
Nature of injury, illness
Number of days
Number of incidents
Fractures Sprains, strains, tears Amputations Cuts, lacerations, punctures Cuts, lacerations Punctures (except gunshot wounds) Bruises, contusions Chemical burns and corrosions Heat (thermal) burns Multiple traumatic injuries With sprains and other injuries With fractures and other injuries Soreness, pain Carpal tunnel syndrome Tendonitis All other Total
79470 308630 5920 92840 77340 15500 79250 3790 14550 23370 12340 3350 159600 5050 1810 126100 900380
Source: Taken from Table R13, Injuries, Illnesses, and Fatalities. Bureau of Labor Statistics (https://www.bls.gov/iif/oshwc/osh/ case/cd_r13_2018.htm).
safety hazards. Cited by the Bureau of Labor Statistics as having a rate of injuries and illnesses “higher than for all manufacturing and for all private industry,” the animal slaughter and processing industry has, through the past century, come to stand for the antithesis of a “culture of safety.” Table 1 shows the nature of lost-time injuries. The increasing awareness of occupational health and safety, coupled with third-party oversight and tracking, are encouraging amidst what is—to say the least—a challenging occupational safety landscape for the most vulnerable workers in the United States and worldwide. If the moral imperative to keep workers safe on the job does not, itself, constitute more than reason enough to justify the continuation of these trends, it is worth noting that the total economic loss of not investing in the prevention of occupational injury is estimated by the ILO to be around 4% of the world’s GDP per year, or about US$2.8 trillion. It is neither just nor cost-effective for employers to ignore the plight of employees subjected to unsafe working conditions. Table 2 and Table 3 present data on the number of nonfatal occupational injuries and illnesses involving days taken off from work for the US. The numbers of events, the days lost from work, the widespread types and locations of hazards faced by workers, and the equally diverse array of injuries and conditions. 2.2 Workers’ Compensation Systems The development of the no-fault workers’ compensation insurance system was a major step forward in reducing rates of fatalities and disabling injuries in the workplace. The concept of workers’ compensation insurance derives from European programs, particularly from Germany, a country that remains at the forefront of enlightened and effective control of occupational health and safety hazards. (Unlike the United States, the entire German workforce falls under the insurance system according to the work sector of employment; Greiner & Kranig, 1998). In the United States, workers’ compensation is controlled by the
Number of Days Away from Work in All Industry,
Cases involving 1 day Cases involving 2 days Cases involving 3–5 days Cases involving 6–10 days Cases involving 11–20 days Cases involving 21–30 days Cases involving 31 or more days Total
Number of incidents 123950 97350 158540 107620 95440 54370 263100 900370
Source: Table R25, Bureau of Labor Statistics, Injuries, Illnesses and Fatalities. (https://www.bls.gov/iif/oshwc/osh/case/cd_r25_ 2018.htm).
Table 3 Sources and Numbers of Injuries and Illness in Private Industry, 2018 Source of injury, illness: Chemicals, chemical products Containers Furniture, fixtures Machinery Parts and materials Person, injured or ill worker Worker motion or position Person, other than injured or ill workers Health care patient Floors, walkways, ground surfaces Handtools Ladders Vehicles Trucks Cart, dolly, hand truck--nonpowered All other Total
Number of incidents 12440 108130 40280 51000 77630 122030 117010 65650 46980 146750 43910 20250 100740 26060 14280 111580 1104720
Source: Based on Table R25, Detailed industry by selected natures (Number), Bureau of Labor Statistics (BLS).
states, not the federal government, although federal government employees and a few occupational categories, like railroad workers and energy workers, are insured under a federal system (US Department of Labor, n.d.). The modern compensation system evolved over decades and it was not until the second half of the twentieth century that all states enacted workers’ compensation systems. The underlying premise of workers’ compensation is the so-called no-fault “grand bargain” that “circumvented lengthy, expensive trials where the burden of proof was on the employee and removed a source of financial uncertainty for the employer by removing a worker’s civil right to sue for damages” (Utterback, Meyers, & Wurzelbacher, 2014). Occupational injuries and illnesses were “recognized” by statute and schedules of compensation established. This eliminated the need for workers to use the civil court to try to get medical care and financial support. The reality is that such suits are not a feasible option for most workers: they required enough knowledge of the system to seek help, the resources to put together their
576
“case,” and most important, there was no guarantee of success. For employers, a no-fault system spared them the expenses and uncertainties of civil suits and they could obtain insurance against claims. Compensation varies drastically from state to state: identical injuries or fatalities will be compensated differently in each of the 50 states. Some injuries and illnesses may not even be compensable in all jurisdictions. A major deficiency in the modern workers’ compensation system is its inadequate recognition of occupational diseases, primarily because most occupational diseases are, in general, chronic in nature (i.e., develop over time) and have several factors that contribute to their occurrence. Chronicity and multiple causation make it difficult to establish cause-and-effect between specific occupational exposures and specific occupational diseases. In plain language, many people may develop diseases that are at least partially attributable to occupational exposures without realizing it. Their physicians may also not realize the connection since few medical practitioners receive extensive training in diseases of the occupations and in the proper taking of an occupational history. One problem for employers is that in many states it will be the last employer who is “liable” for compensation. Although the actual exposure may have occurred earlier at a different workplace or perhaps during military service, the liability rests with the employer not responsible. The complexity of establishing cause and effect is exacerbated by how little we actually know about the extent of occupational disease within the population. The relationship between workplace stress and cardiovascular diseases is one example. In some states, like Minnesota, police officers and fire fighters are eligible for compensation based on a “heart attack presumption” (Sec. 299A.41 MN Statutes, n.d.). but workers in other highly stressful occupations are, in general, not afforded such presumption and the costs of cardiovascular diseases at least in part attributable to occupation are borne by the general public through cost burdens on the healthcare system, rather than on the workers’ compensation insurance system. The paucity of data on the etiology and course of workrelated diseases leaves most occupational diseases largely unrecognized and uncompensated. By failing to associate a chronic disease either in whole, or in part, to an occupational cause, the disease does not affect the experience rating of the employer; thus the financial incentives on employers to ameliorate conditions are reduced. The medical cost of care for diseases either wholly or partially attributable to an occupation will be borne by other sectors of the health care system, such as the Veterans Administration and Medicare, thus placing the burden on the general public and on other agencies (Leigh & Robbins, 2004). One reason that we know so little about the relationships between occupational exposures and chronic disease is that they are not well studied. The federal research budget allocated to occupational health and safety is miniscule. Consider that in FY2020 the total budget for NIOSH was slightly more than $342 million dollars while the National Institutes of Health operated with a budget of $2.6 billion. Research into occupational disease thus represents a very small portion of federal research dollars. Additional barriers to a worker obtaining compensation for injuries and illnesses include the statutes of limitation for filing that apply in many jurisdictions. Such time constraints effectively preclude compensating diseases or disorders that do not appear “soon enough” after employment has ended and/or the exposure has ceased. Industrial hearing loss is one example since it may take several years post-exposure for
DESIGN FOR HEALTH, SAFETY, AND COMFORT
the loss to manifest itself. Another example is mesothelioma, a usually fatal hallmark cancer associated with asbestos exposure. Mesothelioma can have a latency period of 35 years or longer before it is diagnosed. Despite these deficiencies, the positive effects of the workers’ compensation system on improving workplace health and safety must be emphasized. The costs of insurance coverage are experience-rated, so there is a great incentive among employers to minimize accidents and injuries, thus keeping insurance costs down, and perhaps even permitting an employer to self-insure and further lower costs. In Germany, the complex Berufgenossenschaft accident insurance system for all workers in all industries is an outstanding example of the power of such a system for injury prevention and improvement of working conditions (Greiner & Kranig, 1998). The Berufgenossenschaften also sponsor research on improved industrial practices. Thus, they are continually increasing knowledge about the field of occupational safety and health and developing best practices for minimizing hazards through extensive research on workplace improvement. The future direction of the workers’ compensation system in the United States is not clear as important carriers are leaving the market, together with their research expertise and useful statistical data. In 2017, the Liberty Mutual Research Institute, established in 1954, and one of the foremost institutions in the study of workplace injury reduction, abruptly closed its doors when its parent company Liberty Mutual Insurance retrenched its insurance offerings away from workers’ compensation (Fernandes, 2017). 2.3 The Occupational Safety and Health Act of 1970 The passage of the Occupational Safety and Health Act in 1970 ushered in a new era of management of occupational health and safety in the United States, as well as globally since American practices have a major influence on the world economy. The aim of the OSH Act is to ensure safe and healthy working conditions for working men and women by setting and enforcing standards and by providing training, outreach, education, and assistance. The OSH Act created an administrative agency, the Occupational Safety and Health Administration (OSHA), within the Department of Labor. Labor is a cabinet-level agency, reporting directly to the President. OSHA is headed by an Assistant Secretary of Labor, who reports directly to the Secretary of Labor. Most private sector employers and their workers in the 50 states, and those territories that are under federal jurisdiction, fall under the aegis of OSHA. Over the years, after much political controversy and struggle, the concept of State Plans became a reality. There are currently 28 State Plans, which are OSHA-approved workplace safety and health programs operated by individual states or US territories and monitored by OSHA. State Plans must be at least as effective as OSHA in protecting workers and in preventing work-related injuries, illnesses, and deaths. Some State Plans include coverage of state and local government workers, but in many states these governmental employees are not protected by OSHA. In some states, local regulation is more stringent than OSHA regulation. 2.4 The General Duty Clause Despite the fact that OSHA does not have specific standards for a very large number of potential exposures and safety hazards, it does have a powerful tool in the General Duty Clause (OSHA Section 5(a)(1)). The General Duty Clause requires
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
that each employer “shall furnish to each of his employees employment and a place of employment which are free from recognized hazards that are causing or are likely to cause death or serious physical harm to his employees; shall comply with occupational safety and health standards promulgated under this Act.” The General Duty Clause also places a responsibility on employees: “Each employee shall comply with occupational safety and health standards and all rules, regulations, and orders issued pursuant to this Act which are applicable to his own actions and conduct.” In 1989, the European Union adopted a similar policy in its Framework Directive, Directive 89/391/EEC, which states that employers shall take the measures necessary for the safety and health protection of workers, including prevention of occupational risks. The risks should be avoided and, if not possible, evaluated and combatted at their source.
3
OPERATIONALIZING OSH: BENCHMARKING
3.1 OSHA Recordkeeping Requirements It is not possible to manage any system well without basic metrics of performance. Serious work-related injuries and illnesses are the major metrics by which successful management of occupational health and safety are gauged. Many, but not all, employers with more than 10 employees are required to keep a record of such illnesses and injuries on standard OSHA recordkeeping forms. (Minor injuries requiring only first aid do not need to be recorded.) OSHA has now made it possible to submit and keep electronic records. OSHA has also exempted an array of “low-risk” industries from these requirements. Exempted industries and employers with fewer than 10 employees, are, however, obligated to report any incident that leads to a fatality, in-patient hospitalization, amputation or loss of an eye. The reader should be aware that the list of exempt industries contains many workplaces with recognized occupational hazards (e.g., the offices of dentists and physicians and personal care services). 3.2 The Bureau of Labor Statistics The Bureau of Labor Statistics (BLS) has played a fundamental role in the development of workplace standards and prevention programs from the early part of the twentieth century, beginning with its first report on industrial accidents in the iron and steel industry. BLS data supported the ground-breaking work of early pioneers in the field, such as Dr. Alice Hamilton, who has been called the “mother of occupational medicine” (Hamilton, 1985). The BLS has been conducting its Survey of Occupational Injuries and Illnesses (SOII) since the 1940s but its recordkeeping was severely hampered by reliance on data that was voluntarily recorded and reported by employers. Not surprisingly, voluntary reporting resulted in incomplete data because of non-reporting. In addition to incomplete reporting, another major obstacle to accurate assessment of the extent of injury, illness and death was inadequate definition of work injuries themselves. The data also did not include accidents that did not result in days away from work. The requirement that only accidents leading to at least one day lost need be reported led to a serious undercount. It had been a widespread industrial practice for employers to assist injured workers to return to the workplace so that they could “punch the clock” and hence not miss work, even though they could not report to their usual duties. This practice arose as part of an effort
577
to reduce workers’ compensation costs: with experience-rated premiums, the fewer accidents, the fewer increases in premiums. When the OSH Act was passed, Congress included a mandate to revise the accident reporting system. One new mandate made it necessary for most private industry employers to maintain records and report injuries and illnesses on a specific schedule and the OSH Act, for the first time, required that all incidents be reportable, regardless of whether they resulted in days lost away from work or not. 3.3 New Statistical Systems Congress assigned the responsibility for developing a new statistical system to the BLS. Over the years reporting requirements have evolved, as described by the BLS in its History of the Survey of Occupational Injury and Illness (SOII) in its very useful online Handbook of Methods (US Bureau of Labor Statistics, n.d.). (The BLS website contains much interesting information on the history of the data that it collects and analyzes.) Even with the improved and required recordkeeping and reporting, an undercount of fatalities remained a serious shortcoming. Another major shortcoming was the failure to collect information on worker demographics, or on details related to accident causation. In 1987, the US National Academy of Science (NAS) was mandated by Congress to examine the recordkeeping of injury and illness and to make recommendations for its improvement. The recommendations from the NAS resulted in a redesign of the SOII and the creation of a new dataset, the Census of Fatal Occupational Injuries (CFOI). The CFOI supplements the reports submitted to OSHA by employers with additional administrative records, like workers’ compensation reports and death certificates, in order to improve its case finding and thus its accurate reflection of occupational fatalities that occurred (Pollack & Keimig, 1987). The BLS writes: “More specifically, the Keystone Dialogue Group recommended the development of a method for counting work-related fatalities,” stating that the “development of an accepted count of workplace deaths should mute controversy on this issue stemming from the variety of estimates coming from different sources.” The BLS reported that fatality estimates made by different organizations like the National Safety Council and NIOSH’s study of fatalities resulting from traumatic injuries, 1980–1989, could vary widely, between 3000 and11,000 deaths nationally per year. A further improvement, the addition of more demographic data to the BLS analyses, enhanced the ability to improve understanding in areas like women’s occupational health and to study racial and ethnic disparities (Stellman, 1977). The BLS approaches each accident and injury from five vantage points, as illustrated in Figure 1. Collecting information on the nature of the disabling condition, how it occurred, what body part was affected, what equipment or operation caused the accident and whether there was a secondary source contributing to the event, results in a richer data source for analysis and program management. The mandatory reporting and retention of OSHA logs are an essential element of this program. Successful data and benchmarking systems rely on consistent classification of the industry and occupation of the worker. Between 1992 and 2002, the SOII and CFOI used the Standard Industrial Classification (SIC) to define industry. In 2003, a new system, the North American Industry Classification System (NAICS) was adopted to reflect major changes in employment, particularly in information systems, healthcare delivery, high-tech manufacturing and a host of new service industries and to align the American, Canadian, and Mexican
578
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Case circumstance
Nature of disabling condition
Event or exposure
Figure 1 here.
Secondary Source contributed
Sources directly producing disability
The Bureau of Labor Statistics approaches the circumstances surrounding each case by the five vantage points illustrated
recordkeeping systems. The SIC and the NAIC do not map into one another completely so the data have to be treated as separate series and comparisons between the years before and after 2003 have to be made very cautiously, if they can be made at all. As the BLS states in its website essay on Concepts of the SOII: The NAIC is conceptually different from the SIC because it groups establishments together if they use similar raw materials, capital equipment and labor. Every sector of the economy was reclassified using the NAIC system. Every sector of the economy was restructured and redefined under NAICS. A new Information sector combined communications, publishing, motion picture and sound recording, and online services, recognizing our information-based economy. NAICS restructured the manufacturing sector to recognize new high-tech industries. A new subsector was devoted to computers and electronics, including reproduction of software. Retail trade was redefined. In addition, eating and drinking places were transferred to a new accommodation and food services sector. The difference between the retail trade and wholesale trade sectors is now based on how each store conducts business. For example, many computer stores were reclassified from wholesale to retail. Nine new service sectors and 250 new service-providing industries were recognized with the adoption of NAICS in 2003.
3.4 Recording Criteria There are four different criteria for registering a nonfatal recordable injury or illness but the presence of a single criterion is sufficient to trigger the reporting requirement. The criteria are: • • • •
Part of body affected
A factory worker amputates her finger when her clothing is caught in an industrial stamping machine.
loss of consciousness days away from work restricted work activity or job transfer medical treatment beyond first aid.
In addition, there are four additional instances that require reporting and are reflective of requirements in OSHA standards: • Any needlestick injury or cut from a sharp object that is contaminated with another person’s blood or other potentially infectious material. • Any case requiring an employee to be medically removed under the requirements of an OSHA health standard. • Tuberculosis infection as evidenced by a positive skin test or diagnosis by a physician or other licensed healthcare professional after exposure to a known case of active tuberculosis. • An employee’s hearing test (audiogram) reveals (1) that the employee has experienced a standard threshold shift (STS) in hearing in one or both ears (averaged at 2 kHz, 3 kHz, and 4 kHz) and (2) the employee’s total hearing level is 25 decibels (dB) or more above audiometric zero (also averaged at 2 kHz, 3 kHz, and 4 kHz) in the same ear(s) as the STS. 3.5 Types of Abnormal Condition or Disorder As of 2002, OSHA removed the distinction between injuries and illnesses from its recordkeeping guidelines and substituting the concept of “abnormal condition or disorder.” The SOII, however, still separates the data using codes contained in the Occupational Injury and Illness Classification System (OIICS). These are applied to the “nature of the disabling conditions” described above. BLS classifies injuries cuts, fractures, sprains, and so on, together that arise from a single instantaneous exposure at work. An occupational illness deals with any abnormal condition that is caused by exposure to workplace factors other than events arising from an instantaneous event or exposure. The major nature of injury or illness code titles are as follows. These are then subdivided into more detailed categories: 1 2
Traumatic Injuries and Disorders Systemic Diseases and Disorders
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
3 4 5 6 7 8 9999
579
Infectious and Parasitic Diseases Neoplasms, Tumors, and Cancers Symptoms, Signs, and Ill-defined Conditions Other Diseases, Conditions, and Disorders Exposures to Disease—No Illness Incurred Multiple Diseases, Conditions, and Disorders Nonclassifiable
3.6 Case Types The BLS lists the following categories and definitions of case types: • Days away from work, job restriction, or transfer (DART) cases with days away from work beyond the day of injury or onset of illness, or days of job transfer or restricted work activity, or both. • Days away from work (DAFW) cases are those which result in days away from work (beyond the day of injury or onset of illness). The number of days away from work for these cases is determined according to the number of calendar days (not workdays) that an employee was unable to work, even if the employee was not scheduled to work those days. The day on which the employee was injured or became ill is not counted. These cases may also include days of job transfer or restricted work activity in addition to days away from work. Take the case of an employee who suffers a work-related injury resulting in 5 days away from work. Upon returning to work, the employee was unable to perform normal duties associated with the job for an additional 3 days (i.e., the employee was on restricted work activity). This case would be recorded as a days-away-from-work case with 5 days away from work and 3 days of restricted work activity. The number of days away for which employers are required to report may be “capped” at 180 calendar days. • Days of job transfer or restriction cases (DJTR) are those which result only in job transfer or restricted work activity. This occurs when, as the result of a workrelated injury or illness, an employer or healthcare professional recommends keeping an employee from doing the routine functions of his or her job or from working the full workday that the employee would have been scheduled to work before the injury or illness occurred. This may include the following instances: • An employee is assigned to another job temporarily. • An employee works at a permanent job less than full-time. • An employee works at a permanently assigned job but is unable to perform all duties normally connected with it. The day on which the injury or illness occurred is not counted as a day of job transfer or restriction. Workers who continue working after incurring an injury or illness in their regularly scheduled shift but produce fewer goods or services are not considered to be in restricted activity status. They must be restricted from performing their routine work functions to be counted in this category. • Other recordable cases are those which are recordable injuries or illnesses under OSHA recordkeeping
•
•
• • •
guidelines but do not result in any days away from work, nor a job transfer or restriction, beyond the day of the injury or onset of illness. For example, John cut his finger on machinery during his Wednesday afternoon work shift. The injury required medical attention, for which John received sutures at the local emergency room. John was able to return to his normally scheduled workday on the following day (Thursday) and performed his typical work duties without any restrictions. A number of metrics are derived from these case definitions. The 200,000 figure used in the DART rate derived a base of 100 equivalent full-time workers working 40 hours per week for 50 weeks per year. The Total Case Incident Rate (TCIR): the number of work-related injuries per 100 full-time workers during a one-year period. OSHA uses the TCIR to monitor high-risk industries, and they also allow environmental health and safety managers to track incidents and discover patterns across different departments or facilities. The DART rate is calculated as follows, per 100 full-time employees: • (Total number of recordable injuries and illnesses, or one or more Restricted Days that resulted in an employee transferring to a different job within the company × 200,000) / Total number of hours worked by all employees Rate of Total Incidents: The rate is representative of total incidents reported per 100 full-time employee equivalent. Rate of OSHA Recordable Incidents: The rate is representative of total OSHA Recordable incidents per 100 full-time employee equivalent. Rate of Incidents with DART: The rate is representative of total incidents that resulted in an employee being placed medically off duty 1 or more days per 100 full-time employee equivalent.
4 NUMBERS OF INJURIES, ILLNESSES, AND DEATHS 4.1 The United States Using these benchmarks, Figures 2 and 3 show the rates of injuries, illnesses and deaths in the United States. It is notable that there has been a steady decline in rates over time. 4.2 Global Data Global data shows a much higher toll, particularly among developing countries. The ILO estimates that more than 350,000 occupational fatalities occurred in 2015 and an additional 2 million deaths were attributable to occupational diseases. An additional 313 million workers were estimated to have been involved in accidents that resulted in a work absence. The ILO estimates that the financial cost of this OSH toll is equivalent to about 4% of the world GDP (International Labour Office, n.d.). The occupational injury rate among 12 mid-income countries in 2018 is shown in Table 4, and also shows differences between male and female rates. The NAICS categories generally map onto the International Standard Classification of Occupations (ISCO-08) developed by the ILO. Table 5 shows fatal injuries by industry in selected nations. The data show that serious occupational injuries occur both in developed and emerging economies.
580
DESIGN FOR HEALTH, SAFETY, AND COMFORT Injury and Illness Rates by Industry, 1989–20112 Cases per 100 full-time employees
16 Hospitals
14
Manufacturing
12
Construction
10
U.S. average (all private industry)
8 6 4 2 0 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 Year
Data source: Bureau of Labor Statistics
5
Cases per 10,000 full-time employees
Injuries and Illnesses Resulting in Days Away from Work, by Industry, 1992–2011 700 Hospitals 600
Construction Manufacturing
500
Private industry (overall) 400 300 200 100 0 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Data source: Bureau of Labor Statistics
Year
Figure 2 Bureau of Labor Statistics trend data on occupational injury and illness rates (1989–2011) and by days away from work (1992–2011), by industry.
National Benchmarks by Industry OSHA Recordable Rate of Incidence 7
6.4
6.2
6
6
5.9
National Benchmarks by Industry DART Rate of Incidence
5.7
Hospitals Manufacturing Construction Private Industry
5 4
3.3
3.2
3
3
2.9
2.8
2
2.6
2.5
2.4
2
1.7
1.7
1.6
2.3 1.6
1.5
2013
2014
2015
2016
2017
2.3
1
1
0
0 2013 Figure 3 sector.
3
2014
2015
2016
2017
OSHA national benchmarks of recordable rate of Incidence and DART rate of incidence, 2013–2017, by major employment
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
581
Table 4 Non-fatal Occupational Injuries Per 100,000 Workers, by Gender, 2018 Country Argentina Armenia Azerbaijan Belarus Chile Sri Lanka Myanmar Mauritius Pakistan Russian Federation Ukraine Uzbekistan Total
Male
Female
M/F Ratio
4796.70 69.75 15.00 74.07 3473.84 21.64 39.80 415.00 880.00 138.00 76.80 41.20 10041.79
1990.04 13.05 1.00 22.08 2613.37 12.37 8.60 64.00 256.00 81.00 25.60 10.10 5097.21
2.41 5.34 15.00 3.35 1.33 1.75 4.63 6.48 3.44 1.70 3.00 4.08 1.97
Source: Data from Safety and health at work, ILO ILOSTAT statistics.
Table 5
Fatal Occupational Injuries by Industry, 2018
Country
Mining and natural resources
Manufacturing
50 3 7 24 27 0 2 52 5 2 – 13 0 11 2 26 253 – 0 75 704
44 6 7 23 18 28 11 183 32 6 22 1 2 1 1 17 256 4 0 59 343
Argentina Armenia Azerbaijan Belarus Chile Egypt Israel Japan Sri Lanka Moldova, Republic of Myanmar Mongolia Mauritius Malaysia Panama Peru Russian Federation Singapore Seychelles Ukraine United States
Trade, transportation and utilities – – 9 21 53 14 9 235 7 19 – 8 0 65 5 42 306 12 1 73 1526
Construction 55 – 15 19 27 0 28 309 47 6 – 4 0 47 17 26 190 14 0 27 1038
Source: Data from Safety and health at work, ILO ILOSTAT statistics.
5 OCCUPATIONAL HEALTH MANAGEMENT SYSTEMS 5.1 ILO Management Guidelines In 2001, the International Labour Office published guidelines on occupational safety and health management systems. The need for such guidelines arose from the recognition that new technologies and massive global interconnectedness have brought about continual change in the workplace, requiring agile management in order to assure both economic viability and safe and healthy working conditions, similar to the recognition that new
classification systems were needed for benchmarking OSH. The ILO (2009) states: [T]echnological progress and intense competitive pressures bring rapid change in working conditions, work processes and organization. Legislation is essential but insufficient on its own to address these changes or to keep pace with new hazards and risks. Organizations must also be able to tackle occupational safety and health challenges continuously and to build effective responses into dynamic management strategies … The positive
582
DESIGN FOR HEALTH, SAFETY, AND COMFORT
impact of introducing occupational safety and health (OSH) management systems at the organization level, both on the reduction of hazards and risks and on productivity, is now recognized by governments, employers and workers.
5.2 OHSAS 18001 and ISO 45001 The publication of the ILO guidelines was followed by the publication of the influential British guidelines, OHSAS 18001 (Occupational Health and Safety Management Systems, 2007). In 2018, the ISO released its ISO 45001 (International Standards Organization, 2018) based on these two predecessors and industries around the globe have been switching transitioning from OHSAS 18001 to ISO 45001. The implementation of the management system to individual industries and workplaces relies on the expertise of its managers and the buy-in of both management and workers. Basic principles of industrial hygiene, safety practices, and preventive medicine obviously must be incorporated into any system, as well as strict adherence to regulations and reporting requirements. Auditing and continual improvement lie at the heart of the system. Further discussion on risk management, which is a major area of endeavor and expertise, is beyond the scope of this chapter. Several excellent texts are available in the area, as for example, the work of Tony Boyle (Boyle, 2019). 5.3 Basic Building Blocks of OHS Management The main building blocks of any OHS management system set out by the ILO are listed below and illustrated schematically in Figure 4. • • • • • • •
Policy Occupational safety and health policy Worker participation Organizing Responsibility and accountability Competence and training Occupational safety and health management system documentation • Communication • Planning and implementation
Adjust
Plan
Check
Do
Figure 4 Main building blocks of an OHS Management System as set out by the ILO. (Source: Based on ILO, 2009.)
• • • • • • • • • • • •
Initial review System planning, development and implementation Occupational safety and health objectives Hazard prevention Evaluation Performance monitoring and measurement Investigation of work-related injuries, ill health, diseases and incidents, and their impact on safety and health performance Audit Management review Action for improvement Preventive and corrective action Continual improvement
6 SYSTEMS ANALYSIS AND OSH IN HEALTH CARE Virtually all systems that have been developed are variations of the same main themes and interactions set out in the basic ILO standard. This is borne out by the fact that OSHA successfully used the elements of its construction industry management system for the crosswalk between OSHA guidelines and certification requirements of the hospital accrediting agency JCAHO, the Joint Commission on Accreditation of Healthcare Organizations, commonly referred to as the Joint Commission. This crosswalk is shown in the Appendix (Tables A1–A6). Examination of the elements in the Appendix shows the concordance between general principles of management of both organizations, as well as the specific requirements for the healthcare environment (OSHA, 2008). 6.1 Complexity of Regulatory Oversight of Occupational Health and Safety In the field of public health, workplace safety, management, and regulation are a constant and exigent concern. It falls within the purview of many federal agencies and governing bodies—among them, the Occupational Safety and Health Administration (OSHA) at the U.S. Department of Labor, established by the Occupational Safety and Health (OSH) Act of 1970 (OSHA). The purpose of this statutory law is manifold: it created OSHA (as well as NIOSH, the National Institute for Occupational Safety & Health) and assigned it regulatory functions, including several specific obligations that prohibit employers doing wrong by their employees’ health. Its proscriptive and prohibitive dimensions stipulate that employers must not introduce serious health and safety hazards into the workplace; that if such hazards should arise, they must be mitigated or guarded against at no cost to employees; and that employers cannot fail to notify employees of all hazards, provide adequate training to negotiate them, and make available past injury records. Its overarching goal is to ensure that workplace health hazards are curbed and that employers and employees have access to safety and health programs. The implementation of the OSH Act of 1970 in practice is obviously a complex endeavor and is guided at a high level by the activities of OSHA. On a jurisdictional level, most employees in the United States are protected under OSHA’s regulatory authority. OSHA monitors most private sector workplaces, in addition to federal workplaces, which are required to have health and safety programs in place that are held to the same standards as those of private employers. State and local government employees are also protected by the OSH Act if working
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
in states that have received OSHA approval for their state workplace safety programs. OSHA enforces the OSH Act by way of inspections. Because millions of workplaces fall under its jurisdiction, it is impractical for the agency to inspect every single one with any degree of regularity; as a result, it conducts general surveys of the employment landscape to identify workplaces that are considered to be most hazardous, and focuses its inspections on those (Occupational Safety and Health Administration, n.d.a.). OSHA’s purview does have exceptions—for instance, it is not responsible for protecting employees who are protected by other agencies. As an example, the Federal Insecticide, Fungicide and Rodenticide Act (FIFRA) endows the EPA, not OSHA, with the authority to protect agricultural workers from pesticide exposure. This distinction has made it difficult for OSHA to protect agricultural workers, and OSHA has shown itself to be relatively content to shift that responsibility to the EPA. The 1974 case Organized Migrants in Community Action, Inc. (OMICA) v. Brennan led to the EPA issuing Worker Protection Standards for Agricultural Pesticides, but OSHA later argued that applying those standards was solely the EPA’s responsibility (Greenstone, 1975). This gap led to the loss of many agricultural workers’ lives. Interestingly, OSHA regulations do protect non-agricultural workers from incidental workplace exposure to pesticides. Regulatory oversight is far more complex than simple adherence to OSHA regulations and in almost all workplaces there will never be a single agency that covers all aspects of its OSH program. Further, it is nearly impossible to separate protection of the general environment from maintaining a safe and healthy workplace for workers, thus requiring employers to reconcile and adhere to both environmental and OSH standards and guidelines. In the case of health care, it is neither feasible nor desirable to separate patient safety from worker OSH, adding yet another layer of complexity to compliance, reporting and program management. For example, a medical center in New York City will be subject to federal, state, and local ordinances, as well as to the requirements of professional accrediting agencies, in the handling of its hazardous waste. • • • •
Resource Conservation and Recovery Act (RCRA) Hazardous Waste Program (RCRA Subtitle C) Solid Waste Program (non-Hazardous, RCRA Subtitle D New York State Environmental Conservation
Transport of biological samples, as for an organ transport, requires International Air Transportation Association (IATA), Dangerous Goods Regulation (DGR).
6.2 OSH Chemical Hazards: Management Approaches OSHA standards cover relatively few specific hazards and working conditions compared to the numbers of hazards that are commonly encountered in the workplace. This is especially true for the regulation of chemicals. Many OSHA standards were simply incorporated into OSHA in 1970 by adopting and extending already established federal standards in other agencies and standards developed and published by recognized organizations like the ACGIH. Since 1970, OSHA has only promulgated full standards (complete 6(b)) for 16 agents and it promulgated a single standard for occupational carcinogens that included 13 carcinogens without permitted exposure levels (PELs).
583
There are 400 chemicals that have specific permitted exposure levels (PELs) that appear in OSHA Table Z-1. OSHA recommends that in addition, employers use other published criteria: • California Division of Occupational Safety and Health (Cal/OSHA) Permissible Exposure Limits (PELs) • National Institute for Occupational Safety and Health (NIOSH) Recommended Exposure Limits (RELs) • ACGIH® Threshold Limit Values (TLVs®) and Biological Exposure Indices (BEIs®). A recent study dramatically increased the estimate of number of chemicals that appear to be in regular use. The new estimate is that there are more than 350,000 chemicals in commerce, compared to the previous estimate of approximately 70,000 chemicals (Wang, Walker, Muir, & Nagatani-Yoshida, 2020). If the new estimate is even approximately correct, less than 1% of the chemicals in commerce have a specific exposure standard attached to them. The absence of specific numeric standards, however, does not mean that chemicals are completely unregulated. First, OSHA has its powerful General Duty Clause, and, as we discuss in Section 6.2.3, the Globally Harmonized System of Classification and Labeling of Chemicals (GHS) which requires manufacturers to identify the chemicals they produce and sell. Employers are obligated to impart this information to employees. Many hazardous workplace chemicals, however, are never made into products and hence no HCS documentation is available. For example, some chemicals are generated as intermediates in the manufacturing process and some of them may become low-level contaminants in final commercial products. The notorious chemical contaminant commonly called dioxin (2,4,7,8-tetrachlorobenzo-p-dioxin) is one such intermediate that contaminates the pesticide 2,4,5-T and other related polycyclic chemicals. (Agent Orange was a 50-50 mixture of 2,4,5-T and 2,4-D.) It was very difficult for scientists to work out the chemical structure of dioxin even though manufacturers knew that a ‘bad actor’ was present because of the many health effects they were observing in the workers exposed to it (Hay, 1982). Although 2,4,5-T is no longer registered for use in most countries, there are still numerous potential exposures to dioxin in other products because it is so readily generated during combustion and in many different chemical manufacturing processes. No American occupational exposure standards exist for dioxin. The situation is not hopeless because process controls can be put in place that effectively control the level of all volatile chemicals and effluents, regardless of whether a specific standard has been worked out. Process controls will use basic physical properties like molecular weight, volatility, solubility, boiling points to develop engineering solutions that drastically reduce or eliminate exposures. Engineering controls are one step in the hierarchy of control summarized in Table 6. The hierarchy begins with the potential elimination of the particular product. If not feasible, the next method of exposure control is product substitution. If still not feasible, the next step is strict engineering controls. Appropriate personal protective equipment (PPE), such as respirators and gloves that are impermeable to the chemical in question, follow in the hierarchy and are the least desirable approach. PPE is plagued by problems of proper fit, adequacy, and knowledge-base in eliminating the hazards. Workers may have great difficulty wearing PPE for long periods as it can be uncomfortable and impair communication. PPE requires effort to maintain its effectiveness and cleanliness.
584
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Table 6 Overview of the Hierarchy of Controls for Chemical Hazards. Type of control
Examples
Elimination/Substitution
•
Replace with safer alternatives. [See OSHA (n.d.c)]
Engineering Controls (implement physical change to the workplace, which eliminates/reduces the hazard on the job/task)
•
Change process to minimize contact with hazardous chemicals. Isolate or enclose the process. Use of wet methods to reduce generation of dusts or other particulates. General dilution ventilation. Use fume hoods. Rotate job assignments. Adjust work schedules so that workers are not overexposed to a hazardous chemical.
Administrative and Work Practice Controls (establish efficient processes or procedures) Personal Protective Equipment (use protection to reduce exposure to risk factors)
• •
• • • •
• •
• •
Use chemical protective clothing. Wear respiratory protection. [See OSHA (n.d.d), Safety and Health Topics page] Use gloves. Wear eye protection.
Source: Chemical Hazards and Toxic Substances, U.S Department of Labor.
6.2.1 Process Safety Management Standard OSHA has taken a step toward developing a universal standard for managing chemical exposures with its Process Safety Management Standard (29 CFR 1910.119). The main provision of the standard is a process hazard analysis (PHA) that is based on a compilation of process safety information. A PHA is a careful review of what could go wrong and what safeguards must be implemented to prevent releases of hazardous chemicals. The standard also mandates written operating procedures, employee training and participation, pre-startup safety reviews, evaluation of the mechanical integrity of critical equipment, contractor requirements, and written procedures for managing change. It requires a permit system for hot work, investigation of incidents involving releases of covered chemicals or “near-misses,” emergency action plans, compliance audits at least every 3 years, and trade secret protection. Much of the safe handling of chemicals depends on the employers’ sense and obligation toward minimizing unnecessary exposures and on employer technical abilities to sample or hire appropriate consultants since there are highly regulated standard procedures. The NIOSH Pocket Guide to Chemical Hazards (NPG) provides a concise set of data on 677 substance groupings. The categories of information in the NPG are a good roadmap to the information needed to control exposures to chemical hazards. It is available in hard copy, online and as an app. NIOSH provides a good deal of information on standard methods and also identifies
protective clothing materials appropriate for chemicals listed in this pocket guide. The information for each chemical includes: • Chemical Name • Structure/Formula • CAS Number: Chemical Abstracts Service (CAS) registry number • DOT ID and Guide Numbers • Synonyms and Trade Names • Conversion Factors: for the conversion of ppm (parts of vapor or gas per million parts of contaminated air by volume) to mg/m 3 (milligrams of vapor or gas per cubic meter of contaminated air) at 25∘ C and 1 atmosphere for chemicals with exposure limits expressed in ppm. • Exposure Limits • REL: NIOSH recommended exposure limits • TWA: indicates a time-weighted average concentration for up to a 10-hour workday during a 40-hour workweek. • STEL: short-term 15-minute exposure limit • Unless noted otherwise, the STEL is a 15-minute TWA • C-REL: the ceiling value should not be exceeded at any time. • PEL: OSHA permissible exposure limits (8-hour work shift/40-hour workweek) • Physical Description • Chemical and Physical Properties (including flammability ratings) • Incompatibilities and Reactivities • Measurement Methods • Personal Protection and Sanitation Recommendations • First Aid • Respirator Selection Recommendations • Exposure Route, Symptoms, Target Organs 6.2.2 Laboratory Standard OSHA developed a standard for handling chemicals in laboratory situations: Occupational Exposure to Hazardous Chemicals in Laboratories standard (29 CFR 1910.1450), because the management of chemicals in a large chemical processing situation is dramatically different than in most laboratories which generally have relatively small quantities of a large variety of chemicals at “laboratory scale.” The standard sets forth the criteria for determining whether the workplace meets the small-scale operations required by OSHA. OSHA sets forth the following primary elements for developing a Chemical Hygiene Plan (CHP) in the laboratory: • Minimizing exposure to chemicals by establishing standard operating procedures, requirements for personal protective equipment, engineering controls (e.g., chemical fume hoods, air handlers, etc.) and waste disposal procedures. • For some chemicals, the work environment must be monitored for levels that require action or medical attention. • Procedures to obtain free medical care for work-related exposures must be stated. • The means to administer the plan must be specified. • Responsible persons must be designated (Chemical Hygiene Officer) for procurement and handling of Safety Data Sheets, organizing training sessions, monitoring employee work practices, and annual revision of the CHP.
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
6.2.3 Hazard Communication Standard: A Key to OSH Hazardous Chemical Management The 1970s were a period of intense advocacy for the improvement of worker health and safety. One major area of activism became known as the worker “Right-to-Know” the nature of the hazards, especially chemical hazards, that they encountered on the job. Employees do not have an inherent right to measure and monitor their workplaces without agreement from their employers. Before the passage of the Hazard Communication Standard (29 1910.1200, 1915.1200, 1917.28, 1918.90 and 1926.59, (HCS) workers did not have the right even to know the names of the substances or the contents of mixtures with which they might come into contact. The HCS ensures that information about chemical and toxic substances, and the protective measures that must be taken to avoid harm, must both be made available to workers and presented to them in an understandable fashion. Recently the HCS was modified to bring it into alignment with the Globally Harmonized System of Classification and Labeling of Chemicals (GHS), which is a single set of harmonized criteria for classifying chemicals according to their health and physical hazards and specifies hazard communication elements for labeling and safety data sheets (OSHA, n.d.b.).
Hazard Communication
585
The HCS requires that for each substance or mixture present in the workplace, the following must be provided: • • • •
Hazard classification Labels A Safety Data Sheet Information and training.
HCS requires chemical manufacturers and importers to prepare labels and Safety Data Sheets on the products they sell for use by their downstream customers. Every substance/mixture is to be classified according to health and physical hazards using specific criteria. Labels must contain a harmonized signal word, pictogram, and hazard statement for each hazard class and category, together with applicable precautionary statements A 16-section formatted Safety Data Sheet must be available for inspection by any worker requesting it. The elements of a label and Safety Data Sheets and HCS communication are shown in Figure 5. All workers must have training on how to read and understand Safety Data Sheets and labels. Employers must prepare and implement a written Hazard Communication Program. HCS also requires that all employees receive training about hazardous
Workers must be trained to understand these pictograms and the hazards they represent. To learn more about training, labeling, and safety data sheet requirements, scan the QR code.
Workers have the right to know and understand the hazardous chemicals they use and how to work with them safely. www.osha.gov/hazcom 800-321-OSHA (6742) TTY 1-877-889-5627
Label Elements 1. Signal Word: Indicates relative severity of hazard. “Danger” is used for most severe instances, while “warning” is less severe.
2. Symbols (Hazard Pictograms):
3. Product Name or Identifiers* DANGER
Carbon Monoxide H220: Extremely flammable gas. H331: Toxic if inhaled - H360D: May damage the unborn child. - H372: Causes damage to organs through prolonged or repeated exposure Keep container tightly closed. Avoid breathing vapours. if inhaled: Remove victim to fresh air and keep at rest in a position comfortable for breathing. Call a Poison Center or doctor. Store in a well-ventilated place.
Convey health, 3630-06-5 006-001-00-2 211-128-3 30.0 L physical and environmental hazard * Additional Product information with red Identifiers diamond pictograms. 6. Manufacturer May use a combination Information: of one to five symbols. Company name, address & telephone number.
Figure 5
4. Hazard Statements: Phrases that describe the nature of hazardous products and associated risks if precautionary actions not taken.
5. Precautionary Statements: Phrases associated with each hazard statement, that describe general preventative, response. storage or disposal precautions.
OSHA hazard communication and labeling requirements.
586
DESIGN FOR HEALTH, SAFETY, AND COMFORT
chemicals before their initial assignment in a new work area and when new hazards are introduced into the work area. Employers are advised to evaluate and reassess their training programs periodically, although OSHA cannot legally require an employer to carry out a formal evaluation. OSHA has created an array of materials to assist employers in their HCS responsibilities and OSHA staff can provide technical assistance upon request. Occupational Exposure Banding As mentioned previously, the great majority of chemicals do not have a regulated occupational exposure limit, OEL, which is the maximum allowable concentration of a hazardous substance in a workplace, or the upper limit of concentration in the air. For chemicals that lack OELs, NIOSH has designed a system called occupational exposure banding (OEB). NIOSH describes OEBs on their website as “a validated, consistent, and documented approach to characterizing chemical hazards so timely management decisions can be made based on the best available science information,” citing McKernan (McKernan & Seaton, 2014). NIOSH clearly states the OEBs are not meant to replace OELs and that they are meant to be an additional tool for hygienists to use in the many instances that OEL is not available. OEBs use the letters A through E to designate a band representing chemical toxicity, with Band A representing the highest toxicity and Band E the lowest range of exposure concentrations. OEBs are useful because they represent thoughtful judgments, based on available toxicological and epidemiological data, but, unlike OELs they do not need to go through an arduous, lengthy and usually contentious legal process. The OEBs do not have the force of law equivalent to the OELs but they provide a tool to be used by employers to assess and manage risk to workers in the workplace. The NIOSH occupational exposure banding process utilizes a three-tiered approach. The system relies on the expertise of the person assigning the banding and on the use of reference data that meets NIOSH criteria. As with so many aspects of occupational health and safety, the dearth of definitive data on the health effects of chemicals is the limiting factor in this process. 6.3 Total Worker Health: A Holistic Approach Total Worker Health® (TWH), is a holistic approach to worker well-being developed by NIOSH. We started out the discussion of this chapter by recalling the early days of industrialization, when workers were struggling for shorter work weeks, fire safety, machines that didn’t cripple or kill them and to keep children out of factories. Over the course of the twentieth century more and more attention has begun to be paid to occupational diseases, especially those that develop over time. More recently significant attention has been paid to the quality of work life and to the long-term effects of stressful conditions on worker health. Total Worker Health® takes this a step further and acknowledges risk factors related to work that contribute to health problems previously considered unrelated to work. The TWH approach seeks to improve well-being in the American workforce for the benefit of workers, employers, and the nation by protecting safety and enhancing health and productivity. … [and defines itself] as policies, programs, and practices that integrate protection from work-related safety and health hazards with promotion of injury and illness prevention efforts to advance worker well-being. Two pragmatic examples of application of the Total Worker Health® to the commonly encountered problems, risk for musculoskeletal disorder and to reduce work stress, were drawn
from the NIOSH website (https://www.cdc.gov/niosh/twh/ totalhealth.html). The general approaches were: • To prevent risk of musculoskeletal disorders, consider: • reorganizing or redesigning how individuals do their work; • providing ergonomic consultations; and • providing education on arthritis self-management strategies. • To reduce work-related stress, consider: • implementing organizational and management policies that give workers more flexibility and control over their schedules; • providing training for supervisors on approaches to reducing stressful working conditions; and • providing skill-building interventions on stress reduction for all workers. In 2017, NIOSH convened a workshop to consider methodological issues in designing and evaluating Total Worker Health® programs (Tamers et al., 2018). 6.4 Creating a Safety Culture: Driving Toward Zero Harm The Institute of Medicine report, To Err Is Human: Building a Safer Health System, kickstarted a movement now generally falling under the rubric Zero Harm (Institute of Medicine, 1999). The primary focus of the research and program development in the healthcare sector has been on reducing patient harm. It is universally recognized that, as recently noted by Gandhi and co-authors (2020): achieving this goal requires a comprehensive, systemsfocused effort. … and [recognizing] the importance of broadening the definition of harm to include non-physical harms (e.g., psychological harms), harms to caregivers and the healthcare workforce, and harms occurring beyond the hospital and across the care continuum. Four key elements required for successful systems change resulting in safety improvements are discussed: (1) change management, (2) culture of safety, (3) a learning system, and (4) patient engagement and codesign of healthcare. Thus, basic operational elements of striving towards zero harm in health care for patients are, as expected, the same as those set forth for the general management of OSH discussed earlier. 6.5 New York Presbyterian Case Study: NYP Zero Harm Initiative Most zero harm programs in health care are predominantly focused on patient safety, and when worker health and well-being are considered, it is generally with regard to psychosocial issues in the workplace, such as prevention of burnout. Most programs do not address the panorama of potential hazards in the health care environment such as needlestick injuries, noise exposure, exposure to aerosolized drugs, and ergonomic issues associated with patient handling, as a few examples. The New York Presbyterian Hospital (NYP) health care system in New York has developed a NYP ZeroHarm initiative specifically striving to reduce worker injuries and health-related incidents to zero. This effort is compliant with OSHA Safety
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
and Health Management Systems Guidelines and the Joint Commission on Accreditation of Healthcare Organizations Requirements (OSHA, 2012). (See Figure 6 for an overview of the NYP ZeroHarm Initiative.) In no industry more than in health care is the absolute importance of the workplace safety of employees (health care workers) to the successful delivery of the service they provide (patient care) immediately apparent. In fact, despite the overwhelming size and scope of the health care industry, OSHA did not expressly move to regulate occupational safety in the context of the health care workplace until its proposed tuberculosis standard in 1997 (later withdrawn in 2003) (Institute of Medicine Committee on Regulating Occupational Exposure to Tuberculosis, 2001). The regulatory intricacies of regulating OSH in health workplaces also speak to the importance of individual health care entities’ efforts to protect safety and well-being: from a legal standpoint, many employers frequently coexist within one health care workplace, leaving questions of liability difficult to resolve. Moreover, state employees are not expressly protected under OSHA unless their state participates as a State Plan jurisdiction; thus, the intricacies of public versus private employer status can determine whether or not a health care organization is exempt from OSHA regulation. The ZeroHarm initiative puts a large part of the mission of healthcare workplace safety into healthcare organizations’ own hands, setting as its goal the dual objective of protecting patients and protecting clinicians, hospital workers and staff. To this end, NYP has implemented measures including the following: • Safety Event Reviews (Daily Medical Event Reviews, Patient Safety Huddles, Root Cause Analyses) • Daily event reviews and root cause analyses are methods commonly seen in the field of business management, and are oft implemented in corporate workplaces and offices (Wedell-Wedellsborg, 2017). • NYP has effectively transitioned these methods into the health care management sphere, and has implemented these measures to regularly assess workplace safety and identify factors that might jeopardize staff and patient safety. • Daily event reviews and patient safety huddles are undertaken with a focus on a list of “key behaviors” conducive to a culture of zero harm (Cooke, Brady, & Jaffer, 2019). • Root cause analyses—efforts to identify the original, precipitating cause of health and safety risks, ideally before but possibly after they are realized—are undertaken both for externally reportable hazards and other, select hazards are only reported internally. • Electronic medical record (EMR) dashboards are utilized to support analyses of clinician efficacy in avoiding harms. • Other “pillars of perioperative safety” followed in order to maintain zero harm include preoperative checklists, surgical pauses and debriefs, and the maintenance of absolute silence during safety checks. • Comparisons to National Database of Nursing Quality (NDNQI) Benchmarks • Established in 1998 as part of their Safety and Quality Initiative, the database is owned and supervised by ANA but housed at the University of Kansas School of Nursing. • Combining assessments of ergonomic factors such as nursing hours per patient day and injury incidence
587
rates such as number of patient falls, the database seeks to quantify both chronic and acute safety hazards. • The operationalization of this existing source of rich, longitudinal data on patient and staff safety is a shining example of how the information era has led to the utilization of available data for public health purposes—here, for health care worker and patient safety. • Lessons learned from implementation of the NDNQI benchmarks nationally have included the importance of ongoing quality monitoring checks. That is to say, implementing a system of assessing workplace safety is crucial, but second-order reviews of the efficacy of that system itself—including ensuring that time worked is accurately reported, that injuries are accurately categorized, and so on—are ultimately equally important (Montalvo, 2007). • Comparisons to Agency for Healthcare Research and Quality (AHRQ) Indicators (QIs) • The Patient Safety and Adverse Events Composite (PSI-90) utilizes Bayesian statistical analyses to create composite measures of hospital quality as it relates to preventable events that are deleterious to patients. • PSI metrics include “In-Hospital Fall with Hip Fracture Rate” and “Retained Surgical Item or Unretrieved Device Fragment Count” (AHRQ, 2016). • Downstream harms that might result from these safety hazards in the future are accounted for with “harm weights” calculated based on clinical literature reviews. • Notably, AHRQ took steps in 2016 to clarify several of these indicators, with an eye toward decoupling them from clinical procedures and accounting for the fact that injuries—such as falls, for instance—can happen at any time in the hospital, not just immediately post-operation or iatrogenically. • To be sure, distinguishing between iatrogenic and incidental safety hazards in hospitals—and ensuring that both “types” are accounted for, mitigated and minimized—is a challenge that merits continued study. • Sustainability and maintenance of a zero harm culture involves methods for ensuring the long-term sustainability of the ZeroHarm initiative. The culture at NYP includes: • COO/Presidents (SVP)’ continued leadership. • Creation and implementation of a culture of reaching out to injured employees after any incident. • Implementation of a daily tiered huddle system focused on staff and patient safety. Tier 1 huddles are inclusive of frontline staff and an immediate supervisor. Typically, these take place at the start of a shift with the goal being to escalate issues that caused, or may have caused an injury or risk to patients or staff during the prior shift, or that may occur during the next shift. Daily Tier 2 huddles include supervisors from various areas and their immediate supervisor (Director). Tier 2 huddles aggregate concerns from multiple areas within the organization. Daily Tier 3 huddles escalate those concerns gathered at Tier 2, to leaders from other organizational support areas that may be able to provide immediate assistance in removing or correcting the identified risk.
588
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Concept Map
Tiered Huddles
Goal Boards
Coaching
NYP Management System Leader Standard Work
Leader Rounding A3 Problem Solving
Zero Harm Team Structure Campus
COO
NYPZeroHarm Steering Committee
VP Lead
2 levels up (Director) Coaching & escalating
Campus Coaches NYPZeroHarm
1 level up (Manager) Coaching & escalating
Core Team
Employee
CSE Coaches A3 Problem Owner (Employee’s Supervisor) Solves problems to root
Key Behaviors
Team Member
Team Leader
Commitment
Visibility
Open & Honest Communication
Transparency
Identify Problems & Share Information
Real Time Problem Solving
Teamwork
Support & Coaching
Follow Policies & Procedures
Recognition
Figure 6 Overview of the NYP ZeroHarm Initiative.
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT
• Ensuring that EMR and management systems are being utilized for the ZeroHarm initiative as intended. • Campus leadership proficiency and ownership of problem solving • A3 thinking is institutionalized and becomes the expectation. • Employees are empowered to “see, solve and share” problems as they arise. • Institutionalized training • Continued in-person ZeroHarm for Leaders Training for new hires. • Additional training to be produced for target audiences (physicians, residents, non-supervisors). • Specific training for an identified Campus Lead on root cause analysis/A3 proficiency. • Empowerment and training of ZeroHarm “Campus Coaches” • Develop an army of individuals who can instantaneously coach leaders on A3 thinking and process improvement after an incident, near miss, or identification of risk. • Quarterly summits to educate, share learning, and discuss challenges.
7 FUTURE TRENDS AND ISSUES The pandemic year 2020 will undoubtedly have huge ramifications for the health of workers around the globe and for the practice of occupational health and safety management in ways we cannot possibly project at the time of this writing. Resources will be scarcer and population priorities reset. No matter the philosophical and economic constraints imposed on the system, there will be no escaping the need to deal with the following issues, in addition to others: • The aging workforce. Older workers will require accommodation. The steep decline in the economy may force workers to be on the job for many years longer than
Appendix A.1
589
previously intended. While older workers, in general, have lower accident rates, for example, they may be less able to withstand physical and chemical stresses. • The gig economy. This is where workers are no longer tied to workplaces that manage occupational health and safety or provide benefits. Factors that have driven OSH in past decades, like the workers’ compensation system and the power to impose penalties on employers through agencies like OSHA, no longer apply when the employer-employee relationship has been transformed by self-employed contractual labor. • Work–life balance. The COVID-19 epidemic seems to have rapidly ushered in a “work from home” economy. The implications of this shift for worker well-being are not even remotely understood. The implications for women workers, in particular, who continue to share an unequal burden of childcare and household duties are worrisome. It is not clear how one builds a “culture of safety” or a healthy work–life balance when much of the workforce is spread out and communicating via telecommunications. • Artificial intelligence, robotics, and total control over work-life. Automation was already widespread in the workplace prior to the pandemic. In a remarkably short period of time, a very large fraction of the still employed workforce is now tethered to a computer, even those charged with warehousing, packing, delivering and disposing of goods and services. The rapidity of the change and the anxiety of the global health situation have not yet permitted careful consideration of all aspects of the transformation.
APPENDIX CROSSWALK BETWEEN OSHA GUIDELINES AND JCAHO REQUIREMENTS A crosswalk between OSHA Safety and Health Management Systems Guidelines and the Joint Commission On Accreditation of Healthcare Organizations (JCAHO) Requirements (OSHA, 2012).
Management Leadership
OSHA Guidelines Management establishes, documents, and communicates to employees and contractors clear goals that are attainable and measurable.
Management signs a statement of commitment to safety and health.
Management maintains a written safety and health management system that documents the elements and sub-elements, procedures for implementing the elements, and other safety and health programs including those required by OSHA standards.
JCAHO Requirements LD.02.01.01: The mission, vision, and goals of the hospital support the safety and quality of care, treatment, and service. LD.03.04.01: The hospital communicates information related to safety and quality to those who need it, including staff, licensed independent practitioners, patients, families, and external interested parties. LD.04.03.09, EP 2: The hospital describes, in writing, the nature and scope of services provided through contractual agreements. LD.04.04.01: Leaders establish priorities for performance improvement. LD.02.01.01, EP 1: The governing body, senior managers, and leaders of the organized medical staff work together to create the hospital’s mission, vision, and goals. LD.03.03.01: Leaders use hospital-wide planning to establish structures and processes that focus on safety and quality. LD.04.04.05: The hospital has an organization-wide, integrated patient safety program within its performance improvement activities. EC.01.01.01, EP 3: The hospital has a written plan for managing … the environmental safety of patients and everyone else who enters the hospital’s facilities. (continued overleaf)
590 Appendix A.1
DESIGN FOR HEALTH, SAFETY, AND COMFORT (continued)
OSHA Guidelines Management identifies persons whose safety and health responsibility includes carrying out safety and health goals and objectives, and clearly defines and communicates their responsibilities in their written job descriptions. Management provides and directs adequate resources (including time, funding, training, personnel, etc.) to those responsible for safety and health, so they are able to carry out their responsibilities.
Management holds those assigned responsibility for safety and health accountable for meeting their responsibilities through a documented performance standards and appraisal system.
Management integrates safety and health into other aspects of planning, such as planning for new equipment, processes, buildings, etc. Management establishes lines of communication with employees and allows for reasonable employee access to top management at the worksite.
Management sets an example by following the rules, wearing any required personal protective equipment, reporting hazards, reporting injuries and illnesses, and basically doing anything that they expect employees to do. Management ensures that all employees (including contract employees) are provided equal, high-quality safety and health protection. Management conducts an annual evaluation of the safety and health management system in order to: maintain knowledge of the hazards of the worksite; maintain knowledge of the effectiveness of system elements; ensure completion of the previous years’ recommendations; and modify goals, policies, and procedures.
JCAHO Requirements EC.01.01.01, EP 1: Leaders identify an individual(s) to manage risk, coordinate risk reduction activities in the physical environment, collect deficiency information, and disseminate summaries of actions and results.
LD.01.03.01, EP 5: The governing body provides for the resources needed to maintain safe, quality care, treatment, and services. LD.03.03.01, EP 4: Leaders provide the resources needed to support the safety and quality of care, treatment, and services. LD.03.06.01, EP 3: Leaders provide for a sufficient number and mix of individuals to support safe, quality care, treatment and services. LD.04.01.03: The leaders develop an annual operating budget and, when needed, a long-term capital expenditure plan. HR.01.07.01, EP 1: The hospital evaluates staff based on performance expectations that reflect their job responsibilities. LD.03.01.01, EP 4: Leaders develop a code of conduct that defines acceptable behavior and behaviors that undermine a culture of safety. LD.03.06.01.4: Those who work in the hospital are competent to complete their assigned responsibilities. LD.03.03.01: Leaders use hospital-wide planning to establish structures and processes that focus on safety and quality. LD.03.06.01, EP 1: Leaders design work processes to focus individuals on safety and quality issues. LD.02.03.01: The governing body, senior managers and leaders of the organized medical staff regularly communicate with each other on issues of safety and health. LD.03.01.01, EP 8: All individuals who work in the hospital, including staff and licensed independent practitioners, are able to openly discuss issues of safety and quality. LD.03.04.01: The hospital communicates information related to safety and quality to those who need it including staff, licensed independent practitioners, patients, families, and external interested parties. LD.03.01.01: Leaders create and maintain a culture of safety and quality throughout the hospital.
LD.04.04.05, EP 1: The leaders implement a hospital-wide patient safety program. EC.04.01.01, EP 15: Every 12 months, the hospital evaluates each environment of care management plan, including a review of the plan’s objectives, scope, performance, and effectiveness. LD.01.03.01, EP 6: The governing body works with the senior managers and leaders of the organized medical staff to annually evaluate the hospital’s performance in relation to its mission, vision, and goals. LD.03.02.01: The hospital uses data and information to guide decisions and to understand variation in the performance of processes supporting safety and health. LD.03.02.01, EP 7: Leaders evaluate how effectively data and information are used throughout the hospital.
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT Appendix A.2
Employee Involvement
OSHA Guidelines
JCAHO Requirements
Employees are trained for the tasks they will perform in support of the safety and health management system, such as conducting inspections, investigations, or audits.
HR.01.02.01, EP 1: The hospital defines staff qualifications specific to their job responsibilities. HR.01.02.05, EP 3: The hospital verifies and documents that the applicant has the education and experience required by the job responsibilities. HR.01.04.01: The hospital provides orientation to staff. ----
Employees receive feedback on any suggestions, ideas, or reports of hazards that they bring to management’s attention. All employees, including new hires, are notified about participation in the safety and health management system and employees’ rights under the OSH Act.
Employees and contractors demonstrate an understanding of the fundamental principles of the safety and health management system.
Contract employees are provided with safety and health protection equal in quality to that provided to employees. All contractors, whether regularly involved in routine site operations or engaged in temporary projects such as construction or repair, are required to follow the safety and health rules of the host. Employers have in place a documented oversight and management system covering applicable contractors.
Appendix A.3
591
HR.01.04.01, EP 2: The hospital orients its staff to the key safety content before staff provides care, treatment, and services. HR.01.04.01, EP 3: The hospital orients staff on the following: Relevant hospital-wide and unit-specific policies and procedures. Completion of this orientation is documented. HR.01.04.01, EP 4: The hospital orients staff on the following: Their specific job duties, including those related to infection prevention and control and assessing and managing pain. Completion of this orientation is documented. HR.01.05.03, EP 8: Staff participate in education and training on fall reduction activities. HR.01.06.01, EP 2: The hospital uses assessment methods to determine the individual’s competence in the skills being assessed. EC.03.01.01, EP 1: Staff & licensed independent practitioners can describe or demonstrate methods for eliminating and minimizing physical risks in the environment of care. ----
LD.04.03.09: Care, treatment, and services provided through contractual agreement are provided safely and effectively.
LD.04.03.09: Care, treatment, and services provided through contractual agreement are provided safely and effectively.
Worksite Analysis
OSHA Guidelines A baseline safety and industrial hygiene hazard analysis is conducted. A hazard analyses of routine jobs, tasks, and processes is conducted.
A hazard analysis of any significant changes including but not limited to non-routine tasks (such as those performed less than once a year), new processes, materials, equipment, and facilities is conducted.
JCAHO Requirements ---EC.02.01.01, EP 1: The hospital identifies safety and security risks associated with the environment of care that could affect patients, staff, and other people coming to the hospital’s facilities. EC.02.02.01, EP 1: The hospital maintains a written, current inventory of hazardous materials and waste that it uses, stores, or generates. EC.02.04.01, EP 2: The hospital maintains a written inventory of all medical equipment or a written inventory of selected equipment categorized by physical risk associated with use (including all life-support equipment) and equipment incident history. EM.01.01.01, EP 2: The hospital conducts a hazard vulnerability analysis (HVA) to identify potential emergencies that could affect demand for the hospital’s services or its ability to provide those services, the likelihood of those events occurring, and the consequences of those events. LD.04.04.05, EP 10: At least every 18 months, the hospital selects one high-risk process and conducts a proactive risk assessment. EC.02.06.05: The hospital manages its environment during demolition, renovation, or new construction to reduce risk to those in the organization.
592 Appendix A.3
DESIGN FOR HEALTH, SAFETY, AND COMFORT (continued)
OSHA Guidelines A pre-use analysis is conducted when considering new equipment, chemicals, facilities, or significantly different operations or procedures. Hazard analyses performed for any significant changes or as a pre-use analysis are documented. An established set of written procedures ensures routine self-inspections of the workplace and documentation of findings and corrections.
A reliable system is established that enables employees to notify appropriate management personnel in writing about conditions that appear hazardous, and to receive timely and appropriate responses.
A written industrial hygiene (IH) program, which documents procedures and methods for identification, analysis, and control of health hazards for prevention of occupational disease, is established. Investigations of all accidents and near-misses are performed and written reports of the investigations are maintained.
Trends are analyzed for information such as injury/illness history, hazards identified during inspections, employee reports of hazards, and accident and near-miss investigations.
JCAHO Requirements EC.02.02.01: The hospital manages risks related to hazardous materials and waste. EC.02.06.05: The hospital manages its environment during demolition, renovation, or new construction to reduce risk to those in the organization. EM.01.01.01, EP 3: The hospital … prioritizes the potential emergencies identified in its hazard vulnerability analysis (HVA) and documents these priorities. EC.02.04.01, EP 3: The hospital identifies the activities, in writing, for maintaining, inspecting, and testing for all medical equipment on the inventory. EC.02.04.01, EP 4: The hospital identifies, in writing, frequencies for inspecting, testing, and maintaining medical equipment on the inventory based on criteria such as manufacturers’ recommendations, risk levels, or current hospital experience. EC.04.01.01, EP 12: The hospital conducts environmental tours every six months in patient care areas to evaluate the effectiveness of previously implemented activities intended to minimize or eliminate environment of care risks. EC.04.01.01, EP 13: The hospital conducts annual environmental tours in nonpatient care areas to evaluate the effectiveness of previously implemented activities intended to minimize or eliminate risks in the environment. LD.03.01.01, EP 8: All individuals who work in the hospital, including staff and licensed independent practitioners, are able to openly discuss issues of safety and quality. LD.04.04.05, EP 6: Leaders provide and encourage the use of systems for blame-free internal reporting of a system or process failure, or the results of a proactive risk assessment. EC.04.01.01, EP 1: The hospital establishes a process(es) for continually monitoring, internally reporting, and investigating the following: – Injuries to patients or others within the hospital’s facilities – Occupational illnesses and staff injuries – Incidents of damage to its property or the property of others – Security incidents involving patients, staff, or others within its facilities – Hazardous materials and waste spills and exposures – Fire safety management problems, deficiencies, and failures – Medical or laboratory equipment management problems, failures, and use errors – Utility systems management problems, failures, or use errors EC.01.01.01, EP 3: The hospital has a written plan for managing the following: The environmental safety of patients and everyone else who enters the hospital’s facilities. IC.01.05.01, EP 2: The hospital’s infection prevention and control plan includes a written description of the activities, including surveillance, to minimize, reduce, or eliminate the risk of infection. EC.02.04.01, EP 5: The hospital monitors and reports all incidents in which medical equipment is suspected in or attributed to the death, serious injury, or serious illness of any individual, as required by the Safe Medical Devices Act of 1990. EC.04.01.01, EP 6: Based on its process(es), the hospital reports and investigates the following: Security incidents involving patients, staff, or others within its facilities. EC.04.01.01, EP 8: Based on its process(es), the hospital reports and investigates the following: Hazardous materials and waste spills and exposures. EC.04.01.01, EP 3: Based on its process(es), the hospital reports and investigates the following: Injuries to patients or others in the hospital’s facilities. EC.04.01.01, EP 4: Based on its process(es), the hospital reports and investigates the following: Occupational illnesses and staff injuries. LD.04.04.05, EP 3: The scope of the safety program includes the full range of safety issues, from potential or no-harm errors (sometimes referred to as near misses, close calls, or good catches) to hazardous conditions and sentinel events. EC.04.01.03, EP 2: The hospital uses the results of data analysis to identify opportunities to resolve environmental safety issues. LD.03.02.01, EP 5: The hospital uses data and information in decision making that supports the safety and quality of care, treatment, and services. PI.02.01.01, EP 4: The hospital analyzes and compares internal data over time to identify levels of performance, patterns, trends, and variations.
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT Appendix A.4
Hazard Prevention and Control
OSHA Guidelines Access to certified safety and health professionals and other licensed health care professionals is provided. Types of hazards employees are exposed to, the severity of the hazards, and the risk the hazards pose to employees are all considered in determining methods of hazard prevention, elimination, and control. The organization complies with any hazard control program required by an OSHA standard, such as PPE, Respiratory Protection, Lockout/Tagout, Confined Space Entry, Process Safety Management, or Bloodborne Pathogens. Licensed health care professionals are available to assess employee health status for prevention, early recognition, and treatment of illness and injury.
A written preventive and predictive maintenance system is in place for monitoring and maintaining workplace equipment. A documented system is in place to ensure that hazards identified by any means (self-inspections, accident investigations, employee hazard reports, preventive maintenance, injury/illness trends, etc.) are assigned to a responsible party and corrected in a timely fashion. A documented disciplinary system is in place and includes enforcement of appropriate action for violations of the safety and health policies, procedures, and rules. Written procedures for response to all types of emergencies (fire, chemical spill, accident, terrorist threat, natural disaster, etc.) on all shifts are established, follow OSHA standards, are communicated to all employees, and are practiced at least annually.
Appendix A.5
593
JCAHO Requirements LD.03.06.01, EP 3: Leaders provide for a sufficient number and mix of individuals to support safe, quality care, treatment, and services. EC.02.01.01, EP 3: The hospital takes action to minimize or eliminate identified safety and security risks in the physical environment. EM.01.01.01, EP 5: The hospital uses its hazard vulnerability analysis as a basis for defining mitigation activities (that is, activities designed to reduce the risk of and potential damage from an emergency). IC.01.02.01, EP 1: The hospital provides access to information needed to support the infection prevention and control program. IM.03.01.01, EP 1: The hospital provides access to knowledge-based information resources 24 hours a day, 7 days a week. LD.04.01.01, EP 2: The hospital provides care, treatment, and services in accordance with licensure requirements, laws, and rules and regulations. EC.03.01.01.1: Staff and licensed independent practitioners can describe or demonstrate methods for eliminating and minimizing physical risks in the environment of care. EC.03.01.01.2: Staff and licensed independent practitioners can describe or demonstrate actions to take in the event of an environment of care incident. EC.04.01.01, EP 10: Based on its process(es), the hospital reports and investigates medical/laboratory equipment management problems, failures, and use errors. EC.04.01.05, EP 1: The hospital takes action on the identified opportunities to resolve environmental safety issues. EC.04.01.05, EP 3: The hospital reports performance improvement results to those responsible for analyzing environment of care issues.
HR.01.06.01, EP 15: The hospital takes action when a staff member’s competence does not meet expectations.
EM.02.01.01, EP 2: The hospital develops and maintains a written Emergency Operations Plan that describes the response procedures to follow when emergencies occur. EC.02.02.01, EP 3: The hospital has written procedures, including the use of precautions and personal protective equipment, to follow in response to hazardous material and waste spills or exposures. EC.02.03.01, EP 9: The hospital has a written fire response plan.
Safety and Health Training
OSHA Guidelines Training is provided so that managers, supervisors, non-supervisory employees, and contractors are knowledgeable of the hazards in the workplace, how to recognize hazardous conditions, signs and symptoms of workplace-related illnesses, and safe work procedures. Training required by OSHA standards is provided in accordance with the particular standard.
Managers and supervisors understand their safety and health responsibilities and how to carry them out effectively. New employee orientation/training includes, at a minimum, discussion of hazards at the worksite, protective measures, emergency evacuation, and employee rights under the OSH Act.
JCAHO Requirements HR.01.04.01: The hospital provides orientation to staff. HR.01.05.03, EP 6: Staff participate in education and training that incorporates the skills of team communication, collaboration, and coordination of care.
HR.01.04.01, EP 3: The hospital orients staff on relevant hospital-wide and unit-specific policies and procedures. HR.01.04.01, EP 2: The hospital orients its staff to the key safety content before staff provides care, treatment, and services. EC.03.01.01, EP 1: Staff and licensed independent practitioners can describe or demonstrate methods for eliminating and minimizing physical risks in the environment of care. HR.01.04.01: The hospital provides orientation to staff. HR.01.05.03, EP 7: Staff participate in education and training that includes information about the need to report unanticipated adverse events and how to report these events. EM.02.01.01, EP 7: The Emergency Operations Plan identifies alternative sites for care, treatment, and services that meet the needs of the hospital’s patients during emergencies.
594 Appendix A.5
DESIGN FOR HEALTH, SAFETY, AND COMFORT (continued)
OSHA Guidelines Training is provided for all employees regarding their responsibilities for each type of emergency.
Persons responsible for conducting hazard analysis, including self-inspections, accident/incident investigations, and job hazard analysis, receive training to carry out these responsibilities. Training attendance is documented and meets OSHA standards, or for non-OSHA required training, is provided at adequate intervals. Training curricula is up-to-date; is specific to worksite operations; is modified when needed to reflect changes and/or new workplace procedures, trends, hazards, and controls identified by hazard analysis; and is understandable by all employees. Persons who have specific knowledge or expertise in the subject area conduct training. Employees understand where personal protective equipment (PPE) is required, why it is required, its limitations, how to use it, and maintenance procedures.
Persons responsible for conducting hazard analysis, including self-inspections, accident/incident investigations, and job hazard analysis, receive training to carry out these responsibilities. Training attendance is documented and meets OSHA standards, or for non-OSHA required training, is provided at adequate intervals. Training curriculum is up-to-date; is specific to worksite operations; is modified when needed to reflect changes and/or new workplace procedures, trends, hazards, and controls identified by hazard analysis; and is understandable by all employees. Persons who have specific knowledge or expertise in the subject area conduct training. Employees understand where personal protective equipment (PPE) is required, why it is required, its limitations, how to use it, and maintenance procedures.
JCAHO Requirements EM.02.02.07, EP 2: The Emergency Operations Plan describes the roles and responsibilities of staff for communications, resources and assets, safety and security, utilities, and patient management during an emergency. EM.02.02.07, EP 7: The hospital trains staff for their assigned emergency response roles. HR.01.04.01, EP 4: The hospital orients staff on their specific job duties, including those related to infection prevention and control and assessing and managing pain. HR.01.02.05, EP 3: The hospital verifies and documents that the applicant has the education and experience required by the job responsibilities. HR.01.05.03, EP 1: Staff participate in ongoing education and training to maintain or increase their competency; staff participation is documented. HR.01.05.03, EP 4: Staff participate in ongoing education and training whenever staff responsibilities change. HR.01.05.03, EP 5: Staff participate in education and training that is specific to the needs of the patient population served by the hospital.
HR.01.06.01, EP 3: An individual with the educational background, experience, or knowledge related to the skills being reviewed assesses competence. EC.02.02.01, EP 3: The hospital has written procedures, including the use of precautions and personal protective equipment, to follow in response to hazardous material and waste spills or exposures. EM.01.01.01, EP 8: The hospital keeps a documented inventory of the resources and assets it has on site that may be needed during an emergency, including, but not limited to, personal protective equipment, water, fuel, and medical, surgical, and medication-related resources and assets. IC.01.05.01, EP 7: The hospital has a method for communicating responsibilities about preventing and controlling infection to licensed independent practitioners, staff, visitors, patients, and families. IC.02.01.01, EP 2: The hospital uses standard precautions, including the use of personal protective equipment, to reduce the risk of infection. HR.01.04.01, EP 4: The hospital orients staff on their specific job duties, including those related to infection prevention and control and assessing and managing pain. HR.01.02.05, EP 3: The hospital verifies and documents that the applicant has the education and experience required by the job responsibilities. HR.01.05.03, EP 1: Staff participate in ongoing education and training to maintain or increase their competency; staff participation is documented. HR.01.05.03, EP 4: Staff participate in ongoing education and training whenever staff responsibilities change. HR.01.05.03, EP 5: Staff participate in education and training that is specific to the needs of the patient population served by the hospital.
HR.01.06.01, EP 3: An individual with the educational background, experience, or knowledge related to the skills being reviewed assesses competence. EC.02.02.01, EP 3: The hospital has written procedures, including the use of precautions and personal protective equipment, to follow in response to hazardous material and waste spills or exposures. EM.01.01.01, EP 8: The hospital keeps a documented inventory of the resources and assets it has on site that may be needed during an emergency, including, but not limited to, personal protective equipment, water, fuel, and medical, surgical, and medication-related resources and assets. IC.01.05.01, EP 7: The hospital has a method for communicating responsibilities about preventing and controlling infection to licensed independent practitioners, staff, visitors, patients, and families. IC.02.01.01, EP 2: The hospital uses standard precautions, including the use of personal protective equipment, to reduce the risk of infection.
OCCUPATIONAL SAFETY AND HEALTH MANAGEMENT Appendix A.6
595
Annual Evaluation
OSHA Guidelines A system and written procedures are in place to guide an annual evaluation of the safety and health management system.
The evaluation covers all elements and sub-elements of the safety and health management system. The evaluation identifies the strengths and weaknesses of the safety and health management system, and opportunities for improvement. Managers, qualified corporate staff, or outside experts conduct the evaluation with participation from employees.
JCAHO Requirements EC.04.01.01, EP 15: Every 12 months, the hospital evaluates each environment of care management plan, including a review of the plan’s objectives, scope, performance, and effectiveness. LD.04.04.05, EP 10: At least every 18 months, the hospital selects one high-risk process and conducts a proactive risk assessment. PI.01.01.01: The hospital collects data to monitor its performance.
EC.04.01.03, EP 2: The hospital uses the results of data analysis to identify opportunities to resolve environmental safety issues. PI.01.01.01, EP 8: The hospital uses the results of data analysis to identify improvement opportunities. EC.04.01.03, EP 1: Representatives from clinical, administrative, and support services participate in the analysis of environment of care data.
REFERENCES AHRQ. (2016). PSI 90 Fact Sheet. Agency for Healthcare Research and Quality. https://www.qualityindicators.ahrq.gov/News/PSI90_ Factsheet_FAQ.pdf Boyle, T. (2019). Health and safety: Risk management (5th ed). London: Taylor & Francis. BSI. (2007). Occupational health and safety management systems—specification, BSI, 2007, BS OHSAS 18001:2007. Cherniak, M. (1989). The Hawk’s Nest incident: America‘s worst industrial disaster. New Haven, CT: Yale University Press. Cooke, J,, Brady, O., & Jaffer, A.K. (2019, May 2). Dual healthcare transformation through respect, high reliability, data transparency, accountability, and service lines. https://www .beckershospitalreview.com/pdfs/May2nd/Track%20B/Jaffer_ Cooke_Brady_Track%20B_1125am.pdf Fernandes, D. (2017, May 12). Liberty Mutual closing its research unit. Boston Globe. https://libertymutualresearch.org/boston-globe/ Gandhi, T.K., Feeley, D., & Schummers, D. (2020). Zero Harm in health care. NEJM Catalyst, 1(2). https://doi.org/10.1056/CAT.19.1137 Greenstone, Ellen S. (1975). Farmworkers in jeopardy: OSHA, EPA, and the pesticide hazard. Ecology Law Quarterly, 51, 69–137. Greiner, D., & Kranig, A. (1998). Prevention, rehabilitation and compensation in the german accident insurance system. In Stellman, JM (Ed.), ILO encyclopaedia of occupational health and safety (4th ed., Vols. 1–4). Geneva: International Labour Office. Hamilton, A. (1985). Exploring the dangerous trades: The autobiography of Alice Hamilton, M.D. (Reprint, first published 1942). Boston: Northeastern University Press. Hay, A. (1982). The chemical scythe: Lessons of 2, 4, 5-T, and dioxin. New York: Plenum Press. Hunter, D. (1978). The diseases of occupations (6th ed.). London: Hodder & Staughton. Institute of Medicine. (1999). To err is human: Building a safer health system. https://doi.org/10.17226/9728 Institute of Medicine Committee on Regulating Occupational Exposure to Tuberculosis. (2001). OSHA in a health care context. In Tuberculosis in the workplace. New York: National Academies Press (US). https://www.ncbi.nlm.nih.gov/books/NBK222461 International Labour Office. (n.d.). Global trends on occupational accidents and diseases. Retrieved June 19, 2020, from https:// www.ilo.org/legacy/english/osh/en/story_content/external_files/ fs_st_1-ILO_5_en.pdf International Labour Office. (2009). Guidelines on occupational safety and health management systems, ILO-OSH 2001. Geneva: International Labour Office.
International Standards Organization (ISO) (2018). ISO 45001:2018, Occupational health and safety management systems—Requirements with guidance for use (2018). https:// www.iso.org/files/live/sites/isoorg/files/archive/pdf/en/iso_ 45001_briefing_note.pdf Lancianese, A. (2019, January 20). Before Black Lung, The Hawks Nest Tunnel Disaster Killed Hundreds. In Investigations. NPR. https://www.npr.org/2019/01/20/685821214/before-black-lungthe-hawks-nest-tunnel-disaster-killed-hundreds Leigh, J. P., & Robbins, J. A. (2004). Occupational disease and workers’ compensation: Coverage, costs, and consequences. The Milbank Quarterly, 82(4), 689–721. PubMed. https://doi.org/10.1111/ j.0887-378X.2004.00328.x McKernan, L, & Seaton, M. (2014, May). The banding marches on. The Synergist, 44–46. Montalvo, I. (2007, June 19). American Nurses Association Nursing Sensitive Measures: National Database of Nursing Quality Indicators (NDNQI®). Washington, DC: National Committee on Vital Health and Statistics Quality Workgroup (QWG). https://www .ncvhs.hhs.gov/wp-content/uploads/2014/05/070619p8.pdf National Institute for Occupational Safety and Health (NIOSH). (n.d.). Total Worker Health. Retrieved June 14, 2020, from https://www .cdc.gov/niosh/TWH/ Occupational Safety and Health Administration. (n.d.a). General facts: Inspections. Retrieved June 21, 2020, from https://www.osha.gov/ OshDoc/data_General_Facts/factsheet-inspections.pdf Occupational Safety and Health Administration. (n.d.-b). Facts on aligning the hazard communication standard to the GHS. Retrieved June 19, 2020, from https://www.osha.gov/as/opa/facts-hcs-ghs .html Occupational Safety and Health Administration. (n.d.-c). Transitioning to safer chemicals: A toolkit for employers and workers. https:// www.osha.gov/dsg/safer_chemicals Occupational Safety and Health Administration. (n.d.-d). Respiratory protection. https://www.osha.gov/respiratory-protection Office of Occupational Statistics and Employment. (2020). Occupational outlook handbook. Washington, DC: Bureau of Labor Statistics. https://www.bls.gov/ooh/healthcare/home.htm OSHA. (2008). OSHA Directive No. CSP-03-01-003, Chapter III, Requirements for star, merit, resident contractor, construction industry, and federal agency worksites. https://www.osha.gov/ OshDoc/Directive_pdf/CSP_03-01-003.pdf OSHA. (2012). Safety and health management systems and Joint Commission Standards: A comparison. https://www.osha.gov/dsg/ hospitals/documents/2.2_SHMS-JCAHO_comparison_508.pdf
596 Pollack, E. S. & Keimig, D. F. (Eds.) (1987). Counting injuries and illnesses in the workplace: Proposals for a better system. New York: National Research Council, National Academy Press. Sec. 299A.41 MN Statutes, Minnesota Legislature (testimony of Office of the Revisor or Statutes). Retrieved June 22, 2020, from https:// www.revisor.mn.gov/statutes/cite/299A.41 Sinclair, U. (1906). The jungle. New York: Doubleday. Stellman, J. M. (1977). Women’s work, women’s health: Myths and realities. New York: Pantheon. Tamers, S. L., Goetzel, R., Kelly, K. M., Luckhaupt, S., Nigam, J., Pronk, … Sorensen, G. (2018). Research methodologies for Total Worker Health®: Proceedings from a workshop. Journal of Occupational and Environmental Medicine, 60(11), 968–978. PubMed. https://doi.org/10.1097/JOM.0000000000001404 US Bureau of Labor Statistics. (n.d.). Handbook of methods. Washington, DC: U.S. Department of Labor, Bureau of Labor Statistics.
DESIGN FOR HEALTH, SAFETY, AND COMFORT Retrieved June 15, 2020, from https://www.bls.gov/opub/hom/ home.htm US Department of Labor. (n.d.). Workers’ compensation. Retrieved June 20, 2020, from https://www.dol.gov/general/topic/workcomp Utterback, D. F., Meyers, A. R., & Wurzelbacher, S. J. (2014). Workers’ compensation insurance: A primer for public health. DHHS Publication No. (NIOSH). https://stacks.cdc.gov/view/cdc/ 21466 Wang, Z., Walker, G. W., Muir, D. C. G., & Nagatani-Yoshida, K. (2020). Toward a global understanding of chemical pollution: a first comprehensive analysis of national and regional chemical inventories. Environmental Science & Technology, 54(5), 2575–2584. https://doi.org/10.1021/acs.est.9b06379 Wedell-Wedellsborg, T. (2017, January 1). Are you solving the right problems? Harvard Business Review, January–February 2017. https://hbr.org/2017/01/are-you-solving-the-right-problems
CHAPTER 22 MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE William S. Marras Biodynamics Laboratory The Ohio State University Columbus, Ohio
Waldemar Karwowski University of Central Florida Orlando, Florida
1 2
INTRODUCTION
597
6.2
Job Demand Index
609
6.3
NIOSH Lifting Guide and Revised Lifting Equation
609
MAGNITUDE OF LOW-BACK PAIN PROBLEM AT WORK
598
6.4
NIOSH Lifting Index and Risk of Exposure to Low-Back Disorders
610
3
EPIDEMIOLOGY OF WORK RISK FACTORS
599
6.5
Video-Based Biomechanical Models
611
4
OCCUPATIONAL BIOMECHANICS LOGIC
600
5
BIOMECHANICS OF RISK
601
5.1
Relationship Between Tissue Stimulation and Pain
601
5.2
Functional Lumbar Spinal Unit Tolerance Limits
602
5.3
Ligament Tolerance Limits
603
5.4
Facet Joint Tolerance
604
5.5
Adaptation
604
5.6
Psychophysical Limits as a Tolerance Threshold
604
5.7
6
1
Physiological Tolerance Limits
604
5.8
Psychosocial Pathways
605
5.9
Spine-Loading Assessment
605
8 9
ASSESSMENT METHODS AND IDENTIFICATION OF LOW-BACK DISORDER RISK AT WORK
609
6.1
609
3DSSPP
7
INTRODUCTION
Low-back disorders (LBDs) resulting in low-back pain (LBP) are common experiences in life. Although LBDs appear to occur more frequently as one ages, it does not need to be an inevitable result of aging. There is also abundant information about the work relatedness of LBD. Both the physical as well as organizational/psychosocial aspects of work have independently been associated with higher rates of LBDs. At a superficial level, these findings may appear to represent a paradox relative to LBD causality, and there has been significant debate about the contribution of work factors compared to individual factors in defining risk. However, for most of us these factors coexist and are for the most part inexplicably linked. When one steps away from the opinionated motivations behind many of the causal claims, it is clear that there is both a natural degenerative impact of aging upon the spine that is capable of leading to pain for some people. However, this degenerative process can be greatly accelerated through work exposure, thus leading to greater incidences of LBDs at the workplace.
PRACTICAL INDUSTRY GUIDELINES
613
7.1
ACGIH Guidelines: Lifting Threshold Limit Values (LTVs)
613
7.2
Guidelines for One-Handed and Two-Handed Manual Lifting
615
7.3
Guidelines for Manual Pushing and Pulling Tasks
616
7.4
Guidelines for Lifting in Confined Spaces
618
7.5
Workplace Design Guidelines for Asymptomatic vs. Low-Back-Injured Workers
618
7.6
Exoskeletons and Wearable Robotics
619
PROCESS OF IMPLEMENTING ERGONOMIC CHANGE
621
CONCLUSION
622
REFERENCES
622
The above suggests that one can never totally eliminate the risk of LBD in the workplace since a natural or base rate of LBDs would be expected to occur due to individual factors such as heredity and aging. However, through the proper design of work it is possible to minimize the additional (and often substantial) risk that could be offered through workplace risk factors. Therefore, this chapter will focus primarily upon what we now know about the causal factors leading to LBDs and LBP as well as how the workplace can be assessed and designed to minimize its impact on contributing to this additional workplace risk. Hence, this chapter will concentrate primarily upon the preventive aspects of workplace design from an ergonomics standpoint. The science of ergonomics is concerned primarily with prevention (Karwowski, 2005; Karwowski & Marras, 2003; Marras & Karwowski, 2006). Many large and small companies have permanent ergonomic programs (processes) in place and have successfully controlled the risk as well as the costs associated with musculoskeletal disorders (GAO, Government Accountability Office, 1997). Ergonomic approaches attempt to alter 597
598
DESIGN FOR HEALTH, SAFETY, AND COMFORT
the work environment with the objective of controlling risk exposure and optimizing efficiency and productivity. Two types of risk control (interventions) categories are used in the workplace. The first control category involves engineering controls that physically change the orientation of the work environment relative to the worker. Engineering controls alter the workplace and create a “smart” work environment where the risk has been minimized so that the work–person interface is optimal for productivity and minimal for risk. The second category of control involves administrative controls that are employed when it is not possible to provide engineering controls. It should be understood that administrative controls do not eliminate the risk. They attempt to control risk by managing the time of exposure to the risk in the workplace and, thus, require active management. Administrative controls often consist of rotation of workers to ensure that workers have adequate time to recover from exposure to risks through appropriate scheduling of non-risk exposure tasks. While ergonomics typically addresses all aspects of musculoskeletal disorders as well as performance issues, this chapter will be limited to issues and principles associated with the prevention of LBDs due to repetitive physical work (not including vibration).
2 MAGNITUDE OF LOW-BACK PAIN PROBLEM AT WORK Since most people work, workplace risk factors and individual risk factors are difficult to separate (de Campos et al., 2020; Foster et al., 2018; Hartvigsen et al., 2004; Hoy et al., 2010; Lee et al, 2001; National Research Council, 2001). Nonetheless, the magnitude of LBDs in the workplace can be appreciated via surveys of working populations. In the United States, back disorders are associated with more days away from work than any other part of the body (Jacobs et al., 2008; National Institute for Occupational Safety and Health (NIOSH), 2000). A study of 17,000 working-age men and women in Sweden (Vingård et al., 2002) indicated that 5% of workers sought care for a new LBP episode over a three-year period. In addition, they reported that many of these LBP cases became chronic. Assessment of information gathered in the National Health Injury Survey (NHIS) found that back pain accounts for about one-quarter of the workers’ compensation claims in the United States (Guo et al., 1995). About two-thirds of these LBP cases were related
Table 1 • • • • • • • • •
to occupational activities. Prevalence of lost-work days due to back pain was found to be 4.6% (Guo, Tanaka, Halperin, & Cameron, 1999). Many studies concluded that people with physically demanding jobs, physical and mental comorbidities, smokers, and obese individuals have the most significant risk of reporting low back pain (LBP) (Buchbinder et al., 2018; Hartvigsen et al., 2018; Hoogendoorn et al., 2000; Steffens et al., 2016). Table 1 presents the main facts relevant to LBP incidence worldwide. The efforts reported through the Bone and Joint Decade study (Jacobs et al., 2008) have evaluated the burden of low-back problems on U.S. workers. This assessment reports that about 32% of the population reports pain that limits their ability to do work and 11% of workers report pain that limits their ability to do any work. In these categories, 62% and 63% of workers, respectively, report low-back dysfunction as the limiting factor responsible for their work limitations. When work limitation due to back pain was considered as a function of gender, we see that more females report slightly more back pain than males. As shown in Figure 1, back pain that limits work or prevents one from working occurs more frequently as a function of age up until 65–74 years of age and then deceases slightly over the age of 75. Certain types of occupations have also reported significantly greater rates of LBP. Reported risk was greatest for construction laborers (prevalence 22.6%) followed by nursing aides (19.8%) (Guo et al., 1995). However, a literature review (Hignett, 1996) concluded that the annual LBP prevalence in nurses is as high as 40–50%. Figure 2 shows a summary of the distribution of lost-time back cases in private industry as a function of the type of work and the source of the injury based upon a NIOSH analysis of work-related LBDs (NIOSH, 2000). Figure 2 suggests that the service industry, followed by manufacturing jobs, accounts for nearly half of all prevalence for occupationally related LBDs. It also indicates that handling of containers and worker motions or position assumed during work are very often associated with LBDs in industry. Therefore, these data strongly suggest that occupational factors can be related to risk of LBDs. Recently, Ferguson et al. (2019) studied the prevalence of lower back pain, seeking medical care for low back pain, and lost time due to low back pain among manual material handling workers in the United States. The study focused on analyzing the last 12 months of reported low back health. This included the information on: (1) LBP lasting 7 days; (2) seeking medical care for LBP; and (3) taking time off work due to LBP.
Basic Information about Low Back Pain (LBP) and Its Worldwide Impact on the Workplace
LBP is an extremely common symptom in populations worldwide and occurs in all age groups LBP is responsible for over 60 million disability-adjusted life-years in 2015, an increase of 54% since 1990, with the biggest increase seen in low-income and middle-income countries Disability from LBP is highest in working age groups worldwide, which is especially concerning in low-income and middle-income countries where informal employment is common and possibilities for job modification are limited Most episodes of LBP are short-lasting with little or no consequence, but recurrent episodes are common and LBP is increasingly understood as a long-lasting condition with a variable course rather than episodes of unrelated occurrences. LBP is a complex condition with multiple contributors to both the pain and associated disability, including physical/biomechanical factors, psychological factors, social factors, biophysical factors, comorbidities, and pain-processing mechanism For the vast majority of people with LBP, it is currently not possible to accurately identify the specific nociceptive source Lifestyle factors that relate to poorer general health, such as smoking, obesity, and low levels of physical activity, are also associated with LBP episodes Costs associated with health care and work disability attributed to LBP vary considerably between countries, and are influenced by social norms, health-care approaches, and legislation The global burden of LBP is projected to increase even further in the coming decades, particularly in low-income and middle-income countries
Source: Modified from Hartvigsen et al. (2018).
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
599
% of back pain in one site
100 90 80 70 60
Back pain limits amount of work can do
50
Pain keeps from working
40 30 20 10 0 18–44
65–74 45–64 Age (yr)
Figure 1
Other 25%
48,212
Age distribution and its relationship to work limitations.
Floors, walkways, or ground surfaces 10% Health care patient 10%
47,812
119,008
75 & over
54,381
Manufacturing 21%
Services 28%
97,785
131,428
Parts and materials 12%
76,597 Retail trade 16%
44,499 122,319
Construction 9%
80,357
Containers 26%
Worker motion or position 17%
(a)
61,270 Other 13%
(b)
60,512 Transportaion and public utilities 13%
Figure 2 (a) Number and distribution of back cases with days away from work in private industry by industry division in 1997 (NIOSH, 2000). (b) Number and distribution of back cases with days away from work in private industry by source of the disorder in 1997 (Jacobs et al., 2008).
The results showed that during the 12-month period, the prevalence of low back pain lasting 7 days among the nearly 2000 US manual material handling workers was 25%. Furthermore, 14% of these workers sought medical care, and 10% reported lost time due to LBP. There were no statistically significant differences in gender, age, or weight between cases and non-cases for any prevalence measure.
3
EPIDEMIOLOGY OF WORK RISK FACTORS
Numerous literature reviews have endeavored to identify specific risk factors that may increase the risk of LBDs in the workplace. One of the first attempts at consolidating this information was performed by the NIOSH. In this critical review of the epidemiological evidence associated with musculoskeletal disorders (Putz-Anderson et al., 1997), five categories of risk factors were evaluated. This evaluation suggested that strong evidence existed for an association between LBDs and lifting/forceful movements and LBDs and whole-body vibration. In addition, the evaluation concluded that there was significant evidence establishing associations between heavy physical work and awkward postures and back problems. Additionally, insufficient evidence was available to make any conclusions between static work postures and LBD risk.
Independent methodologically rigorous literature reviews by Hoogendoorn and colleagues (1999) were able to support these conclusions. Specifically, they concluded that manual materials handling, bending and twisting, and whole-body vibration were all significant risk factors for back pain. Numerous investigations have attempted to assess the potential dose–response relationship among work risk factors and LBP. In particular, studies have been interested in the existence of an occupational “cumulative load” relationship with LBD. Two studies (Kumar, 1990; Norman et al., 1998) suggested the existence of such a cumulative load –LBD relationship in the workplace, although Videman et al. (1990) suggested that this relationship might not be a linear relationship. Videman et al. found that the relationship between history of physical loading due to occupation (cumulative load) and history of LBP was “J shaped” with sedentary jobs being associated with moderate levels of risk, heavy work being associated with the greatest degree of risk, and moderate exposure to loading being associated with the lowest level of risk (Figure 3). Seidler and colleagues (2001) have suggested a multifactor relationship with risk in that the combination of occupational lifting, trunk flexion, and duration of the activities significantly increased risk. Several studies have been able to identify risk with high levels of sensitivity and specificity when continuous dynamic
600
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Risk of low-back pain
Table 2 Summary of Epidemiological Evidence with Risk Estimates (Attributable Fraction) of Associations with Work-Related Factors Associated with LBDs
Work-related risk factor
Sedentary Heavy Work load Figure 3 Relationship between risk of LBP and work intensity exposure.
biomechanical measures are employed (Marras, Lavender, Ferguson, et al., 2010a, 2010b). These efforts indicated that collective exposure to dynamic sagittal bending moments above 49 N-m, lateral trunk velocities greater than 84.1 deg/s, and exposure to the moment occurring after the midway point of the lift (more than 47.6% of the lift duration) yielded a sensitivity of 85% and a specificity of 87.5% in its ability to identify jobs resulting in reduced spine function. Studies have also implicated psychosocial factors in the workplace as work risk factors for LBDs (Bigos et al., 1991; Bongers, de Winter, Kompier, & Hildebrandt, 1993; Gray, Adefolarin, & Howe, 2011; Hartvigsen et al., 2004; Hoogendoorn et al., 2000; Iles, Davidson, & Taylor, 2008; Karasek et al., 1998; Schultz et al., 2004; Serranheira et al., 2020; van Poppel et al., 1998; Wai et al., 2010). Studies have also indicated that monotonous work, high perceived work load, time pressure, low job satisfaction, and lack of social support were all related to LBD risk. Yet, the specific relationship with LBD appears to be unclear. Davis and Heaney (2000) found that the impact of psychosocial factors was diminished, although still significant, once biomechanical factors were accounted for in the study designs. Several secondary prevention investigations of LBDs have begun to explore the interaction between LBDs, physical factors, and psychosocial factors. Frank and colleagues (1996) as well as Waddell (1992, 2004) have concluded that much of LBP treatment is multidimensional. Primary prevention epidemiological studies have indicated that multiple categories of risk, such as physical stressors and psychosocial factors, play a role in LBD risk (Krause et al., 1998). Other studies have reported that low social support at the workplace and bending at work were strongly associated with extended work absences due to LBP (Bento et al., 2020; Dueñas et al., 2020; McKillop et al., 2017; Tubach et al., 2002). Perhaps the most comprehensive review of the epidemiological literature was performed by the National Research Council/Institute of Medicine (NRC, 2001). This assessment concluded that there is a clear relationship between LBDs and physical load imposed by manual material handling, frequent bending and twisting, physically heavy work, and whole-body vibration. Using the concept of attributable risk (attributable fraction), this analysis was able to determine the portion of LBP that would have been avoided if workers were not exposed to specific risk factors. As indicated in Table 2, the vast majority of high-quality epidemiological studies have associated LBDs with these risk factors and as much as two-thirds of risk can be attributed to materials handling activities. It was concluded that preventive measures may reduce the exposure to risk factors and reduce the occurrence of back problems.
Manual material handling Frequent bending and twisting Heavy physical load Static work posture Repetitive movements Whole-body vibration
Attributable fraction (%) range 11–66 19–57 31–58 14–32 41 18–80
Source: NRC (2001).
4 OCCUPATIONAL BIOMECHANICS LOGIC While epidemiological findings help us understand what exposure factors could be associated with work-related LBDs, the literature is problematic in that it cannot prescribe an optimal level of exposure in order to minimize risk. The previous section concluded that moderate levels of exposure are least risky for LBDs; however, we do not know what, precisely, constitutes moderate levels of exposure. The National Research Council/Institute of Medicine’s (NRC, 2001) review of epidemiological evidence and LBDs states that “epidemiologic evidence itself is not specific enough to provide detailed, quantitative guidelines for design of the workplace, job, or task.” This lack of specificity results from a dearth of the continuous exposure measures. Most epidemiological studies have documented workplace exposures in a binary fashion where they document if a specific threshold of exposure has been exceeded. For example, many studies document whether workers lift more than 25 lbs or not. Without continuous measures, it is impossible to ascertain the specific “levels” of exposure that would be associated with an increased risk of LBDs (NRC, 2001). In addition, from a biomechanical standpoint, we know that risk is a much more complex issue. We need to understand the load origin in terms of distance from the body and height off the floor as well as the load destination location if we are to understand the forces imposed on the body through the lifting task. Defining risk is most likely multidimensional. Hence, in order to more fully understand “how much exposure is too much exposure” to risk factors, it is necessary to understand how work-related factors interact and lead to LBDs. Thus, causal pathways are addressed through biomechanical and ergonomic analyses. Collectively, the biomechanical literature as a whole provides the specificity of exposure and a promising approach to controlling LBD risk in the workplace. Biomechanical logic provides a logic structure to help us understand the mechanisms that might affect the development of a LBD (Chaffin et al., 1977; Gatchel & Schultz, 2014; Marras, 2000, 2012; Urban & Fairbank, 2020; Wu et al., 2020). At the center of this logic is the notion that risk can be defined by comparing the load imposed upon a structure with the tolerance of that same structure. As shown in Figure 4a, McGill (1997) suggests that, during work, the structures and tissues of the spine undergo a loading pattern with each repeated job cycle. When the magnitude of the load imposed upon a structure or tissue exceeds the structural tolerance limit, tissue damage occurs. The tissue damage might be capable of setting off the sequence of events that could lead to LBD. With this logic, if the magnitude of the imposed load is below the structural tolerance, the task can be considered free of risk to the tissue. The magnitude of the distance between the structure loading and the tolerance
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
601
Tolerance
Spinal load
Safety margin
Loading pattern
Spinal load
Time (a)
Tolerance
Loading pattern Time (b) Figure 4 Biomechanical load–tolerance relationships. (a) When the tolerance exceeds the load, the situation is considered safe with the distance between the two benchmarks considered a safety margin. (b) Cumulative trauma occurs when the tolerance decreases over time.
can be thought of as a safety margin. On the other hand, if the load exceeds the tolerance, significant risk is present. Biomechanics reasoning can also be employed to describe the processes believed to be at play during cumulative trauma exposure. When exposed to repetitive exertions, one would expect the tolerance to be subject to degradation over time (Figure 4b). Yet, as the work is performed repeatedly, we would expect that the loading pattern would remain relatively constant, whereas with overuse we would expect the tolerance limit to drop over time. This process would make it more probable that the tissue load exceeds the tissue tolerance and triggers a potential disorder.
5 BIOMECHANICS OF RISK There are numerous pathways to pain perception associated with LBDs. These pain pathways are the key to understanding how tissue loading results in LBP. In addition, if one appreciates how pain is related to the factors associated with tissue loading, then one can use this knowledge to minimize the exacerbation of pain in workplace design. Thus, this knowledge forms the basis of ergonomics thinking. One can quantitatively target the limits above which a pain pathway is initiated as a tolerance limit for ergonomic purposes. Although these pathways have not been explicitly defined, designing tasks relative to these general principles is appealing since they represent biologically
plausible mechanisms that are consistent with the injury association derived from the epidemiological literature. Three general pain pathways are believed to be present for the spine that may affect the design of the workplace. These pathways are related to: (1) structural and tissue stimulation; (2) physiological limits; and (3) psychophysical acceptance. It is expected that each of these pathways have different tolerances to the mechanical loading of the tissue. Thus, in order to optimally design a workplace, one must orient the specific tasks so that the ultimate tolerance within each category is not exceeded. 5.1 Relationship Between Tissue Stimulation and Pain There are several structures in the back that when stimulated are capable of initiating pain perception. Both cellular and neural mechanisms can initiate and exacerbate pain perception. Several investigations have described the neurophysiological and neuroanatomical origins of back pain (Bogduk, 1995; Cavanaugh, 1995; Cavanaugh et al., 1997; Kallakuri et al., 1998; Siddall & Cousins, 1997b). These pathways involve the application of force or pressure on a structure that can directly stimulate pain receptors and can trigger the release of pain-stimulating chemicals. Pain pathways in the low back have been identified for pain originating from the facet joints, disc, longitudinal ligaments, and sciatica. Facet joint pain is believed to be associated with
602
DESIGN FOR HEALTH, SAFETY, AND COMFORT
the distribution of small nerve fibers and endings in the lumbar facet joint, nerves containing substance P (a pain-enhancing biochemical), high-threshold mechanoreceptors in the facet joint capsule, and sensitization and excitation of nerves in the facet joint and surrounding muscle when the nerves were exposed to inflammatory biochemicals (Dwyer, April, & Bogduk, 1990; Özaktay, Cavanaugh, Blagoev, & King, 1995; Yamashita et al., 1996). The pathway for disc pain is believed to activate through an extensive distribution of small nerve fibers and free nerve endings in the superficial annulus of the disc as well as in small fibers and free nerve endings in the adjacent longitudinal ligaments (Bogduk, 1991, 1995; Cavanaugh et al., 1995; Kallakuri et al., 1998). Sciatic pain is thought to be associated with mechanical stimulation of some of the spine structures. Moderate pressure placed on the dorsal root ganglia can result in vigorous and long-lasting excitatory discharges that could easily explain sciatica. In addition, sciatica might be explained through excitation of the dorsal root fibers when the ganglia are exposed to the nucleus pulposus. Stimulation and nerve function loss in nerve roots exposed to phospholipase A2 could also explain the pain associated with sciatica (Cavanaugh et al., 1997; Chen et al., 1997; Özaktay, Kallakuri, & Cavanaugh, 1998). Studies are demonstrating the importance of proinflammatory agents such as tumor necrosis factor alpha (TNF𝛼) and interleukin-1 (IL-1) (Dinarello, 2000) in the development of pain. Proinflammatory agents are believed to upregulate vulnerability to inflammation under certain conditions and set the stage for pain perception. Thus, it is thought that mechanical stimulation of tissues can initiate this sequence of events and thus become the initiator of pain. It may be possible to consider the role of these agents in a load–tolerance model where tolerance may be considered the point at which these agents are upregulated. A preliminary study (Yang et al., 2011) has
demonstrated that loads on spine structures due to occupational tasks are capable of initiating such a chemical reaction. This body of work is providing a framework for a logical link between the mechanical stimulation of spinal tissues and structures and the sensation of LBP that is the foundation of occupational biomechanics and ergonomics. 5.2 Functional Lumbar Spinal Unit Tolerance Limits Individual structure tolerances within lumbar functional spinal units are often considered, collectively, as part of the structural support system. The vertebral body can withstand fairly large loads when compressed, and since the end plate is usually the first structure to yield, the end-plate tolerance is often considered as a key marker of spine damage leading to pain. A review by Jäger (1987) indicated the compressive tolerance of the end plate reported in the literature can be large (over 8 kN), especially in upright postures, but highly variable (depending greatly on age), with some specimens indicating failure at 2 kN. Damage to human vertebral cancellous bone often results from shear loading and the ultimate strength is correlated with tissue stiffness when exposed to compressive loading (Fyhrie & Schaffler, 1994). Bone failure typically occurs along with disc herniation and annular delamination (Gunning, Callaghan, & McGill, 2001). Thus, damage to the bone itself appears to often be a part of the cascading series of events associated with LBP (Brinckmann, 1985; Kirkaldy-Willis, 1988; Siddall & Cousins, 1997b). Figure 5 depicts compressive forces for ten selected physical activities with the highest compressive forces, including lifting weights from the ground, holding weights at arm level, and moving weight in front of the body, exhibiting the highest compressive forces (Marras, Mageswaran, Khan, & Mendel, 2018; Rohlmann et al., 2014).
Lifting weight from ground Arm elevation with weight in hands Moving weight in front of body Standing up/sitting down Staircase walking
WP5
Tying shoes
WP4 WP3
Upper body flexion
WP2 Lifting a carried weight WP1 Washing face Moving from lying to sitting Walking
0
250
500
750
1000
1250
1500
1750
Maximum resultant force (N) Figure 5
Ten physical activities with the highest compressive forces for five patients (WP1-5). Source: Marras et al. (2018).
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
There are several lines of thinking about how vertebral end plate microfractures can lead to low-back problems. One line of thinking contends that the health of the vertebral body end plate is essential for proper mechanical functioning of the spine. When damage occurs to the end plate, nutrient supply is restricted to the disc and this can lead to degeneration of the disc fibers and disruption of spinal function (Moore, 2000). The literature supports the notion that the disruption of nutrient flow is capable of initiating a cascading series of events leading to LBP (Brinckmann, 1985; Kirkaldy-Willis, 1988; Siddall & Cousins, 1997a, 1997b). The literature suggests that the end plate is often the first structure to be damaged when the spine is loaded, especially at low load rates (Brinckmann et al., 1988; Callaghan & McGill, 2001; Holmes, Hukins, & Freemont, 1993; Moore, 2000). Vertebral end-plate tolerance levels have been documented in numerous investigations. End-plate failure typically occurs when the end plate is subjected to compressive loads of 5.5 kN (Holmes et al., 1993). In addition, end-plate tolerances decrease by 30–50% with exposure to repetitive loading (Brinkmann et al., 1988) and this suggests that the disc is affected by cumulative trauma. The literature suggests that spine integrity can also be weakened by anterior–posterior (forward–backward) shear loading. Shear limit tolerance levels beginning at 1290–1770 N for soft tissue and 2000–2800 N for hard tissue have been reported for spinal structures (Begeman et al., 1994; Cripton, 1995). A more recent study suggested limits based upon lifting frequency with 1000 N being the limit for 100 or fewer loadings per day and 700 N representing the limit for 100–1000 loadings per day (Gallagher & Marras, 2012). Load-related damage might also be indicated by the presence of Schmorl’s nodes in the vertebral bodies. Some have suggested that Schmorl’s nodes could be remnants of healed end-plate fractures (Vernon-Roberts & Pirie, 1973, 1977) and might be linked to trauma (Kornberg, 1988; Vernon-Roberts & Pirie, 1973). The position or posture of the spine is also closely related to end-plate tolerance to loading. Flexed spine postures greatly reduce end-plate tolerance (Adams & Hutton, 1982; Gunning et al., 2001). In addition, trunk posture has been documented as an important consideration for occupational risk assessment. Industrial surveillance studies by Punnett et al. (1991), Marras et al. (1993, 1995), and Norman and colleagues (1998) have all suggested that LBD risk increases when trunk posture deviates from a neutral upright posture during the work cycle. It also appears that individual factors also influence end-plate integrity. Most notably, age and gender appear to greatly influence the biomechanical tolerance of the end plate (Jäger et al., 1991) in that age and gender are related to bone integrity. Brinckmann et al. (1988) have demonstrated that bone mineral content and end-plate cross-sectional area are responsible for much of the variance in tolerance (within 1 kN). There is little doubt that the disc can be subject to damage with sufficient loading. Disc herniations occur frequently when the spine is subject to compression and positioned in an excessively flexed posture (Adams & Hutton, 1982). In addition, repeated flexion, even under moderate compressive loading conditions, can produce repeated disc herniations (Callaghan & McGill, 2001). Under anterior–posterior shear conditions, avulsion of the lateral annulus can occur (Yingling & McGill, 1999a, 1999b). The torsion tolerance limit of the disc can be exceeded at loads as low as 88 N-m in an intact disc and 54 N-m in a damaged disc (Adams & Hutton, 1981; Farfan et al., 1970). When the spine is loaded in multiple dimensions simultaneously, risk also increases. The literature indicates that when the spine assumes complex spinal postures such as hyperflexion with lateral bending and twisting, disc herniation is increasingly likely (Adams & Hutton, 1985; Gordon et al., 1991). Disc tolerance can also be associated with diurnal cycles or time of day when the lifting exposure occurs. Snook and
603
associates (1998) found that flexion early in the day was associated with an increased risk of a LBP report. In addition, Fathallah and colleagues (1995) found similar results reporting that the risk of injury was greater early in the day when disc hydration was at a high level. Therefore, the temporal component of risk associated with work exposure must be considered when assessing risk. This brief review of the spine’s tolerance limits indicated that the tolerance limits of the functional lumbar spinal unit vary considerably. Adams et al. (1993) describe a process where repeated vertebral microfractures and scarring of the end plate can lead to an interruption of nutrient flow to the disc. This process can result in weakening of the annulus that can result in protrusion of the disc into the surrounding structures. In addition, the weakened disc can result in spinal instability. The end plate and most of the inner portions of the annulus are not capable of sensing pain. However, once disc protrusion and/or disc instability occurs, loads are transmitted to the outer portions of the annulus and surrounding tissues. These structures are indeed capable of sensing pain. In addition, inflammatory responses can occur and nociceptors of surrounding tissues can be further sensitized and stimulated, thus initiating a sequence of events resulting in pain. Quantitative ergonomics approaches attempt to design work tasks so that spine loads are well within the tolerance limits of the spine structures. While a wide range of tolerance limits have been reported for the functional lumbar spinal unit, most authorities have adopted the NIOSH lower limit of 3400 N for compression as the protective limit for most male workers and 75% of female workers (Chaffin et al., 1977). This limit represents the point at which end-plate microfracture is believed to begin within a large, diverse, population of workers. Similarly, 6400 N of compressive load represents the limit at which 50% of workers would be at risk (NIOSH, 1981). Furthermore, current quantitative assessments are recognizing the complex interaction of spine position, frequency, and complex spine forces (compression, shear, and torsion) as more realistic assessments of risk. However, these complex relationships have yet to find their way into ergonomic assessments nor have they resulted in best practices or standards. 5.3 Ligament Tolerance Limits Ligament tolerances are affected primarily by load rate (Noyes, De Lucas, & Torvik, 1994). Avulsion occurs at low load rates and tearing occurs mostly at high load rates. Therefore, load rate may explain the increased risks associated with bending kinematics (velocity) that have been identified as risk factors in surveillance studies (Fathallah et al., 1998) as well as injuries from slips or falls (McGill, 1997). Posture appears to also play a role in tolerance. While loaded, the architecture of the interspinous ligaments can result in significant anterior shear forces imposed on the spine during forward flexion (Heylings, 1978). This finding is consistent with field observations of risk (Marras et al., 1993, 1995; Marras, Allread, et al., 2000; Norman et al., 1998; Punnett et al., 1991). Field observations have identified 60 N-m as the point at which tissue damage is initiated (Adams & Dolan, 1995). Similarly, surveillance studies (Marras et al., 1993, 1995) have identified exposures to external load moments of at least 73.6 N-m as being associated with greater risk of occupationally related LBP reporting. Also reinforcing these findings was a study by Norman and colleagues (1998), who reported nearly 30% greater load moment exposure in those jobs associated with risk of LBP. In this study, mean moment exposure associated with the back pain cases was 182 N-m of total load moment (load lifted plus body segment weights). Spine curvature or lumbar lordosis may also affect the loading and tolerance of the spinal structures. Findings from
604
Canadian researchers have demonstrated that when lumbar spinal curvature is maintained during bending the extensor muscles support the shear forces of the torso. However, when the spine (and posterior ligaments) are flexed during bending, significant shear can be imposed on the ligaments (McGill et al., 1994; McGill & Norman, 1987; Potvin et al., 1991). The shear tolerance of the spine can easily be exceeded (2000–2800 N) when the spine is in full flexion (Cripton, 1995). As with most living tissues, temporal factors play a large role in recovery of the ligaments. Solomonow and colleagues have found that ligaments require long periods of rest to regain structural integrity. During this recovery period, it is likely that compensatory muscle activities are recruited (Gedalia et al., 1999; Solomonow et al., 1998, 1999, 2000, 2002; Stubbs et al., 1998; Wang, Parnianpour, Shirazi-Adl, & Engin, 2000), and these muscle activities can easily increase spine loading. Required recovery time has been estimated to be several times the loading period duration and thus may easily exceed the typical work–rest cycles common in industry. 5.4 Facet Joint Tolerance The facet joints are capable of supporting a significant portion of the load transmitted along the spinal column. Therefore, it is important to understand the tolerance limits of this structure. Facet joint failure can occur in response to shear loading of the spine. McGill and colleagues have reported that much of the tissues which load the facets are capable of producing significant horizontal forces and thus place these structures at risk during occupational tasks (McGill, 2002). Cripton (1995) estimates a shear tolerance for the facet joints of 2000 N. These findings are consistent with industrial observations indicating that exposure to lateral motions and shear loading was associated with an increased risk of LBDs (Marras et al., 1993, 1995; Norman et al., 1998). In addition, laboratory-based studies confirm that exposure to high lateral velocity can result in significant lateral shear forces in the lumbar spine (Marras & Granata, 1997b). Torsion can also load the facet joints to a failure point (Adams & Hutton, 1981). Exposure to excessive twisting moments, especially when combined with high-velocity motions, has been associated with excessive tissue loading (Marras & Granata, 1995; McGill, 1991; Pope et al., 1986, 1987). Field-based studies have also identified these movements as being associated with high-risk (for LBP) tasks (Marras et al., 1993, 1995; Norman et al., 1998). The load imposed upon the spinal tissues when exposed to torsional moments also depends greatly upon the posture of the spine, with greater loads occurring when deviated postures (from neutral) are adopted (Marras & Granata, 1995). The specific structure loading pattern depends upon both the posture assumed during the task as well as the curvature of the spine, since a great amount of load sharing occurs between the apophyseal joints and the disc (Adams & Dolan, 1995). Therefore, spine posture dictates both the nature of spine loading as well as the degree of risk to the facet joints or the disc. 5.5 Adaptation An important consideration in the interpretation of the load–tolerance relationship of the spine is that of adaptation. Wolff’s law suggests that tissues adapt and remodel in response to the imposed load. In the spine, adaptation in response to load has been reported for bone (Carter, 1985), ligaments (Woo et al., 1985), disc (Porter et al., 1989), and vertebrae (Brinckmann et al., 1989). Adaptation may explain the observation that the greatest risk has been associated with jobs involving both high loads and low levels of spinal load, whereas jobs associated with moderate spine loads appear to enjoy the lowest levels
DESIGN FOR HEALTH, SAFETY, AND COMFORT
of risk (Chaffin & Park, 1973; Videman et al., 1990). Hence, there appears to be an optimal loading zone for the spine that minimizes the risk of exceeding the tolerance limit. 5.6 Psychophysical Limits as a Tolerance Threshold Tolerance limits used in biomechanical assessments of tissue are typically derived from cadaveric testing studies. While these mechanical limits for tissue strength are probably reasonable for the analysis of tasks resulting in acute-trauma-type injuries, the application of cadaver-based tolerances to repetitive tasks is less logical. Repetitive loading must consider the influence of repeated weakening of the structure as well as the impact of tissue repair. Since adaptation is a key distinction in living tissue, quantitative analyses of the load–tolerance relationship under repeated loading become problematic. Highly dynamic tasks may be difficult to characterize through quantitative biomechanical analyses and their injury pathway may be poorly understood. Hence, there is a dearth of biomechanical tolerance limit data that describes how living tissues respond to such repeated loading conditions. In circumstances where mechanical tolerances are not known, an alternative approach to establishing tolerance limits has been to use the psychophysical limit as a tolerance limit. Psychophysics has been used as a means of strength testing where subjects are asked to progressively adjust the amount of load they can push, pull, lift, or carry until they feel that the magnitude of the force exertion would be acceptable to them over an 8-h work shift. Work variables included in such evaluations typically include measures such as lift origin, height, load dimensions, frequency of exertion, push/pull heights, and carrying distance. These variables are systematically altered to yield a database of acceptable conditions or thresholds of acceptance that would be tolerable for a specified range of male and female workers. These data are typically presented in table form and indicate the percentage of subjects who would find a particular load condition acceptable for a given task. Snook and colleagues are best known for publishing extensive descriptions of these psychophysical tolerances (Ciriello et al., 1990; Snook, 1978, 1985a, 1985b, 1987; Snook & Ciriello, 1991). It should be noted that very few investigations have reported whether the design of work tasks using these psychophysical tolerance limits results in a minimization of LBP reports at work. However, one study by Snook (1978) has reported that low-back-related injury claims were three times more prevalent in jobs exceeding the psychophysically determined strength tolerance of 75% of men compared with jobs demanding less strength. 5.7 Physiological Tolerance Limits Energy expenditure limits can also be used as tolerance limits for those jobs where physiological load limits the workers’ ability to perform the work. These limits are associated with the ability of the body to deliver oxygen to the muscles. When muscles go into oxygen debt, insufficient release of adenosine triphosphate (ATP) occurs within the muscle and prolonged muscle contractions cannot be sustained. Therefore, under these extremely high energy expenditure work conditions, aerobic capacity can be considered as a physiological tolerance limit for LBP. The NIOSH has established physiological criteria for limiting heavy physical work based upon high levels of energy expenditure (Waters et al., 1994). These criteria established an energy expenditure rate of 9.5 kcal/min as a baseline for maximum aerobic lifting capacity. Seventy percent of this baseline limit is considered the aerobic tolerance limit for work
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
that is defined primarily as “arm work.” Some 50%, 40%, and 33% of the baseline energy expenditure have been designated as the tolerance limits for lifting task durations of 1, 1–2, and 2–8 h, respectively. While limited epidemiological evidence is available to support these limits, Cady and associates (1979, 1985) have demonstrated the importance of aerobic capacity limits associated with back problems for firefighters. 5.8 Psychosocial Pathways A body of literature has attempted to describe how psychosocial factors might relate to the risk of suffering from a LBD. Psychosocial factors have been associated with risk of LBP in several reviews (Bongers et al., 1993; Burton, Tillotson, Main, & Hollis, 1995) and some researchers have dismissed the role of biomechanical factors as a causal factor (Bigos et al., 1991). However, few studies have appropriately considered biomechanical exposure along with psychosocial exposure in these assessments. A study by Davis and Heaney (2000) demonstrated that no studies have been able to effectively assess both risk dimensions concurrently. More sophisticated biomechanical assessments (Davis et al., 2002; Marras, Davis, et al., 2000) have shown that psychosocial stress has the capacity to influence biomechanical loading. These studies demonstrate that individual factors such as personality can interact with perception of psychosocial stress to increase trunk muscle coactivation and subsequently increase spine loading. Therefore, these studies provide evidence that psychosocial stress is capable of influencing LBP risk through a biomechanical pathway. 5.9 Spine-Loading Assessment A critical part of evaluating the load–tolerance relationship and the subsequent risk associated with work is an accurate and quantitative assessment of the loading experienced by back and spine tissues. The tolerance literature suggests that it is important to understand the specific nature of the tissue loading, including the dimensions of tissue loading such as compression force, shear force in multiple dimensions, load rates, positions of the spine structures during loading, and frequency of loading. Hence, accurate and specific measures associated with spine loading are essential if one is to use this information to assess the potential risk associated with occupational tasks. Currently, it is not practical to directly monitor the loads imposed upon living spine structures and tissues while workers are performing a work task. Instead, indirect assessments such as biomechanical models are typically used to estimate tissue loads. The goal of biomechanical models is to understand how exposure to external loads results in internal forces that may exceed specific tolerance limits in the body. External loads are imposed on the musculoskeletal system through the external environment (e.g., gravity or inertia) and must be countered or overcome by the worker in order to perform work. Internal forces are supplied by the musculoskeletal structures within the body (e.g., muscles, ligaments) that must supply counterforces to support the external load. However, since the internal forces are typically at a severe biomechanical disadvantage (relative to the external moment), these internal forces can be very large and result in large force applications on spine tissues. Since these internal forces are so large, it is extremely important to accurately assess the nature of these loads in order to appreciate the risk of a musculoskeletal disorder. Several biomechanical modeling approaches have been employed for these purposes and these different approaches often result in significantly different trade-offs between their ability to realistically and accurately assess spine loading associated with a task and ease of model use.
605
Early models used to assess spine loading during occupational tasks needed to make assumptions about which trunk muscles supported the external load during a lifting task (Chaffin & Baker, 1970; Chaffin et al., 1977). These initial models assumed that a single “equivalent” muscle vector within the trunk could characterize the trunk’s internal supporting force (and thus define spine loading) required to counteract an external load lifted by a worker. These crude models assumed that a lift could be portrayed as a static equilibrium lifting situation and that no muscle coactivation occurs among the trunk musculature during lifting. The models employed anthropometric regression relationships to estimate body segment lengths representative of the general population. Two output variables were predicted that could be used in a load–tolerance assessment of work exposure. One commonly used model output involved spine compression that is typically compared to the NIOSH-established compression limits of 3400 N and 6400 N. A second model output was population static strength capacity associated with six joints of the body. Specifically, lumbosacral (L5/S1) joint strength was used to assess overexertion risk to the back. This model evolved into a personal computer-based model and was used for general assessments of materials handling tasks involving slow movements (that were assumed to be quasi-static) where excessive compression loads were suspected of contributing to risk. An example of the model program output is shown in Figure 6. Early field-based risk assessments of the workplace have used this method to assess spine loads on the job (Herrin et al., 1986). As computational power became more available, workplace biomechanical models were expanded to account for the contribution of multiple internal muscle reactions in response to the lifting of an external load. The assessment of multiple muscles resulted in models that were much more accurate and realistic. In addition, the spine tolerance literature was beginning to recognize the significance of three-dimensional spine loads as compared to purely compression loads in characterizing potential risk. The multiple-muscle biomechanical models were capable of predicting spine compression forces as well as spine shear forces. The first multiple-muscle system model was developed by Schultz and Andersson (1981). This model demonstrated how loads manipulated outside the body could impose large spinal loads upon the system of muscle within the torso due to the coactivation of trunk muscles necessary to counteract this external load. This model was able to predict asymmetric loading of the spine. Hence, the model represented an advance in realism compared to previous models. However, the approach resulted in indeterminate solutions in that there were more muscle forces represented in the model than functional constraints available to uniquely solve for the forces, so unique solutions were not apparent. Modeling efforts attempted to overcome this difficulty by assuming that certain muscles are inactive during the task (Bean, Chaffin, & Schultz, 1988; Hughes & Chaffin, 1995; Schultz et al., 1982). These efforts resulted in models that worked well for steady-state static representations of a lift but not for dynamic lifting situations (Marras, King, & Joynt, 1984). Later efforts attempted to account for the influence of muscle coactivation upon spine loading under dynamic, complex lifting situations by directly monitoring muscle activity using electromyography (EMG) as an input to multiple-muscle models. EMG measures eliminated the problem of indeterminacy since specific muscle activities were uniquely defined through the neural activation of each muscle. Because of the use of direct muscle activity, these models were termed biologically assisted models. They were not only able to accurately assess compression and shear spine loads for a specific occupationally related movements (Granata & Marras, 1993, 1995a, 1999; Marras &
606
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Figure 6 Example of the static strength prediction program used to assess spine load and strength requirements for a given task. Courtesy of D. Chaffin.
Davis, 1998; Marras, Davis, & Granata, 1998; Marras, Davis, & Splittstoesser, 2001; Marras & Granata, 1995, 1997b, 1997c; Marras & Sommerich, 1991a, 1991b; McGill, 1991, 1992a, 1992b), but also to predict differences among individuals so that variations in loading among a population could be evaluated (Granata et al., 1999; Marras, Allread, et al., 2000. Marras,
bl
Figure 7
t La
RAbd ErSp
In O
SPINE
bl
ErSp RAbd
t La
Ex O
Davis, & Jorgensen, 2002; Marras & Granata, 1997b; Mirka & Marras, 1993) (Figure 7). These models were reported to have excellent external as well as internal validity (Granata et al., 1999; Marras, Granata, & Davis, 1999). The significance of accounting for trunk muscle coactivation when assessing realistic dynamic lifting was demonstrated by Granata and
l
Ob Ex bl nO
I
Biologically (EMG) assisted model used to evaluate spine loading during simulated work activities.
Compression force (N)
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
607
4400
4400
4200
4200
4000
4000
3800
3800
3600
3600
Left hand only
3400
3400
Both hands
3200
3200
Right hand only
3000
3000
2800
2800
2600
2600
2400
2400 60 CCW 30 CCW 0 30 CW 60 CW Lift origin asymmetry (deg)
Figure 8 Mean peak compression force as a function of lift asymmetry (clockwise [CW] vs. counterclockwise [CCW]) and hand(s) used to lift load. Results derived from EMG-assisted model simulation of tasks.
Marras (1995b). They found that not accounting for coactivation models could miscalculate spinal loading by up to 70%. The disadvantage of biologically assisted models is that they require EMG recordings from the worker, which is often not practical in a workplace environment. Hence, many biologically assisted modeling assessments of the spine during work have been performed under laboratory conditions and have attempted to assess specific aspects of the work that may be common to many work conditions. For example, studies have employed EMG-assisted models to assess three-dimensional spine loading during materials handling activities (Hwang et al., 2016; Davis & Marras, 2000; Davis et al., 1998a, 1998b; Marras & Davis, 1998; Marras, Davis, & Splittstoesser, 2001; Marras & Granata,
1997b). In addition, numerous studies have yielded information about various dimensions of lifting using biologically assisted models. Figure 8 illustrates the difference in spine compression as subjects lift with one hand versus two hands as a function of lift asymmetry (Marras & Davis, 1998). This assessment indicates that compressive loading of the spine is not simply a matter of load weight lifted. Considerable trade-offs occur as a function of asymmetry and the number of hands involved with the lift. Trade-offs among workplace factors were evaluated in a study that assessed order-selecting activities in a laboratory setting (Marras, Granata, et al., 1999). Results from this study are shown in Table 3. Table 3 highlights the interaction between load weight, location of the lift (region on the pallet), and
Table 3 Percentage of lifts During Order Selection Tasks in Various Spine Compression Benchmark Zones as a Function of Interaction Between Load Weight, Location of Lift (Region on Pallet), and Presence of Handles Box weight 22.7 kg
18.2 kg Region on Pallet Front top
Back top
Front middle
Back middle
Front bottom
Back bottom
27.3 kg
Benchmarks (N)
Handles
No Handles
Handles
No Handles
Handles
No Handles
6400 6400 6400 6400 6400 6400
100.0 0.0 0.0 98.2 1.8 0.0 98.7 1.3 0.0 88.7 11.3 0.0 45.3 52.0 2.7 35.3 60.7 4.0
100.0 0.0 0.0 89.1 10.9 0.0 91.3 8.7 0.0 82.0 18.0 0.0 30.0 62.0 8.0 24.0 67.3 8.7
100.0 0.0 0.0 84.5 15.5 0.0 94.7 5.3 0.0 80.7 19.3 0.0 29.3 62.7 8.0 30.0 56.7 13.3
99.2 0.8 0.0 76.4 23.6 0.0 82.7 17.3 0.0 75.3 24.7 0.0 14.0 65.3 20.7 10.7 65.3 24.0
99.2 0.8 0.0 83.6 16.4 0.0 92.6 7.4 0.0 76.7 23.3 0.0 16.0 72.0 12.0 9.3 71.3 19.3
100.0 0.0 0.0 67.3 32.7 0.0 76.0 23.3 0.7 64.7 34.6 0.7 3.3 66.0 30.7 2.0 62.0 36.0
Source: Marras et al. (1999), Ergonomics, Vol. 42, No. 7, pp. 980–996. Note: Spine loads estimated by an EMG-assisted model.
608
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Table 4 Spine Forces (Means and Standard Deviations for Lateral Shear, Anterior–Posterior Shear, and Compression) as a Function of the Number of Hands Used, Number of Feet Supporting Body During Lift, Region of Pallet and Height of Bin When Lifting Items from an Industrial Bin Independent variables Hand Feet Region of bin
Bin height
Condition
Lateral shear force
one hand two hand one foot two feet upper front upper back lower front lower back 94 cm 61 cm
472.2 (350.5)* 233.8 (216.9)* 401.7 (335.1)* 304.3 (285.1)* 260.2 (271.7)a 317.0 (290.8)a 414.4 (335.0)b 420.4 (329.0)b 361.9 (328) 344.1 (301)
Anterior–posterior shear force 1093.3 (854.7) 1136.9 (964.1) 1109.4 (856.1) 1120.8 (963.3) 616.6 (311.1)a 738.0 (500.0)a 1498.3 (1037.8)b 1607.5 (1058.4)b 1089.9 (800.8) 1140.3 (1009.1)
Compression force 6033.6 (2981.2) 5742.3 (1712.3) 6138.6 (2957.5)* 5637.3 (2717.9)* 3765.7 (1452.8)a 5418.1 (2364.2)b 6839.8 (2765.4)c 7528.2 (2978.4)d 5795.8 (2660.4) 5980.2 (3027.4)
Source: From Marras, Davis, Kirking, & Bertsche (1999). ∗ Significant difference at x = 0.05 Region has four experimental conditions; superscript letters indicate which regions were significantly different from one another at x = 0.05.
presence of handles on spine compression (benchmark). The analysis indicates that all three factors influence the loading on the spine. Another study indicated the trade-offs between spine compressive and shear loads as a function of the number of hands used by the worker during the lift, whether both feet were in contact with the ground, lift origin, and height of a bin from which objects were lifted (Ferguson et al., 2002) (Table 4). Other studies have evaluated spine-loading trade-offs associated with team lifting (Marras, Davis, Kirking, & Granata, 1999), patient lifting (Table 5) (Wang, 1999), the assessment of lifting Table 5
belts (Granata et al., 1997; Jorgensen & Marras, 2000; Marras, Jorgensen, et al., 2000; McGill et al., 1990, 1994), and the use of lifting assistance devices (Marras et al., 1996). Efforts have also endeavored to translate these in-depth laboratory studies for use in the field through the use of regression models of workplace characteristics (Fathallah et al., 1999; McGill et al., 1996). In addition, biologically assisted models have been used to assess the role of psychosocial factors, personality, and mental processing on spine loading (Davis et al., 2002; Marras, Davis, et al., 2000).
Spine Loads Estimated During Patient Transfer as a Function of the Number of Lifters and Transfer Technique
Transfer task Lateral shear forces (N) Lower to wheelchair without an arm from bed Lower to bed from wheelchair without an arm Lower to wheelchair from bed Lower to bed from wheelchair Lower to commode chair from hospital chair Lower to hospital chair from commode chair Anterior-posterior shear forces (N) Lower to wheelchair without an arm from bed Lower to bed from wheelchair without an arm Lower to wheelchair from bed Lower to bed from wheelchair Lower to commode chair from hospital chair Lower to hospital chair from commode chair Compression forces (N) Lower to wheelchair without an arm from bed Lower to bed from wheelchair without an arm Lower to wheelchair from bed Lower to bed from wheelchair Lower to commode chair from hospital chair Lower to hospital chair from commode chair Source: Marras, Davis, Kirking, & Bertsche (1999). Note: Superscript letters indicate significant difference at p = 0.05.
One-person transfers 1176.8 (891.0)d 1256.2 (778.8)d 1066.8 (490.0)d 1017.0 (370.9)d 1146.8 (587.5)d 1104.1 (526.6)d
Two-person transfers 754.0 (144.9)ab 908.6 (589.4)c 639.1 (351.6)a 942.5 (508.3)c 833.4 (507.3)b 834.1 (425.6)b
1031.8 (681.7)a 1089.7 (615.6)a 1180.8 (716.7)a 1108.7 (544.5)a 1137.1 (587.5)a 1122.0 (536.0)a
986.8 (496.8)a 1032.9 (472.1)a 1020.7 (503.4)a 1049.4 (511.4)a 1018.4 (544.9)a 982.6 (484.6)a
5895.4 (1998.1)d 6457.2 (1930.6)e 5424.0 (2133.8)c 5744.0 (1728.5)cd 6062.3 (1669.7)d 6464.7 (1698.0)E
4483.2 (1661.7)b 4663.3 (1719.2)b 4245.2 (1378.7)a 4630.7 (1656.2)b 4645.7 (1450.8)b 4630.6 (1621.4)b
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
In an effort to eliminate the need for biological measures (EMG) to assess muscle coactivity and subsequent spine loading, several studies have attempted to use stability as the criterion to govern detailed biologically assisted biomechanical models of the torso (Cholewicki & McGill, 1996; Cholewicki, Simons, et al., 2000; Cholewicki & Van Vliet, 2002; Granata & Marras, 2000; Granata & Orishimo, 2001; Granata & Wilson, 2001; Panjabi, 1992a, 1992b; Solomonow et al., 1999). This is thought to be important because a potential injury pathway for LBDs suggests that the unnatural rotation of a single spine segment may create loads on passive tissue or other muscle tissue that results in irritation or injury (McGill, 2002). However, nearly all of the work performed in this area to date has been directed toward static response of the trunk or sudden loading responses (Cholewicki, Polzhofer, et al., 2000; Cholewicki, Simons, et al., 2000; Cholewicki & Van Vliet, 2002; Granata & Orishimo, 2001; Granata et al., 2001; Granata & Wilson, 2001). Thus, these assessments may have limited value for the assessment of the most common workplace risk factors for LBP. 6 ASSESSMENT METHODS AND IDENTIFICATION OF LOW-BACK DISORDER RISK AT WORK The logic associated with various risk assessment approaches has been described in previous sections. These approaches have been used to develop a rich body of literature that describes spine loading and subsequent risk in response to various work-related factors that are common to workplaces (e.g., one-hand vs. two-hand lifting). These studies can be used as a guide for the proper design of many work situations. However, there is still a need to assess unique work situations that may not have been assessed in these in-depth laboratory studies. High-fidelity spine-loading assessment techniques (e.g., EMG-assisted models) may not be practical for the assessment of some work situations since they require extensive instrumentation and typically require the task to be simulated in a laboratory environment. Therefore, tools with less precision and accuracy may be necessary to estimate risk to the spine at work. This section reviews the methods and tools available for such assessments along with a review of the literature that supports their usage. 6.1 3DSSPP The three-dimensional static strength prediction program (3DSSPP) has been available for quite some time. The logic associated with this approach was described previously. The computer program considers the load–tolerance relationship from both the spine compression and joint strength aspects. Spine compression is estimated with a linked segment–single equivalent muscle model and compared to the NIOSHestablished compression tolerance limit of 3400 N. Strength tolerance is assessed by estimating the joint load imposed by a task on six joints and comparing these loads to a population-based static strength database. This strength relationship has been defined as a lifting strength rating (LSR) and has been used to assess low-back injuries in industrial environments (Chaffin & Park, 1973). The LSR is defined as the weight of the maximum load lifted on the job divided by the lifting strength. The assessment concluded that “the incidence rate of low back pain [was] correlated (monotonically) with higher lifting strength requirements as determined by assessment of both the location and magnitude of the load lifted.” This was one of the first studies to emphasize the importance of load moment exposure (importance of load location relative to the body in addition to load weight) when assessing risk. The study also found that exposure to moderate lifting frequencies appeared to be protective, whereas high or low lift rates were associated with jobs linked to greater reports of back injury.
609
One study used both the LSR and estimates of backcompression forces to observe job risk over a three-year period in five large industrial plants where 2934 material handling tasks were evaluated (Herrin, Jaraiedi, & Anderson, 1986). The findings indicated a positive association between the lifting strength ratio and back pain incidence rates. This study also found that musculoskeletal injuries were twice as likely when spine compression forces exceeded 6800 N. However, this relationship did not hold for low-back-specific incident reports. This study indicated that injury risk prediction was best associated with the most stressful tasks (as opposed to indices that represent risk aggregation). 6.2 Job Demand Index Ayoub developed the concept of a job severity index (JSI), which is somewhat similar to the LSR (Ayoub et al., 1978). The JSI is defined as the ratio of the job demands relative to the lifting capacities of the worker. Job demands include the variables of object weight (lifted), the frequency of lifting, exposure time, and lifting task origins and destinations. A comprehensive task analysis is necessary to assess the job demands in this context. Worker capacity includes the strength as well as the body size of the worker where strength is determined via psychophysical testing (as discussed earlier). Liles and associates (1984) performed a prospective study using the JSI and identified a threshold of a job demand relative to worker strength above which the risk of low-back injury increased. These authors suggest that this method could identify the high risk (costly) jobs. 6.3 NIOSH Lifting Guide and Revised Lifting Equation The NIOSH has developed two lift assessment tools to help those in industry assess the risk associated with materials handling. The objective of these tools was to “prevent or reduce the occurrence of lifting-related low back pain among workers” (Waters et al., 1993). These assessments considered biomechanical, physiological, and psychophysical limits as criteria for assessing task risk. The original tool was a guide to help define safe lifting limits based upon biomechanical, physiological, and psychophysical tolerance limits (NIOSH, 1981). This method requires the evaluator to assess workplace characteristics. Based upon these work characteristics, the guide estimates that the magnitude of the load that must be lifted for spine compression reaches 3400 N (the action limit (AL)) and 6400 N (the maximum permissible limit (MPL)). From a biomechanical standpoint, the AL was defined as the spine compression limit at which damage begins to occur in the spine in a large portion of the population. Based upon this logic, “safe” work tasks should be designed so that the load lifted by the worker is below the calculated AL limit. The AL is calculated through a functional equation that considers four discounting functions multiplied by a constant. The constant (90 lb, or 40 kg) is assumed to be the magnitude of the weight that, when lifted under ideal lifting conditions, would result in a spine compression of 3400 N. The four workplace-based discounting factors are: (1) horizontal distance of the load from the spine; (2) the vertical height of the load off the floor; (3) the vertical travel distance of the load; and (4) the frequency of lifting. These factors are governed by functional relationships that reduce the magnitude of the allowable load (constant) proportionally according to their contribution to increases in spine compression. The MPL is determined by simply multiplying the AL by a factor of 3. If the load lifted by the worker exceeds the MPL, it is assumed that more than 50% of the workers would be at risk of damaging the disc. Under these conditions engineering controls would be required. If the load lifted is between the AL and the MPL values, then the task is
610
DESIGN FOR HEALTH, SAFETY, AND COMFORT
assumed to place less than half the workers at risk. In this case, either engineering or administrative controls were permitted. If the load lifted is less than the AL, the task is considered safe. This guide was designed primarily for sagittally symmetric lifts that were slow (no appreciable acceleration) and smooth. Only one independent assessment of the guide’s effectiveness could be found in the literature (Marras, Fine, et al., 1999). When predictions of risk were compared with historical data of industrial back injury reporting, this evaluation indicated an odds ratio of 3.5 with good specificity and low sensitivity. The most recent version of a NIOSH lifting guideline is known as the “revised NIOSH lifting equation” (Waters et al., 1993). The revised equation was developed with the intent of also including asymmetric lifting tasks as well as tasks with different types of coupling (handles) features in the assessment. The functional structure of the revised equation is similar in form to the 1981 guide in that it includes a load constant that mediates several work-characteristic “multipliers.” However, several differences are apparent between these two guides. First, the revised equation yields a recommended weight limit (RWL) (instead of an AL or MPL). If the magnitude of the load lifted by the worker is below the RWL, the load is considered safe. Second, the functional equation’s load constant is reduced to 23 kg (51 lb) (from the 40 kg, or 90 lb, in the 1981 guide). Third, the functional relationship between the equation multipliers and the workplace factors is changed. Functionally, these relationships are slightly more liberal for the four factors in order to account for the lower value of the load constant. Fourth, two additional multipliers are included to account for task asymmetry and coupling. Once the RWL is calculated for a given workplace configuration, it is compared (as a denominator) to the load lifted by the worker to yield a lifting index (LI). If the LI is less than unity, the job is considered safe. If the LI is greater than 1, then risk is associated with the task. LI values above 3.0 are thought to place many of the workers at an increased risk of LBP (Waters et al., 1994). The equations that govern both the 1981 and 1993 versions of this guide are described in Chapter 12 of this Handbook. Two effectiveness studies have evaluated the revised equation. The first evaluation compared the ability of the revised equation to identify high- and low-risk jobs based upon a historical LBP reporting in industry (Marras, Granata, & Davis, 1999). This evaluation reported an overall odds ratio of 3.1. In-depth
analyses indicated higher sensitivity than the 1981 guide but lower specificity. A second study assessed odds ratios as a function of the LI magnitude (Waters et al., 1999). LIs between 1 and 3 yielded odds ratios ranging from 1.54 to 2.45, suggesting increasing risk with increasing LIs. Conversely, when the LIs were over 3, the odds ratio was lower (odds ratio of 1.63), indicating a nonmonotonic relationship between the LI and risk. Other studies also reported that a LI or Composite Lifting Index (CLI) greater than 1.0 is associated with increased LBP risk (Lavender et al., 1999; Y. L. Wang, 1999; Waters et al., 2011). 6.4 NIOSH Lifting Index and Risk of Exposure to Low-Back Disorders Recently, Nwafor (2020) developed a practical tool for predicting low back pain based on the UAW-Ford Ergonomic Assessment Tool’s empirical data. The Maximum Lifting Index (MaxLI), as defined by the NIOSH Revised NIOSH Lifting Equation (Waters et al., 1994; Waters et al., 1999; Waters et al., 2011), was used to predict the experienced low back pain (eLBP) when performing manual material handling (MMH) tasks. The applied empirical data included the distribution of cases and non-cases for MMH, i.e., whether the job involved weights over 4 pounds, as well as standing or walking for >6 hours, and length of shift (8 hours and 10 hours) were included in the logistic regression models to test the relationship between the combination of maximum lifting index (MaxLI), manual material handling (MMH) and experienced low back pain (eLB). The following two models were useful in predicting eLBP at work due to manual material handling tasks: Logit (eLB) = α + β1MaxLI + β2MMH Logit (eLB) = α + β1MaxLI The results showed that the MaxLI was a reasonable predictor of low-back pain. Also, MMH identified cases of workers with LBP with a very high sensitivity of 0.96 and a specificity of 0.79. The cutoff point for MaxLI of 1.3 was used to determine if a task puts workers at an increased risk of developing LBP, with a sensitivity at 0.96 and specificity at 0.89, however, sensitivity decreased significantly when the LI increased over 1.5 (see Figure 9).
1.0
Sensitivity / Specificity
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
Sensitivity
0.1
Specificity
0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Maximum Lifting Index
Figure 9
Sensitivity and specificity vs. Maximum Lifting Index. Source: Nwafor (2020).
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
611
87.3 112.2 137.1 162.1 187.0 211.9 236.9 261.8 286.8
0
3.8
4.9
5.9
7.0
8.1
17.8
22.9
28.1
33.2
6.7
8.6
10.5
12.4
19.4
25.0
10
30.5
36.1
Average twisting velocity (deg/s)
10.3
11.4
12.4
38.4
43.4 48.5
53.6
58.7
14.3
16.2
20.0
21.9
Maximum sagittal flexion (deg)
58.3
63.9
Maximum lateral velocity (deg/s)
41.7
9.2
Lift rate (lifts/hr)
47.2
18.1 52.8
20 30 40 50 60 70 80 Probability of high-risk group membership
90
Maximum moment (ft-lb)
100
Figure 10 Lumbar motion monitor (LMM) risk model. The probability of high-risk (LBP) group membership is quantitatively indicated for a particular task for each of the five risk factors indicating how much exposure is too much exposure for a particular risk factor.
6.5 Video-Based Biomechanical Models Quantitative video-based assessments have been used to better understand the association of LBP risk with workplace factors. A study by Norman and colleagues (1998) employed a quasi-dynamic two-dimensional (2D) biomechanical model to evaluate cumulative biomechanical loading of the spine in 234 automotive assembly workers. The study identified four independent factors for LBP reporting. These factors consisted of integrated load moment (over a work shift), hand forces, peak shear force on the spine, and peak trunk velocity. The analysis found that workers exposed to the upper 25% of loading to all risk factors had about six times more risk of reporting back pain than those exposed to the lowest 25% of loading. 6.5.1 Lumbar Motion Monitor Risk Assessment The previously reviewed LBP risk assessment tools have not attempted to understand the role of motion in defining risk. We have known since the days of Sir Isaac Newton that force is a function of mass times acceleration. Hence, motion can have a very large influence on spine loading. Despite this, most of the available assessment tools represent work tasks as static or quasi-static in their assessments. The contribution of trunk dynamics combined with traditional workplace biomechanical factors to LBP risk has been assessed by Marras and colleagues (1993, 1995). These studies evaluated over 400 industrial jobs (along with documented LBD risk history) by observing 114 workplace and worker-related variables. Of the variables documented, load moment (load magnitude times distance of load from spine) exposure was identified as the single most powerful predictor of LBD reporting. The studies also identified 16 trunk kinematic variables that resulted in statistically significant odds ratios associated with risk of LBD reporting in the workplace. None of the individual kinematic variables were as strong a predictor as load moment. However, when load moment was considered in combination with three trunk kinematic variables (describing three dimensions of trunk motion) and an exposure frequency measure, a strong multiple logistic regression model that quantifies the risk of LBD risk (resulting from work design) was identified (odds ratio, O.R. = 10.7). This analysis indicated that risk was multidimensional in nature and that exposure to the combination of the five variables described LBP reporting well. This information was used to develop a functional risk model (Figure 10) that accounts for trade-offs between risk variables. As an example, a job that exposes a worker to a low magnitude of load moment can represent a high-risk situation if the other
four variables in the model are of sufficient magnitude. Thus, the model is able to assess the interactions or collective influence of the risk variables. This model has been validated via a prospective workplace intervention study (Marras, Allread, Burr, & Fathallah, 2000). The risk model has been designed to work with a lumbar motion monitor (LMM) (Figure 11) and a computer program to document trunk motion exposure on the job. When these conclusions are combined with the findings of epidemiological studies exploring the influence of nonneutral
Figure 11 LMM used to track trunk kinematics during occupational activities.
612
postures in the workplace (Punnett et al., 1991), a potential injury pathway is suggested. These studies indicate that as trunk posture becomes more extreme or the trunk motion becomes more rapid (during the performance of work) LBP reporting increases. From a biomechanical perspective, these results suggest that the occupational risk of LBDs is associated with mechanical loading of the spine and indicates that when tasks involve greater three-dimensional loading, the association with risk becomes much stronger. Fathallah et al. (1998) evaluated data from 126 jobs to assess the complex trunk motions of groups associated with varying degrees of LBP reporting. They found that groups with greater LBP reporting rates exhibited complex trunk motion patterns that consisted of high values of combined trunk velocities, especially at extreme sagittal flexion. In contrast, the low-risk groups did not exhibit these patterns. This suggests that elevated levels of complex simultaneous velocity patterns along with key workplace factors (load moment and frequency) are correlated with increased LBP risk. 6.5.2 Dynamic Moment Exposure in the Workplace Since earlier studies (Marras & Granata, 1995; Marras et al., 1993) have shown that exposure to load moment is one of the best indicators of LBP reporting, a recent effort has investigated exposure to dynamic moment exposure and its relationship to decrements in low-back function. An ultrasound-based measurement device (Figure 12) was used to monitor dynamic load moment exposure in distribution center workers over extended periods throughout the workday (Marras, Lavender, et al., 2010). This effort was able to precisely document the range
Figure 12 Moment monitor used to precisely measure exposure to dynamic load moments on the job.
DESIGN FOR HEALTH, SAFETY, AND COMFORT
of dynamic moment exposure associated with different types of work (Marras et al., 2010a). Assessment of these exposures relative to low-back function decrement risk indicated that lateral velocity along with dynamic moment exposure and timing of the peak load exposure allowed the identification of job characteristics leading to low-back dysfunction with excellent sensitivity and specificity (Marras et al., 2010b). This study demonstrates that with proper quantification of realistic (dynamic) task exposures and an appreciation for risk factor interactions, one can indeed identify the characteristics of jobs that lead to LBDs. 6.5.3 Workplace Assessment Summary There are many reviews of the literature that have not been able to identify relationships between LBP and work factors; none of these studies have assessed quantitative studies of workplace exposure. Only with quantitative measures of the workplace can one assess “how much exposure is too much exposure.” The studies described in this chapter are insightful in that, even though some of these studies have not evaluated spinal loading directly, the exposure measures included can be considered indirect indicators of spinal load. Collectively, these studies suggest that as the risk factors increase in magnitude, the risk increases monotonically. While load location and strength limits both appear to be indicators of the risk to the spine, other exposure metrics (load location, kinematics, and three-dimensional analyses) are important from a biomechanical standpoint because they influence the ability of the trunk’s internal structures to support the external load. Therefore, as these measures change, they can change the nature of the loading on the back’s tissues. These studies indicate that when biomechanically meaningful assessments are collected in the workplace, associations between physical factors and risk of LBD reporting are apparent. Several common features of biomechanical risk can be identified from these studies. First, increasingly accurate LBP risk can be identified in the workplace when the specific load magnitude and location relative to the body (load moment) are quantified. Second, studies have demonstrated that increased reporting of LBP can be characterized well when the trunk’s three-dimensional kinematic demands due to work are described. Finally, these assessments have shown that LBP risk is multidimensional. There appears to be a synergy among risk factors that is often associated with increased reporting of LBP. Many studies have also suggested that some of these relationships are nonmonotonic. In summary, these efforts have suggested that the better the exposure characteristics are quantified in terms of biomechanical demand, the better one can assess the association with risk. As discussed above, to assess the biomechanical demands associated with workplace dynamics and the risk of LBD, Marras et al. (1993, 1995; Marras, Ferguson, et al., 1999) developed a trunk goniometer (lumbar motion monitor, or LMM). The LMM has been extensively used over the last 25 years to accurately measure the trunk motion patterns of high LBD risk job workers, and compared this to trunk motions and workplace conditions associated with low-risk jobs. Marras, Allread, et al. (2000) also developed a multiple logistic regression model to assess task exposure that predicts the probability of high-risk group membership. The LMM model takes into consideration the following five risk factors: (1) frequency of lifting; (2) load moment (load weight multiplied by the distance of the load from the spine); (3) average twisting velocity (measured by the LMM); (4) maximum sagittal flexion angle through the job cycle (measured by the LMM); and (5) maximum lateral velocity (measured by the LMM). The LMM risk assessment model can assess the risk associated with three-dimensional trunk motion on the job with a high degree of predictability (odds ratio 10.7) compared to
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
previous attempts to assess work-related LBD risk. The LMM model provides information about risk that would take years to derive from historical accounts of incidence rates that are prospectively validated (Marras, Ferguson, et al., 1999; Marras, Allread, et al., 2000), including assessment of loading at the lumbar spine during pushing and pulling tasks (Knapik & Marras, 2009). Furthermore, Ferguson, Marras, and Burr (2004) also showed that trunk kinematics and subsequent risk estimates are dictated primarily by job design and not influenced by the worker’s low back health status.
7 PRACTICAL INDUSTRY GUIDELINES 7.1 ACGIH Guidelines: Lifting Threshold Limit Values (LTVs) The American Conference of Governmental Industrial Hygienists (ACGIH) introduced Lifting Threshold Limit Values (TLVs) as a means of controlling the biomechanical risk of a low back injury in the workplace (ACGIH, 2009). As discussed by Marras and Hamrick (2006), the ACGIH Lifting TLV assessment method (ACGIH, 2009) consists of three tables that consider the horizontal and vertical location of the object to be lifted at the origin, task repetition, and duration of the lifting. Specifically, TLVs define lifting weight limits as a function of lift origin “zones” and repetitions associated with occupational tasks (see Figure 13). Lift origin zones are defined by the lift height off the ground and lift distance from the spine associated with the lift origin. Twelve zones have been defined that relate to lifts within +/−30∘ of asymmetry from the sagittal plane. These zones are represented in a series of figures with different lift frequencies
613
and time exposures. Within each zone, weight-lifting limits are specified based upon the best information available from several sources, which include (1) EMG-assisted biomechanical models; (2) the 1993 revised NIOSH lifting equation; and (3) the historical risk data associated with the LMM database. The weight lifted by the worker is compared to these limits. According to ACGIH (2009), weights exceeding the zone limit are considered hazards (Splittstoesser, O’Farrell, et al., 2017). Russell et al. (2007) compared the results of the NIOSH, ACGIH TLV, Snook’s data, 3DSSPP, and WAL&I lifting assessment instruments when applied to a uniform lifting and lowering tasks. The results showed that NIOSH, ACGIH TLV, and Snook instruments provided similar results when assessing musculoskeletal exposures associated with a lifting job. Furthermore, while the ACGIH TLV, Snook, and WA L&l methods were simpler to use, the NIOSH method offered a greater range of interpretive capabilities to determine what aspects of the lift would most benefit from task redesign. Amick, Zarzar, and Jorgensen (2011) concluded that the magnitude of the LBD risk as a function of the TLV magnitude suggests that the TLVs represent a low risk for LBDs. They also proposed that the ACGIH TLV lift assessment method may be used to assess the destination of a lift in addition to the origin as designed. The recommended procedure to determine the Lifting Threshold Limit Values (TLVs) includes the following steps (ACGIH, 2009; Marras & Hamrick, 2006; Zarzar, 2010): 1. 2.
Determine task duration and lifting frequency of the task. Select the proper TLV table (see Tables 6–9).
Horizontal
Vertical
0 cm
30 cm
60 cm
80 cm
Top of Upper Reach Limit (Shoulder Height + 30 cm)
Shoulder Height Bottom of Upper Reach Limit (Shoulder Height – 3 cm)
Knuckle Height
Mid-shin Height (Half-way between Ankle and Knee) Floor Figure 13 Graphic representation of vertical and horizontal zones. Source: Adapted from ACGIH (2009) TLVs; Marras & Hamrick (2006); Russell et al. (2007).
614
DESIGN FOR HEALTH, SAFETY, AND COMFORT
Table 6
A Guide to Selecting ACGIH Lifting TLV Table Duration of task per day ≤2 h Table 7.2
Lifts per hour ≤60 ≤12 >12 and ≤30 >60 and ≤360 >30 and ≤360
>2 h Table 7.2
the shaded regions, professional judgment may be used to determine if infrequent lifts of light weight may be safe. E. The anatomical landmark for knuckle height assumes the worker is standing erect with arms hanging at the sides.
Table 7.3 Table 7.3 Table 7.4
Table 8
ACGIH Lifting Table: Moderately Frequent Lifting.
Source: Marras & Hamrick (2006).
Table 7
Horizontal zoneA
ACGIH Lifting Table: TLVs® for Infrequent Lifting Horizontal zoneA
Vertical zone Reach limitC or 30 cm above shoulder to 8 cm below shoulder height Knuckle heightE to below shoulder Middle shin to knuckle heightE Floor to middle shin height
Close: 60 to 80 cm No known safe limit for repetitive liftingD
9 kg
18 kg
14 kg
7 kg
14 kg
No known safe limit for repetitive liftingD
No known safe limit for repetitive liftingD
Source: Marras & Hamrick (2006). Note: ≤2 Hours per Day with ≤60 Lifts per Hour OR ≥2 Hours per Day with ≤12 Lifts per Hour.
Vertical zone Reach limitC or 30 cm above shoulder to 8 cm below shoulder height Knuckle heightE to below shoulder Middle shin to knuckle heightE Floor to middle shin height
Close: 60 to 80 cm
14 kg
5 kg
No known safe limit for repetitive liftingD
27 kg
14 kg
7 kg
16 kg
11 kg
5 kg
No known safe limit for repetitive liftingD
No known safe limit for repetitive liftingD
9 kg
Source: Marras & Hamrick (2006). Note: >2 Hours per Day with >12 and ≤30 Lifts per Hour OR ≤2 Hours per Day with >60 and ≤360 Lifts per Hour.
Table 9 Lifting.
ACGIH Lifting Table: Frequent, Long Duration Horizontal zoneA
3. Identify the lifting zone height according to the hand’s initial position and the lift’s horizontal location (midpoint between the hands compared to the midpoint between the ankles); see Figure 13. 4. Determine the corresponding zone, then compare the lifted weight against the maximum recommended TLV and report the findings. If the lifted weight exceeds the TLV, an ergonomic intervention should be suggested and implemented such that the weight is less than the TLV. The following notation applies to Tables 7–9: A. Distance from the midpoint between inner ankle bones and the load. B. Lifting tasks should not start or end at the horizontal reach distance of more than 80 cm from the midpoint between the inner ankle bones (see Figure 13). C. Routine lifting tasks should not start or end at heights that are greater than 30 cm above the shoulder or more than 180 cm above floor level (see Figure 13). D. Routine lifting tasks should not be performed for shaded table entries marked “No known safe limit for repetitive lifting.” While the available evidence does not permit the identification of safe weight limits in
Vertical zone Reach limitC or 30 cm above shoulder to 8 cm below shoulder height Knuckle heightE to Below shoulder
Middle shin to knuckle heightE Floor to middle shin height
Close: 60 to 80 cm
11 kg
No known safe limit for repetitive liftingD
No known safe limit for repetitive liftingD
14 kg
9 kg
5 kg
9 kg
7 kg
2 kg
No known safe limit for repetitive liftingD
No known safe limit for repetitive liftingD
No known safe limit for repetitive liftingD
Source: Marras & Hamrick (2006). Note: >2 Hours per Day with >30 and ≤360 Lifts per Hour.
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
7.2 Guidelines for One-Handed and Two-Handed Manual Lifting One important practical consideration for task design in industry to protect the lower back is one-handed versus two-handed lifting. An early study by Allread, Marras, and Parnianpour (1996) examined trunk kinematic differences between lifts performed using either one hand (unsupported) or two hands. Results showed that one-handed lifting resulted in significantly higher motion ranges in the lateral and transverse planes and greater flexion in the sagittal plane. Trunk kinematics (range of motion, velocity, and acceleration) was also much greater with increasingly asymmetric load positions. It was concluded that Table 10 New Asymmetric Multiplier for NIOSH Lifting Equation. Task angle (deg)
Trunk twist Step-turn
Middle twist
Linear
Polynomial
1.00 0.95 0.91 0.89 0.88 0.87 0.86 0.86 0.86 0.85 0.85 0.85 0.85
1.00 0.99 0.99 0.99 0.98 0.97 0.96 0.94 0.93 0.91 0.89 0.86 0.84
1.00 0.97 0.95 0.92 0.89 0.86 0.84 0.81 0.78 0.76 – – –
1.00 0.99 0.98 0.96 0.93 0.89 0.84 0.78 0.71 0.64 – – –
0 15 30 45 60 75 90 105 120 135 150 165 180
Source: Han, Stobbe & Hobbs (2005).
615
back motion characteristics associated with LBDs were significantly higher for one-handed lifts. In a related study, Marras and Davis (1998) examined three-dimensional spinal loading associated with asymmetric lifting while using either one or two hands. Lifts occurred at 0, 30, or 60 degrees from the sagittal plane on both sides of the body. The results showed that compression and lateral shear forces increased as the lift origin became more asymmetric, and by about twice the rate when lifting from origins to the left of the sagittal plane compared to lifting from origins to the right of the sagittal plane. It was also noted that when lifts were performed beyond 60 degrees of asymmetric origin to the right of the sagittal plane, the relationship between spinal loading and asymmetric origin became nonlinear, indicating much greater costs of lifting from origins beyond 60 degrees of the sagittal plane. Overall, one-handed lifts, using the hand on the same side of the body as the load, resulted in compression forces that were approximately equal to those observed when lifting with two hands in a sagittally symmetric position. Han, Stobbe, and Hobbs (2005) investigated the effect of asymmetry on a person’s lifting capacity using the psychophysical approach for three types of asymmetric lifting tasks: step-turn; middle twist; and twist. They pointed out that while many lifting tasks require task angles between 135 and 180 degrees, under the NIOHS Lifting Equation, the asymmetry multiplier (AM) for step-turn at over 135 degrees is 0, leading to a recommended weight limit (RWL) of 0. The proposed new asymmetric multiplier for NIOSH Lifting Equation is shown in Table 10. Recently, Weston, Aurand, et al. (2020) examined lumbar spine loading for one versus two-handed handling and presented recommendations for one-handed load handling limits that protect the lower back. The recommended weight limits that represent the low-medium and medium-high risk transition values are depicted in Table 11. Most medium and high-risk
Table 11 Recommended Weight Limits Protective of the Lower Back for Each One-Handed Lifting Condition, Assuming Lift Frequency to Be Below 255 lifts/h Height
Asymmetry (deg)
Ankle (rv15 cm) 0 45 90 Knee (rv50 cm) 0 45 90 Waist (rv93 cm) 0 45 90 Source: Weston, Aurand, et al. (2020).
Distance
Recommended weight limit (kg) Low risk
Medium risk
High risk
Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm)
Unacceptable Unacceptable Unacceptable Unacceptable Unacceptable Unacceptable
0–2.2 kg 0–1.0 kg 0–3.8 kg 0–2.2 kg 0–5.4 kg 0–3.3 kg
2.3 kg or more 1.1 kg or more 3.9 kg or more 2.3 kg or more 5.5 kg or more 3.4 kg or more
Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm)
0–8.0 kg 0–4.9 kg 0–8.0 kg 0–4.9 kg 0–8.0 kg 0–4.9 kg
8.1–8.5 kg 5.0–6.0 kg 8.1–9.8 kg 5.0–7.0 kg 8.1–11.0 kg 5.0–8.0 kg
8.6 kg or more 6.1 kg or more 9.9 kg or more 7.1 kg or more 11.1 kg or more 8.1 kg or more
Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm) Close (40 cm) Far (70 cm)
0–9.3 kg 0–9.3 kg 0–9.3 kg 0–9.3 kg 0–9.3 kg 0–9.3 kg
Up to 11.3 kg 9.4–10.5 kg Up to 11.3 kg Up to 11.3 kg Up to 11.3 kg Up to 11.3 kg
– 10.6 kg or more – – – –
616
DESIGN FOR HEALTH, SAFETY, AND COMFORT
assessments were due to the level of shear loading or combinations of compression and shear loading. It was noted that physical strength was also considered in one-handed lifting and lowering at waist height regardless of lift asymmetry or horizontal distance, leading to a medium or high-risk classification. Furthermore, it was also pointed out that for exertions with a lifting frequency greater than 255 lifts/h, practitioners should regard the low-risk column as medium-risk and the medium-risk column as high-risk. Weston, Aurand, et al. (2020) also observed that one-handed lifting/lowering might be preferable to corresponding twohanded exertions in terms of spinal loading. However, they also pointed out that in addition to external lifting characteristics including lift origin/destination height and asymmetry, load mass, and horizontal reach distance, the choice to use one versus two hands for a particular lifting or lowering exertion should be considered alongside the worker’s strength capabilities. This is because, in general, workers are less capable of lifting with one hand from waist height. As illustrated in Table 12, the proposed recommended weight limits are more restrictive than frequency-independent weight limits calculated by the Revised NIOSH Lifting Equation (RNLE) for corresponding two-handed lifting/lowering exertions. Finally, the authors caution that the proposed risk model presented is population-based and should not be used to make inferences about individual risk for injury. 7.3 Guidelines for Manual Pushing and Pulling Tasks The nature of many materials handling tasks performed in the industry today is shifting from lifting to pushing and pulling activities (Theado, Knapik & Marras, 2007; Knapik & Marras, 2009). Therefore, the assessments of push–pull tasks
Table 12
Best Comparison of One-Handed Weight Limits to Existent Two-Handed Lifting Guidelines
Height (cm) Height (cm) 15
Asymmetry (deg)
Horizontal distance
One-hand limit (medhigh risk transition point) (kg)
0
Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm) Close (40 cm) Far (63 cm)
2.2 1.2 3.9 2.5 5.5 3.7 8.6 6.5 9.9 7.6 11.1 8.6 11.3 11.0 11.3 11.3 11.3 11.3
45 90 50
0 45 90
93
are essential for preventing LBP and LBDs. Knapik and Marras (2009) demonstrated that pulling induced greater spine compressive loads than pushing, whereas the reverse was true for shear loads at the different lumbar levels. They recommended that pushing and pulling loads equivalent to 20% of body weight should be the limit of acceptable exertions. They also noted that pulling at low and medium handle heights (50% and 65% of stature) minimized A/P shear forces. In a companion study, Marras, Knapik, and Ferguson (2009) examined the impact of load magnitude, pushing speed, required control, and handle height on spine loads while pushing both carts and overhead suspended loads in industrial settings. A biologically-assisted biomechanical model was used to assess compression, anterior/posterior shear, and lateral shear forces. The results showed that anterior/posterior shear loads were greatest at the upper levels of the lumbar spine and that all experimental factors influenced anterior/posterior shear to varying degrees except for the nature of the load (cart vs. suspended). The study concluded that pushing and pulling are complex biomechanical tasks with spine shear forces due to the coactivation of trunk muscle activities and spine orientations that are influenced by several occupational factors. Therefore, low back pain rates in some work environments associated with lifting may not be reduced when changing the task from lifting to pushing. Garg et al. (2014) reported the psychophysical limits for maximum pushing and pulling forces. Table 13 shows the initial and sustained pushing and pulling forces acceptable to 75% of male and female workers that can be used to design and analyze pushing and pulling tasks common in the industry. Weston, Alizadeh, et al. (2018) reported biomechanically determined hand force limits to protect the low back during occupational pushing and pulling tasks. The developed limits
0 45 90
Revised NIOSH lifting equation (RNLE) frequency independent RWL (kg)
Ratio one/two handed)
10 6.3 8.6 5.4 7.1 4.5 11.7 7.4 10.1 6.4 8.3 5.3 13.5 8.6 11.6 7.4 9.6 6.1
0.22 0.19 0.45 0.46 0.77 0.82 0.74 0.88 0.98 1.19 1.34 1.62 0.84 1.28 0.97 1.53 1.18 1.85
Source: Weston, Aurand, et al. (2020). Notes: One-handed limits are updated to reflect a ‘far’ horizontal distance of 63 cm (as opposed to 70 cm) for a more direct comparison. The Revised NIOSH Lifting Equation limits assume good coupling and a lift duration of 1 hour.
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE
617
Table 13 Combinations of Distance and Frequency for Maximum Acceptable Sustained Push/Pull Forces (Snook & Ciriello, 1991) Acceptable to 75% of Workers that Exceed 8-H Physiological Criteria (0.7 L/ Min for Females and 1.0 L/Min for Males) Frequency (one exertion every) Gender
Distance (m)
Females Females Females Females Females Females Males Males Males Males Males Males
2.1 7.6 15.2 30.5 45.7 61.0 2.1 7.6 15.2 30.5 45.7 61.0
Push
Pull
6 s, 12 s 15 s, 22 s 25 s, 35 s,1 m 1 m, 2 m 1 m, 2 m 2m 6s 15 s. 22 s 25 s, 35 s 1m, 1m 1 m, 2m 2m
6 s, 12 s 15 s, 22 s 25 s, 35 s, 1 m 1 m, 2 m 1 m, 2 m 2m 6s 15 s, 22 sa 25 s, 35 s, 1 mb 1 m, 2 ma 2 ma
Source: Adapted from Garg et al. (2014). a Exceeds 8-h physiological criteria (1.0 1/min for males) for 64 and 95 cm handle heights. b Exceeds 8-h physiological criteria (1.0 1/min for males) for 64 cm handle height only.
are based on the objective hand force limits determined through a biomechanical assessment of the lumbar spine forces. Mixed modeling techniques were applied to correlate spinal load with hand force or torque throughout a wide range of exposures. Exertion type, exertion direction, handle height, and interactions significantly influenced the dependent spinal load measures, hand force, and turning torque. The authors pointed out that these biomechanically determined guidelines are up to 30% lower than comparable psychophysically derived limits. They are also more protective for straight pushing. It was recommended that industrial practitioners consider implementing these guidelines in risk assessment and workplace design occupational pushing and pulling tasks. Table 14 provides the recommended push/pull risk limits.
Table 14 Most Conservative Risk Limits for Straight and Turning Push/Pull Exertions Protective of Various Percentages of the Population Tested
Exertion 1 Hand Pull
Straight Turning Percent exertions exertions population Height protected HF limit (N) Torque limit (Nm) 81.3 cm (32 in)
101.6 cm (40 in)
121.9 cm (48 in)
90 75 50 25 10 90 75 50 25 10 90 75 50 25 10
212 252 295 338 378 184 217 253 289 322 164 198 235 272 306
58 72 87 103 117 74 88 104 119 134 75 91 108 125 141
Table 14
(continued)
Exertion 2 Hand Pull
Straight Turning Percent exertions exertions population Height protected HF limit (N) Torque limit (Nm) 81.3 cm (32 in)
101.6 cm (40 in)
121.9 cm (48 in)
2 Hand Push 81.3 cm (32 in)
101.6 cm (40 in)
121.9 cm (48 in)
90 75 50 25 10 90 75 50 25 10 90 75 50 25 10 90 75 50 25 10 90 75 50 25 10 90 75 50 25 10
203 282 368 454 533 260 306 356 406 452 301 336 374 412 447 195 243 295 347 395 217 250 286 322 355 290 333 380 427 470
Source: Weston, Alizadeh, et al. (2018).
71 86 102 118 133 87 104 124 143 161 96 113 133 153 170 72 88 105 122 138 94 109 126 142 157 96 114 133 152 169
618 Table 15
DESIGN FOR HEALTH, SAFETY, AND COMFORT Comparison of Biomechanically Determined Push/Pull Limits to Psychophysically Determined Equivalents
Exertion Straight 2 Hand Push, 101.6 cm (40 in)
Straight 2 Hand Pull, 101.6 cm (40 in)
Biomechanically determined hand force limit (N)
Adjusted: biomechanically determined hand force limit (N)
Snook & Ciriello (1991) hand force limit (N)
Percent of population protected
Resultant hand force
Horizontal component of force
Horizontal component of force
Percent Change
90 75 50 25 10 90 75 50 25 10
217 250 286 322 355 260 306 356 406 452
217 250 286 322 355 245 288 335 382 424
239 300 371 437 503 240 285 341 391 442
−9.1 −16.6 −22.9 −26.3 −29.5 +1.9 +0.9 −1.9 −2.4 −4.0
Source: Weston, Alizadeh, et al. (2018).
In addition to the above, Table 15 provides useful information about the differences between the biomechanically determined guidelines and the widely used psychophysically derived pushing and pulling guidelines of Snook and Ciriello (1991). It should be noted that the Snook and Ciriello (1991) guidelines are based on the experimental data that was limited to less than 22 subjects in any given contributing study. The new guidelines by Weston, Alizadeh, et al. (2018) are based upon data obtained from 62 subjects who were exposed to all experimental conditions and who are more likely to be more representative of a normally distributed working population consisting of inexperienced and younger workers often employed in the industry to perform pushing and pulling tasks.
posed a significant risk to the lumbar spine (in terms of peak compressive, lateral shear, or resultant spinal loads) regardless of lifting style. From a low back loading perspective, kneeling lifting style was recommended as preferable to sitting when lifting in confined vertical spaces, such as airline baggage handling. It was noted that kneeling offered the greatest benefit because of the ability to keep the torso upright, leading to a reduction of shear forces on the lumbar spine. It was concluded that baggage handling tasks performed in confined vertical space pose a high risk of a low-back injury, regardless of lifting style. Therefore, engineering solutions such as cranes, hoists, or vacuum lifts should be adapted in both confined and unconfined spaces to mitigate the biomechanical risk of injury.
7.4 Guidelines for Lifting in Confined Spaces
7.5 Workplace Design Guidelines for Asymptomatic vs. Low-Back-Injured Workers
Working in unusual or restricted postures has been linked to physical limitations and musculoskeletal problems. Gallagher (2005) presented a literature review of epidemiological studies showing that work in unusual and restricted postures was associated with significantly higher rates of musculoskeletal complaints and reduced strength and lifting capacity, compared to workers not adopting these postures. It was postulated that if awkward postures cannot be eliminated in the workplace, jobs should be designed in accordance with the reduced strength and diminished lifting capabilities of workers working in such postures. Furthermore, Splittstoesser et al. (2007) noted that most lifting assessment tools could not easily be adapted to analyze jobs requiring restricted postures. They evaluated spinal loading during manual materials handling in kneeling postures. An EMG-driven biomechanical model, previously validated for upright lifting, was adapted for use in kneeling tasks. The developed regression equations were able to predict peak spine loading in terms of compression, AP, and lateral shear forces, using subject height, load weight, and destination height. Recently, Weston, Dufour, et al. (2020) investigated torso flexion and three-dimensional lumbar spinal loads as a function of working in a confined vertical space (kneeling, sitting) as well as an unconfined space (stooping). The examined lifting styles were consistent with baggage handling inside the compartment of an airplane. The results revealed that such lifting exertions
Marras et al. (2004) measured the spine loading patterns of patients with LBP performing symmetric and asymmetric lifting exertions compared with asymptomatic individuals performing the same tasks. An EMG-assisted model was used to evaluate spine loading during the lifting exertions. Differences in spine loading between the LBP and asymptomatic subjects were noted as a function of the experimental variables. The results showed that patients with LBP experienced greater spine compression and shear forces when performing lifting tasks compared with asymptomatic individuals. The study concluded that assessing physical work requirements can help identify the lifting conditions at work that exacerbate spine loading. It should also be noted that Behjati and Arjmand (2019) performed biomechanical assessment of the NIOSH Lifting Equation in asymmetric load handling activities using a detailed musculoskeletal model. They concluded that the lifting equation does not control for the large compressive and or shear LS/S1 forces that occur when performing manual handling tasks with a large load asymmetry near the floor level. Therefore, the NIOSH assessment tool should be used with caution for evaluating recommended weight limits for such extreme lifting tasks. Ferguson, Marras, and Burr (2005) developed workplace design guidelines for asymptomatic vs. low back-injured workers. The proposed three lifting guidelines are based on
MANAGING LOW-BACK DISORDER RISK IN THE WORKPLACE Table 16 Criteria Levels for Low-Risk, Medium-Risk, and High-Risk Lifting Conditions Compression Low risk
6400N
>I000 N
30∘ rapid lift motions from side to side one-handed lifting lifting while seated or kneeling high levels of heat and/or humidity lifting unstable or non-rigid objects (e.g., containers of liquid or sacks of powder) • poor hand-holds: e.g., lack of handles, cut-outs • unstable footing: e.g., on gravel or sand • during or immediately after exposure to whole-body vibration at or above the ACGIH TLV for Whole-Body Vibration
Table 3 High Frequency Lifting: >2 Hours/Day & = 0.6) usually indicates that a model is capturing the major changes of the data. • Root-mean-square (RMS). RMS is a term in statistics to evaluate the difference between the prediction from a model (x’) and the actual data (x). Given a series of
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
events (e.g., t1 , t2 , t3 , … tN ) with a total of N observations, we can obtain both predicted value from a model (x1 , x2 , x3 , … xN , total of N and data (x’1 , x’2 , x’3 , … x’N ). The RMS between the model’s prediction and the actual data can be calculated via Eq (1). {∑ } RMS = square root of [(xi − x’i )2 ] ∕N (1) The smaller the RMS, the smaller the difference between a model’s prediction and the actual data.
3 WHAT ARE THE KEY FEATURES OF MATHEMATICAL MODELS IN HUMAN FACTORS? 3.1 Theory-Driven Modeling vs. Data-Driven Modeling Approaches With the current development of artificial intelligence (AI) and data-driven approaches, it is important to understand the differences between (1) theory-driven (top-down) modeling approaches, including both mathematical modeling and simulation modeling, and (2) data-driven (bottom-up) approaches including both AI models (e.g., deep learning) and statistical modeling (e.g., linear regressions) (see Table 1). First, for the theory-driven approaches, psychological theories play a crucial role in formulating the structure of the models and are used extensively to specify the details of the models, either as mathematical equations in mathematical models or as production rules in simulation models implemented with computer codes. Many examples of theory-driven mathematical models and simulation models (some of which are also called symbolic
689
models) are described in later sections of this chapter. Second, for the data-driven approaches, extensive use of data to shape or train a model plays a crucial role. For example, artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The recent two decades of AI development (including artificial neural network model and deep learning AI approaches) are data-hungry and they are strongly data-driven. Several issues should be noted here. First, there is a main difference between the purposes of the two groups of approaches. Theory-driven (top-down) modeling approaches treat understanding the mechanisms of the human or human-machine systems as the top priority and then are concerned with mimicking human behavior or/and cognition as a result. In contrast, data-driven (bottom-up) approaches primarily focus on mimicking human behavior or even exceed human performance and they do not care too much whether the process (internal mechanisms) in generating these results is consistent with human cognition or not. Accordingly, some people call the group of top-down approaches strong AI (because you truly understand the system) while data-driven approached only are weak AI (mapping between input and output without knowing the details of what is inside the black box). As an analogy, researchers using the theory-driven (top-down) modeling approaches are pathologists or physiologists, who try to understand how the human body works, how a person gets sick, and then how certain medicine or a treatment works in the human body to treat certain diseases. While researchers in the data-driven (bottom-up) approaches are acting more like “quick remedy” testers whose main concern is patient recovery from taking a cure, not concerned about why a remedy might have worked or failed. As a result, researchers using theory-driven (top-down) modeling
Table 1 A Comparison between the Theory-Driven (Top-Down) Modeling Approaches and Data-Driven (Bottom-Up) Approaches Theory-driven modeling approaches (top down) Math models Simulation models
Data-driven modeling approaches (bottom-up) AI models Statistical models
Purpose
Achieve deep understanding how the system works, make predictions and mimic human
Mimic human or even better than human
Make predictions
Mechanism
Focus
Focus
Minor
Ignorable
Explicability
Very clear
Clear
Less clear
Less clear
Input
Theory, explaining other work, data2
Theory, explaining other work, data2
Large amount of data
Data
Output
Human behavior1
Human behavior
Human-like behavior
Human behavior
Transferability
Retraining with data is not required
Retraining with data is not required
Retraining with data
Retraining with data
0 data prediction
Can work
Can work
Not work
Not work
Modeling process
Math derivations
Production rules and simulation
Extensive training
Parameters adjusting with fixed structure
Format
Math equations
Codes (production rules), equations
Codes and equations
Fixed structure equations and codes
Dimensions
1 Usually 2 If
not at verbal level. there is a free parameter in the model.
690
approaches are able to explain how the results were generated step by step via equations (mathematical models) or line by line by codes (simulation models); however, it remains one of the greatest challenges for researchers in AI (especially in deep learning) to explain the weights, threshold, and connections of thousands of nodes in the complex neural network structures after AI models have been trained (called the reachability problems; Sankaranaarayanan, Dutta, & Mover, 2019). Second, the sources of a model’s input are very different in the two groups of modeling approaches. AI modeling approaches in the recent two decades have to use huge amounts of data to train the model mapping input and output. A small sample of data (e.g., 30 pairs of input and output data) is too small to train those GPU-powered models. A small sample of data will still work for statistical modeling approaches (e.g., linear or non-linear regressions) and they can still find the optimal value of parameters in the pre-structured equations with the best fitting between the input and the output. Theory-driven (top-down) modeling approaches use the existing theory or models to understand the system and then predict its behavior given the context information (e.g., the experimental conditions where the experimental data were obtained, but these are not the data to be modeled) as the primary model’s input. Due to the challenge of prediction of systems/human behavior without data, some modeling work does use a portion of data to train a part of the model (that is called free parameter) or other published work to calibrate the model (Wu & Liu, 2006, 2007). Third, due to the difference of input of these two groups of models, good theory-driven (top-down) models can make predictions without data (0 data prediction) while this is extremely hard for data-driven approaches. This 0 data prediction has profound practical and theorical meaning: (1) in practice, AI systems have encountered an issue of novel problems or tasks: if new input data have never been trained before, the system will have a higher error rate in recognizing or acting based on this new input. This problem is very common in natural language processing (NLP), and NLP systems which were trained with huge data still cannot answer various questions from humans in a perfect manner especially when the questions are outside of their training sets. This problem also has happened in computer vision when the system encounters new visual stimuli or real-world objects that were never trained in the AI models before. As a result, a hybrid system (a mixture of both data-driven models and theory-driven model) has been used in practice to solve these problems. (2) In theory, the 0 data prediction feature of top-down models (especially mathematical models) allows researchers in that field to open the doors of a new world and models can guide data collection. For example, the discovery of Einstein’s theory of relativity is a typical top-down modeling process without using any data and Einstein’s theory of relativity opens the door in modern physics, connecting energy and mass, which were proved by experiments later (Hafele & Keating, 1972). Fourth, another important way to evaluate a model is to calculate how easy it is to transfer it from one specific task to another task. The bottom-up data-driven approaches can perform a single narrow goal extremely well but often in ways that are extremely centered around a single task and are not robust and transferable to even modestly different circumstances (e.g., to a board of a different size, or from one video game to another with the same logic but different characters and settings) without extensive retraining (Marcus, 2020). Since both mathematical models and simulation models are top-down models, their transfer from one task to another task does not require training with data from the new task. For all models, including both bottom-up and top-down models, in the current state of the art, transferring the modeling results from one task to another requires the assistance of researchers.
HUMAN PERFORMANCE MODELING
Fifth, the modeling processes are different. On the one hand, the theory-driven modeling approaches start with a top-down understanding of the human cognition or/and human-machine systems first, and then build mathematical equations or production rules to model human behavior. Mathematical models use mathematical equations to quantify the human cognition or human-machine systems while simulation models use computer codes (most of them are production rules) to mimic human cognition or/and human machine systems. For ideal mathematical models in human factors, the final equations should be derived from the previous equations step by step. These modeling processes are similar to a mathematical statement proof process (e.g., prove the “a2 + b2 = c2 ” Pythagorean Theorem), since the researchers sometimes receive data (prediction of the model) before the model is built (even though their work is usually written in an order of model first and model verification with data second). Thus, a researcher in mathematical modeling should clearly lay out step by step in clear detail how his/her model reaches the final prediction. In other words, if it is a mathematical model of human performance, a researcher should list all of the model derivation steps clearly from the model input to the model output (prediction of the data), without skipping any important steps. Data to be modeled can be used to verify the models’ predictions but they are not required. On the other hand, the bottom-up modeling approaches requires data at the beginning of the modeling approaches: Recent deep learning models in AI are data-hungry and they need extensive training via either supervised or unsupervised learning. For statistical models, data are required to adjust the parameters in the limited number of pre-existing structures of models (e.g., linear, logistic, cubic, quadratic) to achieve the highest degree of matching. In the 0 data situations (e.g., a new task or a new situation to apply the model), deep learning in AI and statistical models cannot help much. This is one of their limitations as mentioned in Section 2.2. In summary, the top-down and bottom-up modeling approaches are very different in many aspects. Researchers must choose these approaches based on their needs. Recent studies (Lin, Wu, & Chaovalitwongse, 2015) have integrated these two approaches and this may solve the limitations on both sides. In Section 3.2, since this chapter focuses on mathematical models, we will discuss the differences between the mathematical models and simulation models within the group of top-down modeling approaches. 3.2 Mathematical Modeling vs. Simulation Modeling Approaches Within the top-down/theory-driven modeling approaches, compared to computer simulation models, mathematical models can predict, quantify, and analyze human performance, workload, brain waves, and other indexes of human behavior in a more rigorous way. First, mathematical models and equations of human behavior clearly quantify the mechanisms of human behavior by clear quantifications of the relationships of variables, including the relationship between the input and output of each equation. Users of these mathematical models will find it much easier to understand and extract the relationships among variables in equations than reading computer codes. Second, mathematical models and equations of human behavior can be relatively easily edited, modified, improved, and integrated together to develop new mathematical equations. Third, mathematical models and equations of human behavior and performance can relatively easily be implemented in different programming languages and embedded in different intelligent systems to work together with system design. The equations from mathematical modeling approaches can be coded directly in the system while it usually takes longer to run a simulation to obtain the same
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
results. Fourth, mathematical models and equations can lead to analytical solutions, which are more accurate than simulation results. Fifth, there are mathematical models and equations that can be proved by mathematics derivation directly with no need to be verified by empirical data (Wu, Berman, & Liu, 2010). However, there is one major limitation of mathematical models compared to simulation models: mathematical models are not able to generate or mimic human behavior, especially human language. Mathematical models are able to mathematically describe and predict human or human–machine interaction behavior, but they cannot replace humans in completing tasks while simulation and computer codes based on a human model can generate or mimic human behavior, including human language. Therefore, if someone is hoping to build a human language model which can generate human language, simulation or computer codes-based models are better choices. Several examples of integrating mathematical and simulation/symbolic models are provided in a later section of this chapter. In Sections 4, 5 and 6 of this chapter, we provide a brief history of the top-down/theory-driven mathematical models, simulation/symbolic models, and their integration. Then, in Section 7, we describe how to build and verify mathematical models, using a concrete example to illustrate the steps involved in this process. In the final section of this chapter (Section 8), we provide further discussions on mathematical modeling in human factors, together with specific examples as illustrations. 4 MATHEMATICAL MODELING OF SINGLE TASK PERFORMANCE 4.1 Task-Specific Models This section describes mathematical modeling of human performance of a single task—the simplest case of task performance, in which a person focuses his or her attention on the performance of one task only. Here, we describe four of the earliest, well-known, and historically important performance time models that were developed for specific single task situations. One of the earliest task-specific mathematical models of human performance is the Hick-Hyman Law of choice reaction time, which shows that the reaction time in a choice reaction task is a logarithmic function of the number of choice options a person faces in performing the choice reaction task. For example, if a user of a computer menu needs to make a choice among N menu items, then the reaction time to make the choice would be a logarithmic function of N, assuming the N items have equal value or are equally likely to be chosen (Hick, 1952; Hyman, 1953). Another of the earliest task-specific models of human performance well known in psychology and the human factors community is the Fitts Law of human movement time (Fitts, 1954). According to the Fitts Law, the movement time for the finger or hand to reach a destination such as a control button is a logarithmic function of the term (2A/W), in which A is the amplitude of the movement from the origin to the destination and W is the width of the intended destination target such as a control button. The third task-specific performance time model we describe here is Neisser’s model of visual scanning or visual search time in a structured visual search task (Neisser, 1964). In a structured visual search task, visual stimuli are displayed in an organized fashion and experimental participants are asked to search the visual display in some strict serial order (e.g., from left to right), similar to reading a name list or inspecting an instrument panel sequentially. A robust finding in the structured visual search is that there is a linear relationship between the serial position of a target in a display and the visual scanning or search time that is needed to find the target. More specifically, scanning time = N × I, where N represents that the target is at the Nth position
691
on a display and I is the average time needed to inspect one item. Furthermore, the average time needed to visually search and find an item on a display of M items is (M × I) /2, where M is the total number of items on a display, if we can assume the search target item is equally likely to appear on each of the M display positions. The fourth task-specific performance time model we describe here is Sternberg’s model of memory scanning time (Sternberg, 1969). In a memory scanning task, participants are first asked to remember a set of items (called the positive set, which could be, e.g., letters) and then the participants are shown one target item, and the participants are asked to scan their memory to answer whether the target item is in their memory or not (i.e., whether it is a member of the positive set or not). Numerous studies of the memory scanning task have shown that the time needed to respond whether the target is in the positive set is linearly dependent on the number of memorized items (i.e., the size of the positive set) but is not determined by the position of the target within the positive set. Mathematically, memory scanning time = a + b × S, where a is memory scanning initiation time, b is the increment in memory scanning time for each additional memorized item, and S is the size of the positive set. These task-specific models are very valuable in providing a quantitative summary of the related empirical phenomena such as choice reaction, finger movement, visual scanning, and memory scanning, and can be applied in the analysis of related simple human factors tasks. However, these models are “isolated” models for the particular types of tasks; they alone do not suggest how they can be linked with each other or with other models to model more complex tasks, in which the specific tasks are only the building blocks or components of a complex task. How should we bring the single task models together to account for human performance of more complex tasks? One approach is to conduct task-specific empirical experiments that involve two or more component tasks and then find mathematical models (such as statistical regression equations) to account for the collected data, and another approach is to develop more fundamental theories about the psychological processes or the mental structure that underlie the task components. One example of the first approach, i.e., the task-specific empirical or statistical approach, is Liu’s (1996a) study of integrated memory and visual tasks that require the simultaneous human performance of memory scanning of several memorized items and visual scanning of several visual displays. This type of task is typical of many real-world situations. For example, a process controller may need to monitor several visual displays to detect if any of them is showing a system error, which can be defined simultaneously in several forms (e.g., there are several types of system errors rather than only one form of error). The experimental results of (Liu, 1996a) were modeled as a set of statistical regression equations, which showed clear interactions between the memory scanning and the visual scanning components of the task, demonstrating the need for additional terms in the regression models beyond the terms based on Sternberg’s model and Neisser’s model alone. The results indicate that models of complex tasks may not be simple straightforward summations of the corresponding component single task models. While the regression models can be used for performance estimates or approximations of complex task performance in many areas of human factors applications, the method itself is task-specific and the results cannot be well explained without a fundamental understanding of the underlying psychological processes or mental structure. To develop this fundamental understanding is the mission of the second approach of task component integration: the theory-driven approach.
692
4.2 Psychological Process Models or Structural Models In order to establish the scientific basis to link psychological task components into a coherent whole that enables complex task performance, mathematical psychologists have been investigating the possible arrangements of psychological processes and the mental structures that underlie the various tasks. The first mathematical model of human performance and its possible underlying mental structure is a simple mathematical model called the subtractive method developed by (Donders, [1868] 1969) more than 150 years ago. Donders assumed that psychological processes are arranged in a serial manner similar to a chain of components, which can be inserted into or deleted from the chain. For example, reaction time in a choice reaction time task involving two choice alternatives might take 350 msec, while the simple reaction time requiring no choice to be made can be accomplished in 250 msec. By subtracting 250 msec from 350 msec, we know that an additional 100 msec is needed to perform the choice between two alternatives. According to Donders, this additional 100 msec is an indication of an additional psychological process that is inserted into the chain of processes involved in performing the original simple reaction time tasks and the duration of 100 msec is the duration of this additional inserted process. Donders’ model is called the subtractive method because the model uses the basic mathematical operation of subtraction to investigate the possible existence of an inserted or deleted process by examining the difference between the mean duration of a task that does not include the process in question and one that does. It is important to emphasize here that the model makes a structural assumption that one task process cannot start until all of its preceding task processes in the serial task chain have been completed. In the past 50 years or so, other mathematical models of psychological processes and mental architecture have been developed to relax the original structural assumption of Donders’ model mentioned above so as to cover a broader range of temporal and architectural arrangements that mental task processes might assume. The most prominent among these models include the cascade model (McClelland, 1979), which considers the possibilities of continuous flow-like serial mental processes that do not require a task process to wait till all its preceding processes finish, and the program-evaluation-and-review technique (PERT) network model (Schweickert, 1978), also called the critical-path network model, which investigates the possible arrangements of mental processes into discrete processing networks. In these types of network, processes on the same path of the network operate in strict sequence as in Donders’ model, but processes that are not on the same path can be active at the same time as in the cascade model (see Liu, 1996b, for details). To model a broader range of possible arrangements of psychological processes, Liu (1996b) developed a class of queuing network (QN) models of mental architecture, which assumes that the human mind resembles a queueing network. A queueing network is a network of servers, each providing a service of some kind to the customers, who may need to wait for service (queuing for service) if the needed server is busy serving other customers. Each server has a waiting space for customers to wait, and multiple queues may exist simultaneously in the system. The customers travel from server to server along the network paths that connect the servers. Queueing networks can be found everywhere in daily life and in engineering practice, and examples include supermarket checkout lanes, road traffic networks, telecommunication systems, and computer networks. The class of QN models is continuous-flow-network models and includes all the existing psychological process models described above as special cases and unifies them into a larger and coherent modeling framework. The class of QN models
HUMAN PERFORMANCE MODELING
also offers alternative explanations to many of the conclusions based on the previous models. Furthermore, QN models cover a broader range of possible arrangements of mental processes but these had not been modeled by previous models, such as feedback or non-unidirectional information flow, information “overtaking and bypassing,” and process dependencies or non-selective influence of factor effects. All the QN models in (Liu, 1996b) were discussed in relation to empirical data, and can be subjected to well-defined empirical tests. Readers would have noticed that the later mathematical models are more complex than the earlier models, and would have naturally raised the question: What is the necessity and additional value in considering the later and more complex mathematical models? The answer is two-fold: one for scientific understanding and one for practical applications. First, from the perspective of scientific understanding, the later and more complex models offer broader and deeper insights into the psychological mechanisms underlying task performance and raise many more theoretical and task independent questions for further investigation than the earlier and simpler task-specific models. Second, for practical applications, whether designers need to adopt the more complex models depends on a tradeoff between the ease and the precision of their modeling work. If a designer only needs to consider relatively simple tasks and is satisfied with quick and simple estimations of human performance with no desire or resource to deal with the complex models, then the simplest model of “addition and subtraction” certainly serves this purpose of quick estimation. For example, designers could assume users do everything in sequence; for example, they do a visual search on a visual display first, then choose among N alternatives, and finally move their finger to click a button—the users do all these three activities strictly in sequence with no time overlap or interference among them, and users do nothing else while performing these three sequential activities. With these assumptions, a designer could simply add up the time needed to perform the three activities, as modeled by the single task performance models described above. However, this designer must be fully aware that this simple estimation may be very inaccurate. If a designer is not satisfied with these assumptions and estimation results, for example, when users are performing these activities while driving a vehicle, then the simple additive/subtractive models would not suffice. One bit of good news is that researchers of mathematical models have been developing computer software programs to facilitate the development and use of complex mathematical models, as described elsewhere in this chapter.
5 MATHEMATICAL MODELING OF MULTITASK PERFORMANCE The preceding section discusses mathematical modeling of single task performance with a focus on performance time. These models were proposed to account for task situations when a person focuses his or her attention on the performance of a single task. This section expands the discussion to mathematical modeling of multitask performance when a person needs to devote his or her attention to the performance of multiple concurrent tasks, such as looking at an instrument panel while driving a car. 5.1 Single-Channel or Single-Server Models Mathematical modeling of multitask performance began with single channel models and single-server queuing models, established by engineers who based their mathematical models on the conceptual theory of multitask performance proposed by psychologists. The first conceptual theory in this area is the
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
693
single-channel theory of selective attention originally proposed by (Craik, 1947) to explain the psychological refractory period discovered by (Telford, 1931). Telford discovered that when two reaction time tasks are presented close together in time, the reaction time to the second task stimulus is usually longer than the corresponding single task condition. The single channel theory assumes that the human mental system has bottlenecks that can only process one piece of information at a time, and multitask performance is accomplished by serial selections about which component task or piece of information to process at any instant. Several extensions of the single channel theory were also proposed, including those of (Broadbent, 1958), (Deutsch & Deutsch, 1963), (Welford, 1967). Based on the single-channel conceptual theory of selective attention, numerous mathematical models of human multitask performance have been proposed (Carbonell, 1966; Rouse, 1980; Senders, 1964; Senders & Posner, 1976). Mathematically, most of these models represent human multitask performance as a single server queueing system in which multiple tasks or diverse sources of information are queued for service from a single-server human information processing system, which serially and quickly switches among the queued tasks or information sources. The single-channel assumptions can also be found in several other engineering models. For example, Sheridan (1972) modeled complex task performance as a task sampling and sequencing problem and assumed that there is a mental cost to switching attention among the tasks, which will determine how often different information sources in the environment are sampled. For a comprehensive and detailed review of this school of single channel or single server models, see (Liu, 1997).
5.3 Queueing-Network (QN) Models As described in Section 5.2 on mathematical modeling of single task performance, QN models of psychological processes integrate other mathematical models as special cases and can model situations that cannot be modeled by other models. Similarly, QN models of multitask performance have also been shown to include the single-server models and task network models of multitask performance described above as special cases, and can also model multitask situations that cannot be modeled by the single server or task network models. In the earliest article on QN modeling of multitask performance, Liu (1997) describes a three-node queueing network model of human multitask performance to account for interferences between concurrent spatial and verbal tasks. The model integrates the considerations of single-channel queuing theoretic models of selective attention and parallel processing, multiple-resource models of divided attention, and provides a computational framework for modeling both the serial processing and the concurrent execution aspects of human multitask performance. Furthermore, the three-node QN model allows the modeling of task selection strategy, server capacity, and queuing priority to account for a variety of multitask situations. The article describes numerous existing experimental evidence in support of the queueing network model and also discusses the value of using QN methods to integrate currently isolated concepts of human multitask performance. The article further discusses modeling human–machine interaction in general, in which humans and machines can be treated as servers in a larger integrated network.
5.2 Task-Network Models Another approach to model human multitask performance is the task network models, originally developed by (Siegal & Wolf, 1969) with the systems analysis of integrated networks of tasks (SAINT) modeling methodology. This approach represents human performance as a sequence of tasks (also called paths), and alternative paths may exist to accomplish a complex task in various situations. These alternative paths form a task network. It should be emphasized here that the parallel paths in task network models represent alternatives rather than concurrent processing. A family of task network-based models has been developed, and a prominent example is Micro Saint Sharp, which is a general purpose, discrete event simulation software tool that has been used successfully in many areas, including human factors and the military, manufacturing, and service sectors (Laughery, 1989). Micro Saint is also used as the platform to develop the task network modeling tool of the Army Research Lab (ARL) called IMPRINT (Improved Performance Research Integration Tool). It is a discrete event modeling framework, which assumes that the performance system can be represented as networked task sequences and continuous tasks can be approximated as discrete tasks. IMPRINT has been used extensively to represent and analyze human-system performance, in system acquisition, and to estimate user-centered requirements (Hawley, Lockett, & Allender, 2005). These models share the same fundamental theoretical assumption that humans can process only one piece of information at a time. Human multitask performance is modeled as a process of selecting tasks for sequential action according to some service discipline or cost function, which is usually calculated with processing time as the crucial metric. Time is the limiting resource that is competed for by multiple tasks and task difficulty is a function of its required processing time. These models give no or much less attention to the other dimensions or factors of task demand.
6 INTEGRATION OF MATHEMATICAL AND SYMBOLIC MODELS All the models discussed so far are primarily mathematical or numerical simulation models, which use mathematical equations or numerical simulation software to quantitatively model human performance. Using these mathematical models alone for engineering design and application has two major limitations: First, these models are unable to generate or mimic the detailed actions of a person in specific task situations, i.e., they do not represent the specific steps a person may undertake in performing a task. Second, as mathematical or numerical models, they do not capture or represent the specific cognitive knowledge a person may employ in accomplishing his/her specific goals in a particular real-world situation. In contrast, another group of human performance models, called symbolic models, demonstrates their particular strengths in dealing with the two limitations of mathematical models. More specifically, symbolic models can generate and mimic the detailed procedures and actions that a person might take in performing a task, and as a symbolic model, they can naturally represent the specific symbolic and cognitive knowledge a person uses in a task. Since the early 1970s, numerous symbolic models of human performance have been developed. Prominent among these are the cognitive architecture-based models, including the Model Human Processor (MHP) and the GOMS family of model (Card, Moran, & Newell, 1983), (John & Kieras, 1996); ACT-R (Anderson & Lebiere, 1998; Anderson, et al., 2004), Soar (Laird, Newell, & Rosenbloom, 1987; Newell, 1990), and EPIC (Meyer & Kieras, 1997a, 1997b). However, these symbolic models are not mathematical models in their overall underlying structure. Although they use mathematics to analyze some of the specific mechanisms or operations of some of the components of the models, they do not have an overarching and coherent mathematical structure to represent the interconnected arrangements of all the components and their interaction patterns.
694
HUMAN PERFORMANCE MODELING
mental workload, which emerge as the network behavior of multiple streams of information flowing through a network. QN-MHP has been applied to generate and model a variety of tasks, as summarized later in this chapter. The expansion of the three MHP processors into the three QN subnetworks of servers in QN-MHP is based on the extensive literature on the human brain and human cognitive system (see Liu, Feyen, & Tsimhoni, 2006, Wu & Liu 2008a, 2008b, for details). For example, the perceptual subnetwork includes a visual and an auditory perceptual subnetwork, each of which is composed of four servers. The cognitive subnetwork includes a working memory system, a goal execution system, a long-term memory system and a complex cognitive processing system. The working memory subnetwork contains four components: a visuospatial sketchpad, a phonological loop, a central executor, and a performance monitor. The long-term memory system represents two types of long-term memory: declarative (facts and events) and spatial memory and nondeclarative memory (procedural memory and motor program). The complex cognitive processing system performs complex cognitive functions and it is the only serial processing server in the QN-MHP. The motor subnetwork contains five servers corresponding to the major brain areas in retrieval, assembling, and execution of motor commands and sensory information feedback (Liu et al., 2006; Wu & Liu 2008a). QN-MHP uses the related MHP parameters such as processing cycle times to establish QN server processing times. QN-MHP servers are defined with processing logics to perform certain procedural operations. To represent the procedural aspects of a task, QN-MHP adopts the cognitive task description method described by Kieras for an NGOMSL analysis (Kieras, 1988). Currently QN-MHP has a library of about two dozen NGOMSL-style QN-MHP task analysis operators that
Clearly, the mathematical modeling approach and the symbolic modeling approach are complementary. It is thus important and valuable to develop a modeling method that integrates the two approaches. In the following, we summarize two modeling methods that bridge the mathematical and the symbolic approaches: QN-MHP (Model-Human-Processor) and QN-ACTR. 6.1 QN-MHP MHP is a well-known computational modeling method in the fields of human-computer interaction and human factors (Card et al., 1983). MHP represents the human information processing system as a series of three discrete processors: perception, cognition, and motor processors. The three processors operate in strict sequence, and one processor cannot start its processing until its preceding processor has completed its processing. Mathematically and conceptually, it is in the same spirit as the Donders’ subtractive method described above. QN-MHP integrates QN and MHP and its related GOMS (Goals, Operators, Methods, and Selections) method, through adopting and modifying several major components of the MHP/GOMS (Liu, Feyen, & Tsimhoni, 2006; Wu & Liu, 2008a, 2008b) and representing them in a queueing network. QN-MHP expands the three MHP discrete processors into three continuous QN subnetworks of servers, each performing distinct psychological functions specified with a procedural/symbolic language (see Figure 1). By integrating mathematical and symbolic modeling approaches, QN-MHP adopts the strengths of both modeling approaches and overcomes the weaknesses of either approach alone. QN-MHP allows both precise mathematical analysis and real-time generation of multitask performance and
2 1
4 3
D A C
E
G
F
H
B 6 5
V
21
W
22
Y
Z
X
8
23 24 25
7 (a.) Perceptual Subnetwork Perceptual Subnetwork 1. Common visual processing 2. Visual recognition 3. Visual location 4. Visual recognition and location integration 5. Common auditory processing 6. Auditory recognition 7. Auditory location 8. Auditory recognition and location integration Figure 1
(b.) Cognitive Subnetwork Cognitive Subnetwork A. Visuospatial sketchpad B. Phonological loop C. Central executive D. Long-term procedural memory E. Performance monitor F. Complex cognitive function G. Goal initiation H. Long-term declarative & spatial memory
(c.) Motor Subnetwork Motor Subnetwork V. Sensorimotor integration W. Motor program retrieval X. Feedback information collection Y. Motor program assembling and error detecting Z. Sending information to body parts 21–25: Body parts: eye, mouth, left hand, right hand, foot
The general structure of the queuing network model (QN-MHP). (Source: Based on Lin and Wu, 2012.)
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
frequently appear as elemental task components. This library of elemental task operators includes basic motor operators, perceptual processing operators, complex cognitive function operators, memory access operators, and procedure flow operators. Each operator has been implemented in the QN-MHP simulation program and can be used by modelers to build models of complex tasks. Human performance modelers can also develop their own operators in addition to these predefined elemental task operators. Like all human performance modeling methods that are based on task analysis, including MHP/GOMS, QN-MHP modeling starts with “task analysis” that identifies and represents systematically the user’s goals in a task and identifies methods to accomplish the goals. The result of QN-MHP task analysis is then used as an input into the QN-MHP simulation software, together with other crucial task information such as the task environment and machine interface. To model multitask performance, each task, including its goals and methods for accomplishing the goals, as well as its associated environmental and device information, is analyzed separately and entered into corresponding sections of the input files of the simulation software. In the QN-MHP, multiple goal lists can be processed simultaneously and independent of one another, simulating the human potential ability to perform more than one thing at a time. The processing procedure for each goal is the same as in the corresponding single goal task modeling. Each entity flowing through the queueing network is associated with one of the goals, and the entities representing different goals could encounter potential competition at the server level, thus causing potential task interference. Depending on the capacity and utilization level of each server, entities associated with different goals can either be processed in parallel, or wait until the server is available. Priority decisions are made in real time locally, at the server level, rather than centrally at an executive level, as required when only one goal at a time can be processed. It is the flow patterns of the entities and potential congestions at the various servers, not the limitations of a particular central executive that produce task interferences. Using the road network as an analogy, travelers on a road network do not need moment-to-moment commands from a traffic controller. Congestion may appear at various locations due to different road capacity and traffic flow demands, not necessarily due to the limitations of a particular central controller. In a QN-MHP simulation of multitask performance, potential task interference emerges when two or more streams of entities, representing two or more concurrent task goals, traverse the queueing network concurrently and compete for service at the various servers. QN-MHP supports mathematical analysis and real-time simulation of human performance and has been successfully used to generate and model human performance and mental workload in real time, including driver performance (Liu et al., 2006) and driver workload (Wu & Liu, 2007), transcription typing (Wu & Liu, 2008b), the psychological refractory period (Wu & Liu, 2008a), visual search (Lim & Liu, 2004a, 2004b, 2009), visual manual tracking performance and mental workload measured by NASA-TLX subjective workload (Wu & Liu, 2007), the event-related potential (ERP) techniques (Wu, Liu, &Walsh, 2008), driver performance with night vision and pedestrian detection systems (Lim, Tsimhoni, & Liu, 2010a, 2010b), driver lateral control (Bi, Gan, Shang, & Liu, 2012; Bi, Wang, Wang, & Liu, 2015), control of body movements (Fuller, Reed, & Liu, 2012), driver EEG-signal-based steering control (Bi et al., 2017), usability testing tool for In-Vehicle Infotainment Systems (Feng, Liu, & Chen, 2017), and touch screen drag gestures (Jeong & Liu, 2019). QN-MHP can be implemented in any general-purpose simulation program and has thus far been implemented in ProModel
695
and Matlab Simulink. Easy-to-use graphical interfaces have been developed, with which a modeler only needs to select and click menu buttons with no need to learn a programming language. Using the visualization features of the QN-MHP software, one can visualize the real-time operation of the mental network architecture, including the internal information flows inside “the mind” while the modeled human performs simple or complex tasks and the travel patterns of the entities representing various tasks and their potential network congestions. Several advanced time-series and statistical results can also be displayed both in real time and recorded into external files. Subjective and physiological mental workload is also mathematically analyzed and visually displayed as QN server and subnetwork utilizations (Liu, et al., 2006; Wu & Liu, 2007; Wu, Tsimhoni, & Liu, 2008; Feng, Liu, * Chen, 2017). In addition to serving as a human factors applications tool, the QN approach also demonstrates its unique theoretical position in multitask modeling, as discussed in detail in Liu (1996b, 1997, 2007, 2009), Liu, Feyen, and Tsimhoni (2006); Wu and Liu (2008a). With the QN approach, “queuing” is a unique and fundamental theoretical concept, which serves as a major task coordination mechanism that allows multitask interference and performance patterns to emerge without the need for any supervisory process. Like a roadway network system, traffic patterns appear as results of network architectural characteristics and traffic flows. In contrast to strictly serial or strictly parallel processing theoretical positions, QN-MHP has a hybrid cognitive network structure with both serial and parallel information processing components in its cognitive subnetwork on the basis of existing experimental evidence in the literature. Furthermore, in addition to task generation and simulation capabilities, QN-MHP has an overall mathematical structure of the queueing network that allows QN-MHP to model the interactions among the servers mathematically. Wu and Liu (2008a) modeled all of the empirical Psychological Refractory Period data with closed-form mathematical equations, without the need for simulation. 6.2 QN-ACTR As described in Section 6.1, QN-MHP integrates the mathematical approach of QN with the procedural method of MHP/GOMS and is able to model a wide range of procedure-based (i.e., IF-THEN types of) tasks. However, MHP/GOMS can only model procedural type of tasks; it does not have the ability to model more complex cognitive tasks that cannot be accurately represented and analyzed as IF-THEN procedures. Examples of such complex cognitive tasks include memory retrieval mechanisms, language understanding, and problem solving. Because QN-MHP relies on MHP to represent the task specific aspects, QN-MHP, like MHP, cannot model these types of complex cognitive tasks either. This section introduces the integrated cognitive architecture QN-ACTR that has been developed to overcome this limitation of MHP and QN-MHP (for more details, see (Cao & Liu, 2011, 2012, 2013, 2015), QN-ACTR integrates the QN approach with the ACT-R approach. ACT-R stands for Adaptive Control of Thought-Rational (ACT-R; J. R. Anderson, et al., 2004), and it is one of the best-known and most developed cognitive architectures. The structure of ACT-R contains several modules, each of which is a cognitive component, such as the vision module and the declarative module. A crucial aspect of ACT-R is that it has two types of knowledge representations: declarative chunks and production rules (rules, for short). A chunk’s retrieval time and error rate are determined by its activation level, which is set by sub-symbolic calculations. Rules represent procedural
696
HUMAN PERFORMANCE MODELING
knowledge. It is the joint action of the declarative chunks and the production rules that determines task performance. ACT-R has been used to model a wide range of complex cognitive tasks including decision making, language comprehension, skill acquisition, and memory retrieval (for the list of the domains modeled by ACT-R, see http://act-r.psy.cmu.edu/). ACT-R has its strength in modeling tasks that involve complex cognitive activities, thanks to its sophisticated sub-symbolic algorithms for determining activation levels. Although it uses detailed mathematical algorithms in various specific mechanism computations, it does not have an overall mathematical structure to represent its overall architecture. It is mainly a symbolic computational architecture. In comparison, as discussed above, QN has an overall mathematical structure and has demonstrated its strength in modeling multi-task performance using its hybrid queueing network structure and scheduling mechanisms. But QN alone does not have the capacity to represent the detailed procedures or productions of complex cognitive tasks. ACT-R has a sophisticated declarative memory mechanism reflecting activation and association and a procedural knowledge mechanism that can generate new rules. These mechanisms are what the QN architecture lacks. Clearly, QN and ACT-R are complementary, and QN-ACTR aims to integrate these two previously isolated mental architectures and benefit from their integration. QN-ACTR represents the ACT-R cognitive structure as a queueing network (Figure 2), whose servers represent ACT-R’s modules and buffers. ACT-R’s information units such as chunks, production rules, and buffer requests are represented as QN entities, which travel between the QN servers. All the server processing logics in QN-ACTR are identical to the corresponding algorithms in ACT-R, including the sub-symbolic calculations in the production and declarative modules. To support the queuing based scheduling of multi-tasks at the local server level, queues are added to the modules that have non-zero processing time. QN-ACTR is currently implemented in a C#-based discrete event simulation software package, Micro Saint Sharp (http:// www.maad.com/), which is a network-based simulation platform and provides natural support for QN modeling. Modules and buffers in ACT-R were implemented as servers and the corresponding data objects for storing related parameters. Chunks and production rules were programmed as data objects. ACT-R methods and functions were ported to QN-ACTR functions, which can be called by related servers. Global parameters were
Vision Module
set to their default values as in ACT-R. Since ACT-R algorithms are mainly implemented in Lisp, the ACT-R algorithm codes have been translated from Lisp to C#. A parser function has been built that can directly read the same codes that define chunks, production rules, and parameters in an ACT-R model file. As a result, the task-specific knowledge part of a model in QN-ACTR is exactly the same as that in ACT-R. If an ACT-R model is available, a modeler can simply copy from the ACT-R model paste it into the QN-ACTR model. Usability features have been designed in the QN-ACTR interface to facilitate model development. These features include a click-and-select user interface, the visualization of modeling execution process and modeling results, and an integrated experiment platform that allows comparison of models and humans in identical experimental setups. More specifically, the click-and-select user interface of QN-ACTR serves as a model setup assistant that guides modelers through the model building process, including task setup (i.e., task procedure, displays, and controls), knowledge setup (i.e., task-specific cognitive knowledge including production rules and declarative chunks), and parameter setup. A modeler could build a cognitive model by simply filling in tables and selecting from lists without the need to learn the specific coding language used in ACT-R. However, QN-ACTR also supports the sophisticated modelers who want to use the ACT-R cognitive knowledge codes and syntaxes through its built-in syntax parser. An easy-to-use template has also been developed for building the task environment part of a model by simply setting the template’s parameters according to the experiment setup. QN-ACTR has been used to model several dozens of tasks to evaluate the modeling capability and usability of QN-ACTR in comparison to QN-MHP and ACT-R. The results showed that QN-ACTR can model all the tasks that have been modeled by QN-MHP and ACT-R, and furthermore, it provides additional capabilities for modeling complex cognitive and multi-task situations that QN-MHP or ACT-R alone is unable to or finds difficult to model. In addition to serving as an integrated modeling platform to cover a broader range of complex cognitive tasks, QN-ACTR also sheds new light on some theoretical issues. For example, QN-ACTR offers new insights on some previously unaddressed theoretical issues in ACT-R, such as the concurrent goal scheduling issue in modeling task or goal conflicts, from the QN perspective. More specifically, to handle concurrent goal scheduling, QN-ACTR allows concurrent goals to wait
Manual & Speech Modules
Visual & Visuallocation Buffers
Aural & Aurallocation Buffers
Audio Module
Perceptual Subnetwork Figure 2
Matching
Intentional Module & Buffer
Selection Execution
Imaginal Module & Buffer
Production Module
Declarative Module & Buffer
Cognitive Subnetwork
Preparation Initiation Execution Manual & Vocal Buffers
Motor Subnetwork
QN-ACTR architecture. (Source: Modified from Cao and Liu, 2011.)
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
in queues. Concurrent task performance can then emerge as multiple streams of information entities flowing through the QN network, with no need for task-specific procedures or an executive process to interactively control task processes. The integration of QN and ACT-R allows further examination of different queueing mechanisms for modeling a wide range of multitask scenarios, especially those involving dynamic and complex cognitive tasks. An illustrative example of QN-ACTR modeling that demonstrates the unique values of QN-ACTR can be found in the QN-ACTR modeling of human performance and workload in complex cognitive and multitask scenarios with time stress (Cao & Liu, 2012, 2015). QN-ACTR was used to develop model simulations of two empirical experimental tasks from earlier studies. The first task was a semantic category judgment task (Colle & Reid, 1998) that involved complex cognitive activities in the declarative memory. The second task was a dual-task of target shooting and arithmetic computation (Kerick & Allender, 2004), in which the shooting task used the visual-manual channel, whereas the arithmetic task used the auditory-vocal channel. Subjective workload was measured by NASA-TLX (Hart & Staveland, 1989) in the first experiment and by SWAT (Reid & Nygren, 1988) in the second experiment. As discussed above, modeling complex declarative memory processes involved in the first task (semantic category judgment) is beyond the scope of MHP and QN-MHP, but can be handled by ACT-R. However, to model the second task (dual task of shooting and computation), ACT-R needs to use complex scheduling algorithms at the module level to handle task interferences, and neither of the two subjective workload assessment techniques is included in ACT-R architecture. Furthermore, time stress has not been modeled by existing models. Due to these considerations, these two tasks are chosen as platforms to evaluate the capabilities of QN-ACTR in comparison to QN-MHP and ACT-R. The same ACT-R codes, rules, and parameters were used for the QN-ACTR model, which generated model results that closely match the corresponding human performance and workload data in both single and dual task situations under time stress.In Section 7, we describe the detailed steps to build and verify mathematical models in human factors.
7 HOW TO BUILD AND VERIFY MATHEMATICAL MODELS IN HUMAN FACTORS In this section, we describe the major steps in building a mathematical model in human factors, as well as how to verify it with data. In the situations when there is no data available or modelers are not able to conduct experiments to verify the model, researchers can still propose/build a mathematical model without its verification by data. There are several different ways to build mathematical models in human factors. For example, researchers can draw a conceptual model of the human-machine systems in detail. The conceptual model is a foundation of detailed mathematical modeling or simulation modeling. There are at least two types of conceptual model a researcher in human factors can draw. The first is a task-independent conceptual model (e.g., a model of human cognition that does not change under different tasks) and the other is a task-dependent conceptual model (e.g., a model describes the details of steps for a human operator to complete certain tasks). After that, researchers can construct the simplest mathematical models to describe the relations among the boxes in these conceptual models. In the following section, we will use an example of modeling the aging effect on mental workload to illustrate one of the ways to build and verify mathematical models in human factors.
697
Step 1 (Understanding the mechanism). Researchers need to understand the human-machine systems in depth since we are using a top-down modeling approach rather than directly mapping between input and output. Researchers need to carefully explore the mechanisms of the system (e.g., causal relations or any neurological/brain mechanisms) with a deep understanding of the system. For example, in modeling human errors in transcription typing, researchers need to find the root cause of error distribution of human body movements which are grounded in the neural firing mechanisms in the human primary motor cortex, and then build models based on this neural firing mechanisms (Wu & Liu, 2008b). Researchers can also draw a conceptual model of the system as a result of this step or use an existing human cognition or human–machine model to understand the mechanisms. Step 2 (Mapping between human factors concepts with mathematical terms). Following Step 1, if a researcher chooses to quantify a conceptual model, he or she needs to build mathematical equations within or/and between each element of the model. If a researcher decides to use an existing cognitive or human performance model or architecture, he or she needs to understand the model and try to map the human factors/psychological terms with the terms in that model. In the current modeling of the aging effect example, researchers chose to use a queuing network model (QN-MHP) (Wu & Liu, 2008a), and we can start with an overall mapping between the variables and mathematical terms in the model (e.g., server speed and utilization of subnetwork) (see Figure 3). After performing the same task, why does an older person perceive a higher mental workload than a young person? There are many factors that can contribute to this phenomenon: When a person gets old, to perform the same task, many aspects of the cognition and physical body parts are affected. Researchers can choose one of these aspects at the abstract level (e.g., information processing speed) to capture the effects of aging on workload. When a person gets old, the information processing speed of different parts of the body and human cognitions slow down. Therefore, we can consider using the servers’ processing speed to represent the human information processing speed as one of the indexes of the aging effect (see Number “1” in Figure 3) with literature support in this step. Following the same logic, researchers can use the subnetworks to represent the mental workload (see Number “2” in Figure 3) with support from the literature. Step 3 (Formulating the mathematical equations). Researchers should not be afraid of using mathematical symbols to describe the system. In this step, first, we understand the system that with an increase of age (represented by A, A=1 for young adults following the simplicity principle) after a person becomes an adult, the information processing speed (u) decreases. There are many different ways to quantify an inverse relationship between two variables. However, based on the “simplicity” criterion in modeling introduced in Subjective Mental Workload
Aging Effect 1
2 3
Server Speed
Utilization of Subnetworks
Figure 3 An example of mathematical modeling in quantifying the effect of aging on mental workload.
698
HUMAN PERFORMANCE MODELING
the previous sections, researchers should start with the simplest format or relations between the variables first unless researchers have support from the literature for a more specific or more complex format or relations. Therefore, in this example, we start with the simplest format of an inverse relation (Eq 2): U = U0 ∕A
(2)
where u0 represents the information speed of young adults. In this example, the simplest format indicates that we do not add extra parameters or variables (e.g., u = a + u0 /A). Since we start with A = 1, when a person’s age increase, A’s value increases which leads to a smaller and smaller value of u. Here, there are a few important principles in specifying the variables in the mathematical equation (called “Principles of variable specifications” here). First, we need to specify the range of each variable (e.g., A >= 1, u > 0) which will help us to formulate their relations and avoid errors in modeling. For example, if we don’t specify A >= 1, the denominator can equal 0, leading to an infinite processing speed which is not consistent with reality. Second, we need to understand or at least be aware of the unit of each variable. In this example, the units of both u and u0 can be bit/sec and they are the same; therefore, A will be a parameter or variable without a unit. However, if the units of the variables on the two sides of the equation are different (which may lead to their different ranges), we may have to consider adding more variables in the equation to convert their units and ranges. Third, we need to pay attention to possible differences between the physical systems that the original mathematical term used and the mental system (see the development of Eq (3) as an example). Fourth, we can use subscript of each symbol/variable to specify a different person, time, status, location, etc., which is very important in the following mathematical operations. Here, we use 0 as a subscript of u0 to represent the information speed of young adults. Moreover, since the human system has many different elements which are mapped into different servers in the main model (QN-MHP), we added one more subscript j to indicate the server number. Accordingly, we have the following Eq (3): Uj = U0,j ∕A
(3)
where uj represents the information processing speed at a server j. Following the same logic in this modeling step, we can build the mathematical equations to quantify the mental workload by using the subnetwork (𝜌, 0 0) here, indicating that TD > 0 when 𝜌 = 0. Step 4 (Mathematical derivations). After all human factors or psychological terms or variables (including independent, intermediate, and dependent variables) have been formulated with mathematical equations, this step will link all of them together with mathematical derivations (Number “3” in Figure 3), generating the detailed equations with the left side of the equations indicating the dependent variable(s) and the right side of the equations containing other variable. (See the detailed derivation process in (Wu & Liu, 2007) and an example in Eq (5) here). In the following sections, we will describe the details of this equation as an example. ⎞ ⎛ ⎟/ ⎜ T 𝜆 i all dt⎟ 4T + b TD = Aa ⎜ ⎟ ⎜∫0 Call ∑i ⎜ 𝜇0, j ⎟ ⎠ ⎝ j=1
(5)
Step 5 (Checking the equations from human factors perspectives). We are not mathematicians only interested in mathematical modeling process in human factors. After the final equations of the prediction of human performance or cognition are obtained, researchers need to carefully check the meaning and structure of the equations are consistent with human factors’ findings or not. For example, in Eq (5) as a result of mathematical derivations, the definite integral in mathematics means the area underneath a curve given a certain time range which is consistent with the concept of human workload that is accumulated over a time period. We can also check and find out that with an increase in age of a person (an increase of A), Eq (5) predicts that mental demand (Temporal demand (TD) as one of dimensions of mental workload) increases, which is consistent with the human factors literature. Step 6 (Specifying the value of parameters or variables value). The previous steps provided the structures of the mathematical formula and it is still important to set correct or reasonable values for each parameter or variables on the right side of these equations. In mathematical modeling in human factors research, there are multiple resources for researchers to set the correct or reasonable values for each parameter or variables: First, if a researcher is using an existing model or architecture (e.g., QN-MHP), the model itself may have default values of a few variables (e.g., Wu, 2018). Second, researchers can set the values of parameters or variables based on the experimental settings of the study to be modeled. For example, T in Eq (4) is the length of a trial in the experiment. Third, researchers can use the existing literature (not the study to be modeled) to set the values of parameters or variables. For example, for A in our example equation, given a specific age of a person, we can use the study of the aging effect on spatial choice tasks to set A’s value (Proctor, Vu, & Pick, 2005). Fourth, for certain parameters or variables which
MATHEMATICAL MODELING IN HUMAN–MACHINE SYSTEM DESIGN AND EVALUATION
it is difficult to obtain from the previous three sources, researchers can use simulation models to obtain their values (e.g., information arrival rate λ). Fifth, the last choice is to use free parameters. Researchers can use a small portion of the data to be modeled to set a few (the fewer the better) parameters or variables’ values. Then the left larger amount of data will still be used to verified the model’s prediction. Step 7 (Verifying the model’s prediction and improving the model). Data from experiments, surveys, observations, or other empirical studies can be used to verify a model’s prediction. (However, the verification of mathematical models is not required from the perspective of mathematics if there is no error in the derivation process.) In human factors, if a researcher submits a manuscript to a journal, usually reviewers and editors are expecting some verification of the mathematical model due to the potential differences between an abstract model and the complexity of real human machine systems. R square and RMS can both be used to verify a model’s prediction, including the capture of the pattern of data change over time or conditions (R square) and the difference between the model’s predictions and the data (RMS). As a rule of thumb, R square should reach at least 0.5 while an extremely high R square (e.g., 0.999) usually means that there may be free parameters in the model or the model may not be very robust in modeling other data sets or other experimental studies due to possible over-fitting problems. If a model’s R square is under 0.5, researchers need to carefully check Steps 1–6 in the modeling process. For example, do researchers really understand how the system works? Is there any important perceptual/cognitive/motor element missing in the modeling process? Are there any errors in formulating the mathematical equations or any derivation errors? Are the formulated mathematical equations truly consistent with other findings in human factors research? Moreover, researchers can also perform sensitivity analysis to check the prediction results are mainly affected by which variables and if there are any errors in obtaining their values. Researchers can explicitly describe this improvement of the model in their papers so that other researchers can understand the detailed modeling and improvement processes. Step 8 (Playing the model and improving the system). Going back to the purpose of mathematical modeling in human factors, we are not just mathematicians modeling human performance or behavior. The purpose of human factors engineering is to use mathematics as a tool to help us to improve the human-machine system. Once a model or a set of mathematical models of human performance or human behavior is carefully verified by empirical studies, researchers are encouraged to apply these models in practice (Section 8 of this chapter provides more examples). Here in our current example, once we have a model of human workload, we can develop a workload management system (Wu, Tsimhoni, & Liu, 2008): Based on the predicted mental workload (output), an automatic system can adjust the presentation rate of a secondary task so that the drivers’ workload can be minimized. In summary, the above eight steps are one of the example procedures used to build, verify, and apply mathematical models in human factors. Researchers can build their own mathematical models in human factors with other possible procedures. Mathematical modeling is a growing area in human factors compared with the relatively large number of experimental studies in human factors. In Section 8, we discuss the integration of mathematical and symbolic models.
699
8 THE APPLICATIONS OF MATHEMATICAL MODELING IN SYSTEM DESIGN AND EVALUATION In this section of the chapter, we describe how mathematical models can be used in human–machine system design and usability testing. 8.1 The Applications of Mathematical Modeling in Overall Human–Machine System Design Following Step 8 of mathematical modeling (Playing the model and improving the system) described in Section 7, here we will illustrate how mathematical models can be used in human machine system design. There are several major ways to play or apply a mathematical model in the system design. One of the applications of the mathematical models is to perform optimization with the verified model. For example, if researchers have built and verified a mathematical model quantifying human typing performance (e.g., the number of digits or letters typed per min, DPM) as a function of various variables including: the variables of the numerical keyboard design (e.g., the gap size between the keys (G), the size of keys on the keyboard (S)) and variables of other experimental settings (e.g., speech rate of the letters or numbers when a person hear the letters or digits, ISI) (See the example Eq (6) and the exact format of equations (f) in detail in the study by Lin and Wu (2012)). DPM = f (G, S, ISI, other variables)
(6)
We can perform optimization using Eq (6) in two different ways. The first way is to solve the optimization problem using mathematical derivation directly by setting the objective function Max (DPM), given certain reasonable constraints in human factors (e.g., gap size between the keys (G) >= 1 mm but less than 1000 mm, 1 mm