Applied Cognitive Science and Technology: Implications of Interactions Between Human Cognition and Technology [1st ed. 2023] 9819939658, 9789819939657

This book fills the long-pending gap in consolidating research on applied cognitive science and technology. It explores

195 68 6MB

English Pages 277 [261] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Editors and Contributors
Part I Artificial Intelligence and Agents
1 Toward Behavioral AI: Cognitive Factors Underlying the Public Psychology of Artificial Intelligence
A Revived Human View of Artificial Intelligence
India
New Zealand
United Kingdom
United States
China
Worldwide Growth and An Evolving Need for Behavioral AI
Why Should We Trust Algorithms?
Algorithm Aversion and Appreciation
Domain Specificity and Task Sensitivity
Measuring Preferences for AI Versus Humans in Light of National AI Strategies
Cognitive Factors Related to Preferences Toward AI Algorithms
Transparency and Explanation
Algorithmic Error
Perceived Understanding
Accuracy and Risk Levels
Sense of Uniqueness-Neglect and Responsibility
Summary of Factors
Cognitive Solutions to Increase Acceptability of AI and Enhance Algorithmic Appreciation
Communicate Transparency of Algorithmic Processing
Give Control to Modify Algorithmic Outcomes
Provide Social Proof
Increase Understanding
Frame Algorithms to Be More Humanlike
Conclusion
References
2 Defining the Relationship Between the Level of Autonomy in a Computer and the Cognitive Workload of Its User
Introduction
Review
Levels of Automation
Measurements of Mental Workload
Other Factors that May Impact Relationship
Experimental Design and Methodology
Discussion
Conclusion
References
3 Cognitive Effects of the Anthropomorphization of Artificial Agents in Human–Agent Interactions
Introduction
Anthropomorphic Design
The Uncanny Valley
Social Robotics
Empathy and Attribution of Mind
Physical Human–Robot Interaction
Goal-directed Action and Mirroring
Altruistic and Strategic Behavior Toward Robots
Ethical Considerations
Present Challenges
State of the Field
References
Part II Decision Support and Assistance Systems
4 Psychological Factors Impacting Adoption of Decision Support Tools
Introduction
Decision Support Systems
Adoption
Cognitive Biases
Anchoring Bias
Egocentric Bias
Belief Bias
Familiarity Bias
Automation Bias
Trust
Ethics and Culpability
Mitigating Factors
Requirements
Design
Training
Release
Support
Conclusion
References
5 Model-Based Operator Assistance: How to Match Engineering Models with Humans’ Cognitive Representations of Their Actions?
Introduction
Complex Problem Solving and Situation Awareness in Process Plants
Formal Models in the Engineering Phase of a Plant
Using Engineering Models for Operator Assistance
Abstraction Hierarchies in Systems Engineering and Human Cognition
Abstraction Hierarchies in Systems Engineering
Abstraction Hierarchies in the Cognitive Representation of Actions
Implications for the Use of Engineering Models in Operator Assistance Systems
Models Need to Provide Information on the Right Levels of Abstraction
Selecting Suitable Models by Matching the Levels of Abstraction
Describing the Contents and Capabilities of Models
Conclusion
References
Part III Behavioral Cybersecurity
6 Behavioral Game Theory in Cyber Security: The Influence of Interdependent Information's Availability on Cyber-Strike and Patching Processes
Summary
Introduction
The Markovian Game
Expectations in the Markovian Game
Experiment
Experimental Design
Respondents
Procedure
Results
The Proportion of Strike and Patch Actions in AV and Non-AV Conditions
The Proportion of Strike and Patch Actions in s and ns States
Strike and Patch Proportions in AV and Non-AV Conditions and s and ns States
Strike and Patch Proportions Over Blocks
Discussion and Conclusion
Authors’ Note
References
7 Exploring Cybercriminal Activities, Behaviors, and Profiles
Introduction
The Threat of Cybercrime: Actions and Actors
Cybercriminal Case Studies
Overview and Method of Analysis
Case 1
Case 2
Case 3
Case 4
Case 5
Discussion and Conclusion
References
Part IV Neural Networks and Machine Learning
8 Computer Vision Technology: Do Deep Neural Networks Model Nonlinear Compositionality in the Brain's Representation of Human–Object Interactions?
Introduction
Materials and Methods
Experimental Data
Direct Classification Using a Deep Neural Network
DNN Representations to Predict Voxel Responses
Results
Comparing MVPA Classification with Direct DNN-Based Classification
DNN Representations to Predict Voxel Responses
Clustering over Voxels
Discussion and Conclusions
References
9 Assessment of Various Deep Reinforcement Learning Techniques in Complex Virtual Search-and-Retrieve Environments Compared to Human Performance
Summary
Introduction
Background
The Food Collector Experiment
Participants
Experiment Design
Procedure
Models
Evaluation Metric for Human and Model Performance
Results
Human Experiment Results
Model Results
Discussion
Conclusion
References
10 Cognate Identification to Augment Lexical Resources for NLP
Introduction
Problem Statement
Model
Character Embeddings
LSTM
Attention Layer
Language and Concept Features
Experiments
Datasets
Evaluation
Baseline Models
Experiment 1: Cross-Language Evaluation
Experiment 2: Cross-Family Pre-training
Experiment 3: Cross Concept Evaluation
Hindi–Marathi Domain Experiment
Analysis
Concept Wise Performance
Transcription Tests
Discussion
Conclusion
References
Part V Human Factors
11 Psychophysiological Monitoring to Improve Human–Computer Collaborative Tasks
Introduction
Multi-level Cognitive Cybernetics and Adaptive Automation
Means of Communicating Human Performance Factors
Technology Requirements
Physiological Indicators of Performance
Pupil Diameter
Heart Rate Variability
Electro-dermal Activity
Electroencephalography
Adaptive Automation Empirical Research
Conclusions
References
12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why it matters
What is a Sense of Agency?
Mechanism Underlying Sense of Agency
Importance of Sense of Agency
Measures of Sense of Agency
Applications of Sense of Agency
Health and Sense of Agency
Sense of Agency and Aging
Sense of Agency and Schizophrenia
Sense of Agency and Autism
Sense of Agency and Immersive Therapy
Sense of Agency and Meditation
Sense of Agency and Prosthetics
Sense of Agency and Technology
Sense of Agency and HCI
Sense of Agency and Gaming Industry and VR
Sense of Agency and Automation
Sense of Agency and Education
Conclusion
References
Part VI Engineering Design
13 Do Analogies and Analogical Distance Influence Ideation Outcomes in Engineering Design?
Introduction
Literature Review
Patents in Design
Effects of Patent-Based Analogies on Ideation
Effects of Analogical Distance on Ideation
Metrics for Assessing the Performance of Ideation
Research Gaps and Questions
Research Methodology
Experiment
Data Analysis
Results
Effects of Stimulation on Quantity, Novelty, and Quality of Concepts
Effects of Analogical Distance on Novelty and Quality of Concepts
Discussion
Conclusions
References
Part VII Critical Considerations
14 Humiliation and Technology: Dilemmas and Challenges for State, Civil Society, and Industry
Humiliation in Human–Technology Relations
What is Humiliation?
Self, Social Interactions, and Culture
Definition and Consequences
Social Identity and Group-Based Humiliation
A Victim-Centered, Agentic, and Multi-level Approach
Fraping and Online Identity
Technology and Persistence of Caste in India
Abu Ghraib and Militant Islamic Terrorism: The Role of Technology
Interventions by State, Civil Society, and Industry
Acknowledge Victimhood
Ensure Proper Control and Deletion of Digital Records of Humiliation
Develop Humiliation Dynamic Informed Platform Governance
Conclusion
References
15 Technology: Does It Help or Harm Intelligence—or Both?
Introduction
Foolishness
Obedience to Authority
Groupthink
Self-Imposed Bias and Limitations in Information-Seeking and Information-Interpretation
Conclusion
References
Recommend Papers

Applied Cognitive Science and Technology: Implications of Interactions Between Human Cognition and Technology [1st ed. 2023]
 9819939658, 9789819939657

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Sumitava Mukherjee Varun Dutt Narayanan Srinivasan   Editors

Applied Cognitive Science and Technology Implications of Interactions Between Human Cognition and Technology

Applied Cognitive Science and Technology

Sumitava Mukherjee · Varun Dutt · Narayanan Srinivasan Editors

Applied Cognitive Science and Technology Implications of Interactions Between Human Cognition and Technology

Editors Sumitava Mukherjee Department of Humanities and Social Sciences Indian Institute of Technology Delhi New Delhi, Delhi, India

Varun Dutt School of Computing and Electrical Engineering Indian Institute of Technology Mandi Kamand, Himachal Pradesh, India

Narayanan Srinivasan Department of Cognitive Science Indian Institute of Technology Kanpur Kanpur, Uttar Pradesh, India

ISBN 978-981-99-3965-7 ISBN 978-981-99-3966-4 (eBook) https://doi.org/10.1007/978-981-99-3966-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

To wifey (Payel), father (Bapi), mother and guru (Prof. Narayanan Srinivasan) —Sumitava Mukherjee To my parents, my in-laws, my wife, and my daughter —Varun Dutt To Prof. Janak Pandey —Narayanan Srinivasan

Preface

Cognitive Science was conceptualized in the early 1950s to become an interdisciplinary study of the mind by integrating ideas from multiple disciplines like psychology, artificial intelligence, linguistics, neuroscience, and anthropology. The goal was to go beyond an intuitive understanding of the mind to develop a new discipline that would form theories of mind based on a certain set of core principles like mental representations and computational mechanisms (Pylyshyn, 19801 ). This interdisciplinary approach means going beyond the generic conceptualization of cognitive science as a bunch of participating disciplines while at the same time commiting to being pluralistic in methods that build up a common core (Thagard, 20202 ). Applied cognitive science, therefore, should be concerned with the same foundational blocks but concentrate on new knowledge creation and applications within a specific domain of inquiry. This volume also focuses on technology partly because of its remarkable pervasiveness in human lives and it clearly necessitates charting the interfaces of human cognition with technology. Researchers in cognitive science have not explicitly focused on constructing a “cognitive science of technology” (Stout, 20213 ) although there have been dispersed interests in diverse related areas. For example, studies in cognitive neuropsychology have studied the specific impacts of technologies (like cellphones, virtual reality, and augmented reality) on various aspects of cognition like attention, memory, learning, problem-solving, creativity, etc. Researchers in artificial intelligence and psychology have continued to model cognitive mechanisms in computers to better understand the underlying computational mechanisms and build efficient AI systems. Linguists have made theoretical, experimental, and computational developments using language as a window to cognition against the backdrop of emerging technologies. 1

Pylyshyn, Z. W. (1980). Computation and cognition: Issues in the foundations of cognitive science. Behavioral and Brain Sciences, 3(1), 111–132. 2 Thagard, P. (2020). Cognitive Science. The Stanford Encyclopedia of Philosophy (Winter 2020 Edition), Zalta, E. N. (ed.), Retrieved from URL = https://plato.stanford.edu/archives/win2020/ent ries/cognitive-science/. 3 Stout, D. (2021). The cognitive science of technology. Trends in Cognitive Sciences, 25(11), 964–977. vii

viii

Preface

Engineers and cognitive scientists have been exploring human–technology interactions in evolving areas like human–robot interaction and human–machine collaborations. Anthropologists have traced our long-term use of technology and its impact on people and society. The debates about cognition and technology have also been addressed by sociologists and philosophers. A significant number of companies now use, engage with, and are developing solutions that better integrate human cognition with technology. In such a large, growing landscape of intellectual development across a wide variety of domains, it is quite difficult to assimilate some of the major threads that could bind applied cognitive science with technology. We have attempted to build up knowledge in this area through a multidisciplinary intellectual space that supports both disciplinary as well as cross-disciplinary integration by finding relations between contexts, mechanisms, and functions across technologies. This is facilitated by focusing on specific technologies that can then be examined using multiple methods in line with the core agenda of cognitive science. It can also be advanced by looking at cognitive functions that can be applied to a range of technologies, especially in interaction or design. This volume does both of these and takes a significant step toward filling the long-pending gap in consolidating research on applied cognitive science and technology by bringing together a diversified set of people spanning psychologists, AI researchers, linguists, designers, and engineers. Across the different chapters, the authors have addressed how we cognize technology, the impact of specific facets of technology (especially via human–technology interactions) on cognitive processes, computational modeling of cognition, and also raise some critical perspectives about the role of technology in society. Part I is about artificial intelligence (AI) and artificial agents which continue to impact the way we think and act. Mukherjee et al. discuss how people think and judge AI algorithms. They summarize a wide range of findings related to the public psychology of AI, documenting psychological responses and charting out cognitive factors underlying such responses. Potential solutions that can increase the acceptability of AI in human–AI interactions are also provided. Moving to a specific aspect of such interactions, Hawkins and Casenti focus on the level of autonomy in AI systems. Their work charts out the complex relationship between changes in the level of autonomy in the AI system with its effects on the cognitive workload of the user raising the question of unintended consequences of autonomous AI systems. Vegt and de Kleijn then move to human–non-human agent interactions, including robots many of which are imbibed with AI. They discuss the cognitive and behavioral effects of anthropomorphism as we move toward closer integration with agents in our societies. Part II concerns decision support and assistance support systems. A decision support system (DSS) is built to reduce the constraint of human in the loop for automated systems and is most useful when both the system and the user bring some skills to bear on the decision. Hawkins traces out the psychological factors impacting the adoption of decision support systems, especially related to trust, and further provides a rubric to measure the psychological readiness of a DSS. Muller and Urbas discuss complex problem-solving that requires model-based assistance support systems like the ones used in process plants whose operators have to deal

Preface

ix

with a host of planned and unplanned changes in parameters. For compatibility, the models used by plant engineers and those used by operators have to be in sync, which is not quite the case. The authors propose a concept of abstraction hierarchies that are used by engineers and human operators and argue that model descriptions are a cornerstone of successfully applying engineering models to operator assistance. Part III addresses the rising concern about cybersecurity threats with increased Internet-based integration. Maqbool et al. discuss results from behavioral experiments about how information about an opponent’s actions in cyber-strike situations affects decision-making. Analysts are expected to overly patch computer systems in the real world, regardless of whether the information about opponents is available or not. It appears that hackers do seem to care whether computer systems are susceptible or not when striking networks if interdependent information is available. Bada and Nurse suggest that we need to complement technology solutions to this problem, by better understanding cybercriminal perpetrators themselves—their use of technology, psychological aspects, and profiles. Their chapter explores psychological aspects of cybercriminal activities and behavior through a series of notable case studies. Part IV draws attention to computational modeling of cognition through two interrelated approaches that have taken the world over—artificial neural networks and machine learning. These are human-brain-inspired modeling approaches and have been shown to solve a variety of real-world problems. However, from a cognitive science perspective, one key question is to what extent these models indeed depict cognitive mechanisms such that we can accept them as computational mechanisms of cognition. In this part, three important aspects of cognition are addressed—vision, performance, and language. Jha and Agarwal use deep convolution networks (DNN) to examine if they represent object recognition in light of the recent large-scale implementation of such neural networks in computer vision. They find evidence that DNN representation contains an encoding of compositional information for human–object interactions, thus suggesting that DNNs may indeed be able to model some critical aspects of biological vision. Uttrani et al. evaluate and compare human performance with state-of-the-art deep reinforcement learning algorithms in complex search-and-retrieve tasks to find comparative metrics to later conclude that deep reinforcement learning algorithms may guide human decision-makers in their task performance. Kumar, Vaidya, and Agarwal use recurrent neural network architecture to identify cognates (words across different languages that are known to have a common ancestral origin). Their tests on three different language families confirmed that their neural model showed an improvement in performance and could identify similar word pairs with high accuracy from a pair of closely related languages. These chapters together show models and technologies using neural networks that lead to performance advantages and also closely resemble possible cognitive mechanisms. Part V draws attention to human factors which play a vital role in human–technology interfaces. We use technological devices so routinely that it is becoming increasingly difficult to think of tasks that do not require them. In this state of constant digital–device interaction, it would benefit us to focus research on human performance and experience in collaboration with technological devices.

x

Preface

Cassenti and Hung focus here on adaptive automation, human–AI interaction where AI intervenes with digital aids to help a user who is struggling to perform well. Their focus is on physiological measures (including pupil diameter, heart rate variability, electro-dermal activity, and electro-encephalography), which provide continuous measures and do not depend on subjective judgment or modeling. Kumar takes a zoomed-out stance on the experience of agency and control which guide a large range of our experiences with technological interfaces. After laying out the concept of the sense of agency, the author discusses its potential applications in human–computer interface, automation technology, virtual reality immersion therapy, and other areas. Part VI takes a detour to discuss what goes on in the mind of designers by picking up a conversation about design creativity in engineering. Srinivasan et al. specifically studied the efficacy of using patents as stimuli to support creativity in ideation during the conceptual design of spherical robots as part of an engineering design innovation course. They find empirical evidence that patents boost design creativity and hence can serve as potential boosts for product designers. Part VII offers two critical perspectives on technology. Jogdand raises the point that while there has been increasing attention toward technology-facilitated toxicity such as online harassment, cyberbullying, and various other forms of victimization, the phenomenon of humiliation remains poorly understood. A victim-centered, agentic, and multi-level conceptualization of humiliation is proposed which could be useful to understand human–technology interactions across different societies and cultures. Jogdand suggests technology plays a critical role in widening the scope and impact of humiliation and calls for the crucial role of the state, civil society, and industry in protecting human dignity. The final chapter by Sternberg and Karami invokes an extremely important question that in a way stands as a reflection of our society—has technology helped or harmed human intelligence? They point to an increase in human IQ in the twentieth century possibly in response to increasingly complex environments, in part due to emerging technologies all throughout the century. However, the authors suggest that wisdom seems to be decreasing globally, which is a big concern as we see large-scale social and environmental problems looming on our faces—thus rendering deep questions about the implications of technological use on cognition. Our intention in putting this volume together was to provide the reader with a sense of the landscape in applied cognitive science specifically related to technology while being rooted in the foundational aspects of the discipline—representation and processing of knowledge using diverse methodological angles. This is one of the first such volumes catering to specific questions related to certain technologies and even though they appear as samples from a big set of possibilities for interested researchers to delve deeper, we believe it will motivate the readers to tease out the interrelations of these two overlapping aspects integrally connected to our lives. Our work toward building applied cognitive science of technology, and more focused on specific technologies and their human interactions is just like the tip of an iceberg.

Preface

xi

This is only an initiation to indulge in a broad investigation of cognition as we deepen our relationships and co-existence with technologies in the future. New Delhi, India Kamand, India Kanpur, India

Sumitava Mukherjee Varun Dutt Narayanan Srinivasan

Acknowledgments The success of the chapters rests completely with the authors. We thank each of the authors who have provided all of their support, contributed enthusiastically, and waited patiently through all the delays. We would also like to thank other authors who had responded to our call, but we were not able to include their chapters. Special thanks go to Satvinder Kaur and Ramesh Kumaran of Springer Nature (India) for patiently working with us and providing much-needed flexibility in timelines all through. We are very glad about the warm interactions and continuous support we have received from the whole team of Springer Nature.

Contents

Part I 1

2

3

Toward Behavioral AI: Cognitive Factors Underlying the Public Psychology of Artificial Intelligence . . . . . . . . . . . . . . . . . . . Sumitava Mukherjee, Deeptimayee Senapati, and Isha Mahajan

3

Defining the Relationship Between the Level of Autonomy in a Computer and the Cognitive Workload of Its User . . . . . . . . . . . Thom Hawkins and Daniel N. Cassenti

29

Cognitive Effects of the Anthropomorphization of Artificial Agents in Human–Agent Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . Bas Vegt and Roy de Kleijn

41

Part II 4

5

Artificial Intelligence and Agents

Decision Support and Assistance Systems

Psychological Factors Impacting Adoption of Decision Support Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thom Hawkins Model-Based Operator Assistance: How to Match Engineering Models with Humans’ Cognitive Representations of Their Actions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Romy Müller and Leon Urbas

59

73

Part III Behavioral Cybersecurity 6

7

Behavioral Game Theory in Cyber Security: The Influence of Interdependent Information’s Availability on Cyber-Strike and Patching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zahid Maqbool, V. S. Chandrasekhar Pammi, and Varun Dutt

91

Exploring Cybercriminal Activities, Behaviors, and Profiles . . . . . . . 109 Maria Bada and Jason R. C. Nurse

xiii

xiv

Contents

Part IV Neural Networks and Machine Learning 8

Computer Vision Technology: Do Deep Neural Networks Model Nonlinear Compositionality in the Brain’s Representation of Human–Object Interactions? . . . . . . . . . . . . . . . . . . 123 Aditi Jha and Sumeet Agarwal

9

Assessment of Various Deep Reinforcement Learning Techniques in Complex Virtual Search-and-Retrieve Environments Compared to Human Performance . . . . . . . . . . . . . . . . 139 Shashank Uttrani, Akash K. Rao, Bhavik Kanekar, Ishita Vohra, and Varun Dutt

10 Cognate Identification to Augment Lexical Resources for NLP . . . . . 157 Shantanu Kumar, Ashwini Vaidya, and Sumeet Agarwal Part V

Human Factors

11 Psychophysiological Monitoring to Improve Human–Computer Collaborative Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 177 Daniel N. Cassenti and Chou P. Hung 12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why it matters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Devpriya Kumar Part VI

Engineering Design

13 Do Analogies and Analogical Distance Influence Ideation Outcomes in Engineering Design? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 V. Srinivasan, Binyang Song, Jianxi Luo, Karupppasamy Subburaj, Mohan Rajesh Elara, Lucienne Blessing, and Kristin Wood Part VII

Critical Considerations

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil Society, and Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Yashpal Jogdand 15 Technology: Does It Help or Harm Intelligence—or Both? . . . . . . . . 251 Robert J. Sternberg and Sareh Karami

Editors and Contributors

About the Editors Sumitava Mukherjee is an assistant professor at the Department of Humanities and Social Sciences, Indian Institute of Technology Delhi. Dr. Mukherjee has his academic backgrounds in computer engineering and cognitive science. At IIT Delhi, he was a founding member of the Cognitive Science Programme and served as the Co-ordinator from 2021–2023. He also initiated Decision Lab as a research-cumknowledge partner for decision research in India, the Scarcity lab research group to work on thinking and decision making under scarcity and poverty and started a public dialogue by setting up a website Humans and Technology. He works on foundational and applied aspects of judgment and decision making that generate socially relevant behavioral insights. He also initiated a focused research agenda to generate insights and have dialogues on a human view of technology based on behavioral science. He was awarded the Emerging Psychologist award in 2014 by NAOP India, the Outstanding Researcher award by Indian Institute of Technology Gandhinagar in 2014 and the Young Psychologist award in 2020 by NAOP India for his contributions academically and professionally. Varun Dutt is an associate professor in the School of Computing and Electrical Engineering at Indian Institute of Technology Mandi, India. Dr. Dutt has applied his knowledge and skills in the fields of psychology, public policy and computer science to explore how humans make decisions on social, managerial and environmental issues. Dr. Dutt serves as a senior member of IEEE; as the Chair of Indian Knowledge System and Mental Health Applications (IKSMHA) Centre, IIT Mandi; and, as the Principal Investigator at the Applied Cognitive Science (ACS) Lab, IIT Mandi. He is currently serving as an associate editor of Frontiers in Psychology (Cognitive Science) journal, a review editor of Frontiers in Decision Neuroscience journal and a member of the editorial board of the International Journal on Cyber Situational Awareness.

xv

xvi

Editors and Contributors

Narayanan Srinivasan is a professor and Head at the Department of Cognitive Science, Indian Institute of Technology, Kanpur, India. Prior to this, he was a Professor at the Centre of Behavioural and Cognitive Sciences, University of Allahabad, India. He studies mental processes, especially attention, emotions, consciousness and decision making using multiple methodologies. Dr Srinivasan has edited eleven books and seven special issues. He has published more than 180 papers in journals, books and conference proceedings. Dr. Srinivasan is a fellow of Association for Psychological Science, National Academy of Psychology (India), and Psychonomic Society. He is an associate editor of Cognitive Processing, Neuroscience of Consciousness, Frontiers in Consciousness Research, and Mindfulness.

Contributors Sumeet Agarwal Department of Electrical Engineering and Yardi School of Artificial Intelligence, IIT Delhi, New Delhi, India Maria Bada School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK Lucienne Blessing Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore, Singapore Daniel N. Cassenti US Army Research Laboratory, Adelphi, MD, USA Roy de Kleijn Cognitive Psychology Unit, Leiden University, Leiden, Netherlands Varun Dutt Applied Cognitive Science Lab, Indian Institute of Technology Mandi, Kamand, Himachal Pradesh, India Mohan Rajesh Elara Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore, Singapore Thom Hawkins US Army, PM Mission Command, Aberdeen, MD, United States Chou P. Hung US Army Research Laboratory, Aberdeen, MD, USA Aditi Jha Department of Electrical and Computer Engineering, Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA Yashpal Jogdand Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, New Delhi, India Bhavik Kanekar Applied Cognitive Science Lab, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Sareh Karami Department of Counseling, Higher Education, Educational Psychology, and Foundation, mailstop:9727, Mississippi State, USA

Editors and Contributors

xvii

Devpriya Kumar Department of Cognitive Science, Indian Institute of Technology, Kanpur, India Shantanu Kumar IIT Delhi, New Delhi, India Jianxi Luo Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore, Singapore Isha Mahajan Symbiosis School for Liberal Arts, Symbiosis International University, Pune, India Zahid Maqbool Government Degree College Dooru, Anantnag, India Sumitava Mukherjee Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India Romy Müller Faculty of Psychology, Chair of Engineering Psychology and Applied Cognitive Research, Technische Universität Dresden, Dresden, Germany Jason R. C. Nurse School of Computing, University of Kent, Canterbury, Kent, UK V. S. Chandrasekhar Pammi Centre of Behavioral and Cognitive Sciences, University of Allahabad, Allahabad, India Akash K. Rao Applied Cognitive Science Lab, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Deeptimayee Senapati Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India Binyang Song Department of Mechanical Engineering, Massachusetts Institute of Technology, Massachusetts, Cambridge, USA V. Srinivasan Department of Design, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India Robert J. Sternberg Department of Psychology, College of Human Development, MVR Hall, Cornell University, Ithaca, NY, USA Karupppasamy Subburaj Department of Mechanical and Production Engineering - Design and Manufacturing, Aarhus University, Aarhus, Denmark Leon Urbas School of Engineering, Chair of Process Control Systems & Process Systems Engineering Group, Technische Universität Dresden, Dresden, Germany Shashank Uttrani Applied Cognitive Science Lab, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Ashwini Vaidya IIT Delhi, New Delhi, India Bas Vegt Cognitive Psychology Unit, Leiden University, Leiden, Netherlands

xviii

Editors and Contributors

Ishita Vohra International Institute of Information Technology Hyderabad, Hyderabad, Telangana, India Kristin Wood College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, USA

Part I

Artificial Intelligence and Agents

Chapter 1

Toward Behavioral AI: Cognitive Factors Underlying the Public Psychology of Artificial Intelligence Sumitava Mukherjee , Deeptimayee Senapati, and Isha Mahajan

Abstract Companies and governments worldwide are incorporating Artificial Intelligence (AI) algorithms in various sectors to improve products and customer services along with attempts to improvise governance for citizens. We need a research agenda that builds up a behavioral science of AI targeted to gather insights about how the public thinks, judges, decides, and acts toward AI. The first part of the chapter focuses on the underpinnings of a psychological bias to prefer humans over algorithms even though algorithms work better than or as good as human experts on many tasks. Using the backdrop of national AI strategies, one can see that preferences for AI algorithms over humans are both domain-specific and task-sensitive. We cannot talk of a general preference for AI but need to take a more nuanced behavioral science approach. The next part discusses cognitive factors that underlie preferences toward AI algorithms like transparency, perception of algorithmic error, perceived understanding, risk levels, and uniqueness neglect. The later part highlights solutions that can attenuate aversion toward algorithms by communicating transparency of algorithmic processing, giving humans some control over algorithms, providing social proof, and making algorithms seem more humanlike. Our review of empirical research interspersed with practical suggestions on how we can make emerging technologies more acceptable and accountable would be of interest to link cognitive science with technology among academics, practitioners, and policymakers. Keywords Artificial intelligence · Human–Algorithm interaction · Psychology of algorithms · Algorithm aversion · Algorithmic appreciation · Psychology of technology · Human–Technology interaction · Judgment

S. Mukherjee (B) · D. Senapati Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India e-mail: [email protected]; [email protected] I. Mahajan Symbiosis School for Liberal Arts, Symbiosis International University, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_1

3

4

S. Mukherjee et al.

A Revived Human View of Artificial Intelligence Artificial intelligence (AI) is rapidly changing the world around us and is being applied in numerous domains from agriculture to health care. In its essence, artificial intelligence implements tasks that require “intelligence” in non-human systems via computational algorithms. Many of these are typically characteristic of humanlike thinking, problem-solving, decision-making, learning, etc. With the recent advancements in AI technology, such algorithms affect almost all the spheres of our lives and are leading to large-scale development further fueled by national-level government initiatives. Let us take a look at some example uses in recent years across countries that have launched national strategies for AI to get a sense of how important the issue is and how closely it can impact millions of lives every day.

India Niti Aayog, India’s government think tank, has been working toward incorporating AI in various sectors of governance and public policies for over a billion people. It has suggested a hashtag #AIforAll while launching a national strategy on artificial intelligence in 2018–2019 with the aim to realize AI’s potential to transform India in various sectors via research and development (Niti Aaayog, 2018). The underlying idea is that intelligent algorithms will be more effective in adapting according to the unique needs of a country/consumer market and benefit our economic or social needs than is possible at the moment with many humans in the system. It is also poised for the economic development of the country through further automation, augmentation, and innovation where AI will be used to increase productivity and effectiveness or reduce biases in various sectors. Overall, it looks forward to working for the betterment of people from all strata of society for promoting social growth and inclusiveness.1 There has been steady growth in the Indian market share and expenditure in AI.

New Zealand Algorithms are being used in private organizations and government systems in the fields of incarceration, insurance provisions, and security. For the purpose of transparency and overcoming the black-boxed nature of the algorithm used in general, the Stats NZ report was created in 2018, to assess the algorithms being used in the governing systems (Stats NZ, 2018). The country uses a RoC*RoI (Risk of reconviction × Risk of re-imprisonment) model used for imprisonment by the New 1

One should also be careful and critical about various biases noted in algorithms and related societal concerns related to implementing AI as we discuss later.

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

5

Zealand Department of Justice and Corrections. The model was created in the mid1990s (Department of Corrections, 2021). The scores from this model are used in “pre-sentencing reports” and reports for the Parole board. The algorithm is mainly used for decisions to determine an inmate’s eligibility for rehabilitation and allot them to sentence management categories. Algorithms are also used in a variety of public and private sectors in New Zealand, such as transportation, employment, refunds, visa provisions, security, health and medical care, policy development, and risk assessment (stats NZ). New Zealand government created the Algorithm Charter, which outlines a set of guidelines for the use of algorithms in the public sector. These guidelines emphasize accountable measurement of algorithmic impact, explainability, and transparency in order to establish public trust in AI.

United Kingdom The UK government proposed the AI Sector deal in 2018. The program aims to prepare the nation economically and socially for the upcoming AI revolution. The main focus of the strategy is on research, education, employment, infrastructure, and a productive business environment (Department for Business, Energy and Industrial Strategy, 2018). The government has invested around $28 million in agriculture to increase productivity, limit pollution, and cut waste. For example, they are planning to use AI to make sure that the plants that have a higher risk of diseases are treated with crop rotation products (Shah, 2019). The transportation department has announced a $ 2.5 million budget for a project where AI will be used to assess the roads’ condition for safety (Department of Transport, 2021).

United States The US national strategy for artificial intelligence was released in 2019 with an aim to make the USA one of the key leaders of AI globally. The program strives to incorporate AI effectively in key domains like defense, health care, economy, justice and security, and science and engineering. The country is also incorporating AI in agriculture to address challenges (The White House, 2019). The National Institute of Food and Agriculture has taken up various projects in this regard, for example, building robots that can take up the labor-intensive jobs of harvesting, using computer vision to distinguish between crops and weeds for robot weeders, building sensors that can help in the early detection of diseases in plants, etc. The Department of Health and Human Services (HHS) contributes toward the incorporation of AI in health care by taking up various related projects. These range from speeding drug development by AI-based trial simulation and dose optimization to detecting diseases using medical images processed through novel deep learning architectures. For example, NIH researchers have developed an algorithm named DeepSeeNet, which can assist

6

S. Mukherjee et al.

in age-related macular degeneration—an eye disease that can lead to vision loss. The Department of Homeland Security with the collaboration of NASA’s Jet propulsion lab has developed the Assistant for Understanding Data through Reasoning, Extraction, and Synthesis (AUDREY) which helps emergency forces like firefighters, police, paramedics, etc. in taking efficient decisions through better awareness of the surrounding (DHS, 2021).

China There has been a conscious effort to develop and use AI in China for some time. However, prior to 2016, AI was presented merely as one technology among many others, which could be useful in achieving a range of policy goals. In July 2017, the State Council (which is the chief administrative body within China) launched the “New Generation Artificial Intelligence Development Plan” (AIDP) to act as a unified document that outlines China’s AI policy objectives. The overarching aim of the policy is to make China the world center of AI innovation by 2030 and make AI the main driving force for China’s industrial upgrading and economic transformation. China aims to have achieved a “major breakthrough” (as stated in the document) in basic AI theory and to be world-leading in some applications (“some technologies and applications achieve a world-leading level”). Alongside establishing material goals, the AIDP outlines a specific desire for China to become a world leader in defining ethical norms and standards for AI. In March 2019, China’s Ministry of Science and Technology established the National New Generation Artificial Intelligence Governance Expert Committee (Xinhua, 2019). In June 2019, this body released eight principles for the governance of AI. The principles emphasized that, above all else, AI development should begin by enhancing the common well-being of humanity. Respect for human rights, privacy, and fairness were also underscored within the principles.

Worldwide Growth and An Evolving Need for Behavioral AI The OECD report shows that across the world, about 60 countries—most developed nations and developing countries—have launched national AI initiatives that have been driven by academia, private organizations, civil societies, etc. (OECD.ai, 2022). However, barring a few existing behavioral insight teams present in the countries, in most of these initiatives, a much-needed human view is absent or minimally present. It necessitates a dedicated research agenda to build the behavioral science of artificial intelligence which we call Behavioral AI. As part of the agenda, firstly, we need to look beyond engineering and discuss how humans are going to perceive, act, judge, and decide about AI technologies. Secondly, we have to study the behavior of the AI systems themselves as intelligent agents—not

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

7

only as a program but rather as a non-human agents interacting with humans.2 This is important because the implementation of large-scale initiatives could face a major challenge stemming from the lack of acceptability and trust in AI systems. Owing to the inaccessibility of information regarding algorithms, the general population may tend to be wary of using algorithms to make decisions, and could prefer humans to make those decisions instead (algorithm aversion; Dietvorst et al., 2014). This is because of people’s desire for a perfect forecast (Einhorn, 1986; Highhouse, 2008), the understanding that algorithms are not able to learn (Dawes, 1979), their inability to include qualitative data (Grove & Meehl, 1996), the sense that algorithms may not be able to cater to an individual (Grove & Meehl, 1996) along with an insistence of “subjective” capabilities in human experts albeit the fact that a lot of human judgments are not of good quality, and the assumption that human forecasters will be able to improve their judgment through experience (Highhouse, 2008). While many of these aspects are no longer valid in cutting-edge algorithms or have at least bridged the gaps as we discuss below, human thinking about algorithms might not have been quite updated in sync with technological progress. This chapter highlights the core agenda of Behavioral AI—cognitive factors underlying our judgments and decisions about AI.

Why Should We Trust Algorithms? The first empirical evidence regarding the competency and efficiency of algorithms can be traced back to the 1950s when Paul Meehl (1954), in his book “Clinical Versus Statistical Predictions: A Theoretical Analysis and Review of the Evidence”, studied the results of around 20 studies across multiple domains to understand the discrepancy between forecasts performed by algorithms and their human counterparts. In almost all cases, algorithms tended to outperform humans. Starting from the 1950s till now, a tremendous amount of research has shown that even the simplest algorithm (linear models) tends to perform better than human experts (Dawes & Corrigan, 1974). A meta-analysis conducted by Grove et al. (2000) investigated the performance of algorithms and human experts in human health and behavior. The results proved that algorithms outperform humans by around 10%. A recent study also suggests that a convolutional neural network trained on skin cancer images outperformed dermatologists in detecting skin cancer (Haenssle et al., 2018). Humans and algorithms can predict crimes and people who commit repeated offenses equally well, but humans fail when they are not provided with immediate feedback on the accuracy of their predictions. Providing feedback to judges in real-time is difficult; hence algorithms here outperform human experts in predicting recidivism without feedback by using additional information better than human experts. Multiple judicial systems have or are seriously considering using algorithms for decision support to judges.

2

This idea related to the agenda has been discussed in Machine behavior (see Rahwan et al., 2019).

8

S. Mukherjee et al.

In the domain of intuitive strategy, Go is an ancient game that is around 2500 years old and is very complex, which requires excellent strategy and intuition—typically ascribed to humans; but recently AI has outperformed human experts in this game (Metz, 2016). Deepmind, in collaboration with Oxford University, has trained a neural network that can decipher ancient Greek texts faster and more accurately than human experts (Cao, 2019). Natural language generation algorithms are garnering much interest—these algorithms are now able to generate poetries that are at par with poetry composed by esteemed poets. When incentivized participants were asked to distinguish between human-generated poetry pieces and algorithm-generated ones, participants failed to differentiate between the poetry pieces (Kobis & Mossink, 2021). An algorithm named CheXneXt developed by Stanford researchers can simultaneously screen chest X-rays to find out around fourteen potential diseases, and it is at par with the expert radiologists. It was outperformed by the radiologists for three diseases, and in one disease, the algorithm outperformed the radiologists (Rajpurkar et al., 2018). Successful screening of breast cancers in the early stage can ensure efficient treatments for them. For this purpose, mammographic screening procedures are used, but this sometimes leads to false positives and false negatives. An AI system can outperform radiologists in mammographic screening (Killock, 2020; Mckinney et al., 2020). In the areas of work and human resources, algorithms can not only predict individual success like employee performance (Highhouse, 2008) better than humans, but they can also forecast group success better than their human counterparts. Evidence comes from an experiment conducted by Sevaski et al. (2021), where a machine learning algorithm was able to predict group success via visual features better than humans. The task was to analyze patterns of group success from pictures where a group of humans was playing a physical adventure game called “Escape the Room;” the algorithms forecasted the group success with 71.6% accuracy, whereas the humans predicted it with a far lower accuracy level at 58.3%. Thus, algorithms tend to make forecasts that are more accurate than those made by human experts. This has been supported by a tremendous amount of evidence in various fields such as medical diagnosis and treatment (Adams & Chan, 1986; Dawes et al., 1989; Grove et al., 2000), academic performance (Dawes, 1971, 1979), prisoners’ likelihood of recidivism (Thompson, 1952; Wormith & Goldstone, 1984), and so on. Many of these advancements have been in the field of visual processing, facilitated by newer machine learning models, but algorithms are increasingly able to perform better in non-visual domains and recommendation systems (Yeomans et al., 2019).3 Still, there is a public aversion toward their usage. The cognitive factors that underlie behavior toward AI algorithms (including a bias against algorithms) are discussed next. We then lead to another set of cognitive factors which can possibly mitigate public aversion toward AI and boost acceptability. This is of potential interest to practitioners, cognitive scientists, AI researchers, and policymakers. 3

Note that performance is not necessarily tied to the cognitive process such that for the exact same task, an algorithm can depict intelligence, but not necessarily in the manner in which humans depict intelligence (that is there is no assumption of strong equivalence of processes; Pylyshyn, 1980). This means the information processing model could be very different between humans and algorithms.

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

9

Algorithm Aversion and Appreciation People have an inflated sense of confidence in the forecast and information given by human experts (Armstrong, 1980). People’s preference toward human experts over algorithms, however, is not only a result of their high confidence in human experts but also of their weariness toward algorithms. Some prefer human input over algorithmic input, despite knowing that the algorithms outperform humans, which has been called algorithm aversion (Dietvorst et al., 2014). Participants also tend to give more weight to the advice that comes from forecasts made by human experts, rather than statistical or computational algorithms (Önkal et al., 2009) though opposite results have also been shown (Logg et al., 2019). Studies have further shown that despite the clear proof of algorithm competence, people still tend to prefer and give more weight to human inputs and forecasts rather than algorithm-based forecasts (Diab et al., 2011; Eastwood et al., 2012) and judge errors made by algorithms more harshly (Dietvorst et al., 2014). Along with this, research has also shown that people tend to judge professionals who depend on algorithms more harshly than those who are aided by their human colleagues (Shaffer et al., 2013). In a study conducted by Yeomans et al. (2019), when it came to subjective tasks such as recommendations for jokes, people usually tend to prefer human recommenders over algorithm-based recommenders, but rate the recommendations given by algorithms higher when they are not informed about the source of the recommendations. In this study, the recommender system outperformed the close friends and spouses in recommending jokes. Even when people liked the jokes recommended by the recommender system, they still preferred humans to recommend jokes. In health care, the aversion toward algorithms is even more. When chest X-rays were given to radiologists to evaluate advice quality and make diagnoses, the pieces of advice that were forwarded to the radiologists were generated by human experts but half of them were framed as coming from an AI system. Ratings given by radiologists for the advice quality were lower when the advice was framed as coming from an AI source (Gaube et al., 2021). Algorithm aversion is prevalent in the domain of education as well. With rapid digitization, the education sector is also being transformed as teachers are expected to use expert models or AI-assisted tutoring systems. However, it was found out that both on-thejob teachers and pre-training teachers are aversive to using expert models and instead were willing to seek help from a human expert (school counselor) for deciding on tasks such as which of the two students need extra hours of tutorials (Kaufman, 2021). In contrast to the above stream of evidence suggesting that people are aversive toward the use of algorithms, another emerging debate in the field argues that people are not always aversive toward algorithms: instead, they appreciate algorithms and put more weight on algorithmic advice compared to human advice. The phenomenon has been called algorithm appreciation. In a study conducted by Logg et al. (2019), participants were given a task to estimate a person’s weight from a photograph. The task was a domain-neutral perceptual estimation task. To enhance the estimation quality, they were provided with advice either coming from a human or an algorithm.

10

S. Mukherjee et al.

After receiving the advice, the participants could revise their estimates. The weight on the advice was more when it was from algorithms compared to humans. This has been replicated in a few recent studies. For example, in a financial investment scenario, participants preferred decisions suggested by AI-based advisory systems more than advice from human experts when making (hypothetical) strategic decisions for R&D investment (Keding & Meissner, 2021). When prior information related to performance or accuracy is known and when they can understand the task, people tend to choose algorithms over humans (Alexander et al., 2018). However, there is a constant debate between these two different threads of findings that need a more nuanced investigation that depends on the task at hand.

Domain Specificity and Task Sensitivity Most governments and, relatedly, investments by public and private organizations need to think strategically according to the nuances of different sectors. Common among them are education, health care, mobility, finance, and governance. National AI strategies have decided to implement AI solutions in all of these domains. But the public perception of AI algorithms can be different in each sector because people tend to trust algorithms in certain domains and humans in others (Alexander et al., 2018; Dijkstra, 1999; Lee, 2018; Yeomans et al., 2019). For example, Lee (2018) included four kinds of managerial decisions to see how people judge the fairness of these management decisions when taken by humans versus algorithms. It was found that in domains that required mechanical skills, both humans and algorithms were judged equally for fairness, but in the domains that required human skills, the algorithm’s decisions were judged as unfair. Hence in domains that are subjective— where tasks have been done typically by humans such as book recommendation, movie recommendation, joke recommendation (Yeomans et al., 2019), and medical decisions (Promberger & Baron, 2006), algorithm aversion tends to be rather high. In some other domains where algorithms have been used historically like weather forecasting and expert systems solving logic problems (Dijkstra, 1999), people prefer algorithms. This could be a result of the norm (See Norm Theory; Kahneman & Miller, 1986): whatever was inherently a human domain, people find it normative that it continues to be performed by humans. The trust and reliance on algorithms also depend on the nature of the task at hand. If the task is subjective, where it is open to interpretation and there is a need for (human) intuition or personal opinion, then people do not want to rely on algorithms to perform the particular task. On the contrary, when the task is objective and involves facts that are quantifiable and measurable, people prefer that it be done by algorithms. This might be due to people’s belief that algorithms are not good at performing tasks that require subjective input. But the aversion to using algorithms for subjective tasks can be overcome by increasing the task’s perceived objectivity (Castelo et al., 2019).

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

11

Measuring Preferences for AI Versus Humans in Light of National AI Strategies As part of a larger study, we gathered public preferences (n = 329; females = 157, age range = 15–50 years; all Indian nationals) on different tasks from the domains highlighted in Niti Aayog’s National AI strategy by the Indian government which included Smart Mobility, Health Care, Smart Cities, Agriculture, Education, Finance, Entertainment, Lifestyle, and Governance. The tasks we used were those which were either being performed by algorithms already or research activities were ongoing to build algorithms to perform these tasks in the future. The study was described as one aimed to understand people’s attitudes toward artificial intelligence used in day-today lives. For each task (e.g., “Who would you prefer to drive a truck?”), two options were presented: “AI software” and “Human expert.” Participants were asked to make a choice between the two options using a radio button. The list of domains and tasks that were presented to participants are shown in Table 1.1. The proportions of people preferring the task to be performed by an AI software compared to a human expert were calculated for all the seventeen tasks spanning seven domains. Participants showed the highest preference for AI software in tasks like performing surveillance, detecting intrusions, and identifying tax fraud. In contrast, they preferred human experts for providing relationship advice (95.1%), assessing the funniness of the jokes, and driving. A Chi-square test revealed the differences between preferences for either AI or humans were significant for all the tasks except for irrigation as shown in Fig. 1.1. At the level of domains, people preferred human experts for all the tasks in some domains like health care and education, whereas, in some other domains like finance and smart cities, people preferred AI software. Hence, people’s preference for humans versus AI varies depending on the domain (Fig. 1.2). Within the same domain, variability was noted for tasks. For example, in the domain of mobility, a human expert was preferred for driving a truck (Human Expert: 84.2%, AI Software: 15.8%), whereas AI software was preferred to decide the price for the rides (Human Expert: 38.3%, AI Software: 61.7%). The results depict that we cannot talk about algorithm aversion or appreciation in general. People have a selective preference for AI not only based on the domain but also sensitive to preferences toward tasks within that domain. Castelo et al. (2019) make a similar point and they also argue that algorithmic aversion is more for tasks that seem subjective in nature.4 This shows that preferences for AI algorithms are selective based on the tasks at hand which necessitates going deeper into than arguing whether we are aversive or appreciative toward algorithms. 4

In a follow-up study, we indeed find that objectivity of the task and sense of understanding how an AI might be doing the task predict preferences. The more objective the task is and the more our sense of understanding about the AI, the higher is the preference for AI to do the task, compared to human experts. Senapati, D., Mahajan, I., & Mukherjee, S. (2020 Dec). Selective aversion for Artificial Intelligence: Domain specificity and perceived understanding influences algorithm aversion. Annual Conference of the Society for Judgment and Decision Making, USA.

12

S. Mukherjee et al.

Table 1.1 List of the tasks used and the domains they belong to that were used in the study. There are in total 17 tasks, and participants took approximately 10 min to complete the study Tasks used in the study

Domains

1.

Who would you prefer to resolve your queries regarding purchase, refunds, and service issues?

Government mediated services

2.

Who would you prefer to identify tax fraud cases?

Governance

3.

Who would you prefer to decide on the parole term Governance or jail term for offenders?

4.

Who would you prefer for relationship advice?

Entertainment and lifestyle

5.

Who would you prefer to assess the funniness of jokes?

Entertainment and lifestyle

6.

Who would you prefer to provide recommendations for stock investment?

Finance

7.

Who would you prefer to manage your financial portfolio?

Finance

8.

Who would you prefer to predict student performance?

Education

9.

Who would you prefer to make decisions about admissions to a course/program?

Education

10.

Who would you prefer to manage irrigation for crops?

Agriculture

11.

Who would you prefer to provide suggestions regarding suitable crops for a particular type of land?

Agriculture

12.

Who would you prefer to detect intrusion into a locality or building?

Smart cities

13.

Who would you prefer to perform surveillance operations for security?

Smart cities

14.

Who would you prefer to provide medical treatment suggestions?

Healthcare

15.

Who would you prefer to perform a disease diagnosis?

Healthcare

16.

Who would you prefer to decide the ride prices?

Smart mobility

17.

Who would you prefer to drive a truck?

Smart mobility

These public judgments and perceptions regarding algorithms can be influenced due to various factors beyond task-related differences like properties/nature of the algorithm and individual psychological processes that are discussed below (additionally, also see Mahmud et al., 2022).

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

13

Fig. 1.1 The proportion of preferences for humans versus AI

Fig. 1.2 Aggregate preference for humans versus AI in each domain

Cognitive Factors Related to Preferences Toward AI Algorithms This section discusses some salient factors that might be majorly affecting public psychology toward using AI algorithms.

14

S. Mukherjee et al.

Transparency and Explanation The extent to which people understand the algorithm that is being used also plays an essential part in its acceptability. Yeomans et al. (2019) found that people found it harder to understand the recommendations that came from a computer system compared to ones that came from other people, and this lack of understanding led to an increased distrust in the recommender system. When given an explanation regarding the working of the system, the participants were more willing to use the system’s recommendations. When algorithms are being applied in domains like health care and defense (where stakes are quite high), one should understand how an AI system has taken a particular decision. But currently, most of the algorithms work like black boxes, making it more difficult for people to trust them. Hence, algorithms that work on simple machine learning programs that are easily explainable and algorithms with a certain amount of transparency on how decisions are taken make them easier to understand (Schemelzer, 2019). Further, Cramer et al. (2008) found out that explaining to a user why a particular item is recommended results in higher acceptance of the recommendation. In this study, they tested how transparency affects the acceptance of recommender systems. Here, they used three kinds of recommenders that suggest art to the users. The first version of the algorithm was not transparent, the second version was transparent on why a particular recommendation was made to the specific user, and the third one gave a rating on how certain it was that the user would like the recommendation. The results suggest that the transparent version was more understandable to the users, and they perceived it to be more competent and accepted the recommendations. Therefore, the explainability of the artificial intelligence system and the ease with which an individual can understand the system’s functioning have a direct impact on the individual’s trust in the algorithm. Along with fostering trust, it is also necessary that the AI systems are developed with a sense of responsibility and accountability. In this respect, the black-box nature of machine learning models and algorithmic systems often poses a challenge in maintaining accountability. Therefore, it is important that we make efforts to create Artificial Intelligence systems that are comprehensible and easy to understand. One way to deal with this issue is to move toward explainable AI. Explainable AI or popularly called XAI aims to create AI that explains the rationale behind its outputs and explains the inner processing of its mechanism. Once AI can explain why a particular decision was made, it is believed that people would show more understanding and trust in its decisions. Doran et al. (2017) suggest that AI systems have three levels of explainability. The first one is an opaque system; here there are no insights available regarding the algorithmic mechanisms. Users can see only the input and what happens inside is unknown. The algorithm at the end would throw out the outputs without any rationale (the notion of a “black box”). The second kind of system is known as the interpretable system. In these systems, the user can mathematically analyze the inner mechanisms. Interpretable XAI is transparent. However, to understand and interpret the system some level of technical knowledge is required. AI

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

15

system builders can understand the algorithmic mechanisms of interpretable systems. The third kind of system is a comprehensible system, which elicits symbols along with outputs to enable users to make sense of how the outcome is caused. Hence, the explanation would depend on the users’ implicit knowledge and how they want to draw conclusions. Both interpretable and comprehensive systems lack reasoning or rationale for the outcomes to the end-users who lack technical knowledge. Along with this, the explanation should be generated by the AI itself, otherwise, it would give way for human biases of the human analysts to creep in. XAI is thus an attempt to increase the explainability of a system and reduce the biases that could form as a result of an incomprehensible system. Furthermore, XAI attempts to increase the trust individuals have in a system and therefore help to improve the user experience of the system at large. Thus, XAI is an important step toward fostering trust in systems and improving human–AI interactions.

Algorithmic Error Algorithms, like humans, can make errors. People are more aversive to algorithms when they see them make errors, compared to when a human forecaster commits them (Dietvorst et al., 2014). This stems from people’s belief that humans can learn from their errors while algorithms cannot. In another study, participants read a scenario where the algorithm makes an error in either applicant screening for a job or analyzing mortgage application; the repercussions are borne by a person called John. When participants were asked to judge the algorithm, they showed less acceptability toward it and also viewed it more negatively. They attributed lesser blame, accountability, and forgiveness when the algorithm made an error. This might be because of the “nonhumanness” attributed to algorithms. Participants also showed stronger behavioral intentions toward algorithms such as improving, training, or stopping the use of the erring algorithm. It seems people have higher performance standards for algorithms and that might lead to stronger behaviors against erring algorithms, compared to humans having the same level of error (Madhavan & Wiegmann, 2007; Renier et al., 2021). When people were asked to estimate the accuracy of algorithmic decision-making for different domains, people consistently underestimated the level of errors. Their tolerance for actual errors was even less (Rebitschek et al., 2021). People having diminishing sensitivity to error experience a smaller subjective penalty for each additional unit of error brought by the forecast. So, the sensitivity to error is more intense for different magnitudes of near-perfect forecasts but is less intense for differences between forecasts that have more error. In simple words, people with diminishing sensitivity to error would be more concerned with small errors compared to relatively large errors because the intensity of feelings would decrease with each marginal unit of error. The reason here is that people are more sensitive to errors near the status quo than to errors remote from the status quo. Algorithmic advice generally produces less variance than human advice because, given the same input, the same output would

16

S. Mukherjee et al.

be produced by algorithms but that is not the case with humans. Human advice, with greater variability, has an upside of producing a near-perfect forecast and a downside of a worse forecast; but people with diminishing sensitivity to error overweigh the upside and prefer human advice compared to the best average advice provided by algorithms. Hence, when algorithms provide certain advice in uncertain domains, the chances are high that even if they are a little off, people would reject them (Dietvorst & Bharti, 2020).

Perceived Understanding One of the potential reasons for algorithm aversion can be one’s perceived sense of understanding of how the algorithm works in that task domain. Perceived understanding is not really actual understanding, rather, it is when an individual thinks s/he knows or understands something but in reality may not understand the same. Sloman and Fernbach (2018) suggest that people sometimes have a knowledge illusion where they think they know something, and they tend to rate their knowledge much higher. In reality, they don’t know much about the particular phenomenon. We suggest that people think they understand how a human being decides or performs a task, but they don’t think they understand how an algorithm might perform similar tasks. Thus, they have a higher perceived sense of understanding about humans and a lower perceived sense of understanding about algorithms that are irrespective of real understanding either about humans or about algorithms. In one of our studies,5 we indeed find that sense of understanding of how an algorithm might be performing a task predicts the acceptability of AI for the same. Hence, if people think they know how humans work, they are able to trust them more (and in corollary; since they believe they don’t know how algorithms work they are not able to trust and accept them). Cadario et al. (2021) tested whether people really understand how humans work or not, and they found out people’s objective knowledge regarding humans and algorithms is the same: they don’t really understand how either humans or algorithms work. Interestingly, in support of the proposition stated before, they found asymmetry in people’s perceived understanding of algorithms and humans but didn’t find any asymmetry in real understanding.

Accuracy and Risk Levels One important aspect of aversion toward algorithms is their perceived accuracy. Dietvorst et al. (2014) confirmed that if the algorithm is perceived to be more accurate, then the chances of it being used in the forecasting tasks increase. Participants who 5

Senapati, D., Mahajan, I., & Mukherjee, S. (2020 Dec). Annual Conference of the Society for Judgment and Decision Making, USA.

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

17

used algorithms that made almost zero errors tended to rely on them more and were more likely to choose them for their forecasting tasks. Unfortunately, given the unpredictability of the future, a “perfect forecast” is quite impossible. The fact that algorithms can tend to be imperfect severely impacts the acceptance of the algorithms as we discussed above (see section “Algorithmic Error”), and people are less likely to use algorithmic models if they do not trust their accuracy. Following this study, various researchers have theorized that people tend to be reluctant to use algorithms because of their “intolerance towards the inevitable error” (Dietvorst et al., 2016). It stems from the conception that although algorithms are capable of making errors in predictions, humans are capable of perfection through experience and learning from past mistakes (Einhorn, 1986). Along with this, people tend to be more intolerant toward mistakes that algorithms make than humans’ errors, despite the fact that the errors made by the algorithm were smaller compared to the ones made by humans. Another study conducted by Madhavan and Wiegmann (2007) had led to the same conclusion—humans tend to judge algorithms harshly when they make a mistake, regardless of how severe that mistake was, and their trust in the system decreases significantly. Thus, to increase the trust regarding algorithms in humans, it is necessary that we publicly communicate that while algorithms make errors, often the errors are quantitatively and qualitatively less intense than errors made by their human counterparts.

Sense of Uniqueness-Neglect and Responsibility People often ascribe algorithms an inability to include qualitative data or consider “subjectivity” and hence think that algorithms may not be able to cater to an individual (Grove & Meehl, 1996). In the healthcare domain, people prefer human care providers over automated ones even when automated care providers are more accurate. They attribute this asymmetric preference to the statement that people generally believe that their problems are unique—an algorithm cannot understand them and give customized suggestions, and hence humans are better at recognizing these unique needs and providing care accordingly (Longoni et al., 2019). This phenomenon has been called uniqueness neglect. Seen from another viewpoint, we inherently look for responsibility from those who solve our unique problems. When there is a clear ability to hold another person responsible for the outcome of the decision, then it potentially makes people turn toward human experts as opposed to (impersonal) algorithms. Along with this, the preference for the expert is further solidified through the shift of responsibility from the individual to the expert. Relying on an expert makes people feel less responsible for their decisions and their outcome (Promberger & Baron, 2006). However, there is no shift of responsibility when the person uses forecasts made by the statistical models unless there is some intervention by the human. This rejection of help, hence, does not work with algorithms, as any form of refusal would not affect automation (Harvey & Fischer, 1997). Therefore, people are less likely to depend on algorithms because the

18

S. Mukherjee et al.

responsibility of the outcome then lies solely on the individual or cannot be pinned to a human (or even a company/organization) in many cases of algorithmic decisionmaking. Part of this could change once we have clear rules of blame attribution and shared responsibility for decisions made by algorithms. There are many other factors like the complexity of the algorithm, how the description of the algorithms is framed, and the familiarity of the individual with algorithms that can reduce trust or perceived responsibility attribution to an algorithm.

Summary of Factors Acceptance of AI or rolling out AI strategies are far from a uniform approach. Domains and tasks at hand along with attributes of the task would be guiding implementation and acceptability. If the task itself is difficult, then people might rely on algorithms (Bogert et al., 2021). The other side to this discussion are factors related to the algorithm. Cognitive factors relevant to the algorithm—both real (e.g., risk level, accuracy) and perceived (e.g., understanding, uniqueness) would be central. Some aspects like algorithmic errors could also have a perceived component (perceived error or increased sensitivity to errors). Across the previous sections, we assimilate two kinds of factors: task-related and algorithm-related (see Fig. 1.3).

Cognitive Solutions to Increase Acceptability of AI and Enhance Algorithmic Appreciation To increase the uptake of technology, it is necessary not only to understand these factors but also to find practical ways in which we can build upon them. This section suggests how psychological science can be used to increase algorithmic appreciation.

Communicate Transparency of Algorithmic Processing In general, algorithms are considered to be a “black box” (Felzmann et al., 2019), because consumers are not typically aware of the data used by them or understand the way they function. People’s trust in algorithms tends to increase when people are explained how it operates (Yeomans et al., 2019). Of course, explaining how the algorithms are working and arriving at a particular decision can be fruitful only when a critical audience is available (Kemper & Kolkman, 2019) or we re-think communicating how algorithms work (see, for example, Google’s page on how search recommendations work written for the general public6 ). 6

https://www.google.com/intl/en_in/search/howsearchworks/.

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

19

Fig. 1.3 Summary of factors influencing public judgments and behavior toward AI algorithms

This transparency can be in two forms—prospective and retrospective. Prospective transparency includes providing users with information about the data that the algorithm uses and how it reaches the decisions (Felzmann et al., 2019; Zerilli et al., 2018). Retrospective transparency refers to the explanation of how the algorithm reached the conclusion that it did, i.e., after the decision was made (Paal & Pauly, 2018 as cited in Felzmann et al., 2019). This kind of transparency would mainly help to study cases where algorithms have made mistakes—to identify what caused the mistake, and what rectification is necessary. The need for transparency in algorithms has now become a pressing issue. It is necessary that people have autonomy over their own data, and are able to understand how it is being used. Transparency was included in the General Data Protection Regulation (GDPR) of the European Data Protection Law (Felzmann et al., 2019). This addition requires both prospective as well as retrospective transparency from the developers of the algorithms. Explainable AI discussed earlier is an industry-led step in this direction. Cognitively, increased transparency regarding the working of the algorithm will enable the users to get a

20

S. Mukherjee et al.

better understanding of the way the algorithm works, which will help to increase their trust in the algorithm (see section “Transparency and Explanation”). The need for transparency is undeniable, but providing full transparency can make algorithms more vulnerable to cyber-attacks and hacking (see the chapter on behavioral cybersecurity in this volume). There can be three distinct types of communication about algorithmic transparency: one is to let people know the final decision, the second is to inform them of the process of the decision-making, and the third is to provide a basis for the decision made. Research suggests minimal transparency on the AI program and an explanation for why a particular decision has been taken is sufficient to generate the perceived legitimacy of the decision in the public and the acceptance will increase (Licht & Licht, 2019). Thus, communicating how an algorithm works—both clearly and seemingly transparently (along with kinds of communication that resonate with the knowledge/understanding of the users)—is one key direction for future work to boost trust in AI.

Give Control to Modify Algorithmic Outcomes Increasing (real or perceived) control among people (especially, experts) can increase acceptability and intention to use algorithms. In a study conducted by Dietvorst et al. (2016), participants were asked to forecast student test scores according to the data provided. They were also provided an algorithm-based forecast and were given the option to use the algorithm. They were told that the algorithm is imperfect; however, one condition of the study let participants modify the forecast of the algorithm. The study showed that participants were more likely to use the algorithm if they were able to modify its forecasts. However, the extent to which they could change it didn’t significantly affect their choice. The freedom to modify the forecast also led to higher satisfaction with the forecast. The participants also had a better opinion of the algorithm and were more likely to use the algorithm to make future predictions. Therefore, giving people control over the algorithm, irrespective of how much control that is, helps reduce aversion. Among consumers, a sense of perceived control could also potentially boost usage. Kumar (in this volume) points to the sense of agency and control being key modifiers of our experiences with technology interfaces.

Provide Social Proof Research suggests that cultural and social factors also affect how we trust and interact with technology. Social proof has been related to trust previously; hence providing information about high social proof can result in higher acceptance of the algorithms (Alexander et al., 2018). This can be used in promotional campaigns that feed into government initiatives. For example, if the government wants to boost financial roboadvisors or algorithmic trading as part of pension fund management, one important

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

21

piece of information would be projecting a large-enough number of active takers already. Of course, this should be done responsibly and with due consideration of citizen welfare.

Increase Understanding People often dislike algorithms when they think they don’t understand how they work. The black-box nature of algorithms can instill a feeling of not understanding its mechanism, and hence not trusting it. One potential mitigation strategy can be to make people understand how they work by providing information about their input, internal processing, and output. In one study, Yeomans et al. (2019) manipulated the amount of information regarding a joke recommendation algorithm to the participants. They either received sparse or rich information, and it was found that participants who received rich information regarding algorithms were less aversive toward algorithms recommending jokes to them. Similar results were replicated in a study in the healthcare domain, where they tried simple interventions to see whether giving objective knowledge about how both humans and algorithms work would increase algorithm acceptability or not and found out that providing objective information/knowledge in a graphical format improves the acceptability of algorithms. We can improve acceptability by providing knowledge (Cadario et al., 2021) about how the algorithms works, but providing too much information may backfire. People often overestimate their own forecasting abilities which leads to poor performance and non-reliance on superior algorithmic advice. Making people experience that they have suboptimal capabilities and algorithms are better than them could help. In a share forecasting task, participants were asked to forecast whether the share prices will go up or down for forty shares. A forecasting computer was available to them which had 70% accuracy. At the beginning of the experiment, in round one people’s computer usage was 27.97%. As they progressed and received feedback about their performance, their reliance on the computer increased to 52.45% toward the end (Filiz et al., 2021). Another strategy is to expose people to algorithms that are capable of learning. People would rely more on the algorithms that learn over time and provide better outputs gradually (Berger et al., 2021).

Frame Algorithms to Be More Humanlike Increasing anthropomorphic attributes to make algorithms more humanlike could increase their emotional appeal. Previous studies suggest that source bias can affect the interaction between machines and humans. People become less willing and attribute less responsibility to robots and computers when they are more machinelike, but this changes when they are more humanlike (Lee, 2018). Experts in particular domains trust their own decisions more than those of the algorithms, and algorithm

22

S. Mukherjee et al.

aversion is more pronounced in experts compared to lay people, because they are less willing to consider others’ advice (Logg et al., 2019). It might be possible to highlight human expert involvement in the algorithmic design or algorithmic processing to increase acceptability. This already happens in the industry: knowledge engineering from experts to devise expert systems, crowdsourcing exert inputs to design/train a computational model, or having an expert human in the loop could all work toward bringing human touches to algorithms. Additionally, framing a bio-inspired or neural network-based system to highlight its brain-like properties can be another marketing decision that can give traction. In addition, certain humanlike attributions like being a bit uncertain or having increased variance in algorithmic output (Dietvorst & Bharti, 2020) can work in favor of algorithms. Being slow in responding is another design attribute that can give the illusion of humanlike labor and increase acceptability. The added advantage of the labor illusion is increased transparency of operation (Buell & Norton, 2011). Further, if we can introduce a feature in an algorithm that can make it look like the solution has been customized, people are more willing to accept it (Longoni et al., 2019) and will possibly not feel that uniqueness has been neglected. Finally, people are more aversive to algorithms when the task’s nature is subjective, but the perception of the nature of the task can be modified by reframing the algorithm or the task itself. Increasing the perceived objectivity of the task can also help in reducing algorithm aversion (Castello et al., 2019) and so can be by increasing the humanlike qualities in the algorithm.

Conclusion The increased use of artificial intelligence has brought about rapid development and promises a revolutionary transformation in the way society is developing and will develop. However, this will only be useful if the technologies are accepted by the population-at-large, whose lives are impacted by AI regardless of their knowledge of the systems. People all over the world and especially in India are slowly getting exposed to AI algorithms. It is imperative that it will take time for people to accept new developments. Along with this, it is also necessary to check if the algorithms put to use are fair and unbiased. Left unchecked, algorithms tend to perpetuate existing biases and discriminatory practices prevalent in society today. To solve the issues of both algorithm aversion and algorithmic biases, it is necessary to make sure that people understand the way algorithms are used. Transparency regarding the data and the logic of the algorithm that is used to make decisions along with clear accountability for the decisions implemented by the AI would significantly improve trust. This would also be an efficient solution for many problems where AI can make a real impact through autonomous, semi-autonomous, or advisory/recommendation roles. In the end, it is necessary to understand that regardless of the biases and issues prevalent in algorithmic decision-making, the use of AI could significantly improve conditions, reduce human biases, or provide analytics in various aspects of our lives. For the

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

23

same, it is essential that we work toward mitigating the aversion against algorithms and also make algorithms more efficient and unbiased. The process of increasing acceptability toward algorithms is a gradual one, with proper checks, including legal liabilities to save citizen interests. A large part of the solution will be driven by cognitive factors underlying the psychology of artificial intelligence and algorithms.

References Adams, I., & Chan, M. (1986). Computer aided diagnosis of acute abdominal pain: A multicentre study. British Medical Journal, 293(6550), 800–804. https://doi.org/10.1136/bmj.293.6550.800 Alake, R. (2020). Algorithm bias in artificial intelligence needs to be discussed (and addressed). Towards Data Science. Retrieved from https://towardsdatascience.com/algorithm-bias-in-artifi cial-intelligence-needs-to-be-discussed-and-addressed-8d369d675a70. Alexander, V., Blinder, C., & Zak, P. (2018). Why trust an algorithm? Computers in Human Behavior, 89, 279–288. https://doi.org/10.1016/j.chb.2018.07.026 Armstrong, J. S. (1980). The seer-sucker theory: The value of experts in forecasting. Technology Review, 83, 18–24. https://repository.upenn.edu/marketing_papers/3. Awad, E., Levine, S., Kleiman-Weiner, M., Dsouza, S., Tenenbaum, J. B., Shariff, A., Rahwan, I., et al. (2020). Drivers are blamed more than their automated cars when both make mistakes. Nature: Human Behaviour, 4(2), 134–143. https://doi.org/10.1038/s41562-019-0762-8. Berger, B., Adam, M., Rühr, A., & Benlian, A. (2021). Watch me improve: Algorithm aversion and demonstrating the ability to learn. Business & Information Systems Engineering, 63(1), 55–68. https://doi.org/10.1007/s12599-020-00678-5 Bogert, E., Schecter, A., & Watson, R. T. (2021). Humans rely more on algorithms than social influence as a task becomes more difficult. Scientific Reports, 11(1), 1–9. https://doi.org/10. 1038/s41598-021-87480-9 Buell, R. W., & Norton, M. I. (2011). The labor illusion: How operational transparency increases perceived value. Management Science, 57(9), 1564–1579. https://doi.org/10.1287/mnsc.1110. 1376 Burton, J. W., Stein, M. K., & Jensen, T. B. (2020). A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making, 33(2), 220–239. https:// doi.org/10.1002/bdm.2155 Cadario, R., Longoni, C., & Morewedge, C. K. (2021). Understanding, explaining, and utilizing medical artificial intelligence. Nature: Human Behaviour 5(12), 1636–1642. https://doi.org/10. 1038/s41562-021-01146-0. Cao, S. (2019). Google’s DeepMind AI beats humans again: This time by deciphering ancient Greek text. Observer. Retrieved from https://observer.com/2019/10/google-deepmind-ai-machine-lea rning-beat-human-ancient-greek-text-prediction. Castelo, N., Maarten, W. B., & Lehmann, D. R. (2019). Task-dependent algorithm aversion. Journal of Marketing Research, 56(5), 809–825. https://doi.org/10.1177/0022243719851788 Cramer, H., Evers, V., Ramlal, S., et al. (2008). The effects of transparency on trust in and acceptance of a content-based art recommender. User Model and User-Adapted Interaction, 18(5), 455–496. https://doi.org/10.1007/s11257-008-9051-3 Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of human decision making. American Psychologist, 26(2), 180–188. https://doi.org/10.1037/h0030868 Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34(7), 571–582. https://psycnet.apa.org/doi/10.1037/0003-066X.34.7.571. Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81(2), 95. https://doi.org/10.1037/h0037613

24

S. Mukherjee et al.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/science.2648573 Department for Business, Energy & Industrial Strategy. (2018). The AI sector deal: Policy paper. Department of Corrections. (2021). Risk of reconviction. https://www.corrections.govt.nz/resour ces/research/risk-of-reconviction. Department of Homeland Security (DHS). (2021). Assistant for understanding data through reasoning, extraction and synthesis (AUDREY) fact sheet, video and AUDREY hastings experiment after action report. DHS: Science and Technology. Department of Transport. (2021). DfT to embrace artificial intelligence technology in plans for local roads health-check. Department of Transport: News. Diab, D. L., Pui, S. Y., Yankelevich, M., & Highhouse, S. (2011). Lay perceptions of selection decision aids in U.S. and non-U.S. samples. International Journal of Selection and Assessment, 19(2), 209–216. https://doi.org/10.1111/j.1468-2389.2011.00548.x. Dietvorst, B. J., & Bharti, S. (2020). People reject algorithms in uncertain decision domains because they have diminishing sensitivity to forecasting error. Psychological Science, 31(10), 1302– 1314. https://doi.org/10.1177/0956797620948841 Dietvorst, B., Simmons, J., & Massey, C. (2014). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology, 144(1), 114–126. https:/ /doi.org/10.1037/xge0000033 Dietvorst, B. J., Simmons, J. P., & Massey, C. (2016). Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 64(3), 1155–1170. https://doi.org/10.1287/mnsc.2016.2643 Dijkstra, J. J. (1999). User agreement with incorrect expert system advice. Behaviour and Information Technology, 18(6), 399–411. https://doi.org/10.1080/014492999118832 Doran, D., Schulz, S., & Besold, T. R. (2017). What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794. Eastwood, J., Snook, B., & Luther, K. (2012). What people want from their professionals: Attitudes toward decision-making strategies. Journal of Behavioral Decision Making, 25(5), 458–468. https://doi.org/10.1002/bdm.741 Eckel, C. C., & Grossman, P. J. (1996). Altruism in anonymous dictator games. Games and Economic Behavior, 16(2), 181–191. https://doi.org/10.1006/game.1996.0081. Einhorn, H. J. (1986). Accepting error to make less error. Journal of Personality Assessment, 50(3), 387–395. https://doi.org/10.1207/s15327752jpa5003_8 Felzmann, H., Villaronga, E. F., Lutz, C., & Tamò-Larrieux, A. (2019). Transparency you can trust: Transparency requirements for artificial intelligence between legal norms and contextual concerns. Big Data & Society, 6(1). https://doi.org/10.1177/2053951719860542. Filiz, I., Judek, J. R., Lorenz, M., & Spiwoks, M. (2021). Reducing algorithm aversion through experience. Journal of Behavioral and Experimental Finance, 31 100524. https://doi.org/10. 1016/j.jbef.2021.100524 Gaube, S., Suresh, H., Raue, M., Merritt, A., Berkowitz, S. J., Lermer, E., Ghassemi, M., et al. (2021). Do as AI say: Susceptibility in deployment of clinical decision-aids. Npj Digital Medicine, 4(1), 1–8. https://doi.org/10.1038/s41746-021-00385-9 Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2(2), 293–323. https://doi.org/10.1037/10768971.2.2.293 Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19. https://doi.org/10.1037/ 1040-3590.12.1.19. Haenssle, H. A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A., Kalloo, A., Hassen, A., Thomas, L., Enk, A., Uhlmann, L., Reader study level-I and level-II Groups, Alt, C., Arenbergerova, M., Bakos, R., Baltzer, A., Bertlich, I., Blum, A., Bokor-Billmann, T., Bowling, J., Zalaudek, I., et al. (2018). Man against machine: Diagnostic performance of a deep learning

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

25

convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology: Official Journal of the European Society for Medical Oncology, 29(8), 1836–1842. https://doi.org/10.1093/annonc/mdy166 Harvey, N., & Fischer, I. (1997). Taking advice: Accepting help, improving judgment, and sharing responsibility. Organizational Behavior and Human Decision Processes, 70(2), 117–133. https:/ /doi.org/10.1006/obhd.1997.2697 Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1(3), 333–342. https://doi.org/10.1111/j.1754-9434.2008.00058.x Kahneman, D., & Miller, D. T. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93(2), 136–153. https://doi.org/10.1037/0033-295X.93.2.136 Kaufman, E. (2021). Algorithm appreciation or aversion?: Comparing in-service and pre-service teachers’ acceptance of computerized expert models. Computers and Education: Artificial Intelligence. 2.https://doi.org/10.1016/j.caeai.2021.100028. Keding, C., & Meissner, P. (2021). Managerial overreliance on AI-augmented decision-making processes: How the use of AI-based advisory systems shapes choice behavior in R&D investment decisions. Technological Forecasting and Social Change. 171.https://doi.org/10.1016/j.techfore. 2021.120970. Kemper, J., & Kolkman, D. (2019). Transparent to whom?: No algorithmic accountability without a critical audience. Information, Communication & Society, 22(14), 2081–2096. https://doi.org/ 10.1080/1369118X.2018.1477967 Killock, D. (2020). AI outperforms radiologists in mammographic screening. Nature Reviews Clinical Oncology, 17(3), 134. https://doi.org/10.1038/s41571-020-0329-7 Köbis, N., & Mossink, L. D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior, 114.https://doi.org/10.1016/j.chb.2020.106553. Lee, M. K. (2018). Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management. Big Data & Society, 5(1) https://doi.org/10.1177/205 3951718756684. Licht, J., & Licht, K. (2019). Artificial intelligence, transparency, and public decision-making. AI and Society, 35(4), 917–926. Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151, 90–103. https://psycnet.apa.org/doi/10.1016/j.obhdp.2018.12.005. Longoni, C., Bonezzi, A., & Morewedge, C. (2019). Resistance to medical artificial intelligence. Journal of Consumer Research, 46(4), 629–950. https://doi.org/10.1093/jcr/ucz013 Madhavan, P., & Wiegmann, D. A. (2007). Effects of information source, pedigree, and reliability on operator interaction with decision support systems. Human Factors, 49(5), 773–785. https:/ /doi.org/10.1518/001872007x230154 Mahmud, H., Islam, A. N., Ahmed, S. I., & Smolander, K. (2022). What influences algorithmic decision-making?: A systematic literature review on algorithm aversion. Technological Forecasting and Social Change, 175, 121390. https://doi.org/10.1016/j.techfore.2021.121390 McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., & Ashrafian, H. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89–94. https://doi.org/10.1038/s41586-019-1799-6 Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and review of the literature. University of Minnesota Press. Metz, C. (2016). In a huge breakthrough, Google’s AI beats a top player at the game of Go. WIRED. Retrieved from https://www.wired.com/2016/01/in-a-huge-breakthrough-googles-aibeats-a-top-player-at-the-game-of-go/. Niti Aayog. (2018). National strategy for artificial intelligence. Discussion paper. OECD.AI. (2021). Database of national AI policies. https://oecd.ai.

26

S. Mukherjee et al.

Önkal, D., Goodwin, P., Thomson, M., Gönül, S., & Pollock, A. (2009). The relative influence of advice from human experts and statistical methods on forecast adjustments. Journal of Behavioral Decision Making, 22(4), 390–409. https://doi.org/10.1002/bdm.637 Oosterbeek, H., Sloof, R., & Van De Kuilen, G. (2004). Cultural differences in ultimatum game experiments: Evidence from a meta-analysis. Experimental Economics, 7(2), 171–188. https:// doi.org/10.1023/B:EXEC.0000026978.14316.74 Promberger, M., & Baron, J. (2006). Do patients trust computers? Journal of Behavioral Decision Making, 19(5), 455–468. https://doi.org/10.1002/bdm.542 Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J. F., Breazeal, C., Wellman, M., et al. (2019). Machine behaviour. Nature, 568(7753), 477–486. https://doi.org/10.1038/s41586019-1138-y Rajpurkar, P., Irvin, J., Ball, R. L., Zhu, K., Yang, B., & Mehta, H. (2018). Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Medicine, 15(11). https://doi.org/10.1371/journal.pmed.1002686. Rebitschek, F. G., Gigerenzer, G., & Wagner, G. G. (2021). People underestimate the errors by algorithms for credit scoring and recidivism but tolerate even fewer errors. Scientific Reports, 11(1), 20171. https://doi.org/10.1038/s41598-021-99802-y Renier, L. A., Mast, M. S., & Bekbergenova, A. (2021). To err is human, not algorithmic: Robust reactions to erring algorithms. Computers in Human Behavior, 124(106879). https://doi.org/10. 1016/j.chb.2021.106879. Saveski, M., Awad, E., Rahwan, I., & Cebrian, M. (2021). Algorithmic and human prediction of success in human collaboration from visual features. Scientific Reports, 11(1), 1–13. https://doi. org/10.1038/s41598-021-81145-3 Schemelzer, R. (2019). Understanding explainable AI. Forbes. Retrieved from https://www.forbes. com/sites/cognitiveworld/2019/07/23/understanding-explainable-ai/#298298d47c9e. Shaffer, V. A., Probst, C. A., Merkle, E. C., Arkes, H. R., & Medow, M. A. (2013). Why do patients derogate physicians who use a computer-based diagnostic support system? Medical Decision Making, 33(1), 108–118. https://doi.org/10.1177/0272989X1245350 Shah, S. (2019). UK government invests $28m in AI, IoT and high-tech farming projects. Forbes. Sloman, S., & Fernbach, P. (2018). The knowledge illusion. Penguin Random House. Stats NZ. (2018). The stats NZ annual report, 2018. The White House. (2019). AI research and development: Progress report. Thompson, R. E. (1952). A validation of Glueck prediction scale for proneness to delinquency. Journal of Criminal Law, Criminology, and Police Science, 43(4), 451–470. https://doi.org/10. 2307/1139334 Wormith, J., & Goldstone, C. (1984). The clinical and statistical prediction of recidivism. Criminal Justice and Behavior, 11(1), 3–34. https://doi.org/10.1177/0093854884011001001. Xinhua. (2019). AI association to draft ethics guidelines. https://www.xinhuanet.com/english/201901/09/c_137731216.html. Yeomans, M., Shah, A., Mullainathan, S., & Kleinberg, J. (2019). Making sense of recommendations. Journal of Behavioral Decision Making, 32(4), 403–414. https://doi.org/10.1002/bdm. 2118

1 Toward Behavioral AI: Cognitive Factors Underlying the Public …

27

Zerilli, J., Knott, A., Maclaurin, J., et al. (2018). Transparency in algorithmic and human decisionmaking: Is there a double standard? Philosophy & Technology, 32(4), 661–683. https://doi.org/ 10.1007/s13347-018-0330-6 Zhang, M. (2015). Google photos tags two African-Americans as gorillas through facial recognition software. Forbes. Retrieved from https://www.forbes.com/sites/mzhang/2015/07/01/googlephotos-tags-two-african-americans-as-gorillas-through-facial-recognition-software/#86a723 9713d8.

Sumitava Mukherjee is a faculty member at the Department of Humanities and Social Sciences at the Indian Institute of Technology Delhi. He leads the Decision lab research group that advances cognitive and behavioral decision research. His interests are in behavioral science, behavioral economics, and human-technology interactions. Deeptimayee Senapati was a PhD student at the Department of Humanities and Social Sciences at the Indian Institute of Technology Delhi working in the Decision lab research group. She is currently a UX researcher and design enthusiast who fuses human cognition and technology to create seamless experiences. Isha Mahajan was a student at Symbiosis School for Liberal Arts and an intern at the Decision Lab research group. She is currently studying Human Computer Interaction and Design at Indiana University, Bloomington. Her interests lie in Human-AI interaction, educational technologies, and accessibility.

Chapter 2

Defining the Relationship Between the Level of Autonomy in a Computer and the Cognitive Workload of Its User Thom Hawkins and Daniel N. Cassenti

Abstract Artificial intelligence (AI) can offset human intelligence, relieving its users of cognitive burden; however, the trade-off in this relationship between the computer and the user is complicated. The challenge in defining the correlation between an increase in the level of autonomy (LOA) of a computer and a corresponding decrease in the cognitive workload of the user makes it difficult to identify the return on investment for an implemented technology. There is little research to support the assumptions that (1) user workload decreases with greater LOA, (2) greater LOA leads to greater collaborative performance, and (3) factors like trust or automation bias do not vary with LOA. This chapter will discuss the implications of prior research into the relationship between LOA and cognitive workload, including the challenges of accurately and consistently measuring cognitive load using subjective, physiological, and performance-based methods. The chapter will also identify potential experiments and what they might tell us about the relationship between LOA and cognitive workload. Keywords Artificial intelligence (AI) · Automation · Cognitive burden · Cognitive workload · Level of autonomy (LOA) · Trust Summary The drive to implement AI or automation is predicated on the assumption that doing so will improve performance and reduce mental workload. Complicating the relationship between LOA and workload are additional variables, such as the impact of concurrent tasks (Wickens, 2002) and the reliability of the technology implemented (Rovira et al., 2002). While LOA has provided a useful theoretical model for degrees of human–machine teaming, empirical research is essential so T. Hawkins (B) US Army, PM Mission Command, 6590 Surveillance Loop, APG, Aberdeen, MD 21005, United States e-mail: [email protected] D. N. Cassenti US Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783, United States e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_2

29

30

T. Hawkins and D. N. Cassenti

that we can gauge the benefits of those options. Not only will a model inform decision-makers with constrained resources whether increasing automation will improve outcomes, but also whether, at certain levels, automation triggers additional factors, such as trust or usability, that impact workload or performance. The weakness of the relationship between LOA and performance begs the question of whether more AI is always better. For example, AI that makes more autonomous decisions also excludes more human input, which may be needed to derive optimal solutions. While LOA has advanced the discussion of human–AI collaboration, it has been in a largely theoretical way. An empirical foundation will help to inform decisions about whether a higher LOA is worth the effort and cost and provide a framework to prioritize opportunities for implementing AI. The development of an empirically based model relating the degree of automation to improvements in performance and reduction in mental workload will help organizational leaders make decisions about their information technology investments.

Introduction As artificial intelligence (AI) gains traction as a tool for providing insight, executives are forced to face the question of whether a particular investment in automation or AI enhancement is worthwhile from a business perspective. That is, what is the return on investment (ROI), and how should it be measured? Because the benefits of automation are difficult to isolate in operational environments with uncontrolled variables, calculating the cost–benefit of technology investment has usually been tracked at the organizational level (Brynjolfsson & Hitt, 1996; Brynjolfsson & Yang, 1996), rather than at the task level. The drive to implement AI or automation is predicated on the assumption that doing so will improve performance and reduce mental workload. However, there are different ways to measure both performance and mental workload. Level of Automation (LOA) is a way to conceptualize how much AI is doing relative to the user in a human–AI collaborative task. We could use the AI-as-a-teammate framework (see Groom & Nass, 2007) and adopt a more subjective stance for framing our thinking. Cappuccio et al. (2021) claim that this type of anthropomorphizing can have positive implications, but requires a great deal of study and research. Instead of starting with nebulous and subjective factors, we prefer the more objective LOA approach. There are other factors, independent of any assumed relationship between LOA and either mental workload or performance, that could impact the relationship between performance and workload. Studies (e.g., Cassenti et al., 2013) have shown that an assumed linear correlation between independent variables and cognitive workload is not borne out. An advancement from one LOA to another may not always correspond to decreased workload and even if the relationship is consistent, it is unlikely to be strictly linear.

2 Defining the Relationship Between the Level of Autonomy …

31

Cassenti and Kelley (2006) found a similar result when researching the relationship between cognitive workload and performance. Through most of the eight levels of workload (i.e., up to eight of the same tasks performed at once), as the workload increased, performance decreased, all except for the lowest level of workload. They concluded that participants in the study allowed their attention to wander at the lowest workload and thus were more prone to distraction. Although it may be assumed that greater LOA also decreases a user’s mental workload, human performance is not always best with a lower workload (see Lavie & Tsal, 1994). Workload, like LOA, is a complex concept, governed by various factors. Expecting steady changes in any cooccurring variable with an increase in LOA or workload would be a mistake without empirical evidence. Complicating the relationship between LOA and workload are additional variables, such as the impact of concurrent tasks (Wickens, 2002) and the reliability of the technology implemented (Rovira et al., 2002). Automation bias (Cummings, 2004), in which users assume that computerized functions that are designed to process a certain type of problem are infallible in solving that problem, may cause the human user to disengage from the task, allowing the AI to take over unchecked. Conversely, the user may not trust AI and perform all the steps in the task or slow the progress of the AI while seeking to comprehend the logic, increasing redundancy of activity and slowing response time. As useful as LOA is as a conceptual framework, because the relationships between automation and either performance or mental workload are not firmly established, we cannot take for granted that increased levels of automation will benefit human–AI collaborative performance. While LOA has provided a useful theoretical model for degrees of human– machine teaming, empirical research is essential so that we can gauge the benefits of those options. Not only will a model inform decision-makers with constrained resources whether increasing automation will improve outcomes, but also whether, at certain levels, automation triggers additional factors, such as trust or usability, that impact workload or performance. This paper seeks to explore the concepts of LOA and workload; develop an experimental paradigm for studying how LOA changes performance, workload, trust, and stress; and recommend using these empirical results to maximize ROI when developing AI.

Review Levels of Automation Levels of automation (LOAs) are defined by relative distinctions between human and automation control rather than regular increments of increased automation. For example, Sheridan (1992) identifies ten LOAs, from full human control to full computer control. The eight levels between those binary options are all versions where the computer offers a complete set of action alternatives, while few situations

32

T. Hawkins and D. N. Cassenti

would allow even advanced AI to be so comprehensive. Endsley (1987) adopted a more functionally defined framework to describe LOAs. While the first and last remain full-human control and full-computer control, the middle levels are binned as decision support (compared to Sheridan levels two through four), consensual artificial intelligence (Sheridan level five), and monitored AI (Sheridan levels six through nine). While Sheridan’s degrees of automation were useful frameworks for the human supervisory controls at the time they were developed, Endsley’s recategorization is more relevant to today’s AI. Even in a game like chess with a nominally constrained game space and strictly defined movements, there is little value to an exhaustive identification of options (i.e., the trade-off between comprehensiveness and computational expense makes this approach inefficient). The difference in benefit between no automation assistance and a computer offering a comprehensive set of alternatives (i.e., Sheridan’s level 1 versus level 2) is greater than the increment between any other two levels—for example, the difference between a computer executing an option then informing the operator (level 7) and a computer informing the operator if requested (level 8). Meanwhile, the user could be more burdened in a situation where they are only offered a limited time to veto a selected alternative before automatic execution (Sheridan’s level six), because urgency has become a factor than a higher level where involvement is not needed. Endsley and Kaber (1999) similarly place the human-in-the-loop model (where human interaction is required at some point during the action cycle) before the humanon-the-loop model (where a human can see what actions are taking place, but the process executes without their direct involvement). The levels are based on the degree of autonomy, but the degree of autonomy for a system is not the same as the relief of cognitive burden. The differences between one level and the next are qualitative rather than quantitative. While the relationship between LOA and a mental workload measure may change from task to task, the development of a framework to identify that relationship is useful to find the relative value and potential risks of incrementally increasing the LOA.

Measurements of Mental Workload Eggemeier and O’Donnell (1982) binned workload measures into three categories— (1) subjective measures, (2) physiological measures, and (3) performance-based measures. Each measure has its own limitations. Subjective measures are generally gathered post-doc in the form of surveys that ask participants to rate mental workload. However, these measures rely on afterthe-fact assessment that may not be reliable due to forgetting the memory trace or simply not creating a memory trace at all as the participants focus on the task at hand. Subjective measures may also be assessing individual differences more than variations in the workload of a task and are also sensitive to external factors such as concurrent activity (Wickens, 2002).

2 Defining the Relationship Between the Level of Autonomy …

33

Physiological measures, e.g., EEG, heart rate variability (HRV), and pupil dilation, can be useful for the measurement of cognitive workload, but can be difficult to quantify and measure accurately. EEG is particularly difficult to isolate from movement artifacts in the electronic signal (Kilicarslan & Vidal, 2019). HRV has been linked to changes in cognitive functions, but these functions tend to be broad areas like language and memory. Changes in memory suppression abilities have been found to be significantly correlated with HRV, but there have been no effects found on verbal or visual-spatial memory (Forte & Casagrande, 2019). Pupil dilation appears to be the most promising physiological measure for workload (Naicker et al., 2016), but there are still questions as to whether this is a measure of “effort exertion, rather than task demand or difficulty” (van der Wel & Steenbergen, 2018). Other factors could also affect pupil size, such as drug use, lighting, and fear (see the chapter in this book, Cassenti and Hung, in press, for more on physiological measures). Performance-based measures such as score or completion time may either be used to gauge the relative mental workload of a task (assuming that higher workload means lower performance), or may be used to gauge the effectiveness of the automation itself. The benefit of performance-based measures is that they are an objective dependent variable and thus have more reliability than subjective measurements.

Other Factors that May Impact Relationship Beyond the independent variable of LOA and the primary dependent variables of performance and mental workload, other variables may influence the relationship. For example, person-to-person factors such as a predispositioned level of trust in automation (or the subject’s ability to work with automation), stress vectors outside of the experimental design (e.g., what kind of day the subject has had prior to participation), or familiarity with the domain being automated or the system that provides the automation could also affect the dependent variable. Some of these may be detected through physiological measurements such as HRV for stress (Quigley & Feldman Barrett, 1999). Others can, if not controlled, at least be stratified through subjective measures assessed before or after the experiment, such as grouping those with similar domain experience.

Experimental Design and Methodology We advocate an empirical approach to studying LOA. While LOA has advanced the discussion of human–AI collaboration, it has been in a largely theoretical way. An empirical foundation will help to inform decisions about whether a higher LOA is worth the effort and cost and provide a framework to prioritize opportunities for implementing AI.

34

T. Hawkins and D. N. Cassenti

Assessing the costs or benefits of any system requires objective measurement. In the case of AI, the benefits come from performance measures, specifically accuracy and response time. A set of empirical results to show how LOA affects performance would be useful to help guide AI development, yet there is more information that we can derive from these studies. In particular, we could find ways to determine why performance increases or decreases. Mental workload, system usability, and trust in automation are factors that should have an effect on performance whether positive or negative. In this section, we explore the factors of an experimental paradigm to maximize information gain. The need for empirical research on LOA seems clear, but questions remain on how to construct these empirical studies. Generally, a paradigm for the design of an experiment should include certain independent and dependent variables, depending on what conditions and measurements will help test the hypotheses. The general design should inform AI developers about the boundaries and limits of LOA and therefore how to more strategically direct their resources. The first step in building any experimental design is to identify the hypothesis and research questions. In this case, we recommend studying how well and to what extent LOA accounts for differences in human–AI performance. One hypothesis is that as LOA increases, performance steadily increases. Another hypothesis is that the relationship between LOA and performance will depend on the task because the skills required to perform certain tasks could be stronger in humans or AI. For example, tasks that rely more on computational power, one of the strengths of AI relative to human intelligence (see Cummings, 2014), would be easier for AI and a higher LOA would be beneficial. Conversely, if a task relies more on creative solutions, AI would not be of much use and lower AI would be more beneficial. Theoretically, the mental workload of the user decreases with increasing LOA, but empirical evidence is lacking. Our third hypothesis is that LOA and mental workload are negatively correlated with one another. This would be independent of performance, but we can measure mental workload and see if it varies with performance as well. It is worth noting that mental workload and performance do not show consistent patterns between studies (see Hancock & Matthews, 2019), so we cannot depend only on performance, but must also look at a subjective measure of workload. Similarly, we can measure trust and system usability to understand how these factors change with LOA. Although we cannot be sure how trust or automation bias will change with LOA increases (i.e., these likely depend on individual differences), we can be confident in hypothesizing that system usability will increase with LOA as the system performs more of the overall work. Testing these four hypotheses necessitate two independent variables, the LOA of the AI and the tasks performed. Each of these can be varied in a within-subjects design. The design must also include performance variables to test these hypotheses. Accuracy and response time are traditional performance variables. Despite all these possibilities, any systematic study of LOA should begin with the simplest case, so that we can focus on creating an experimental paradigm that will determine the basics of the effect of LOA on human–AI performance. LOA should be parameterized as one independent variable with a number of factor levels

2 Defining the Relationship Between the Level of Autonomy …

35

(i.e., set degrees of automation). Alternatively, one could run the experiment with a combination of different LOAs, but this would complicate the analyses unnecessarily. Given that models of LOA are organized by how much AI is doing relative to the human user, our selection will have an ordinal scale. We advocate for Endsley’s (1987) model, which has five levels, the least amount of all the prominent models. This will greatly reduce the potential for indecipherable interaction in follow-up statistics. Endsley (1987) classified LOA between Level 1 with no computer assistance (Manual Control) and Level 5 with no human intervention (Full Automation). In between is Level 2, Decision Support, with recommendations from the computer that the user can take or leave with no push or coaxing from the system. At Level 3, Consensual AI, the computer will select options, but the user has the chance to change them on a case-by-case basis. At Level 4, Monitored AI, the computer selects all options, but if the user selects to change any of them, the AI is deemed untrusted and converts to a decision support system. Other LOA models are available that have more nuanced levels, however, Endsley’s model is adequate as a first pass with other models chosen if closer examination of adjacent levels appears warranted. Task type is another factor we recommend studying. Tasks have a wide range of required skills that makes it difficult to predict how much automation is required. Given that automation and humans have such differing strengths along multiple dimensions of skills (see Fitts, 1951), one might assume that, when tasks are broken down by which skills are required, tasks which favor automation would result in better performance and preference with higher LOAs. Researchers using our recommended experimental paradigm should not assume that the results of one task will replicate another. From the inception of human-computer collaboration, the desire for autonomy has continuously fluctuated between attraction and resistance. Some users want the computer to do as much as possible to reduce the user’s workload to a minimum. Other users do not trust computers and want to have as much (human) autonomy as possible. It is impossible to satisfy both sides of that preference, however, we assume that most users fall in the middle of the spectrum, rather than to one of the extremes as in any other normal distribution. This addresses the subjective preference predictions. We predict a U-shaped curve wherein the user does not want to tax working memory too much and thus Level 1 AI would demonstrate low user preference. The user similarly would not want to cede all autonomy and therefore Level 5 AI would also demonstrate low user preference. The greatest preference would be for Level 3 in this scenario, the most collaborative of the levels with preference rising from 1 through 2–3 and falling from 3 through 4–5. To fully investigate preference, we believe there are three subjective factors that require testing: mental workload, perception of system usability, and trust in automation. As discussed above, mental workload is not just a subjective phenomenon, but causes physiological and performance changes. We will discuss performance measures below, but as discussed in another chapter of this book (Cassenti & Hung, in press), the best physiological measure of workload (i.e., pupil diameter; see Pfleging et al., 2016) is still not entirely fleshed out and requires background research to be used

36

T. Hawkins and D. N. Cassenti

reliably. As for subjective measures of workload, we recommend the NASA Task Load IndeX (NASA-TLX; Hart & Staveland, 1988) because the measure divides workload into multiple factors (i.e., mental demand, physical demand, temporal demand, perceived performance, effort, and frustration), which aligns with our view that mental workload is complex and multi-faceted. We recommend using only the rating portion of the NASA-TLX to avoid unnecessary complications. There is a second portion where participants do pairwise comparisons of the factors, but past personal experience indicates that participants find this cumbersome and confusing. Furthermore, Galy et al. (2018) found that the global score that is driven by these pairwise comparisons is not as important as the ratings of the six individual factors. Studies of LOA should focus on what makes the LOA level result in more or less workload, which may help redesign of the AI. For system usability, we recommend the System Usability Scale (Brooke, 1996), which is a simple, 10-item questionnaire and is the most widely used scale for system usability (see Lewis, 2006). The scale is divided among 5 items that are positive attributions for usability and 5 items that posit negative attributions and allow the participants to rate agreement with the statements on a Likert scale. Trust in the levels of AI would be rated using the Trust in Automation Scale (Jian et al., 2000). This is a twelve-item scale in its original construction, however, we recommend using the guidance provided in Cassenti (2016) and remove two of the items. His reasoning was in accordance with an item-by-item analysis of the scale by Spain et al. (2008), who found that five of the items were related to a factor of distrust and seven were related to a factor of trust. To balance the two constructs, Cassenti (2016) removed the two items related to trust with the lowest correlation, so the scale would not bias the participant toward trust in the system. As with the System Usability Scale, this scale uses a Likert scale rating of agreement with statements about trust. Subjective ratings are important when it comes to research in human–AI collaboration because users who do not have positive opinions about a system will either not use it or if required to use it, minimize the use, perhaps to the detriment of desired outcomes (e.g., Hawkins, also in this book, in press). However, those outcomes are also important to determine the worth of a system. Therefore, we need to also empirically study performance. As stated above, accuracy and response time are universal measures of performance and should be considered required for any study of LOA and human–AI collaboration. There are situations in which only a final outcome is demonstrative, and the researcher is restricted to single values for each accuracy and response time. Some tasks may also be completed in stages and therefore open the possibility of multiple accuracy and response time measures per trial. For signaldetection-type tasks, there is also the possibility of measuring hit rates, false alarm rates, and sensitivity to targets. Any of these can be measured depending on the task. This section represents the approach to the empirical study of LOA. We hope that this will be informative for human factor specialists and believe that this general approach should encompass a large range of human–AI collaborative tasks. The creation of the experimental paradigm should lead to what we will discuss in the

2 Defining the Relationship Between the Level of Autonomy …

37

next section, possible applications of a well-grounded consideration of LOA in the development of AI.

Discussion The weakness of the relationship between LOA and performance begs the question of whether more AI is always better. For example, AI that makes more autonomous decisions also excludes more human input, which may be needed to derive optimal solutions. Sheridan (1992) warns the reader about these outcomes and advocates for human supervisory control of autonomous systems. While AI may be implemented to counter false heuristics (Kahneman & Tversky, 1980), it has also been known to reflect human biases (Yapo & Weiss, 2018). The ideal situation may be a balance between human and AI contributions to tasks, not only considering the division of labor between human and machine based on relative advantage—time to output, cost per time unit, cognitive burden (human), and processing power (computer) are economically scarce resources—but also considering their combined performance as a system. We are proposing an experiment in the automation of various tasks that use two or more levels of automation, with performance gauged by performance-based measures, as well as post-hoc subjective assessments. The results would demonstrate the relationship between the baseline ordinal scale for LOA and the corresponding ratio scales for the workload measurements to validate the consistency of the measurements. Measurement of productivity improvements at the industry- and firm level has been carried out for investments in automation and information technology (e.g., Pilat, 2005) with arguments made for and against the evidence of technology-linked productivity (Solow, 1987). The adoption of AI represents the next wave of technology investment, but the low level of adoption and maturity of the technology does not yet provide a sufficient basis for industry- or even firm-level measurement (Seamans & Raj, 2018). In addition, technology investments are not always strictly focused on improvements in gross productivity (Kijek et al., 2019). There are also organizations investing in technology, such as the military, that are not focused on productivity in the traditional sense, but rather on capability and readiness (see TRADOC, 2017). Regardless of how an organization ultimately measures its productivity, reducing the cognitive burden on those employed by the industry is in their interest to further those outcomes. The development of an empirically based model relating the degree of automation to improvements in performance and reduction in mental workload will help organizational leaders make decisions about their information technology investments.

38

T. Hawkins and D. N. Cassenti

Conclusion Office 97 (Microsoft, 1996) debuted Microsoft’s digital assistant, “Clippit” (aka “Clippy”). Clippy soon became the mascot of intrusive automation due to its habit of offering assistance when none was needed or desired. Clippy is an example of the wide gulf between full manual control and Sheridan’s (1992) next level of automation, where the computer offers a complete set of action alternatives. There is no intermediate step where the user can request assistance or even the intervention only being triggered once the user has shown signs of struggle. There is an implicit tradeoff between automation and human autonomy, and levels of automation are expressed by a user’s willingness to cede their autonomy to that automation. Assuming a strictly linear relationship between increasing degrees of automation and improved outcomes, Microsoft failed to calibrate the relationship between its users and their erstwhile paperclip agent. Here, we have proposed a methodology to assist with calibration by identifying for a particular task, the relationship between the LOA and the dependent variables of performance and mental workload. That methodology requires empirical research as LOA has not received the rigor of evidence-collection that it deserves to date. Without evidence, all that we discussed in this chapter are mere speculation. Our approach requires two main thrusts of data collection—objective performance measures and subjective preferences. At first glance, it may seem more important to ensure that human–AI collaboration results in a good performance and not be concerned about subjective measures. However, subjective preference is just as important. If a user has negative subjective impressions of AI teammates, then there is little chance that the user will choose to use the AI, instead of relying on means that are more under the control of the user. Not only is that an outcome that we would generally wish to avoid for cost-waste implications, but humans and AI bring together a unique set of skill sets that can theoretically tackle more complex and difficult problems than either alone. In other words, positive subjective preferences for AI will lead to better objective outcomes. We propose using subjective measures of mental workload, trust in automation, and system usability. Using this approach will provide an empirical underpinning for the LOA scales to help organization leaders determine the relative value of investments in automation. Through this future research, we could have new tools available to us to a priori determine the LOA an AI should have to provide the best outcomes for numerous tasks. We could at once create an opportunity for vast cost savings on AI development and ensure better outcomes for human–AI collaboration.

2 Defining the Relationship Between the Level of Autonomy …

39

References Brooke, J. (1996). SUS: A ‘quick and dirty’ usability scale. In I. McClelland I (Ed.), Usability evaluation in industry. Taylor & Francis Ltd. Brynjolfsson, E., & Hitt, L. (1996). Paradox lost? Firm-level evidence on the returns to information systems spending. Management Science, 42(4), 541–558. Brynjolfsson, E., & Yang, S. (1996). Information technology and productivity: A review of the literature. Advances in Computers, 43, 179–214. Cappuccio, M. L., Galliott, J. C., & Sandoval, E. B. (2021). Saving private robot: Risks and advantages of anthropomorphism in agent-soldier teams. International Journal of Social Robotics, 1–14. Cassenti, D. N. (2016). A robotics operator manager role for military application. The Journal of Defense Modeling and Simulation, 13, 227–237. Cassenti, D. N., & Hung, C. P. (in press). Psychophysiological monitoring to improve humancomputer collaborative tasks. In S. Mukherjee, V. Dutt, & N. Srinivasan (Eds.), Applied cognitive science and technology: Implications of interactions between human cognition and technology. Springer Nature. Cassenti, D. N., & Kelley, T. D. (2006). Towards the shape of mental workload. In Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting. Human Factors and Ergonomics Society. Cassenti, D. N., Kelley, T. D., & Carlson, R. A. (2013). Differences in performance with changing mental workload as the basis for an IMPRINT plug-in proposal. In 22nd Annual Conference on Behavior Representation in Modeling and Simulation, Ottawa, Canada. Cummings, M. (2014). Man versus machine or man + machine? IEEE, Intelligent Systems, 29, 62–69. Cummings, M. (2004). Automation bias in intelligent time critical decision support systems. In AIAA 1st Intelligent Systems Technical Conference. Eggemeier, F. T., & O’Donnell, R. D. (1982). A conceptual framework for development of a workload assessment methodology. Wright State University. Endsley, M. R., & Kaber, D. B. (1999). Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3), 462–492. Endsley, M. R. (1987). The application of human factors to the development of expert systems for advanced cockpits. In Proceedings of the Human Factors Society Annual Meeting. SAGE Publications. Fitts, P. M. (1951). Human engineering for an effective air-navigation and traffic-control system. Forte, G., & Casagrande, M. (2019). Heart rate variability and cognitive function: A systematic review. Frontiers in Neuroscience, 13, 710. Galy, E., Paxion, J., & Berthelon, C. (2018). Measuring mental workload with the NASA-TLX needs to examine each dimension rather than relying on the global score: An example with driving. Ergonomics, 61(4), 517–527. Groom, V., & Nass, C. (2007). Can robots be teammates? Benchmarks in human–robot teams. Interaction Studies, 8, 483–500. Hancock, P. A., & Matthews, G. (2019). Workload and performance: Associations, insensitivities, and dissociations. Human Factors, 61, 374–392. Hart, S. G., &Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Advances in Psychology, 52, 139–183. Hawkins, T. (in press). Psychological factors impacting adoption of decision support tools. In S. Mukherjee, V. Dutt, & N. Srinivasan (Eds.), Applied cognitive science and technology: Implications of interactions between human cognition and technology. Springer Nature. Jian, J. Y., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53–71. Kahneman, D., & Tversky, A. (1980). Prospect theory. Econometrica, 12.

40

T. Hawkins and D. N. Cassenti

Kijek, A., Kijek, T., Nowak, A., & Skrzypek, A. (2019). Productivity and its convergence in agriculture in new and old European Union member states. Agricultural Economics, 65(1), 1–9. Kilicarslan, A., & Vidal, J. L. C. (2019). Characterization and real-time removal of motion artifacts from EEG signals. Journal of Neural Engineering, 16, 056027. Lavie, N., & Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in visual attention. Perception & Psychophysics, 56, 183–197. Lewis, J. R. (2006). System usability testing. “Microsoft Office 97 Released to Manufacturing.” Microsoft. Nov 19, 1996. Naicker, P., Anoopkumar-Dukie, S., Grant, G. D., Neumann, D. L., & Kavanagh, J. J. (2016). Central cholinergic pathway involvement in the regulation of pupil diameter, blink rate and cognitive function. Neuroscience, 334, 180–190. Pfleging, B., Fekety, D. K., Schmidt, A., & Kun, A. L. (2016). A model relating pupil diameter to mental workload and lighting conditions. In Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 5776–5788). Pilat, D. (2005). The ICT productivity paradox. OECD Economic Studies, 2004(1), 37–65. Quigley, K. S., & Feldman Barrett, L. (1999). Emotional learning and mechanisms of intentional psychological change. In K. Brandstadter & R. M. Lerner (Eds.), Action and development: Origins and functions of intentional self-development (pp. 435–464). SAGE Publications. Rovira, E., Zinni, M., & Parasuraman, R. (2002). Effects of information and decision automation on multi-task performance. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 327–331). SAGE Publications. Seamans, R., & Raj, M. (2018). AI, labor, productivity and the need for firm-level data (No. w24239). National Bureau of Economic Research. Sheridan, T. B. (1992). Telerobotics, automation, and human supervisory control. MIT Press. Solow, R. M. (1987). We’d better watch out. New York Times Book Review, 36. Spain, R. D., Bustamante, E. A., & Bliss, J. P. (2008). Towards an empirically developed scale for system trust: Take two. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 52(19), 1335–1339. SAGE Publications. TRADOC. (2017). U.S. Army robotic and autonomous systems strategy. Technical Publication. Training and Doctrine Command. van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of effort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25, 2005–2015. Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science, 3(2), 159–177. Yapo, A., & Weiss, J. (2018). Ethical implications of bias in machine learning.

Thom Hawkins Is a project officer (civilian) for artificial intelligence and data strategy with the US Army’s Project Manager Mission Command. He holds a Master of Library and Information Science degree from Drexel University in Philadelphia, PA, and a Bachelor of Arts degree from Washington College in Chestertown, MD. He also writes on artificial intelligence and data topics for Army AL&T magazine. Dr. Daniel Cassenti Earned his Ph.D. from Penn State University in Cognitive Psychology in 2004. After one year as a post-doc at Army Research Laboratory, he became a civilian employee and had been working at ARL ever since. Dan has filled many roles including as BRIMS Conference Chair, Senior Co-Chair of the ARL IRB, and Technical Assistant to the ARL Director. He is currently the Cooperative Agreement Manager for the Army AI Innovation Institute.

Chapter 3

Cognitive Effects of the Anthropomorphization of Artificial Agents in Human–Agent Interactions Bas Vegt and Roy de Kleijn

Abstract As our social interaction with artificial agents is expected to become more frequent, it is necessary to study the cognitive effects evoked or affected by these social interactions. Artificial agents come in all shapes and sizes, from vacuum cleaners to humanoid robots that in some cases can be difficult to distinguish from actual humans. Across this wide range of agents, different morphologies are believed to have different effects on humans in social interactions. Specifically, the extent to which an agent resembles a human has been shown to increase anthropomorphization, the tendency to attribute human characteristics to non-human agents or objects. From an evolutionary perspective, this response is completely reasonable, since for most of our existence as a species, if something looked like a human, it would almost always have behaved like one. However, this is not necessarily the case for artificial agents, whose intelligence can be implemented independently of morphology. In this chapter, we will review the cognitive and behavioral effects of anthropomorphization such as prosocial behavior, empathy, and altruism, as well as changes in subjective experience that can occur when interacting with human-resembling artificial agents. We will discuss results from behavioral experiments, economic games, and psychophysiological evidence. We will first give a review of the current state of the field before discussing some inconsistent findings, and shed light on areas that have been underinvestigated as of yet. Keywords Human–robot interaction · Anthropomorphism · Anthropomorphization · Artificial agents · Robots

B. Vegt · R. de Kleijn (B) Cognitive Psychology Unit, Leiden University, Leiden, Netherlands e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_3

41

42

B. Vegt and R. de Kleijn

Introduction Of the many different types of useful robots, a substantial number of them can work for or alongside humans. Whether performing some service to customers, working together with laborers, or merely sharing a workspace with them, humans and robots will need to socially interact. Considering this, roboticists have been turning to anthropomorphism in their design to better serve these needs, and to ultimately involve robots in day-to-day life more seamlessly. Intuitively, one may think that robots are easier and more pleasant to interact with if they resemble humans, since such interactions must feel more natural. After all, people are generally more accustomed to interacting with a human than with any given complex technology. However, reality paints a slightly more complex picture, as we will discuss in this chapter. Anthropomorphism is the extent to which an object or agent is reminiscent of a human, and robot behavior and external features can be explicitly designed to be more anthropomorphic—in other words, to represent human features more closely. Under the right conditions, human observers can attribute humanlike characteristics to non-human agents. This tendency is known as anthropomorphization, and can be partly induced by anthropomorphic design, although this feature is neither necessary nor sufficient for the phenomenon to occur. Let us first examine the approach of anthropomorphic design, from its theoretical validity to the degree of success it has had so far and is projected to have in the future. We will also consider its effects on human users, such as anthropomorphization, and its effects in turn.

Anthropomorphic Design If we want humans and robots to interact, human users should be taken into account in robot design. Any well-designed tool should not only allow the user to fulfill its intended purpose but it should also do so while minimizing harm to and effort from the user, and robots are no exception. If at all possible, we want them to elicit a positive user experience. When we call a product, program, tool, or robot “easy to use,” we generally mean that it is effective and intuitive (Norman, 2013). In other words, it should be easy to remember or find out the steps required to yield the desired result, and they should be easy to perform. People generally prefer using familiar tools over learning new ones because they already know how the former work, often as a result of much time spent training, whereas learning to use new tools or conventions from scratch takes more time and effort. Even if new tools promise to be more efficient in the long run, people often stick to what they know either because they deem the time investment required to switch to be too high, or because they do not feel tech-savvy enough to do so (Norman, 2013).

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

43

While many older products enjoy the benefit of having defined what many users are now familiar with and thus expect, many newer products take advantage of this by fitting their design to mirror older designs, building on this existing foundation. In other words, rather than making the user familiar with the product, they make the product to be familiar to the user by design. This is one of the two purposes served by robot anthropomorphism: rather than have to learn how to interact with a robot, ideally we would simply interact with one as if it were a human, which is something we all have experience with. The other purpose behind the anthropomorphic design is a more pleasant user experience. There are hurdles in the way of both of these goals, but the pleasantness of interacting with humanlike robots faces a noteworthy one in particular: the uncanny valley.

The Uncanny Valley First hypothesized by Mori et al., (2012, translated from the original in 1970), the uncanny valley describes a nonlinear relationship between an object’s human resemblance and its elicited emotional response. While humans tend to rate an object’s aesthetics as more pleasant the more it resembles a human, it seems that high but imperfect resemblance actually evokes negative affect in most people. The response can range from a mild distaste or faint eeriness to profound discomfort and genuine revulsion, explaining the phenomenon’s prevalence in design philosophies for fictional horror (Tinwell, 2014). While perhaps it makes intuitive sense that we would be wired to like smaller creatures that resemble ourselves (usually in the form of our children), the severity of the uncanny valley is more difficult to explain. Historically, the term uncanny has been conceptualized as familiar yet unknown—not quite so novel as to be mysterious, but all the more unsettling for appearing in a seemingly wrong, unreal context (Freud, 1919; Jentsch, 1906). Indeed, much research initially seemed to support the suggestion that the eerie feeling comes from an inability to classify an object as either human or non-human (e.g., Burleigh et al., 2013; MacDorman et al., 2009; Strait et al., 2017) which fits alongside another concept from horror and mythological monsters, called category jamming: humans, being categorization machines, deeply dislike being unable to classify something they encounter (Carroll, 1990). For example, most people experience no eerie sensation looking at a Roomba or other cleaning robots designed using simple geometric shapes, and are at worst mildly uncomfortable around stationary mannequins. However, put a mannequin’s torso and head on a Roomba, and suddenly shoppers feel uneasy. And indeed, numerous studies attribute the uncanny valley to the same two factors: atypicality and ambiguity (Strait et al., 2017). These studies suggest it is best to keep a robot’s features internally consistent, designing either something which does or does not bear resemblance to humans, rather than producing hybrids. It seems, however, that for now the uncanny valley cannot be simply avoided by following the above guidelines. Both Burleigh and Schoenherr (2014) and

44

B. Vegt and R. de Kleijn

MacDorman and Chattopadhyay (2016), previously among those to identify atypicality and ambiguity as the main factors contributing to the effect, have since published results that contradict their previous conclusions. Meanwhile, results confirming their original findings are still being published, indicating that there is no real consensus on this yet.

Social Robotics Not all machines need to be anthropomorphic in their design. Automatic cars, dishwashers, and smart-home systems, although they may recognize and produce human speech, function perfectly well without a humanlike morphology. However, there are some which benefit immensely from being modeled after a human, because their functionality does not entail mechanical tasks but rather provide a soft service or filling an emotional need. An android serving as a personal assistant, bartender, or home-care giver distinguishes itself from the earlier examples precisely due to its human form factor. So-called social robots rely on establishing a connection we would normally not extend to computers. These robots could in theory navigate spaces and perform much of their operations with any form factor, but they would far less effectively serve their primary function: to put humans at ease or provide company. We visit restaurants, for example, not just for tasty food but also to be treated by friendly and socially apt staff. Similarly, in fields such as education, medical care, and therapeutic interventions, we expect well-trained, compassionate professionals to take good care of us or our children. It is worth noting that some of these roles can also be assumed by robots modeled after pets and other animals rather than humans but nevertheless, these robots all employ the same power: empathy.

Empathy and Attribution of Mind Empathy is an emotional affect in which a person identifies with another to a sufficient degree to mentally take their perspective, and to experience an emotional state similar to that of another, albeit in a milder form. It effectively prevents interactions from going south by reflecting part of the negative experience back on the offender (Radzvilavicius et al., 2019). It is simply difficult to do something terrible to someone while looking them in the eye. Empathy is also heavily involved in compassion, although the terms are not synonymous. Briefly, compassion includes a desire to help alleviate the negative situation the victim is in, whereas empathy merely entails feeling the victim’s plight. Thus, while each can occur without the other, compassion is almost always preceded by empathy. The effect of empathy is stronger when we feel closer to the offended party, and this emotional distance is strongly determined by perceived similarity and physical distance between the two parties (Bregman, 2019). This is perhaps best illustrated by

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

45

how any reasonable person would readily ruin a $100 suit to save a child drowning in front of them, whereas most people would neglect to donate such sums to charity to save lives daily (Singer, 1972). By the same principle, historians describe how it is possible for citizens of opposing countries to do unspeakable things to one another at times of war (Bregman, 2019), group dynamics can explain why there is so much more hostility between rather than within homogeneous teams at a company (Forsyth, 2006), and why sociologists worry about the adverse effects of relative distance and a lack of accountability on social media (Lapidot-Lefler & Barak, 2012). The mechanisms of empathy and compassion can be seen anywhere there is human interaction. At its best, it’s the most effective hostility dampener yet conceived, and it can protect human–computer interaction in the same way, if only we can extend people’s empathic instincts to non-human and even non-sentient targets. There is evidence to suggest that humans can empathize with machines that resemble humans to a sufficient degree (Misselhorn, 2009). This similarity need not be limited to the aesthetic sense that an android looks more like a human than does a Roomba, but also in the sense that a Roomba, being a relatively autonomous agent, already bears more resemblance to a human than for example a refrigerator. And indeed, we can be made to feel empathy for machines as simple as Roombas under the right conditions, such as when we try to judge the morality of seemingly hostile actions (Hoenen et al., 2016). There are myriad approaches by which one can attempt to judge the morality of an action, but for now we only need to consider what is at the center of moral judgement systems: lived experiences of conscious agents. An action is morally good because it elicits or attempts to elicit, on balance, positive experiences. Conversely, it is bad if it elicits or attempts to elicit, on balance, negative experiences. The way in which these intentions or actions, or the resulting experiences, are judged and weighed differ substantially between worldviews, but experiences of conscious agents are involved directly or indirectly in all moral judgements as a matter of course [see Harris (2010) and Harari (2017) for a more elaborate account and discussion on the topic]. One framework that tries to capture the process of this moral judgment in so-called moral dyads suggests that in order to judge the morality of an action, that action must involve at least one agent in both the causal and receiving sides of that action (Gray et al., 2012a, b). If a person breaks a tool or machine in anger, we might judge them because it speaks ill about the ability to keep their temper and not hit other people in heated situations, or because perhaps it was not their machine, or because it took time, money, materials, and effort to produce it and someone else could have used those resources, all of which translate to the lived experience of others. But if one removes all outside agents from consideration, most people would not judge a person breaking a tool to be a villainous act, nor a tool falling on a person and injuring them, negligence (by another human) notwithstanding (Gray et al., 2012a, b). Of course, this reasoning becomes smudgy as it becomes debatable whether the machine in question should be considered a person in its own right. Humans are not rational beings at the best of times: our intuitions rarely map perfectly onto calculated logic—as illustrated by Singer (1972)’s earlier example—and we can be made to make some rather questionable decisions when they are framed in certain

46

B. Vegt and R. de Kleijn

ways (Tversky & Kahneman, 1981). In a human–robot interaction, research has suggested we can be tricked into ascribing sentience where we normally would not. Ward et al. (2013) coined the term harm-made mind to describe how participants tended to attribute more “mind” to fictional humanlike inanimate objects (corpses, robots, and permanently comatose patients) when they were damaged or harmed by humans with ill intent. The reasoning goes that people, following the moral dyad intuition, judged harming a patient, humanlike robot, or even a dead body as so morally objectionable that they subconsciously deemed the victims as being more sentient so that they could ascribe a higher degree of moral fault to the offending agent. Participants not only rated these victims’ capacity for pain higher than was the case for non-victimized or accidentally harmed equivalents in the control groups, but also felt they were more capable of experience and agency, indicating a higher capacity for planning, self-control, and hunger. In other words, it seems that humans, under the right conditions and to a certain extent, can feel empathy toward non-human and even nonconscious entities. However, Ward et al.’s (2013) findings should be considered carefully. The robot that participants read about in the experimental condition’s version of the story was described as being “regularly abused” with a scalpel. This language was not present in the control condition due to the nature of the experiment, and the use of suggestive language to describe the act of damaging a robot or its sensors could have confounded the findings. Since damaging a non-sentient object is not typically the cause to accuse someone of “abusing” it, and since it is a word normally reserved only for describing harmful actions against a feeling agent,1 the mere use of this word potentially suggested sentience on the part of the robot, considering subtle differences in language can influence people’s accounts of events in substantial ways (Loftus & Palmer, 1974). More recent studies (e.g., Küster & Swiderska, 2020) used visual vignettes instead of text-based ones, largely eliminating language as a potential confound. Although results were largely similar, differences were found between human avatars and humanlike robotic avatars, pertaining to their perceived levels of mind and applied moral standards. With that caveat in mind, it is worth noting that research has since demonstrated the principle of the harm-made mind in a more concrete manner as well (Hoenen et al., 2016). Participants report marginally more compassion for robots which they felt were treated more aggressively. This could also be seen on a neurophysiological level as measured by mirror neuron activity, showing that there is potential for us to sympathize with these mechanical entities. It is one thing to ask someone about the idea that machines could possess a mind, it is quite another to try to determine where we stand with current robots. Findings such as the above suggest that people are generally even open to the possibility of machine consciousness, but only as a function of how humanlike the machine is. However, a common intuition by which people determine whether an agent is conscious or not is 1

Reminding one of the same category mistakes famously illustrated by computer scientist Edsger W. Dijkstra: “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

47

through a combination of its complexity and similarity to a human (Morewedge et al., 2007). When an agent’s actual complexity is hard to determine, all that remains to go on are their aesthetic and behavioral similarities to us. The field might benefit from research investigating the upper and lower limits of this theory, searching to identify people’s thresholds for mind attribution and empathy. An extended replication of the harm-made mind study could, for example, include a broader range of objects, some intentionally pushing on the edge of the uncanny valley, and others in the other extreme, resembling humans less and less, such as mannequins, dolls, toys, or even planks of wood, to fully test the limits of the suggested effects as a function of complexity and humanlikeness.

Physical Human–Robot Interaction We also seem to extend certain social norms to robots subconsciously, as seen when participants show physiological signs of emotional arousal when touching a robot’s intimate (low-accessible) body parts (Li et al., 2017). Here as well, the authors pose as an open question to what extent such signs would occur when touching dolls, mannequins, etc. However, the only known attempted replication of this study thus far (Zhou et al., 2021) found no difference in arousal levels between body parts, although this could be due to methodological limitations of that study, including a low number of participants and the use of a robot which lacks some of the most intimate (i.e., least accessible) body parts: this robot does not have legs as such, and thus no inner things nor clear buttocks or part which would correspond to its genitals. As this is a new, yet-to-be-replicated finding, we must employ caution in drawing conclusions on its basis, but at the very least it suggests that people naturally see humanoid robots as something else than mere objects.

Goal-directed Action and Mirroring Human beings, as well as some other animals, learn by both observing and doing. We observe and mimic the behavior of others to expand our own skillset. It is believed that at the heart of this phenomenon is a subconscious “mirroring” mechanism, which has been identified on a neurological level (Gallese et al., 2004). When somebody observes another perform an action, brain activity shows patterns similar to when they perform that action themselves (Gallese et al., 1996). We can also infer the intention behind an action as we observe it, allowing us to understand each other’s behavior as well as replicate it (Alaerts et al., 2010). The neurological signature of mirroring has been recognized when observing robot actions as well (Oberman et al., 2007). Mirror neurons seem to play a similar role in imitating robot actions as they do in imitating human actions, as long as they are not too repetitious (Gazzola et al., 2007). In one study, participants were instructed to mimic a robot’s hand movement

48

B. Vegt and R. de Kleijn

as soon as it was finished, but the timing of their response was instead determined by, when the robot looked at them, mimicking a common social cue for turn-taking (Bao & Cuijpers, 2017). The fact that participants responded to this cue despite not being told about it suggests targets of mimicry are readily perceived as social agents with intentionality (Wykowska et al., 2014). However, participants seemed to use an action’s movement rather than its goal for mimicry, a phenomenon not observed for mimicking humans unless the goal is unclear (Bao & Cuijpers, 2017). This could either be because a robot’s more clunky movements require more conscious effort to imitate or because a robot’s actions are represented on a different abstraction level than those of a human.

Altruistic and Strategic Behavior Toward Robots Economic games can be used to simulate complex real-world situations using relatively simple rulesets in a controlled environment. These experiments can reveal behavior not easily predicted by pure theory, as illustrated when they were used to lay bare the irrational decisions people make (Harsanyia, 1961). For instance, in the dictator game, played with two parties, one party is given a certain amount of money, and asked to divide it over the two parties. Whichever distribution the party dictates is then realized. The ultimatum game is similar, but with the addition that the other party may at this point exercise a veto on the offer, in which case neither party gets anything. Strictly rational actors in a single-round ultimatum game would never reject a proposal, as a little bit of money is always better than no money. However, many people are willing to receive nothing if it means preventing the other party from “unfairly” receiving much more (Güth et al., 1982), even though such retribution yields no material benefit for them.2 Such behavior seems to be driven by an emotional response toward a transgression of fairness norms (Moretti & di Pellegrino, 2010). As such, it only appears when playing against opponents that are held to such norms. This is not the case for computers, which are generally perceived to lack intentionality (Moretti & di Pellegrino, 2010; Sanfey et al., 2003). As for humanoid robots, there is evidence to suggest they are treated more like humans than like computers in ultimatum games (Torta et al., 2013). However, behavior in economic games is itself influenced by expectations and social norms, since participants know they are being watched (Fehr & Schmidt, 2006). This objection has important implications for investigating human–robot social interaction, as behavior toward people is guided by different norms than toward non-human animals and inanimate entities. Therefore, we cannot be sure that only the participants’ tendency to anthropomorphize was measured. In addition, the results of these studies tend to be rather fragile. For instance, the effect in which participants more readily accept unfair offers from computers than from humans can disappear 2

However, it should be noted that this effect is moderated by absolute amount, e.g., Anderson et al. (2011) found that rejection rates approach zero as stake size increases.

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

49

or even reverse under certain circumstances (Torta et al., 2013). In an experiment with non-economic games, children’s urge to win was smaller when playing against a humanoid robot rather than a computer (Barakova et al., 2018). The robot was also perceived to be smarter, despite both opponent types using the same game strategy and purposely performing suboptimal moves to give the children an advantage. De Kleijn et al. (2019) continued the research on the topic of robot anthropomorphism and anthropomorphization tendencies in economic games, flipping the earlier paradigm of Torta et al. (2013) on its head somewhat: rather than use behavior as a measuring tool for anthropomorphization, they include the latter as an independent variable to investigate its effects on strategic and altruistic behavior. They found that sharing behavior in the dictator game was influenced by the physical appearance of robots, but not the participants’ anthropomorphization levels, while the reverse was true for the ultimatum game. Based on the results, they posit that playing against entities which they anthropomorphized led people to exhibit more fairness and strategy in their responses, although the anthropomorphization measures were taken after the game, so it is possible participants’ scores were also influenced by the game rather than just vice versa. For example, one might rationalize selfish decisions by retroactively minimizing their opponents’ humanlikeness, or one might attribute more or less humanlikeness to the opponent based on their reaction upon receiving their offer. This study merits careful consideration for the purposes of this chapter. As in other economic game studies, the researchers used the same text-based interaction between participant and opponent, regardless of the opponent type, to control for possible confounds, as is often done in similar studies. As such, the human player could not draw on their charisma and cunningness to plead, threaten, and shame the participant during the bargaining. Restraining the responses of human and robotic opponents diminishes the difference between them, which lies partly in the fact that they do not have access to the same toolset. Thus, the decision to facilitate comparison also necessarily undermines it somewhat. Furthermore, participants in this study anthropomorphized all non-human opponents equally, even though they were meant to be ordinal in this aspect. It is possible that vastly different outcomes could be observed with different opponents. Lastly, the fact that participants shared any money at all in a one-off dictator game versus a laptop is hard to account for based on just anthropomorphization, since people are unlikely to feel much empathy for a laptop. This might indicate that other factors, such as a strategic assumption that their choice would influence a later part of the game, or the fear of being judged as greedy, or simply familiarity, had a larger hand in determining participants’ choices. Participants tend to display more cooperation and desirable behavior when they anthropomorphize (de Kleijn et al., 2019; Waytz et al., 2010). However, clear, consistent data contrasting interactions between human and non-human partners is hard to find, and studies often contradict each other or yield ambiguous results. While people tend to behave differently when they believe to be playing with a human than with a computer, this difference is not uniform in direction or magnitude. This is hardly surprising, given that people do not consistently share money the same way even when playing only with humans. There is a large variance in the nature of a given person’s interactions with different people. What is more, their strategy (and

50

B. Vegt and R. de Kleijn

fairness thereof) is influenced by many situational factors, their impression of their opponent’s character being only one.

Ethical Considerations In what could well be either the most adorable or frightening of the studies discussed so far, Bartneck et al. (2005) replicated Milgram’s (1963) infamous obedience study, but with a small robot constructed from Lego bricks. The robot was programmed to tremble, scream, and beg for participants to stop administering shocks, but since the whole thing was rather quaint, not a single one of the participants came to its rescue. This would likely turn out very different if the robot to be shocked was indistinguishable from a human. Even if someone in a lab coat urges the participant on and ensures them that “it is only a robot, it cannot feel actual pain,” it seems reasonable to assume that this would create considerable stress in participants. Leaving aside for now the fact that institutional review boards would likely not approve of any experiment which could cause emotional distress, take a moment to consider the implications of this suggestion. Consider that, currently, artificial agents exist which have a very convincing ability to exhibit emotions, but in fact do not possess any real consciousness—there is nothing “what-it-is-like” to be them, as they are simply computer programs or characters in a videogame. But suppose that it is in fact possible for non-biological life to exist, fully artificial but satisfying any possible demands you could set for its sentience, intelligence, qualia, etc., and undergoing lived experiences. Given this premise on the one hand and our earlier observation on the other, we can conclude that the capacity for any such entity to feel, should it exist, does not necessarily stand in a 1:1 relationship with its capacity to emote. In other words, agents are imaginable that experience genuine consciousness, but are not able to convey it, like a patient with locked-in syndrome. We believe examples such as this highlight the necessity of reasoning about this, so as to not inadvertently harm a certain kind of life, the existence of which we cannot yet with certainty prove or disprove. On the other side of the same coin, we must be careful blurring the line between human and mechanical agents. As it has been observed that customers tend to be much more mean and rude to automated customer support services than to human employees in the same service (Pozharliev et al., 2021), we must be careful in concluding that we should make robots more convincingly humanlike, so as to improve customer appraisal. For as long as customers continue to keep their temper when chatting with fellow people but not when they believe to be talking to a robot, they might well end up lashing out against a human being who has been given a script and is simply trying to do their job.

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

51

Present Challenges Assessing the cognitive and behavioral effects of anthropomorphization requires measuring the process of anthropomorphizing in participants. Two components play a role here: how humanlike a robot looks at a participant, and the cognitive and behavioral effects this perceived humanlikeness produces. The interaction between these two components makes it difficult to measure either of them independently. There are considerable challenges involved in assessing the validity of findings discussed in this chapter as this is a complex field with many unknowns, due in part to a fundamental ambiguity of the myriads of parameters involved. Any given factor A might be used to predict variable B in one study, while it is B that seems to influence A in another, both interpretations sounding equally reasonable. This frequently leaves us with little idea of the actual causality, not to mention the possibility that the effects are bidirectional. Even in cases where the cause is identified, the directionality of the effect is contested. Reasoning on the basis of theory can yield many alternate, conflicting interpretations and predictions. As a result, almost any possible hypothesis could enjoy favorable outcomes and vice versa, making it neigh-impossible to prove or disprove much definitively. For instance, in the economic games discussed in an earlier section, we assumed that participants would display altruistic behavior by sharing at least some amount of their money with people, but would not extend this behavior to entities which are clearly non-human and with whom one does not need to share money, such as a rock or a teddy bear. We could then measure how much money gets shared, on average, with several entities and place them on a scale from 0 to human. This hypothetical scale is flawed in principle, as neither limit can be consistently defined. Different scenarios will cause some participants to share nothing, even without using robots. Likewise, there is no robot that is “so human” that participants give it all of their money, because merely being human is not sufficient to guarantee that outcome. The same problems and more are on display in the ultimatum game. If participants share more money in this game with a human than with a robot, multiple conflicting conclusions can be drawn: either that the robot was not human enough to sufficiently elicit empathy, or that it was convincingly human to such a degree that the participant forgot it was a robot, and distrusted it just as they might do with other humans. As one gets closer to the uncanny valley, these considerations only get more complex and multi-layered.

State of the Field Science relies on the aggregation of data, each study building on the work of previous research. We believe that the field requires a stronger foundation before it can progress further. As our technology and knowledge continue to advance and develop, we must also consider critical implications before we overcome these challenges. With the

52

B. Vegt and R. de Kleijn

amount of studies performed and published continuously, it is unavoidable that some of them contradict each other given any significance level, leaving it up to scholars of any field to examine this data and filter out the noise. Foundational fields, whose theories and paradigms have stood the test of time, do not often get torn down and rebuilt from the ground up. For that reason, when something foundational is called into question, a lot can be at stake because decades of consecutive work rests upon it, but it is smooth sailing otherwise. In contrast, newly evolving fields tend to be quite turbulent as they have not yet existed long enough to rely on such central pillars, but rather are rapidly testing new hypotheses and developing methodologies, which can be hurriedly iterated upon, but can just as easily collapse. In the field of human–robot interaction, the weather is turbulent indeed. The field of psychology is still coming to terms with new physiological and psychometric measures which are still being improved. At the same time, the stakes are high because modern computer science and engineering are advancing at a staggering rate, and their efforts are making robots and the programs that drive them increasingly important in our lives. Any field that attempts to tackle these challenges in tandem faces a Herculean task indeed.

References Alaerts, K., Swinnen, S. P., & Wenderoth, N. (2010). Observing how others lift light or heavy objects: Which visual cues mediate the encoding of muscular force in the primary motor cortex? Neuropsychologia, 48, 2082–2090. https://doi.org/10.1016/j.neuropsychologia.2010.03.029 Anderson, S., Ertaç, S., Gneezy, U., Hoffman, M., & List, J. A. (2011). Stakes matter in ultimatum games. American Economic Review, 101, 3427–3439. Bao, Y., & Cuijpers, R. H. (2017). On the imitation of goal directed movements of a humanoid robot. International Journal of Social Robotics, 9, 691–703. https://doi.org/10.1007/s12369017-0417-8 Barakova, E. I., De Haas, M., Kuijpers, W., Irigoyen, N., & Betancourt, A. (2018). Socially grounded game strategy enhances bonding and perceived smartness of a humanoid robot. Connection Science, 30, 81–98. https://doi.org/10.1080/09540091.2017.1350938 Bartneck, C., Rosalia, C., Menges, R., & Deckers, I. (2005). Robot abuse – A limitation of the media equation. Proceedings of the Interact 2005 Workshop on Agent Abuse, 54–58. Bregman, R. (2019). De meeste mensen deugen. De Correspondent. Burleigh, T. J., & Schoenherr, J. R. (2014). A reappraisal of the uncanny valley: Categorical perception or frequency-based sensitization? Frontiers in Psychology, 5, 1–19. https://doi.org/10.3389/ fpsyg.2014.01488 Burleigh, T. J., Schoenherr, J. R., & Lacroix, G. L. (2013). Does the uncanny valley exist? An empirical test of the relationship between eeriness and the human likeness of digitally created faces. Computers in Human Behavior, 29, 759–771. https://doi.org/10.1016/j.chb.2012.11.021 Carroll, N. (1990). The philosophy of horror: Or, paradoxes of the heart. Routledge. de Kleijn, R., van Es, L., Kachergis, G., & Hommel, B. (2019). Anthropomorphization of artificial agents leads to fair and strategic, but not altruistic behavior. International Journal of Human Computer Studies, 122, 168–173. https://doi.org/10.1016/j.ijhcs.2018.09.008 Fehr, E., & Schmidt, K. M. (2006). The economics of fairness, reciprocity and altruism – Experimental evidence and new theories. In Handbook of the economics of giving, altruism and reciprocity (Vol. 1, pp. 615–691). Elsevier. https://doi.org/10.1016/S1574-0714(06)01008-6.

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

53

Forsyth, D. R. (2006). Intergroup relations. In Group Dynamics (Fourth Ed., pp. 447–484). Thomson Wadsworth. Freud, S. (1919). The Uncanny [2011 archive.org version]. The Uncanny. https://web.archive.org/ web/20110714192553/, http://www-rohan.sdsu.edu/~amtower/uncanny.html. Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609. https://doi.org/10.1093/brain/119.2.593 Gallese, V., Keysers, C., & Rizzolatti, G. (2004). A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8, 396–403. https://doi.org/10.1016/j.tics.2004.07.002 Gazzola, V., Rizzolatti, G., Wicker, B., & Keysers, C. (2007). The anthropomorphic brain: The mirror neuron system responds to human and robotic actions. NeuroImage, 35, 1674–1684. https://doi.org/10.1016/J.NEUROIMAGE.2007.02.003 Gray, K., Waytz, A., & Young, L. (2012a). The moral dyad: A fundamental template unifying moral judgment. Psychological Inquiry, 23, 206–215. https://doi.org/10.1080/1047840X.2012.686247 Gray, K., Young, L., & Waytz, A. (2012b). Mind perception is the essence of morality. Psychological Inquiry, 23, 101–124. https://doi.org/10.1080/1047840X.2012.651387 Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior & Organization, 3, 367–388. https://doi.org/10.1016/ 0167-2681(82)90011-7 Harari, Y. N. (2017). The odd couple. In Homo Deus: A brief history of tomorrow (pp. 179–199). Harper Collins. Harris, S. (2010). The moral landscape: How science can determine human values. Free Press. Harsanyia, J. C. (1961). On the rationality postulates underlying the theory of cooperative games. Journal of Conflict Resolution, 5, 179–196. https://doi.org/10.1177/002200276100500205 Hoenen, M., Lübke, K. T., & Pause, B. M. (2016). Non-anthropomorphic robots as social entities on a neurophysiological level. Computers in Human Behavior, 57, 182–186. https://doi.org/10. 1016/j.chb.2015.12.034 Jentsch, E. (1906). On the psychology of the Uncanny. Angelaki, 2, 7–16. https://doi.org/10.1080/ 09697259708571910 Küster, D., & Swiderska, A. (2020). Seeing the mind of robots: Harm augments mind perception but benevolent intentions reduce dehumanisation of artificial entities in visual vignettes. International Journal of Psychology, 56, 454–465. https://doi.org/10.1002/ijop.12715 Lapidot-Lefler, N., & Barak, A. (2012). Effects of anonymity, invisibility, and lack of eye-contact on toxic online disinhibition. Computers in Human Behavior, 28, 434–443. https://doi.org/10. 1016/J.CHB.2011.10.014 Li, J. J., Ju, W., & Reeves, B. (2017). Touching a mechanical body: Tactile contact with body parts of a humanoid robot is physiologically arousing. Journal of Human-Robot Interaction, 6, 118. https://doi.org/10.5898/jhri.6.3.li Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behavior, 13(5), 585–589. https://doi.org/10.1016/S0022-5371(74)80011-3 MacDorman, K. F., & Chattopadhyay, D. (2016). Reducing consistency in human realism increases the uncanny valley effect; increasing category uncertainty does not. Cognition, 146, 190–205. https://doi.org/10.1016/j.cognition.2015.09.019 MacDorman, K. F., Vasudevan, S. K., & Ho, C. (2009). Does Japan really have robot mania? Comparing attitudes by implicit and explicit measures. AI & Society, 23, 485–510. https://doi. org/10.1007/s00146-008-0181-2 Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67, 371–378. https://doi.org/10.1037/H0040525 Misselhorn, C. (2009). Empathy with inanimate objects and the uncanny valley. Minds and Machines, 19, 345–359. https://doi.org/10.1007/s11023-009-9158-2 Moretti, L., & di Pellegrino, G. (2010). Disgust selectively modulates reciprocal fairness in economic interactions. Emotion, 10, 169–180. https://doi.org/10.1037/a0017826

54

B. Vegt and R. de Kleijn

Morewedge, C. K., Preston, J., & Wegner, D. M. (2007). Timescale bias in the attribution of mind. Journal of Personality and Social Psychology, 93, 1–11. https://doi.org/10.1037/0022-3514. 93.1.1 Mori, M., MacDorman, K. F., & Kageki, N. (2012). The uncanny valley. IEEE Robotics and Automation Magazine, 19, 98–100. https://doi.org/10.1109/MRA.2012.2192811 Norman, D. (2013). The design of everyday things. Basic Books. https://doi.org/10.1145/1340961. 1340979. Oberman, L. M., McCleery, J. P., Ramachandran, V. S., & Pineda, J. A. (2007). EEG evidence for mirror neuron activity during the observation of human and robot actions: Toward an analysis of the human qualities of interactive robots. Neurocomputing, 70, 2194–2203. https://doi.org/ 10.1016/J.NEUCOM.2006.02.024 Pozharliev, R., De Angelis, M., Rossi, D., Romani, S., Verbeke, W., & Cherubino, P. (2021). Attachment styles moderate customer responses to frontline service robots: Evidence from affective, attitudinal, and behavioral measures. Psychology and Marketing, 38, 881–895. https:/ /doi.org/10.1002/mar.21475 Radzvilavicius, A. L., Stewart, A. J., & Plotkin, J. B. (2019). Evolution of empathetic moral evaluation. ELife, 8.https://doi.org/10.7554/ELIFE.44269. Sanfey, A. G., Rilling, J. K., Aronson, J. A., Nystrom, L. E., & Cohen, J. D. (2003). The neural basis of economic decision-making in the Ultimatum Game. Science, 300, 1755–1758. https:// doi.org/10.1126/science.1082976 Singer, P. (1972). Famine, affluence and morality. Philosophy & Public Affairs, 1, 2229–2243. http:/ /www.jstor.org/stable/2265052. Strait, M. K., Floerke, V. A., Ju, W., Maddox, K., Remedios, J. D., Jung, M. F., & Urry, H. L. (2017). Understanding the uncanny: Both atypical features and category ambiguity provoke aversion toward humanlike robots. Frontiers in Psychology, 8, 1–17. https://doi.org/10.3389/fpsyg.2017. 01366 Tinwell, A. (2014). The Uncanny Valley in games and animation. CRC. Torta, E., Van Dijk, E., Ruijten, P. A. M., & Cuijpers, R. H. (2013). The ultimatum game as measurement tool for anthropomorphism in human–robot interaction. Lecture Notes in Computer Science, 8239, 209–217. https://doi.org/10.1007/978-3-319-02675-6_21 Tversky, A., & Kahneman, D. (1981). The framing of decision and the psychology of choice. Science, 211, 453–458. https://doi.org/10.1126/science.7455683 Ward, A. F., Olsen, A. S., & Wegner, D. M. (2013). The harm-made mind: Observing victimization augments attribution of minds to vegetative patients, robots, and the dead. Psychological Science, 24, 1437–1445. https://doi.org/10.1177/0956797612472343 Waytz, A., Cacioppo, J., & Epley, N. (2010). Who sees human? The stability and importance of individual differences in anthropomorphism. Perspectives on Psychological Science, 5, 219–232. https://doi.org/10.1177/1745691610369336 Wykowska, A., Wiese, E., Prosser, A., & Müller, H. J. (2014). Beliefs about the minds of others influence how we process sensory information. PLoS ONE, 9, e94339. https://doi.org/10.1371/ JOURNAL.PONE.0094339 Zhou, Y., Kornher, T., Mohnke, J., & Fischer, M. H. (2021). Tactile interaction with a humanoid robot: Effects on physiology and subjective impressions. International Journal of SOcial Robotics, 13, 1657–1677. https://doi.org/10.1007/s12369-021-00749-x

Bas Vegt graduated in Applied Cognitive Psychology at Leiden University, with his thesis on Differential Learning Effects in a Probabilistic Serial Reaction Time Task. During his years as a student he also concerned himself with topics of sociology, anthropology and computer science. His academic interests lie at the overlap of natural and social sciences as well as in curating the accumulated wealth of scientific knowledge and making it more generally accessible.

3 Cognitive Effects of the Anthropomorphization of Artificial Agents …

55

Roy de Kleijn is a computer scientist and psychologist working as an assistant professor at Leiden University. He received his M.S. in computer science from Georgia Tech and his M.S. in cognitive neuroscience from Leiden University, where he also received his PhD in cognitive robotics. His main research interests are evolutionary algorithms, deep reinforcement learning, and human-robot interaction.

Part II

Decision Support and Assistance Systems

Chapter 4

Psychological Factors Impacting Adoption of Decision Support Tools Thom Hawkins

Abstract The general increase in the speed of automation has made humans in the loop a process constraint. Integrated decision support tools can help close the gap between human and computer speed while still preserving safety and other benefits to human intervention. The intent of decision tool developers may not align with user expectations or patterns of employment in their native environment. This chapter investigates the psychological factors that limit or slow adoption of decision support tools, and how those factors can be mitigated. Psychological barriers to the adoption of automated decision support tools include trust and cognitive biases such as anchoring bias, automation bias, and egocentric bias. Sheridan’s trust-causation factors—reliability, robustness, familiarity, understandability, explication of intention, usefulness, and dependence—serve as a rubric for evaluating a system’s readiness for psychological acceptance. Ethics and culpability are also potential issues when providing tools in support of decision-making. Consideration of these factors as part of system design will improve adoption and more seamlessly integrate the human-in-the-loop. Keywords Adoption · Automation · Bias · Decision-making · Decision support · Trust

Summary A decision support system (DSS) is useful when both the system and the user are bringing some skills to bear on the decision. Fitts (1951) delineated between activities for which a human’s performance exceeds that of a machine and those where the machine is superior. Ideally, a DSS balances those operations according to relative specialization to specific tasks. Bridging what Norman and Draper (1986) call “the gulfs of execution and evaluation” is necessary to mitigate the limits to system adoption. The psychological factors impacting adoption of a DSS include cognitive biases such as anchoring, egocentric, belief, familiarity, and automation bias. The development of trust is also key to adoption of a DSS, specifically in T. Hawkins (B) US Army, PM Mission Command, 6590 Surveillance Loop, APG, Aberdeen, MD 21005, United States e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_4

59

60

T. Hawkins

ensuring the user has an understanding of how the DSS’s role affects the outcome reached. Finally, there are concerns related to ethics and culpability that must be addressed with respect to the user’s role in relation to the DSS. Opportunities to counter issues with trust and bias in the adoption of decision support exist across the development process, from the development of system requirements through user training, release, and support.

Introduction The perfect object exists only in isolation. Plato identified these ideals as belonging to a “realm of forms” that can only be approximated in the physical world. In “Phaedo,” Socrates describes the feeling of disappointment that emerges when there is a gap between the ideal and the real: “Someone, on seeing something, realizes that that which he now sees wants to be like some other reality but falls short and cannot be like that other since it is inferior” (Plato, 1981). Millennia later, this observation has become rooted in the field of cognitive engineering. Norman and Draper write of the gulf between a physical system and a user’s goals: “Goals and system state differ significantly in form and content, creating the Gulfs that need to be bridged if the system can be used” (Norman & Draper, 1986). In the decades that followed Norman and Draper’s insight, decision support tools, including business intelligence, have developed into a booming business (Negash & Gray, 2008). In the last decade, we’ve seen the rise of applications for deriving insight from the large amounts of data generated daily (Wixom et al., 2014). What has become conventional wisdom is that there is a return on investment for implementing decision support systems (DSSs). This perspective works to the advantage of the development and sale of such systems—but beware Plato’s perfect object—when we add humans in the loop, we’re integrating an unaccountably complex component.

Decision Support Systems Gorry and Scott Morton (1971) define DSSs as “interactive computer-based systems, which help decision-makers utilize data and models to solve unstructured problems.” The degree of structure in the decision situation is key, as full structure may not require a human in the loop at all, and a complete lack of structure provides little foothold for technology. Other definitions (e.g., Turban et al., 2001) refer to the use of decision support in “semistructured decision situations.” A DSS is useful when both the system and the user are bringing some skills to bear on the decision, as the goal of a DSS is “to form an integrated human–machine team capable of solving difficult, real-world problems more effectively than either individually” (Potter et al, 2002). Paul Fitts, in

4 Psychological Factors Impacting Adoption of Decision Support Tools

61

a 1951 report on automating air traffic control systems, delineated between activities for which a human’s performance exceeds that of a machine (e.g., “ability to reason inductively”) and those where the machine is superior (“e.g., ability to handle highly complex operations”). Ideally, a DSS balances those operations according to relative specialization in specific tasks. Zachary (1988) identifies six goals for decision support: “(1) projecting into the future despite uncertainty, (2) making trade-offs among competing attributes or goals, (3) managing large amounts of information simultaneously, (4) analyzing complex situations within constraints of time and resources, (5) visualizing and manipulating those visualizations, and (6) making heuristic judgments, even if they are only quantitative” (as quoted in Davis et al., 2005). On their face, some of these goals are best suited for “machine” (e.g., “managing large amounts of information simultaneously”) while others are more circumspect (humans notoriously abuse heuristics— Kahneman & Tversky, 1979). The pathway to adoption may differ depending on which user goal is in play.

Adoption The adoption of technology is not simply a binary, singular choice “to accept an innovation but also the extent to which that innovation is integrated into the appropriate context” (Straub, 2009). Rogers (2010) acknowledges this gradual transition model as an “innovation-decision process,” starting with awareness of the innovation and ending with reinforcement of the adoption decision. There are two concepts key to understanding adoption. The first is cognitive dissonance theory (Festinger, 1957) in which users seek to close any gap between their beliefs and behavior. If they have a negative belief about a system, their behavior will reflect that feeling—for example, if they decide through social pressure or a poor first impression that they do not like a system, they may underutilize the system, or reject adoption altogether. Their perception about a system is independent of the system itself. Just as with Plato’s “Realm of Forms” (Plato, 1981), where the idea of a thing is different than an instance of that thing, the belief about what a technological innovation is and can do is separate from the technology’s actual capability. It is only through sustained use or normalization that this gap can be narrowed. If use of the innovation is voluntary, adoption may fail because behavior is reflecting belief. In the same way, an innovation’s champion may overlook its flaws. The other concept is prospect theory (Kahneman & Tversky, 1979), which posits that the framing of a choice—specifically whether it is from the perspective of a prospective loss versus a potential gain—affects the decision, even if the content of the choice is identical. The framing of how an innovation is introduced, or whether against an existing process, influences how it is viewed by a potential user. Davis (1985), in the Technology Acceptance Model (TAM), identified three factors driving a user’s motivation with respect to adoption—(1) perceived ease of use; (2) perceived usefulness; and (3) attitude toward using the system. Davis (1989)

62

T. Hawkins

further found that “usefulness was significantly more strongly linked to usage than was ease of use.” While ease of use can be considered a human factors design component, “usefulness” is purely external to design, identifying the importance of framing in the introduction of the innovation. Making a decision, especially in complex situations that require decision support, is a function normally reserved for a human [i.e., “the ability to exercise judgment” (Fitts, 1951)]. While Rogers’ innovation-decision process applies to technology innovation generally, with adoption of DSS, the user perspective takes a larger role, whereas adoption of technology like mobile phones did not encroach on typically human roles. Acknowledgement of the need for support in carrying out a function may itself affect attitudes for the technology-driven by existing social structures (Rosette et al., 2015). The user not only has potentially unique goals (Zachary, 1988) but also differing preconceptions or patterns of usage fed by their experience. Users with more domain experience may prefer less structured naturalistic decision-making methodologies even in the face of semi-structured decision situations while users with less domain experience may defer more readily to a supporting process as they grapple with the variables (Hawkins, 2019). Even among those with expert domain knowledge, decision styles are not monolithic and a DSS must balance between customization to accommodate these differences, and generalization to maintain efficiency and consistency of outcome (Mayer et al., 2012). One important distinction to make with respect to adoption is that we are not referring to a person who only makes the decision to acquire a DSS, but is not in the position to use the technology. We are also assuming that the adopter has agency in the decision to adopt the DSS, versus a situation where the user is mandated to use a specific system by directive or simply because it is part of a larger process. However, even without agency, understanding the factors that impact adoption can facilitate the process.

Cognitive Biases Cognitive biases “are heuristics that shape individual preferences and decisions in a way at odds with means-end rationality” (Moynihan & Lavertu, 2012). There is no unified, faceted list of cognitive biases; the literature describes biases that are particular to the observation, and these can be consolidated into broader bias types, as discussed below. In the wild, biases are also unlikely to be distinct—often multiple biases could be interacting in a given situation. While biases also affect decisionmaking directly [for example, the tendency to overestimate the frequency of an event (Tversky & Kahneman, 2004)] and can be mitigated through the implementation of a rational DSS, the biases discussed below specifically impact the adoption of a DSS, not the use of such a system. Many of the biases discussed below affect the adoption of technological innovations more broadly, but, where applicable, the impact on adoption of DSSs specifically is identified.

4 Psychological Factors Impacting Adoption of Decision Support Tools

63

Anchoring Bias An anchoring bias is the disproportionate contribution of a user’s first interaction with a technology to their subsequent attitude toward it, regardless of the quality of additional interactions (Furnham & Boo, 2011).1 An anchoring bias is largely a factor of context—how the technology is introduced to the user—including a realized need, a flaw-free deployment, and even the venue through which the technology is presented (e.g., through a trusted peer). This bias could also inform the extent to which or the way in which a user employs a system such as a DSS, even when additional functionality is available or later becomes available. This is particularly notable for machine learning systems which may initially be trained on smaller sets of data until more data is available to retrain or fine-tune the model.

Egocentric Bias An egocentric bias reflects users’ reluctance to acknowledge the need for assistance (Hayashi et al., 2012). For example, not stopping to ask for directions when lost. This bias may vary with the level of a user’s domain experience in which the DSS is applied. For example, a domain novice may welcome the assistance because their identity is not associated with domain knowledge. In contrast, a domain expert2 may be irritated with offers of assistance, especially when the level of assistance offered is underwhelming, such as with Microsoft’s “Clippy” virtual assistant. Similar to the egocentric bias, an in-group bias reflects a user’s preference toward tools or processes developed by themselves or members of their close social connections over tools or processes developed by external or personally unknown entities. This could also extend, in the case of a DSS, to a rejection of the expertise that informs the technology.

Belief Bias A belief bias reflects the user’s acceptance or rejection of a system or process based on a subjective evaluation of the outcome. This could also be a confirmation (not conformity) bias, where a user “will tend to seek information that supports their behavior and discount critical information” (Moynihan & Lavertu, 2012). Specific to a DSS, and related to all three of the above bias types, if a user arrives at a preliminary decision prior to employing a DSS and the DSS fails to confirm their 1

Confirmation bias is also a driving factor for anchoring bias, as a user may seek evidence in subsequent use to confirm their initial impression. 2 Bandura (1982) defines self-efficacy as “judgments of how well one can execute courses of action required to deal with prospective situations.”

64

T. Hawkins

decision, they may reject not only the result but also the efficacy of the system as a whole.

Familiarity Bias A familiarity or status quo bias reflects a user’s inertia with respect to trading a known tool or process for a new one, even if the new tool or process has superior performance. According to Kahneman et al. (1991), “individuals have a strong tendency to remain at the status quo, because the disadvantages of leaving it loom larger than advantages.”

Automation Bias Finally, an automation bias generally refers to a deference toward technological innovation (Alon-Barkat & Busuioc, 2021). A user may fail to independently confirm the results of a conclusion, or even determine that they are themselves incapable of verifying the results. However, the user may also fear the loss of automation and thus rely on more dependably available manual methods. For example, if a refrigerator failed a couple of times a week, we would find a different method for preserving food. A user may also have an anti-automation bias, which can be driven by an egocentric bias where they believe their own methods are superior to leveraging the DSS for assistance. Trust, whether of technology in general or a specific system, is a component of automation bias.

Trust Sheridan (1992) elucidated the troubled relationship between man and machine: People who no longer understand the basis of what they are asked to do will become confused and distrustful. Especially if they perceive a powerful computer to be mediating between them and what is being produced at the other end, people become mystified about how things work, question their own influence and accountability, and ultimately abandon their sense of responsibility.

Sheridan (1992) further described seven trust-causation factors that could serve as a rubric for evaluating a system’s readiness for psychological adoption: reliability, robustness, familiarity, understandability, explication of intention, usefulness, and dependence. With respect to TAM’s (Davis, 1985) three factors influencing a user’s motivation to adopt, and Davis’ subsequent work (1989) that identified perceived usefulness as more strongly correlated to usage than perceived ease of use, we do find usefulness echoed directly in Sheridan’s trust-causation factors. Many of the

4 Psychological Factors Impacting Adoption of Decision Support Tools

65

other trust-causation factors (e.g., robustness, familiarity, understandability) support the perceived ease of use. What is missing from Sheridan’s model is consideration of the user’s pre-existing condition or attitude as an element of success. We can assume that if a DSS has been developed to support a user’s decision, the decision is both critical and complex. The breadth of capability that may be considered a DSS, as described by Zachary (1988), may challenge some of these trust-causation factors. While some DSS functions are fairly deterministic in nature (managing data, producing visualizations), others are either more probabilistic or at least require several intermediary steps from input to output with transformations of the magnitude that it is hard to recognize one from the other. The explainability of artificial intelligence has become its own niche discipline for precisely this reason. A Congressional Research Service (2018) report on artificial intelligence and national security quoted Dawn Meyerriecks, Deputy Director for Science and Technology at the U.S. Central Intelligence Agency, stating “Until AI can show me its homework, it’s not a decision quality product.” Wolfberg (2017) found similar concerns in his study of 21 senior landpower generals. Explainability has also been identified as a feature in the complex, critical decisions of medical professionals (Holzinger et al., 2017). The calibration of trust may also be necessary, both with respect to the underlying data, as well as the availability of the DSS. The certainty of data as presented is a concern independent of the level of automation—a system that appears to present a recommendation holistically and conclusively may either be trusted too much (automation bias) (St. John et al., 2000), or not enough (understandability). Rasch et al. (2003) discussed the perceived risks of using automation, with a focus on dependence and the balancing act of increasing the reliance on a system that could be destroyed or spoofed, adding “a map with a bullet hole in it is still a map, but a computer with a bullet hole in it is a doorstop.” In fact, trust and data certainty are such important factors in the use of automated decision support that they can even be used in an adversarial capacity (Llinas et al., 1998). Trust in the system may start to look more like faith, where a user is asked to take for granted that the machine knows what it is doing, or at least that whomever designed it did.

Ethics and Culpability The very notion of automation encroaching on traditionally human territory carries with it a raft of ethical and legal implications. For example, whether the human operator is held responsible for action taken if he or she did not veto an automated suggestion within the given time constraint. Culpability may represent an additional psychological barrier to adoption beyond the trust causation outlined by Sheridan. “One expert asserts that although some civilian AI algorithms will affect decisionmaking in substantial fields like health care or criminal justice, AI in the defense

66

T. Hawkins

environment will generally be more consequential, with human lives routinely held at risk” (Congressional Research Service, 2018). The fear of autonomous policing or military systems is driven by the delegation of decision-making authority to a data model or algorithm. US Department of Defense Directive 3000.09, Autonomy in Weapon Systems (2017), requires autonomous systems “allow commanders and operators to exercise appropriate levels of human judgment over the use of force.” While Fitts (1951) did not specifically address moral or ethical rules or judgment, he did credit humans with “the ability to reason inductively” and “the ability to exercise judgment,” both of which could encompass consideration of ethical precepts. Murphy and Woods (2009) note that “the reduction of ethics to a set of fixed rules is fraught with difficulty and likely to fail.” Ensuring a human is in the loop for decisions that require moral judgment may assuage fears of robot uprisings, but operators could still remove themselves from specific decisions by allowing system discretion within defined parameters (for example, by authorizing a target in advance through facial recognition or location). The issue of culpability in the event of an incident involving an autonomous vehicle is still in its nascent phase (McManus & Rutchick, 2019; Westbrook, 2017), both from a moral and legal standpoint. Moynihan and Lavertu (2012) note that the “attachment to previous decisions may be driven by a fear of regret and a need to justify previous behavior.” Understandability comes into play again here for the user to grapple with the decision from an ethical standpoint, they must be able to explain how the decision was reached.

Mitigating Factors Bridging what Norman and Draper (1986) call “the gulfs of execution and evaluation” is necessary to mitigate the limits to system adoption. Opportunities to counter issues with trust and bias in the adoption of decision support exist across the development process, from development of system requirements through user training, release, and support.

Requirements As noted by Davis (1989), “perceived usefulness” is perhaps the paragon feature driving technology adoption. Therefore, clearly identifying a need for decision support must be the first step toward a successful adoption. That need—a user’s goals—is expressed in cognitive terms while a system is designed according to physical terms (Norman & Draper, 1986). To ensure that gap is closed rather than widened, the documented needs and the developed DSS must be evaluated frequently with respect to one another, and indeed, if possible, to return to the source of the

4 Psychological Factors Impacting Adoption of Decision Support Tools

67

requirements to ensure that the needs were communicated as intended by the end users. The question of what to automate must be user-driven, based on the demand signal from the users, to include distinguishing between the decision-making models employed. For example, naturalistic decision-making is a widely accepted model for how a domain expert makes decisions and decision support must reflect that model to be adopted by that segment of users. A rational choice method based on a quantitative algorithm may force specificity without allowing the user to hedge for confidence levels. This is related to automation bias, where the user nominally “over trusts” the computer—but in this case, the system may, through inattentiveness in design, communicate a response with more implied confidence than warranted, eliminating potential alternatives. This may be exacerbated without an explanation of how the result was calculated (understandability). Further, analyzing the decision process in detail and understanding which steps have the heaviest cognitive load will help to identify the areas where automation may best aid human judgment through decision support. High cognitive load is also an indicator of where humans are likely to make the most errors in judgments (e.g., Palinko et al., 2010). Process analysis can also be used to determine which steps the user may feel most comfortable with relinquishing to an automated system.

Design As the system is designed to meet the expressed requirement, the input of user juries on proposed design features will reduce the gap in translation of the operational problem into a physical solution. Use of Sheridan’s (1992) trust-causation factors as evaluation criteria will guide design; for example, is the system using display language familiar to the user? How does the user know whether the system is operating within expected parameters? Are any system responses surprising to the user? With time constraints as a factor in an environment (e.g., medical, rescue, military), the system design must incorporate shortcuts for the user to skip or combine steps to reach the outcome more quickly.

Training Trust in a system can be built over time (familiarity) with training under a variety of circumstances (robustness). If a user has the opportunity to defer to a solution that is more readily available or more trusted, they will do so [i.e., status quo bias (Kahneman et al., 1991)]; therefore, the acquired solution must demonstrate an advantage in time or trust to overcome that deference. Furthermore, trust may not be a strictly psychological factor, but also a sociological one, with adoption relying on the relationship between the direct user’s

68

T. Hawkins

organization affected by the decisions and the decision support inputs and outputs, rather than solely on the user who interfaces directly with the tool [i.e., “trust transference” (Kang et al, 2011)]. Training must encompass more than how to use a DSS; decision support must be ingrained in the organizational unit’s processes.

Release How a system is introduced to the user may determine its ultimate adoption. Countering egocentric bias [including the subcategory of “not invented here” bias, where a user rejects external innovations (Antons & Piller, 2015)] through an introduction that relates capability to user requirements and even to specific comments made by users expressing the need will provide the context of how the capability relates to the concept of operations. Due to anchoring bias, the first impression could derail the adoption process if the circumstances of the introduction are not strictly controlled.

Support Feedback on how and under what circumstances a DSS is used must be collected to refine requirements and development or refinement of future decision support tools. If a decision aid or particular feature is not being used, determine the root cause and respond accordingly (including removing the aid from further support). It may be necessary, in situations where the user is not regularly employing the DSS, to build familiarity and trust through practice under a variety of circumstances (AldasManzano et al., 2011), and when time allows, to explore the factors used by the DSS to determine the outcome (explainability), including, especially in the case of machine learning, the error boundaries or confidence in the computer’s output (Zhang et al., 2020).

Conclusion The problem of adoption is complex, involving many variables in building the psychological relationship between a piece of equipment or capability and its user. The development of a DSS invokes several polarities that influence adoption. For example: • Whether the tools support a naturalistic decision-making methodology aimed at users with significant domain expertise, or a more quantitative rational choice method to be used by those with less domain knowledge.

4 Psychological Factors Impacting Adoption of Decision Support Tools

69

• Balancing the complexity of interfaces (and the additional training and resources entailed) with the ability to customize for user preference and provide dynamic, learning interfaces that can be tailored in accordance with time constraints. • The trade-off of shifting the cognitive burden from short-term (analysis subsumed into the tool) to long-term (recalling training on how to use the tool). According to Norman and Draper (1986), “knowledge (or information) in the world and in the head are both essential in our daily functioning. But to some extent we can choose to lean more heavily on one than the other. That choice requires a trade-off—gaining the advantage of knowledge in the world means losing the advantages of knowledge in the head.” • Increasing reliance on a given capability while acknowledging that the reliance could be a liability if the capability becomes unavailable. In this situation, the user or the organization could become incapacitated in terms of their ability to decide given their operation’s reliance on methods that may not be manually replicable. • Critical circumstances may increase the need to rely on a DSS, but occur so rarely that the user’s trust of the DSS has not been appropriately calibrated for use in that circumstance. While system usability scales (Brooke, 1996) give usefulness short shrift, Sheridan’s trust causation factors, explicitly citing usefulness, could be used to develop a rubric for assessing readiness for adoption on an individual basis. Further, understanding how user goals are expressed and fulfilled on an individual (i.e., psychological) level will inform not only the development but also the introduction of a DSS. While many of the factors in this paper could be generalized to adoption of information systems or innovations holistically, the role of decision support is, in a sense, close to home in that exercising judgment falls firmly on the human side of Fitts’ list (Fitts, 1951). Formulation of a DSS brings the computer’s unique skills to bear on a distinctly human role. The more support provided, the more the DSS encroaches on our territory. In this construct, the perception of the DSS’s role has the potential to hew further from the DSS as programmed.

References Aldas-Manzano, J., Ruiz-Mafe, C., Sanz-Blas, S., & Lassala-Navarre, C. (2011). Internet banking loyalty: Evaluating the role of trust, satisfaction, perceived risk and frequency of use. The Service Industries Journal, 31(7), 1165–1190. Alon-Barkat, S., & Busuioc, M. (2021). Decision-makers processing of AI algorithmic advice: Automation bias versus selective adherence. arXiv:2103.02381. Antons, D., & Piller, F. T. (2015). Opening the black box of “not invented here”: Attitudes, decision biases, and behavioral consequences. Academy of Management Perspectives, 29(2), 193–217. Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37(2), 122. Brooke, J. (1996). SUS: A ‘quick and dirty’ usability scale. In I. McClelland I (Ed.), Usability evaluation in industry. Taylor & Francis Ltd.

70

T. Hawkins

Congressional Research Service. (2018). Artificial intelligence and national security. Washington, DC Davis, P. K., Kulick, J., & Egner, M. (2005). Implications of modern decision science for military decision-support systems. Rand Corporation. Davis, F. D. (1985). A technology acceptance model for empirically testing new end-user information systems: Theory and results. Doctoral dissertation, Massachusetts Institute of Technology. Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 319–340. Department of Defense. (2017). Directive 3000.09, Autonomy in weapon systems. https://www.esd. whs.mil/portals/54/documents/dd/issuances/dodd/300009p.pdf. Festinger, L. (1957). A theory of cognitive dissonance (vol. 2). Stanford University Press. Fitts, P. M. (1951). Human engineering for an effective air-navigation and traffic-control system. Furnham, A., & Boo, H. C. (2011). A literature review of the anchoring effect. The Journal of Socio-Economics, 40(1), 35–42. Gorry, G. A., & Scott Morton, M. S. (1971). A framework for management information systems. Hawkins, T. (2019). Naturalistic decision-making analogs for the combat environment. Springer. Hayashi, Y., Takii, S., Nakae, R., & Ogawa, H. (2012). Exploring egocentric biases in human cognition: An analysis using multiple conversational agents. In 2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing. Holzinger, A., Biemann, C., Pattichis, C. S., & Kell, D. B. (2017). What do we need to build explainable AI systems for the medical domain?. arXiv:1712.09923. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrics, 47(2), 263–292. Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1991). Anomalies: The endowment effect, loss aversion, and status quo bias. Journal of Economic Perspectives, 5(1), 193–206. Kang, I., Lee, K. C., Kim, S. M., & Lee, J. (2011). The effect of trust transference in multi-banking channels; offline, online and mobile. International Journal of Mobile Communications, 9(2), 103–123. Llinas, J., Bisantz, A., Drury, C., Seong, Y., & Jian, J-.Y. (1998). Studies and analysis of aided adversarial decision making; phase 2: Research on human trust in automation. State University of New York at Buffalo Center of Multisource Information Fusion. Mayer, J. H., Winter, R., & Mohr, T. (2012). Situational management support systems. Business & Information Systems Engineering, 4(6), 331–345. McManus, R. M., & Rutchick, A. M. (2019). Autonomous vehicles and the attribution of moral responsibility. Social Psychological and Personality Science, 10(3), 345–352. Moynihan, D. P., & Lavertu, S. (2012). Cognitive biases in governing: Technology preferences in election administration. Public Administration Review, 72(1), 68–77. Murphy, R., & Woods, D. D. (2009). Beyond Asimov: The three laws of responsible robotics. IEEE Intelligent Systems, 24(4), 14–20. Negash, S., Gray, P. (2008). Business intelligence. In Handbook on decision support systems 2. Springer. Norman, D. A., & Draper, S. W. (1986). User centered system design: New perspectives on humancomputer interaction. Lawrence Erlbaum. Palinko, O., Kun, A. L., Shyrokov, A., & Heeman, P. (2010). Estimating cognitive load using remote eye tracking in a driving simulator. In Proceedings of the 2010 Symposium on Eye-tracking Research & Applications. Plato. (1981).Five dialogues: Euthyphro, apology, crito, meno, phaedo. Hackett Publishing Company. Potter, S. S., Elm, W. C., Roth, E. M., Gualtieri, J., & Easter, J. (2002). Bridging the gap between cognitive analysis and effective decision aiding: State of the art report (SOAR): Cognitive systems engineering in military aviation environments: Avoiding cogminutia fragmentosa.

4 Psychological Factors Impacting Adoption of Decision Support Tools

71

Rasch, R., Kott, A., & Forbus, K. D. (2003). Incorporating AI into military decision making: An experiment. IEEE Intelligent Systems, 18(4), 18–26. Rogers, E. M. (2010). Diffusion of innovations (5th ed.). Simon & Schuster. Rosette, A. S., Mueller, J. S., & Lebel, R. D. (2015). Are male leaders penalized for seeking help? The influence of gender and asking behaviors on competence perceptions. The Leadership Quarterly, 26(5), 749–762. Sheridan, T. B. (1992). Telerobotics, automation, and human supervisory control. MIT Press. St. John, M., Callan, J., Proctor, S., & Holste, S. T. (2000). Tactical decision-making under uncertainty: Experiments I and II. United States Navy. Straub, E. T. (2009). Understanding technology adoption: Theory and future directions for informal learning. Review of Educational Research, 79(2), 625–649. Turban, E., Aranson, J. E., & Liang, T.-P. (2001). Decision support systems and intelligent systems (7th ed.). Prentice-Hall. Tversky, A., & Kahneman, D. (2004). Judgment under uncertainty: Heuristics and biases. In Preference, belief, and similarity: Selected writings (pp. 203–220). Westbrook, C. W. (2017). The Google made me do it: The complexity of criminal liability in the age of autonomous vehicles. Michigan State Law Review. Wixom, B., Ariyachandra, T., Douglas, D., Goul, M., Gupta, B., Iyer, L., Kulkarni, U., Mooney, J. G., Phillips-Wren, G., & Turetken, O. (2014). The current state of business intelligence in academia: The arrival of big data. Communications of the Association for Information Systems, 34(1). Wolfberg, A. (2017). When generals consume intelligence: The problems that arise and how to solve them. Intelligence and National Security, 32(4), 460–478. Zachary, W. W. (1988). Decision support systems: Designing to extend the cognitive limits. In M. G. Hollander (Ed.), Handbook of human-computer interaction. North Holland. Zhang, Y., Liao, Q. V., & Bellamy, R. K. (2020). Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 295–305).

Thom Hawkins is a project officer (civilian) for artificial intelligence and data strategy with the US Army’s Project Manager Mission Command. He holds a Master of Library and Information Science degree from Drexel University in Philadelphia, PA, and a Bachelor of Arts degree from Washington College in Chestertown, MD. He also writes on artificial intelligence and data topics for Army AL&T magazine.

Chapter 5

Model-Based Operator Assistance: How to Match Engineering Models with Humans’ Cognitive Representations of Their Actions? Romy Müller

and Leon Urbas

Abstract The operation of process plants can be supported by assistance systems that rely on engineering models to answer various questions. However, suitable answers can only be computed when the questions posed by the operators using the assistance are compatible with those considered by the engineers developing the models. Such compatibility cannot be taken for granted, because operator cognition and action rarely play a role during plant engineering. How can we determine whether a model is useful for answering particular operator questions? A promising strategy is to apply compatible frameworks for describing the domain concepts captured by the models and the cognitive action representations of operators. To this end, we introduce the concept of abstraction hierarchies that are used both by engineers to model the technical system and by humans to identify their actions. Thus, they provide an opportunity for matching the two. We argue that such matching cannot be performed algorithmically. If humans are to be responsible for the matching, this presupposes that models are equipped with understandable descriptions of their capabilities. Accordingly, we argue that model descriptions are a cornerstone of successful model-based operator assistance. We discuss the challenges of this approach and identify directions for future work. Keywords Model-based assistance · Engineering models · Abstraction hierarchies · Action identification theory · Situation awareness · Process industries

R. Müller (B) Faculty of Psychology, Chair of Engineering Psychology and Applied Cognitive Research, Technische Universität Dresden, Helmholtzstraße 10, 01069 Dresden, Germany e-mail: [email protected] L. Urbas School of Engineering, Chair of Process Control Systems & Process Systems Engineering Group, Technische Universität Dresden, Dresden, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_5

73

74

R. Müller and L. Urbas

Introduction Complex Problem Solving and Situation Awareness in Process Plants If you have access to high-quality information about the inner workings of a technical system, does this guarantee that you can successfully interact with it? Based on this assumption, model-based operator assistance is considered a promising candidate for advancing human–technology cooperation in the process industries. Dealing with changes in process plants—be it planned in case of adaptations to requirements, or unplanned in case of faults or disturbances—can be a cognitively challenging task (Müller and Oehm, 2019) that requires knowledge-based behavior (Rasmussen, 1983). This work mode is likely to become predominant in highly flexible cyberphysical production systems. In these systems, more and more routine tasks can be automated, while humans are expected to take over the tasks of dynamic decisionmaking and planning (Hirsch-Kreinsen, 2014). Throughout this chapter, the resulting challenges and potentials for operator assistance will be illustrated by relying on the following example: In an industrial wastewater treatment plant, a waste stream of varying alkalinity is neutralized with the goal of achieving an acceptable target acidity. This chemical neutralization is implemented in a continuous, fully automated process that mitigates disturbances by controlling temperatures, liquid levels, and flows. Operators supervise the automated system and intervene if necessary (Sheridan, 2011). One such circumstance that tends to occur unpredictably is foaming. Foaming only becomes problematic under certain circumstances (e.g., when the top of the foam is too close to the lid of the vessel), and it requires situation-specific actions that cannot be captured in standard operating procedures. Operators need to understand that foaming is the problem at hand, evaluate the most probable causes, select and implement suitable actions, monitor their success, and iterate through this procedure until they have found a safe, robust, and economic solution. Situations like this require operators to engage in complex problem-solving (Dörner and Funke, 2017; Fischer et al., 2012; Funke, 2010). First, a large number of heterogeneous variables needs to be considered. These variables include the feed state (e.g., temperature, amount of solved CO2 , acidity), the state and health of the plant equipment (e.g., fouling, corrosion), and the environment (e.g., availability of waste buffers, cooling water reserve). Second, these variables are interconnected, and changing one of them inevitably changes the others. Foaming correlates with the amount of the CO2 gas stream in the neutralizer, which in turn is influenced by parameters of the feed stream (e.g., alkalinity, solved CO2 ) as well as by the current acidity target. Third, the process states and relations between variables are partly intransparent. Foaming is influenced by both the amount of solved CO2 in the feed stream and the surface tension of the feed stream, but neither of them can be measured directly. Fourth, the situation is dynamic and changes nonlinearly, even in the absence of operator intervention. Salt may stick to the surface of the equipment (i.e., fouling) and change the heat transport characteristics, which

5 Model-Based Operator Assistance: How to Match Engineering Models …

75

in turn influences the dynamics of the core process. Finally, multiple goals must be considered that can be vague or in conflict with each other, such as throughput (i.e., neutralize as much feed as possible), availability (e.g., prevent fouling), or robustness (e.g., mitigate foaming while keeping product acidity within acceptable limits). Complex, dynamic situations like this require operators to perceive the relevant elements, interpret this information, and draw inferences about future system behavior (Endsley, 1995, 2017). Gaining situation awareness can be conceptualized as a recursive process, as exemplified in Neisser’s perceptual cycle (Neisser, 1976). Although this conceptual model has been developed half a century ago, it is being widely applied to date and has recently been validated in the context of dealing with critical human–machine incidents (Plant and Stanton, 2015). It assumes a cyclical relationship between humans and the external world (see Fig. 5.1, right): humans build up internal mental schemata based on the external conditions they observe, and these schemata, in turn, direct their perception and action. The resulting changes in the external world lead to a revision of the mental schemata. In our example, operators might have a first idea about the variables that influence foaming, derive suitable countermeasures, and refine their understanding by observing how foaming changes with these interventions. Thus, information processing is not linear and passive, but cyclical and interactive, with operators’ conceptions of the external world playing an essential role. There is no guarantee that the perceptual cycle converges into an objectively valid situation assessment. Taken together, operating a process plant requires complex problem-solving in a continuous cycle of internal schema updating and interaction with the external world. Given the challenging nature of these cognitive activities, how can they be supported by means of assistance systems? A promising support strategy is to apply formal models from the plant engineering phase (cf. Müller et al., 2021). In this chapter, we will discuss the potentials and challenges of this approach. We first describe how models are used during plant engineering, and claim that it cannot be taken for

Fig. 5.1 Matching formal representations of domain constraints with cognitive representations of actions by integrating the Abstraction Hierarchy (AH) and Action Identification Theory (AIT)

76

R. Müller and L. Urbas

granted that they are equally useful for operator assistance. To justify this claim, we explain how engineers develop their models based on hierarchical descriptions of the technical system on multiple levels of abstraction. This provides a valid account of the technical system, but does not adequately consider human cognition and action (Urbas and Doherr, 2011). To provide a complementary account of how humans cognitively represent their actions, we introduce a psychological theory that also relies on abstraction hierarchies. We integrate the accounts of domain-centered and cognition-centered abstraction hierarchies to derive implications for model-based operator assistance. Specifically, we argue that a matching of abstraction hierarchies is needed but cannot be performed algorithmically. Thus, the matching must be performed by humans, which presupposes that they have access to understandable model descriptions. Finally, we discuss the challenges of this approach and provide an outlook for future research.

Formal Models in the Engineering Phase of a Plant Engineers have access to knowledge about various relations between the variables involved in complex chemical processes and plant design. This knowledge is explicitly formulated in formal models that are interpretable by computers and thus can be used for further processing: by performing algorithmic operations on the models, it is possible to generate classifications, predictions, and many other kinds of answers. The entirety of these models, sometimes called the digital twin, is heavily utilized in designing chemical plants to be safe, robust, and economical. In our example, the digital twin can be helpful to understand why in some cases there is extensive foaming, while in other cases, it does not occur at all. However, this deep knowledge is not readily available to operators. This is due to (a) the internal complexity of the knowledge, (b) the sometimes complicated mathematical abstractions inherent in the models, and (c) the implicit background information necessary for evaluating the answers derived from the models (i.e., with regard to their accuracy, limitations, and side effects). When asking how formal engineering models can be used for operator assistance, we first need to consider what functions they fulfill during plant engineering. Engineers use models as tools for thought in various activities (e.g., process planning, mechanical and automation design, and purchase of components). These activities call for three kinds of models. The first kind describes the mereology and makeup of a complex plant out of subsystems, components, and their attributes, either as requirement formulations or as-built descriptions. The second kind describes the individual functions and functional relations between different parts of the system, including connectivity and communication. The third kind simulates plant behavior based on mathematical algorithms, and thus makes it possible to ask questions about the behavior of the system and its subsystems in different contexts. Thus, engineering models describe the structure, effects, and behavior of the system. In that way, they enable engineers to build mental schemata, infer suitable actions, predict the effects

5 Model-Based Operator Assistance: How to Match Engineering Models …

77

of these actions on the system, and evaluate these effects to update their mental schemata. That is, models support all activities within the perceptual cycle (Neisser, 1976) and thus are a valuable tool for the engineering team to build up situation awareness. But can they also foster the situation awareness of operators who are running the plant?

Using Engineering Models for Operator Assistance A common, optimistic, and perhaps somewhat naïve assumption is that engineering models are certainly beneficial for operator assistance. After all, when they provide information that is useful for an engineer to plan and build a plant, they should also be useful for an operator who at first glance has a much narrower task. However, the usefulness of engineering models for operator assistance is called into question by three fundamental characteristics of models (Stachowiak, 1973). First, models provide abstractions of the real world and thus go along with a reduction of information. They only retain information that is relevant in a given context, while leaving out other information. The question of which information is relevant directly relates to the second characteristic of models: their quality can only be evaluated in relation to a particular goal. This means that whether an engineering model is useful for operator assistance will depend on whether the original goals of the model developer are compatible with those of the operator. Third, model development is a creative process based on various subjective decisions, and the process, as well as the reasons behind these decisions, are often not well-documented. Accordingly, a key problem is that it can be difficult to infer whether a model provides the information needed in a particular operating context. A factor that complicates the use of engineering models for operator assistance is that we do not have (and do not want) a world model for any complex technical system. That is, no singular model can describe the entire plant with its processes, products, resources, and interactions that underlie the product-related transformations and the plant-related capacities, which would, in turn, provide a basis for modeling operation (Bamberg et al., 2020). Instead, each model only describes a particular aspect of the system. In consequence, we need to select suitable models that are able to answer a particular operator question. Some models may at first glance look like they can generate the required information, but in fact, they are often insufficient because they are incomplete, not accurate enough, or require data that is not readily available. Accordingly, it must be determined whether a model has the right content and quality to generate valid answers. To use an everyday analogy, if you just say that you want to go shopping, it is enough to be told that there is a shop. However, if you want to buy bread, the shop must be a bakery—or any other valid alternative (e.g., a supermarket) but not an invalid alternative (e.g., a bookstore). What is a valid alternative may further depend on the type of bread you want to buy, which might rule out supermarkets and perhaps even most bakeries. The same logic applies when using models from plant engineering for operator assistance.

78

R. Müller and L. Urbas

This leads to a central question: How can we match the contents and capabilities of models with the information requirements of humans in a specific operating context? Such matching presupposes that model features and operator requirements are represented in a common framework. To this end, the following section will introduce the concept of abstraction hierarchies that can describe both, the constraints of a technical system which are reflected in the models, and the cognitive representations of actions that are used by human operators.

Abstraction Hierarchies in Systems Engineering and Human Cognition Abstraction Hierarchies in Systems Engineering When modeling a complex work domain, it is necessary to structure the problem space that results from various domain constraints. To this end, each technical system can be described on different levels of abstraction. The most prominent framework is Rasmussen’s Abstraction Hierarchy (AH) (Naikar et al., 2005; Rasmussen, 1986) which is frequently applied for system analysis and design (for overviews see Bennett, 2017; Naikar, 2017). The AH relies on a means-end continuum which relates system goals to physical implementations via the functions that specific components must fulfill to achieve the goals. That is, higher levels describe why a function exists, while lower levels describe how it is implemented. The AH typically relies on five levels of abstraction, although the exact number depends on the respective system (Vicente and Rasmussen, 1992): (1) functional purpose (i.e., intended function of the system), (2) abstract function (i.e., flow of mass and energy), (3) generalized function (i.e., functions independent of concrete physical implementation), (4) physical function (i.e., physical processes and components), and (5) physical form (i.e., physical appearance of components). Different disciplines that contribute to the design of a plant use different versions of the AH. For instance, it is not feasible to represent product transformations and automatic control in one and the same AH (Lind, 1994). Instead, multiple representations are needed, and thus the AHs used by mechanical engineers and automation engineers differ considerably (Schmidt and Müller, in preparation). In consequence, different models used during plant engineering also rely on different versions of the AH, depending on their respective purpose. For instance, from a process systems perspective, a control loop might be considered a physical function, whereas from an automation perspective, it is a functional purpose that needs further refinement to be understood and implemented. This concept of different but connected AHs used by different disciplines is well-elaborated in the IEC 62,424 model for the piping and instrumentation diagram, which serves as a standardized means of communication to connect the different AHs (Drath, 2010). Another case is AHs that take different perspectives (e.g., steady state vs. dynamic control). These AHs might match at

5 Model-Based Operator Assistance: How to Match Engineering Models …

79

the levels of functional purpose and abstract function (i.e., achieve target acidity by setting defined mass flows), but start to diverge at the level of generalized function (e.g., define the proper mass flow of acid with respect to alkalinity vs. control the mass flow of acid with respect to target acidity). As we will see below, the multitude of AHs applied in engineering models has direct implications for the issue of matching models to operator questions. Despite these differences in the AHs used by different model developers, one thing is common to all of them: abstraction is chosen to design a chemical plant as efficiently as possible. In contrast, it usually does not consider the cognitions and actions of operators that are needed to successfully run the plant. In principle, Cognitive Work Analysis (Naikar, 2017; Vicente, 1999) provides tools to also model operator tasks, strategies, and competencies, but these analyses are rarely performed (McIlroy and Stanton, 2011). The goals, tasks, and individual actions of operators as well as the questions they might ask during operation are simply not part of the model developer’s goal system. This is problematic because the usefulness of a model can only be evaluated in relation to a particular goal (Stachowiak, 1973). In consequence, it is unclear whether an engineering model provides the information that operators need to make good decisions. To understand this problem of uncertain fit, consider our example of getting a neutralization process to a particular target acidity. The simple engineering model that describes the chemistry of the neutralization can inform operators about the proper setpoint for the acid flow given the alkalinity of the feed and the acidity target. In contrast, this model provides no information about the transient behavior of the process. It cannot say how much time is needed, whether parameter adjustments can be performed all at once or only gradually over a certain period, what risks are there , and so on. However, in case of a foaming event, this information would be valuable for operators to understand what sequence of actions can best mitigate the problem. Accordingly, if an operator’s question is only about what the target state looks like or what parameter values are needed to reach it, the model can provide this information. However, if the question is how to get there, the model remains silent. Only by specifying the task of moving to a new target state, the complete activity and the respective information requirements can be inferred. In other words, an abstraction hierarchy is needed to describe operator actions.

Abstraction Hierarchies in the Cognitive Representation of Actions A hierarchical framework similar to the AH can be used to describe how humans think about what they are doing. Action Identification Theory (AIT) (Vallacher and Wegner, 1985) states that actions are represented on different levels of abstraction. Just like in Rasmussen’s AH, these levels reflect a means-end continuum, ranging from abstract goals to specific implementation details: high levels provide a comprehensive understanding of why an action is performed (e.g., neutralizing wastewater),

80

R. Müller and L. Urbas

whereas lower levels specify how this action is put into practice (e.g., achieving a particular target acidity while preventing the negative consequences of foaming). The actions can be described on lower and lower levels, adding more and more implementation details (e.g., making sure that foam does not enter the venting system vs. increasing the gap between the top of the foam and the lid of the vessel vs. reducing liquid level and throughput vs. carefully changing setpoints of the automated control functions in small steps while considering dynamic system behaviors such as overshooting and settling time). According to AIT, people can identify their actions on any level of abstraction, and there are no fixed limits to the number of levels. At the same time, the level selection is not arbitrary but reflects an inherent tradeoff between comprehensive action understanding (high levels) and effective action maintenance (low levels). AIT assumes that this selection follows three principles. First, actions have a prepotent identity, which corresponds to people’s spontaneous answer to the question “what are you doing?” Second, people tend to adopt higher levels of action identification, which indicates that they are sensitive to the larger meaning of an action in its context. Third, people switch to lower levels when it becomes difficult to maintain the action while representing it on a high level. Overall, the selected level reflects the upper limit of the acting person’s capacity to maintain an action: representations are as abstract as possible, but still specific enough. Moreover, level selection is affected by several factors such as action context, difficulty, experience, and interindividual differences. Since its original formulation, AIT has been used in various contexts such as ecological behavior, education, team performance, and videogame playing (Cruz and Pinto, 2019; Ewell et al., 2018; Moussaoui and ¸ and Turhan, 2018). Desrichard, 2016; Sengür AIT is compatible with the perceptual cycle (Neisser, 1976) as it also assumes cyclical relationships between cognitive representations and overt behavior. Cognitive representations generate actions, and these actions in turn generate revised representations of what one is doing. This link between AIT and the perceptual cycle has important implications for the use of engineering models to help operators gain situational awareness. On the one hand, information needs to be provided on levels of abstraction that are low enough to guide perceptual exploration and the selection of specific actions. On the other hand, abstraction levels need to be high enough to enable the formation of comprehensive, generalizable mental schemata. These complementary requirements cannot always be met by one and the same kind of information. Instead, information may need to be presented on different levels of abstraction, and the relations between these levels should be made explicit. In that sense, AIT is highly relevant for human–machine interaction, as it can be used as a framework to understand operators’ cognitive activities of navigating within the AH (Meineri and Morineau, 2014). On the one hand, successful problem solvers and experts do not just focus on implementation details, but also consider high-level information about relevant processes and system functions (Hajdukiewicz and Vicente, 2002; Janzen and Vicente, 1998; Vicente et al., 1995). On the other hand, they do not only stick to those high abstraction levels, but also pay attention to the specific implementation details, and thus flexibly switch between levels (Hall et al., 2006; Rasmussen, 1985; Vicente, 2002). Therefore, it seems promising to combine the two frameworks of

5 Model-Based Operator Assistance: How to Match Engineering Models …

81

AH and AIT, relating model contents about the constraints of a technical system to operators’ cognitive representations of their actions (see Fig. 5.1).

Implications for the Use of Engineering Models in Operator Assistance Systems Models Need to Provide Information on the Right Levels of Abstraction People gravitate toward levels of action identification that are as abstract as possible (Vallacher and Wegner, 1985). This is important with regard to the matching between models and operator questions because it suggests that operators might query an assistance system in an overly abstract manner. Take our example of reaching a target acidity while considering foaming and transition processes. When operators ask questions that are too abstract (e.g., what parameter values to use for obtaining the target acidity, how to prevent foaming), the assistance system will select a model that matches this request. However, this model might lack the low-level information necessary to select and plan suitable actions and prevent unwanted side effects (e.g., timing of parameter adjustments to perform the transition). Conversely, when acting gets difficult, people tend to describe their actions on low levels of abstraction (Vallacher and Wegner, 1985). In this case, the assistance system might select a model that does provide specific implementation details. However, this model might not support the formation of appropriate mental schemata as it fails to provide the information needed to develop a comprehensive understanding of the action in its current context. For instance, operators might learn about the necessary parameter adjustment steps, but be unaware of the consequences of this action pattern and the effects of deviating from it. Thus, due to the complementary risks of posing questions on overly high and overly low levels of abstraction, models must be selected that meet operators’ actual information requirements.

Selecting Suitable Models by Matching the Levels of Abstraction The previous discussion indicates that model selection requires a matching between model contents and capabilities on the one hand, and operator activities and questions on the other. It must become clear which levels of abstraction information from a model is needed. But how to perform this matching? Doing it algorithmically would require assistance systems to deduct the appropriate levels of abstraction from operator questions. Such algorithms would not only have to translate the

82

R. Müller and L. Urbas

question to extract the requested level, but also perform an automated specification of high-level intentions and abstraction of low-level actions. This is because gaining situation awareness within the perceptual cycle requires both low- and highlevel information to support specific exploration and abstract schema generation, respectively. To date, neither an automated translation nor an automated specification and abstraction seem feasible, for at least two reasons. First, given the complex, dynamic, and context-specific nature of process control, extensive task modeling would be needed, which currently is unrealistic. Second, human action identification is highly individual (Meineri and Morineau, 2014; Vallacher and Wegner, 1985), and thus different operators are likely to generate vastly different requests. Alternatively, assistance systems could provide strong guidance and force operators to standardize their requests, making them comply with the AHs of the engineering models. However, this approach is problematic as well. For one thing, it would often keep operators from asking the questions they actually want to ask. For another, different models use different AHs, depending on the model developer’s disciplinary tasks and purposes. In consequence, the definition of mapping rules between AHs and operator requests would have to be performed individually for each model or class of models. Obviously, this is not scalable. Taken together, any attempts to automatically match the abstraction hierarchies of models and operator questions seem futile. Thus, it follows that the matching process must be carried out by the operators themselves. However, this only is possible if operators are able to evaluate the AHs that are applied in the models. That is, they must be able to understand what a model can do, because only then they can decide whether this fits their own information requirements. Therefore, operators should be supplied with appropriate descriptions of model contents and capabilities.

Describing the Contents and Capabilities of Models A provocative conclusion we reach in this chapter is that models are only useful for operator assistance if they are equipped with understandable descriptions of their purpose, limitations, and capabilities. Such descriptions should ideally convey the underlying assumptions and goals of the model, the levels of abstraction the model can cover, the questions it can answer as a result, and the input it requires. Only in this way, it is possible for operators to select suitable models from the abundance of models available in the digital twin of a plant. In other words, transparency is essential. From this perspective, a key question is how to describe what a model can do without the need to refer to model internals such as content, structure, and algorithms. One possibility is that such capability descriptions are to make it transparent to operators what information a model provides and what activities it supports. More specifically, a model could inform operators what inputs it needs and what outputs it can generate based on these inputs. These descriptions should be embedded in an AH representation that allows operators to move up and down between the levels of

5 Model-Based Operator Assistance: How to Match Engineering Models …

83

abstraction in case the answer provided by the model is too abstract or too specific, or in case they simply want to explore different levels to gain a better understanding. Given such descriptions, operators could decide whether the model provides the required information, whether another model is needed, or whether no suitable model is available. Different models might be suitable for different steps of a single activity such as fault handling. This activity could be split up into fault identification (where a model trained on observations to evaluate deviations from the nominal setpoint might come in handy), and fault location (where a model that allows asking for the propagation of fault symptoms along connections that convey mass, energy, or information would be most suitable). The need for assistance systems to describe model capabilities comes with a number of questions and challenges (for an overview see Table 5.1). One implication is that someone needs to provide the descriptions. Given that models heavily depend on the people developing them in a creative process based on a particular goal and by choosing particular abstractions (Stachowiak, 1973), it follows that the descriptions should ideally be supplied by the model developers themselves. However, as their goals and actions differ substantially from those of operators, it is questionable whether they are able to provide descriptions that are useful in the context of operation. Addressing this issue makes it necessary to consider what knowledge and competencies a person needs in order to describe model capabilities. A question that directly follows from that is to what degree it is possible to know this in advance because in complex systems even engineers cannot anticipate all future situations and requirements (Hollnagel, 2012; Perrow, 1984). Another implication is that it should be considered whether and when a reengineering of model descriptions is possible. That is, under what conditions can these descriptions be generated from existing models? Is this only possible for trivial models, or are there classes of modes that allow for it, whereas others do not? The feasibility of re-engineering might depend on the internal structure of models and on whether the modeling process has followed particular rules and procedures. This corresponds to the issue of self-descriptiveness. In an idealized world, for each model, we can say on which levels of abstraction it provides information. This is because in contrast to human action identification, the AH is precisely defined, within a given domain the number of levels is set (cf. Meineri and Morineau, 2014), and the models have a clear relation to these levels. However, in reality, the abstraction levels of models are not all standardized. One might ask whether at least for particular classes of models the assignment to a particular AH is generically possible, or whether it is necessary to determine it separately for each model instance. The latter seems to be the case, because different model developers have different conceptualizations of the technical system, resulting in the use of different AHs. In practice, this often seems to be based on known cases and other forms of prior knowledge that is not made explicit, and partly not even consciously accessible to the model developers themselves. The description of models can be supported by technologies such as ontologies. However, a problem with this is that their structure, content, purposes, and underlying assumptions (i.e., their how, what, and why) are often hard to understand by people

84

R. Müller and L. Urbas

Table 5.1 Questions and challenges resulting from the need to provide model descriptions that are suitable for operation. Note that this list merely represents an initial collection of ideas, and as indicated by the empty cells, there is plenty of work remaining for future interdisciplinary research Questions and challenges

Specification examples

Examples for related technical and/or organizational means

What information should model descriptions include?

Capabilities, supported activities, levels of abstraction, goals, assumptions, evidence, and design justifications of the model developer

Goal Structuring Notation (Spriggs, 2012), Safety Assurance Cases (Kelly and Weaver, 2004)

In what way should descriptions be provided?

Required inputs and resulting outputs, within an AH representation

Competency and Performance Questions (Mizoguchi and Ikeda, 1998; Uschold and Gruninger, 1996)

Who has the required knowledge for generating descriptions?

About model contents, about operators’ information requirements



What do model developers have to learn in order to generate useful descriptions? How can they be supported?

FAIR (Findable, Accessible, Interoperable, Re-Usable) Meta Data Engineering, curated vocabularies for Abstraction Identification Hierarchies

FAIR guiding principles (Wilkinson et al., 2016) (Wilkinson et al., 2016) and applications (Goble et al., 2020; Lamprecht et al., 2020; Murillo, 2020)

What pitfalls are there in interpreting and using descriptions?

Not having the required background knowledge



What background knowledge is needed?

Limits of applicability, – context-specificity, knowledge about the elements (e.g., inputs)

Is re-engineering possible? Only for trivial models, only Under what conditions can for particular classes of descriptions be generated from models? existing models? What features of models need to be known to determine whether re-engineering is possible?



Internal structure, whether the – modeling process has followed particular rules and procedures

other than their developers, which can make their application difficult. Thus, it is a challenging but promising endeavor to use such technologies in ways that allow for a generation of understandable and helpful model descriptions.

5 Model-Based Operator Assistance: How to Match Engineering Models …

85

Conclusion Having access to high-quality information about the inner workings of a technical system does not guarantee that you can successfully interact with it. Despite the potentials of using engineering models for operator assistance to enhance situation awareness, it is not trivial to determine which models are useful in a particular operational context. This is because the domain-centered abstractions underlying the models are hard to match with the cognition-centered abstractions used by humans to represent their actions. Due to the variable and individual nature of the latter, such matching cannot be performed algorithmically by inferring the required levels of abstraction from operator questions but must be accomplished by operators. However, an essential precondition for enabling operators to select suitable models is that these models are equipped with descriptions of their contents and capabilities. These descriptions should enable operators to understand whether a model fits their present goals and tasks. The need for understandable model descriptions raises a number of questions and challenges for future interdisciplinary work. Acknowledgements Parts of this work were funded by the German Research Foundation (DFG) grants for project HyTec (PA 1232/12-3) and Research Training Group CD-CPPS (GRK 2323/1).

References Bamberg, A., Urbas, L., Bröcker, S., Kockmann, N., & Bortz, M. (2020). What makes the digital twin an ingenious companion? Chemie Ingenieur Technik, 92(3), 192–198. https://doi.org/10. 1002/cite.201900168 Bennett, K. B. (2017). Ecological interface design and system safety: One facet of Rasmussen’s legacy. Applied Ergonomics, 59(Part B), 625–636. https://doi.org/10.1016/j.apergo.2015. 08.001. Cruz, K. S., & Pinto, J. (2019). Team focus in focus: Its implications for real teams and their members. Journal of Work and Organizational Psychology, 35(2), 123–133. https://doi.org/10. 5093/jwop2019a14 Dörner, D., & Funke, J. (2017). Complex problem solving: What it is and what it is not. Frontiers in Psychology, 8(1153), 1–11. https://doi.org/10.3389/fpsyg.2017.01153 Drath, R. (2010). Datenaustausch in der Anlagenplanung mit AutomationML. Springer. Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. Endsley, M. R. (2017). From here to autonomy: Lessons learned from human–automation research. Human Factors, 59(1), 5–27. https://doi.org/10.1177/0018720816681350 Ewell, P. J., Hamilton, J. C., & Guadagno, R. E. (2018). How do videogame players identify their actions? Integrating Action Identification Theory and videogame play via the Behavior Identification Form-Gamer. Computers in Human Behavior, 81, 189–197. https://doi.org/10. 1016/j.chb.2017.12.019 Fischer, A., Greiff, S., & Funke, J. (2012). The process of solving complex problems. Journal of Problem Solving, 4(1), 9–42. https://doi.org/10.7771/1932-6246.1118 Funke, J. (2010). Complex problem solving: A case for complex cognition? Cognitive Processing, 11(2), 133–142. https://doi.org/10.1007/s10339-009-0345-0

86

R. Müller and L. Urbas

Goble, C., Cohen-Boulakia, S., Soiland-Reyes, S., Garijo, D., Gil, Y., Crusoe, M. R., Peters, K., & Schober, D. (2020). FAIR computational workflows. Data Intelligence, 2(1–2), 108–121. https:/ /doi.org/10.1162/dint_a_00033 Hajdukiewicz, J. R., & Vicente, K. J. (2002). Designing for adaptation to novelty and change: Functional information, emergent feature graphics, and higher-level control. Human Factors, 44(4), 592–610. https://doi.org/10.1518/0018720024496980 Hall, T. J., Rudolph, J. W., & Cao, C. G. L. (2006). Fixation and attention allocation in anesthesiology crisis management: An abstraction hierarchy perspective. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (pp. 1064–1067), San Fransisco, CA. Hirsch-Kreinsen, H. (2014). Wandel von Produktionsarbeit - Industrie 4.0. WSI Mitteilungen, 6, 421–429. Hollnagel, E. (2012). Coping with complexity: Past, present and future. Cognition, Technology & Work, 14(3), 199–205. https://doi.org/10.1007/s10111-011-0202-7 Janzen, M. E., & Vicente, K. J. (1998). Attention allocation within the abstraction hierarchy. International Journal of Human-Computer Studies, 48(4), 521–545. https://doi.org/10.1006/ijhc.1997. 0172 Kelly, T., & Weaver, R. (2004). The goal structuring notation–a safety argument notation. In Proceedings of the Dependable Systems and Networks 2004 Workshop on Assurance Cases (pp. 1–6). Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., CapellaGutierrez, S., et al. (2020). Towards FAIR principles for research software. Data Science, 3(1), 37–59. https://doi.org/10.3233/DS-190026 Lind, M. (1994). Modeling goals and functions of complex industrial plants. Applied Artificial Intelligence, 8(2), 259–283. https://doi.org/10.1080/08839519408945442 McIlroy, R. C., & Stanton, N. A. (2011). Getting past first base: Going all the way with Cognitive Work Analysis. Applied Ergonomics, 42(2), 358–370. https://doi.org/10.1016/j.apergo.2010. 08.006 Meineri, S., & Morineau, T. (2014). How the psychological theory of action identification can offer new advances for research in cognitive engineering. Theoretical Issues in Ergonomics Science, 15(5), 451–463. https://doi.org/10.1080/1463922X.2013.815286 Mizoguchi, R., & Ikeda, M. (1998). Towards ontology engineering. Journal-Japanese Society for Artificial Intelligence, 13, 1–10. Moussaoui, L. S., & Desrichard, O. (2016). Act local but don’t think too global: The impact of ecological goal level on behavior. The Journal of Social Psychology, 156(5), 536–552. https:// doi.org/10.1080/00224545.2015.1135780 Müller, R., Kessler, F., Humphrey, D. W., & Rahm, J. (2021). Data in context: How digital transformation can support human reasoning in cyber-physical production systems. Future Internet, 13(6), 156. https://doi.org/10.3390/fi13060156 Müller, R., & Oehm, L. (2019). Process industries versus discrete processing: How system characteristics affect operator tasks. Cognition, Technology & Work, 21(2), 337–356. https://doi.org/ 10.1007/s10111-018-0511-1 Murillo, A. P. (2020). An examination of scientific data repositories, data reusability, and the incorporation of FAIR. Proceedings of the Association for Information Science and Technology, 57(1), e386. https://doi.org/10.1002/pra2.386 Naikar, N. (2017). Cognitive work analysis: An influential legacy extending beyond human factors and engineering. Applied Ergonomics, 59(Part B), 528–540. https://doi.org/10.1016/j.apergo. 2016.06.001. Naikar, N., Hopcroft, R., & Moylan, A. (2005). Work domain analysis: Theoretical concepts and methodology (Technical Report, Issue). Neisser, U. (1976). Cognition and reality: Principles and implications of cognitive psychology. Freeman. Perrow, C. (1984). Normal accidents: Living with high-risk technologies. Basic Books.

5 Model-Based Operator Assistance: How to Match Engineering Models …

87

Plant, K. L., & Stanton, N. A. (2015). The process of processing: Exploring the validity of Neisser’s perceptual cycle model with accounts from critical decision-making in the cockpit. Ergonomics, 58(6), 909–923. https://doi.org/10.1080/00140139.2014.991765 Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Transactions on Systems, Man and Cybernetics, SMC-13(3), 257–266. https://doi.org/10.1109/TSMC.1983.6313160. Rasmussen, J. (1985). The role of hierarchical knowledge representation in decisionmaking and system management. IEEE Transactions on Systems, Man, and Cybernetics, SMC-15(2), 234– 243. https://doi.org/10.1109/TSMC.1985.6313353. Rasmussen, J. (1986). Information processing and human machine interaction: An approach to cognitive engineering. North-Holland. Schmidt, J., & Müller, R. (in preparation). Disciplinary differences in mental models: How mechanical engineers and automation engineers evaluate machine processes. Sengür, ¸ D., & Turhan, M. (2018). Prediction of the action identification levels of teachers based on organizational commitment and job satisfaction by using k-nearest neighbors method. Turkish Journal of Science and Technology, 13(2), 61–68. Sheridan, T. B. (2011). Adaptive automation, level of automation, allocation authority, supervisory control, and adaptive control: Distinctions and modes of adaptation. IEEE Transactions on Systems, Man, and Cybernetics-Part a: Systems and Humans, 41(4), 662–667. https://doi.org/ 10.1109/TSMCA.2010.2093888 Spriggs, J. (2012). GSN-the goal structuring notation: A structured approach to presenting arguments. Springer Science & Business Media. Stachowiak, H. (1973). Allgemeine modelltheorie. Springer. Urbas, L., & Doherr, F. (2011). AutoHMI: A model driven software engineering approach for HMIs in process industries. In 2011 IEEE International Conference on Computer Science and Automation Engineering (pp. 627–631). IEEE. Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136. https://doi.org/10.1017/S0269888900007797 Vallacher, R. R., & Wegner, D. M. (1985). A theory of action identification. Lawrence Erlbaum Associates. Vicente, K. J. (1999). Cognitive work analysis: Towards safe, productive, and healthy computerbased work. Lawrence Erlbaum Associates. Vicente, K. J. (2002). Ecological interface design: Process and challenges. Human Factors, 44, 62–78. Vicente, K. J., Christoffersen, K., & Pereklita, A. (1995). Supporting operator problem solving through ecological interface design. IEEE Transactions on Systems, Man, and Cybernetics, 25(4), 529–545. https://doi.org/10.1109/21.370186 Vicente, K. J., & Rasmussen, J. (1992). Ecological interface design: Theoretical foundations. IEEE Transactions on Systems, Man, and Cybernetics, 22, 1–18. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Mons, B., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Romy Müller is a Postdoc at the Chair of Engineering Psychology and Applied Cognitive Research of the Technische Universität Dresden. Her research focuses on the psychological mechanisms underlying human–machine interaction in complex industrial systems. In her basic research on decision-making, she investigates how people balance stability and flexibility in action selection. In her applied research, she studies fault diagnosis and the role assistance systems can play in it. As these topics require a genuinely interdisciplinary approach, she cooperates with researchers in computer science, engineering, and vocational education, as well as with industrial partners.

88

R. Müller and L. Urbas

Leon Urbas (IEEE, NAMUR, GMA, GI, processNet, DKE) directs the Chair of Process Control Systems and the Process Systems Engineering Group at Technische Universität Dresden. He is a computational systems engineer by training with a Ph.D. in Operator Trainings Systems and several years of industrial experience. He investigates methods for and opportunities of digital transformations in the process industries, with a research focus on modular plants, the alignment of the life cycles of a real plant and its digital twin, and innovative ways to utilize the real-time integrated network of models, simulations, and data to interface humans with chemical plants.

Part III

Behavioral Cybersecurity

Chapter 6

Behavioral Game Theory in Cyber Security: The Influence of Interdependent Information’s Availability on Cyber-Strike and Patching Processes Zahid Maqbool, V. S. Chandrasekhar Pammi, and Varun Dutt

Abstract Cyber-strikes are on the rise, and the availability of information on the actions of hackers and analysts has the potential to shape decision-making in these situations. However, previous literature has not adequately explored this relationship. This research investigates the impact of the availability of information about the opponent’s actions on decision-making in cyber-strike situations. 100 participants, consisting of 50 pairs of hackers and analysts, were randomly assigned to two conditions (AV and Non-AV) in a repeated Markovian game. In AV, both players had complete information on the opponent’s actions and payoffs, while in Non-AV, the information was missing. In the Markovian game, the hacker repeatedly chose between strike or not-strike actions, and the analyst repeatedly chose between patch or not-patch actions. In both AV and Non-AV conditions, after the analyst’s patch actions, the network was randomly in a non-susceptible (ns) state or a susceptible (s) state with each state occurring 50% of the time. The strike proportions were similar in ns and s states and similar in AV and Non-AV conditions. However, the patch proportions were lesser in the Non-AV condition than the AV condition and greater in the ns state than the s state. Keywords Strike · Patch · Decision-making · Game theory · Opponent’s action information · Markovian games · Vulnerabilities

Z. Maqbool Government Degree College Dooru, Jammu & Kashmir, Anantnag 192211, India V. S. C. Pammi Centre of Behavioral and Cognitive Sciences, University of Allahabad, Allahabad, India e-mail: [email protected] V. Dutt (B) Applied Cognitive Science Laboratory, Indian Institute of Technology, Mandi 175005, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_6

91

92

Z. Maqbool et al.

Summary Cyber-strikes are increasing, and the availability of information about the actions of hackers (people who wage cyber-strikes) and analysts (people who patch vulnerabilities) is likely to influence the decision-making in cyber-strike situations. However, literature has given less attention to how information about an opponent’s actions in cyber-strike situations affects decision-making. The research’s objective is to address existing gaps in the literature by investigating the effect of the availability of information about the opponent’s actions on adversary (hacker) and patcher (analyst) decisions in Markovian-security games. One hundred participants (i.e., 50-pairs of hackers and analysts) were assigned randomly to two between-subject conditions in a repeated Markovian game: AV and Non-AV. In AV, each player had complete information regarding the opponent’s actions and payoffs; this information was missing in the Non-AV condition. In the Markovian game, the hacker repeatedly chose between strike or not-strike actions, and the analyst repeatedly chose between patch or notpatch actions. In both AV and Non-AV conditions, after the analyst’s patch actions, the network was in a non-susceptible (ns) state on 50% of the rounds and in a susceptible (s) state on 50% of the rounds. The strike proportions were similar in ns and v states and similar in AV and Non-AV conditions. However, the patch proportions were lesser in the Non-AV condition than the AV condition and greater in the ns state than the s state. Our findings have significant implications in the real world. First, we expect analysts to overly patch computer systems in the real world, regardless of whether the information about opponents is available or not. Second, it appears that hackers do seem to care whether computer systems are susceptible or not when striking networks when interdependent information is available. As a result, a hacker’s perception of susceptibility or non-susceptibility is likely to influence his or her decision to launch a cyber-strike.

Introduction Cyber-strikes and unauthorized access to computer networks, programs, and data have become a concern, especially with the exponential rise in the number of connected devices and their widespread use in all sectors (Roy et al., 2010). Hackers adopt different strategies to penetrate an organization or a network and for this the hackers may need information about the different security protocols in place inside the organization (Chai, 2021). Decision-making related to the security of a system is connected to information, and studies of information have a variety of applications in the fields of cyber-security, economics, and artificial intelligence (Feltham, 1968). However, a decision maker is frequently faced with incomplete or inaccurate information (McEneaney & Singh). A typical cyber-security environment involves hackers who have malicious intent towards the infrastructure of a business or an organization and security analysts involved in making the organization’s infrastructure

6 Behavioral Game Theory in Cyber Security: The Influence …

93

non-susceptible to any security threat (Alpcan & Basar, 2006). And in such a scenario hackers do not typically have any information available to them regarding the target organizations security mechanisms, however hackers can use technology and social engineering mechanisms to gain any information likewise, analysts usually do not have any information on the modus operandi of hackers. However, analysts can also use technological interventions like IDS or honey pots to gain any relevant information (Aggarwal & Dutt, 2020; Maqbool et al., 2016). However, very little is known about the impact of such information of the decision making of hackers and analysts while they interact. Literature has used game theory to study and model the various aspects of the interaction between the hackers and an organization’s security mechanism (security analysts). More specifically, these studies have modeled the interaction between hackers and analysts using non-cooperative 2 × 2 game designs. Most of these studies have assumed complete information about possible actions and the resulting outcomes available to both players (Alpcan & Basar, 2006). Interdependent information has been regarded as a critical aspect in the emergence of cooperation in these games (Alpcan & Basar, 2006; De Dreu et al., 2008; Grimes, 2019; Kloosterman, 2015). However, to the best of our knowledge, no study has been conducted to investigate the impact of the availability or non-availability of interdependent information on hackers’ and analysts’ strike-and-patch decisions in computer networks. In the current study we varied the interdependent information into two between subject conditions, in one of the condition all interdependent information was available and in other condition interdependent information is absent. The purpose of this study was to test whether strikers’ and analysts’ strike-andpatch decisions are affected depending on the availability or npn-availability of interdependent information. To test our hypothesis, we used experiments involving Markovian games (Arora & Dutt, 2013; Dutt et al., 2013; Gonzalez & Dutt, 2011; Maqbool et al., 2016, 2017) with assumptions determined by behavioral and other cognitive theories in games (Arora & Dutt, 2013; Camerer, 2011; Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003). Markovian games (Alpcan & Basar, 2006) can be used to investigate the impact of the presence or absence of interdependent information on hackers’ and analysts’ strike-and-patch decisions. Markovian games allow human participants performing as hackers and analysts to simulate their interaction (strike or not-strike actions for the hackers); (patch or no-patch actions for analysts) and obtain payoffs (rewards as well as penalties) for their actions. Through each other’s actions, this interaction between hackers and analysts could be repeated multiple times. According to the Markovian assumption, the analyst’s most recent patch operation may impact the network’s vulnerability to cyber-strikes in its current state. Thus, in Markovian game the system can either result in a network being immune to cyber-strikes or susceptible to them. Kloosterman (2015) has investigated the impact of varying interdependent information and has concluded that the equilibrium payoffs for the set of perfectly symmetric or complete information sub-games are contained in the equilibrium

94

Z. Maqbool et al.

payoffs for the set of asymmetric or partial information Markovian games. Furthermore, humans tend to have cognitive and motivational biases in paying attention to information and making decisions based on the available information (De Dreu et al., 2008; Ocasio, 2011; Tversky & Kahneman, 1974). Xiaolin et al. (2008) investigated information available for network information system’s risk assessment using Markovian games. According to Xiaolin et al. (2008), if vulnerabilities in the network are not repaired well in time, cyber-strikes will cause damage that becomes more severe as the strike spreads. Conversely, when analysts repaired network vulnerabilities quickly, network damages were smaller. These findings agree with Markovian game dynamics, which assume the network to react based on the analyst’s last actions. Xiaolin et al. (2008) used a mathematical simulation technique to derive predictions regarding Nash equilibria; however, the literature does not yet include studies evaluating human decisions against the Nash predictions. In this chapter, we address this gap in the literature and study how the presence or absence of interdependent information in Markovian games affects analyst’s and hacker’s decision actions. Ben-Asher and Gonzalez (2015) in their study on Intrusion Detection systems concluded that more knowledge about cyber security allowed for a more accurate identification of malicious events and a decrease in misclassification of benign events as malicious for the analysts. Likewise, more knowledge on different kinds of strikes and different vulnerabilities might help the striker to go un-noticed while striking the systems or organizations. In the literature on cognition, a decision-making theory, Instance-based Learning Theory or IBLT (Arora & Dutt, 2013; Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003), is effective in capturing the decisions made by parties who act as hackers and analysts (Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003; Lejarraga et al., 2012). The IBLT theory suggests that individuals learn by accumulating, recognizing, and refining instances that contain information about a particular decision-making situation. IBLT defines five learning mechanisms that are central to dynamic decision making: (1) Instance-based knowledge—knowledge from your past experiences are accumulated in instances(2) recognition based retrieval—Instances are recalled and retrieved from memory depending on the similarity with the current decision making situation (3) adaptive heuristics—If there is no similarity with the previous experiences then new instances need to be created in memory (4) necessity—keeping an eye on alternatives if the best option does not fit the situation (5) feedback—updating of the utility of instances in light of new results. In this chapter, IBLT framework derives expectations for participants’ decisions in Markovian games, where participants perform as hackers and analysts. As per IBLT, human hackers are cognitively limited in recalling and remembering information. They make decisions based on the recency and frequency of the available information. Applying IBLT to the analysts’ and hackers’ experience in Markovian games will better understand how the presence or absence of interdependent information impacts decisions related to the patching process. This may also help us explain how human decisions diverge from Nash predictions on cognitive mechanisms of memory and recall.

6 Behavioral Game Theory in Cyber Security: The Influence …

95

The following section presents the Markovian game and the Nash predictions for the patch and strike proportions. Next, the hypotheses based on IBLT and their testing in a Markovian game are detailed. Lastly, results and their potential implications are discussed for real-world scenarios.

The Markovian Game Figure 6.1 illustrates the Markovian game (Alpcan & Ba¸sar, 2010; Xiaolin et al., 2008). In this game, two people have involved: a hacker and an analyst. The game consists of several consecutive rounds, and both players aim to maximize their rewards. Each player has two actions available: the hacker may strike (str) or notstrike (nstr), while the security analyst may patch (p) or not-patch (np). The strike action corresponds to a hacker gaining access to a network and striking it; the patch action corresponds to an analyst patching a machine to protect it. A participant is randomly assigned the hacker or analyst role when playing the game. Figure 6.1a demonstrates two network states for an action set that a hacker or analyst can take, susceptible (s) and not-susceptible (ns). When a network is in a susceptible state (s), a hacker will likely access it. However, that probability is extremely low when a network is in a non-susceptible state (ns). The analyst’s last action (p or np) determines the transition to the s and ns states. Suppose, for example, that the analyst begins patching computers on the network at time t. In that case, this patching may increase the network’s likelihood of being in the ns state in round t + 1. In contrast, the failure to patch increases the probability of the network in the s state in round t + 1. The transitions from state s to state ns or vice versa depend on the transition probability from the state ns to s (= 0.5) and the transition probability from state s to ns (= 0.5). The following Markovian process determines the probability value of each state in round t: P(t) = K ( p or np) ∗ P(t − 1)

(6.1)

where K( p or np) refers to the state-transition (st) matrix (Fig. 6.1a shows the st matrices for analyst actions), P(t) and P(t − 1) refer to state probabilities in state s and ns in rounds t and t − 1, respectively. At the games’ start, the state probability of s or ns is equally likely (= 0.5):  P(1) =

0.5 0.5

 (6.2)

The 0.5 values correspond to the state probability of the s state (first row) and the ns state (second row), respectively. Because of the hacker’s and analyst’s action combination, the payoffs for each state s and ns are different (as shown in Fig. 6.1b).

96

Z. Maqbool et al.

Fig. 6.1 A Markovian game’s payoff and st matrices. a The st matrices depict transitions from non-susceptible (ns) to susceptible (s) states. b The payoffs for the ns and s states for analyst and hackers

These payoffs constitute a sum-to-zero game in which the gains for hackers (analysts) are more significant when the state is s (ns). For example, in the state s matrix, an a and p action combination results in a 5-point reward and a 5-point penalty for the analyst and hacker, respectively (the analyst catches the hacker striking). Similarly, the payoff for other combinations of actions in Fig. 6.1b could be derived. When the matrices in states s and ns are compared, we observe that state ns has lower (higher) penalties and more significant (lower) benefits for the analysts (hackers) compared to the state s. The mixed strategy Nash equilibria in the Markovian game was computed. Let x denote the strike proportion, and 1 − x denote the not-strike proportion. Similarly, let y denote the patch proportion and 1 − y denote the not-patch proportion. At equilibrium, the hackers and analysts will become uninterested in payoffs from different actions. As a result, we obtain the Nash action proportions as shown below:

6 Behavioral Game Theory in Cyber Security: The Influence …

97

For the s state: − 2 ∗ (1 − x) + 3 ∗ x = +0 − 11 ∗ x and + 11 ∗ (1 − y) − 3 ∗ y = +0 + 2 ∗ y 1 11 ⇒ x = (= 0.125) and y = (6.3) (= 0.687) 8 16 For the ns state: 5 ∗ x − 1 ∗ (1 − x) = −10 ∗ x 1 ⇒x= (= 0.062) and y = 16

+ 0 and − 5 ∗ y + 10 ∗ (1 − y) = 1 ∗ y+ 5 (6.4) (= 0.625) 8

This study compared these Nash actions to participant actions (reported ahead).

Expectations in the Markovian Game In the Markovian game described above, the rewards and penalties in the AV (where the interdependent information is available) and Non-AV (where the interdependent information is not available) patching cases are the same (see Fig. 6.1b). The payoffs in these matrices are of similar magnitudes. As a result, the payoff for hackers and analysts is comparable across both matrices with similar valence. As per IBLT, people tend to maximize their experienced payoffs in terms of blended values across actions (Arora & Dutt, 2013; Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003). Since hacker and analyst participants would experience comparable payoffs across AV and Non-AV scenarios, they would likely possess comparable blended values across both situations. Consequently, we expect a comparable proportion of strike and patch actions across both AV and Non-AV information conditions. Furthermore, IBLT suggests that due to memory and recall limitations, human decisions are unlikely to follow Nash proportions. Humans tend to make decisions based on the outcome of the most recent situation and the frequency of these outcomes. This tendency to base decisions on recent and frequent outcomes would make it difficult for participants to estimate the optimal blended values based on their actions. The Nash proportions will reveal the optimal strike and patch proportions. These assumptions will be tested using a lab-based experiment described ahead.

98

Z. Maqbool et al.

Experiment This section reports a laboratory experiment in which participants performed as hackers and analysts (Fig. 6.1). The experiments aimed to determine how availability (AV) and non-availability (Non-AV) of interdependent information affects the hacker and analyst decisions.

Experimental Design A total of 100 people participated in this study. Participants were assigned randomly to one of the two between-subjects conditions. One condition was AV (N = 50) and the other was Non-AV (N = 50). In each condition, twenty-five participants performed as hackers; whereas, twenty-five participants performed as analysts. We ran 50 rounds of each condition, simulating real-time hacker-analyst interactions. To put our expectations to the test, we compared the strike and patch proportions of the participants in different conditions. Furthermore, we compared the average strike and patch proportions in five-blocks of ten-rounds each across conditions and states. In addition to this, participant proportions were compared to corresponding Nash proportions (derived from Eqs. 6.3 and 6.4). To test our expectations, we used mixed-factorial ANOVAs. Post-hoc tests were performed to compare the participant and Nash proportions across different conditions and states. For all comparisons, a 0.05 alpha level and a 0.8 power level were utilized.

Respondents Eighty-two percent of those who took part in this experiment were males. Participants’ age ranged between 18 and 30 (Mean = 22 years, SD = 1.9 years). The majority of the participants were undergraduate students (70%). All the participants were from STE < background. Forty-eight percent of participants possessed computer engineering degrees; 36% percent had a background in electrical engineering; 12% studied mechanical engineering; and, 4% majored in basic sciences. Participants in this Markovian game were asked to get as much payoff as possible and they were rewarded based on payoff obtained in the game. To calculate the monetary reward, a participant’s final game score was converted to real money in the following ratio: INR 1.0 equaled 55 points. On average, the game took 20 min to complete.

6 Behavioral Game Theory in Cyber Security: The Influence …

99

Procedure An email advertisement was circulated to recruit participants. Participation was not compulsory, and participants agreed via written approval before beginning the experiment. The ethics committee approved the study. Participants were given instructions about the task at hand and its goals (to maximize their total payoff). Payoff matrices and the range of possible actions were explained to the participants as part of instructions. Before the study began, all queries in the instructions were answered. Participants had complete knowledge about the actions and payoffs for them and their partners in the AV condition (payoff matrixes were revealed to both players). At the same time, this information was absent in the Non-AV condition. In addition to this in the AV condition, both participants received feedback about their partner’s actions and payoffs; however, this feedback was absent in the Non-AV condition. Participants were paid their monetary compensation after the study ended.

Results The Proportion of Strike and Patch Actions in AV and Non-AV Conditions The strike and patch proportions were computed in the AV and Non-AV conditions (see Fig. 6.2). The strike action proportions were no different between the AV and Non-AV conditions (0.364~0.36; F (1, 96) = 0.28, p = 0.59, ï2 = 0.003). However, the proportion of patch actions between the AV and Non-AV conditions were significantly different (0.67 > 0.61; F (1, 96) = 12.55, p < 0.05, ï2 = 0.12). These results, overall, are in line with our hypotheses.

The Proportion of Strike and Patch Actions in s and ns States The strike and patch proportions across network states was analyzed. Figure 6.3 shows no significant difference in the strike action proportions in the s state versus the ns state (0.36~0.32; F (1, 96) = 1.02, p = 0.32, ï2 = 0.011). Similarly, the difference in the patch action proportions between the s and ns states was also not statistically significant (0.59~0.62; F (1, 96) = 0.64, p = 0.43, ï2 = 0.007). Following that, we compared the participants’ strike and patch proportions to the Nash action proportions in the two s and ns states. The strike action proportions differed significantly from their Nash action proportions in the s and ns states (state s: F (1, 98) = 71.22, p < 0.05, ï2 = 0.421; state ns: F (1, 98) = 73.13, p < 0.05, ï2 = 0.427). Furthermore, in the s state, patch proportions differed from their Nash proportions (F (1, 98) = 10.47, p < 0.05, ï2 = 0.097). However, in the ns state, the patch proportions were

100

Z. Maqbool et al.

Fig. 6.2 The strike and patch proportions across the AV and Non-AV conditions

no different from their Nash proportion (F (1, 98) = 0.15, p = 0.70, ï2 = 0.002). Overall, these results are in line with our hypotheses.

Fig. 6.3 The strike and patch proportions in the s and ns states

6 Behavioral Game Theory in Cyber Security: The Influence …

101

Strike and Patch Proportions in AV and Non-AV Conditions and s and ns States In addition, the strike and patch proportions were computed in the AV and Non-AV conditions and s and ns states (see Fig. 6.4). As shown in Fig. 6.4, the interaction effect between conditions and states had no effect on the strike proportions performed by hackers (F (1, 98) = 2.48, p = 0.12, ï2 = 0.025). However, the interaction between conditions and states has a significant effect on the patch proportions (F (1, 98) = 4.58, p < 0.05, ï2 = 0.046). Overall, the results are in line with our hypotheses. Following that, we compared the strike and patch proportions to the Nash proportions. Across all conditions and states, the strike proportions performed by hackers differed significantly from their Nash proportions (AV and state s: F (1, 48) = 42.46, p < 0.05, ï2 = 0.47; AV and state ns: F (1, 48) = 24.70, p < 0.05, ï2 = 0.34); Non-AV and state s: F (1, 48) = 30.93, p < 0.05, ï2 = 0.39; and, Non-AV and state ns: F (1, 98) = 55.28, p < 0.05, ï2 = 0.54). Thus, for hackers, these results agree with the hypotheses. For analysts, in case of AV condition the patch proportions were not significantly different from their Nash proportions across both s and ns states (AV and state s: F (1, 48) = 0.0, p = 0.98, ï2 = 0.000; AV and state ns: F (1, 48) = 0.22, p < 0.64, ï2 = 0.005). Similarly in case of Non-AV condition, the patch proportions were not significantly different from their Nash action proportions in the ns state (Non-AV and state ns: F (1, 48) = 1.43, p = 0.23, ï2 = 0.029); however, this difference was significant in the s state (Non-AV and state s: F (1, 48) = 23.91, p < 0.05, ïp2 = 0.333). Overall, these results are consistent with the hypotheses in the ns states, but not in the s state.

Fig. 6.4 The strike/patch proportions across the information conditions and network states

102

Z. Maqbool et al.

Strike and Patch Proportions Over Blocks Next, we calculated the average strike and patch proportions across five blocks of ten rounds each across states (s and ns) and conditions. Figures 6.5a and b show the strike action proportions across 5-blocks in AV and Non-AV conditions. We compared the strike proportions across blocks between s and ns states. No significant difference was found in the strike proportions across blocks in the s and ns states in the AV condition (F (4, 192) = 0.19, p = 0.95, ï2 = 0.004). Again, there was no significant difference in the strike proportions across blocks in the s and ns states in the Non-AV condition (F (4, 192) = 0.37, p = 0.83, ï2 = 0.04). Overall, these results are also consistent with our hypotheses.

Fig. 6.5 Strike action proportions across blocks. a AV condition and b Non-AV condition

6 Behavioral Game Theory in Cyber Security: The Influence …

103

Figure 6.6 shows the patch proportions across 5-blocks in AV condition (Fig. 6.6a) and Non-AV condition (Fig. 6.6b). We compared patch proportions across blocks between the s and ns states. There was no significant difference in the patch proportions across blocks in the s and ns states in the AV condition (F (4, 192) = 0.59, p = 0.67, ï2 = 0.012). Again, there was no significant difference in the patch proportion across blocks in the s and ns states in the Non-AV condition (F (4, 192) = 1.25, p = 0.29, ï2 = 0.024). Overall, these results are also consistent with our hypotheses.

Fig. 6.6 Proportion of patch actions across blocks. a AV condition and b Non-AV condition

104

Z. Maqbool et al.

Discussion and Conclusion With technological advancements, cyberspace has become an essential part of people’s lives, making lives easy from online reservations to drone technology (Humayed et al., 2017). At the same time, this cyberspace is prone to many strikes and can bring a lot of misery to life. Thus, there is an immediate need to secure our digital infrastructure. Information about hackers’ previous decisions and the outcomes of those decisions might help organizations secure their infrastructure (Abramson et al., 2005). Similarly, any information regarding the in-place security protocols of a target organization might help the hackers wage an appropriate cyber-strike type (Abramson et al., 2005). However, in real world, attackers may not possess any information regarding an organization’s security protocols and analysts may not possess any information regarding the actions of hackers and their outcomes (Aggarwal & Dutt, 2020). Via a human experiment, this paper found the impact of interdependent information on hackers’ and analysts’ decision-making. Results showed that the strike and patch proportions differed in the AV and Non-AV conditions. Hackers deviated from Nash equilibrium across conditions and states, however, in some instances analysts did deviate from their optimal behavior. These results are explained based upon assumptions of cognitive theories in games (Arora & Dutt, 2013; Camerer, 2011; Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003). First, we observed that the strike and patch proportions were no different across the two information conditions. A possible explanation for this result could be that the payoffs used in this experiment possessed similar magnitude and valance across both the s and ns states. As previously stated, according to IBLT, people tend to make those actions that lead to maximization of their perceived payoffs in any decision-making situations. This is done by calculating the blended value for each available action and selecting one with the highest blended value (Arora & Dutt, 2013; Dutt et al., 2013; Gonzalez & Dutt, 2011; Gonzalez et al., 2003). Hacker and analyst participants likely perceived the payoffs to be similar across both information conditions because they faced comparable payoffs in both these conditions. In future research, it would be interesting to consider making payoffs more deviant between information conditions. Second, the strike and patch proportions differed from their Nash proportions. This difference is also explainable by IBLT. IBLT argues that humans are limited by memory and recall processes and make repeated decisions based on recency and frequency of outcomes (Dutt et al., 2013; Gonzalez & Dutt, 2011). As it turns out, this excessive reliance on recency and frequency by the participants did not allow them to estimate optimal Nash actions. It caused them to differ from their Nash proportions. Third, results showed the strike and patch proportions to be no different between susceptible and non-susceptible states. However, we also observe that patch actions are higher in both AV and Non-AV information conditions in the non-susceptible state than the susceptible state. It seems that higher payoffs associated with the ns state are causing the analyst to patch more and increase their proportion of patch

6 Behavioral Game Theory in Cyber Security: The Influence …

105

actions towards their optimal Nash proportions. A probable reason could be the resemblance in strike proportions between the s and ns states. When patching is done of computer systems, there is a good chance that the network becomes nonsusceptible, and hackers are penalized for their network strikes. Furthermore, in the non-susceptible state, the penalty for being caught while striking is much higher than in the susceptible state. Since, participants were asked to get as much payoff as possible; according to BGT, they would tend to decrease actions that would result in lower payoffs. In conclusion, due to excessive losses in the susceptible state, hackers are likely to reduce their strike proportions in the susceptible state compared to the non-susceptible state. Our findings also revealed that in the non-susceptible state, analysts had similar patching to their Nash proportions, despite our expectations that they would differ. One possible explanation for this result is that the payoffs in the non-susceptible state are higher than the susceptible state. Excessive patching causes the system to be in a non-susceptible state, thus causing maximization of payoffs. As analysts exhibited high patching proportions across both states, their action proportions seem to agree with the Nash action proportions in the non-susceptible state. A human experiment involving simple Markovian games was conducted in this chapter. Despite the differences between laboratory and real-world conditions, our findings could have significant implications in the real world. The analysis of personal behavior and profiles of hackers and analysts can be applied to AI-human system interaction by utilizing psychological and sociological insights to better understand how humans interact with technology and how AI systems can be designed to be more intuitive and effective. For example, analyzing the profiles of successful hackers can reveal common traits such as high levels of creativity, problem-solving skills, and attention to detail, which can inform the design of AI systems that are better able to identify and mitigate security threats. Similarly, analyzing the profiles of successful analysts can reveal key traits such as strong communication and collaboration skills, which can inform the design of AI systems that are better able to facilitate humanto-human communication and collaboration. Ultimately, this type of analysis can lead to the development of AI systems that are more human-centered and better suited to the needs and abilities of their users. Based on our findings, we expect analysts to continue to overly patch computers in the real world, regardless of the availability or non-availability of information, because interdependent information may not be available in real-world scenarios. Second, it appears that hackers do seem to care whether computer systems are susceptible or not when striking networks when interdependent information is available. As a result, a hacker’s perception of susceptibility or non-susceptibility is likely to influence his or her decision to launch a cyber-strike. It might be necessary to release less data into the public domain and portray computer networks as less susceptible to cyber-strikes in the real world.

106

Z. Maqbool et al.

Authors’ Note This work was supported by the Department of Science and Technology (DST), Government of India, under the grant to Dr. Varun Dutt and Dr. V. S. Chandrasekhar Pammi (IITM/DST-ICPS/VD/251). Besides, we are grateful to the Indian Institute of Technology Mandi for providing the required resource for this project.

References Abramson, C., Currim, I. S., & Sarin, R. (2005). An experimental investigation of the impact of information on competitive decision making. Management Science, 51(2), 195–207. https://doi. org/10.1287/mnsc.1040.0318 Aggarwal, P., & Dutt, V. (2020). The role of information about opponent’s actions and intrusiondetection alerts on cyber decisions in cyber security games. Cyber Security: A Peer-Reviewed Journal, 3(4), 363–378. https://www.ingentaconnect.com/content/hsp/jcs/2020/00000003/000 00004/art00008. Alpcan, T., & Basar, T. (2006). An intrusion detection game with limited observations. In 12th International Symposium on Dynamic Games and Applications, Sophia Antipolis, France. Alpcan, T., & Ba¸sar, T. (2010). Network security: A decision and game-theoretic approach. Cambridge University Press. Arora, A., & Dutt, V. (2013). Cyber security: evaluating the effects of attack strategy and base rate through instance based learning. In 12th International Conference on Cognitive Modeling, Ottawa, Canada. Ben-Asher, N., & Gonzalez, C. (2015). Effects of cyber security knowledge on attack detection. Computers in Human Behavior, 48, 51–61. Camerer, C. F. (2011). Behavioral game theory: Experiments in strategic interaction. Princeton university press. Chai, W. (2021). Hacker. https://www.techtarget.com/searchsecurity/definition/hacker. De Dreu, C. K. W., Nijstad, B. A., & van Knippenberg, D. (2008). Motivated information processing in group judgment and decision making. Personality and Social Psychology Review, 12(1), 22–49. https://doi.org/10.1177/1088868307304092 Dutt, V., Ahn, Y.-S., & Gonzalez, C. (2013). Cyber situation awareness: Modeling detection of cyber attacks with instance-based learning theory. Human Factors, 55(3), 605–618. https://doi. org/10.1177/0018720812464045 Feltham, G. A. (1968). The value of information. The Accounting Review, 43(4), 684–696. http:// www.jstor.org/stable/243630. Gonzalez, C., & Dutt, V. (2011). Instance-based learning: Integrating sampling and repeated decisions from experience. Psychological Review, 118(4), 523–551. https://doi.org/10.1037/a00 24558 Gonzalez, C., Lerch, J. F., & Lebiere, C. (2003). Instance-based learning in dynamic decision making. Cognitive Science, 27(4), 591–635. https://doi.org/10.1207/s15516709cog2704_2. Grimes, R. A. (2019). 8 ways your patch management policy is broken (and how to fix it). https:// www.csoonline.com/article/3025807/why-patching-is-still-a-problem-and-how-to-fix-it.html. Humayed, A., Lin, J., Li, F., & Luo, B. (2017). Cyber-physical systems security—A survey. IEEE Internet of Things Journal, 4(6), 1802–1831. https://doi.org/10.1109/JIOT.2017.2703172 Kloosterman, A. (2015). Public information in Markov games. Journal of Economic Theory, 157, 28–48. https://doi.org/10.1016/j.jet.2014.11.018. Lejarraga, T., Dutt, V., & Gonzalez, C. (2012). Instance-based learning: A general model of repeated binary choice. Journal of Behavioral Decision Making, 25(2), 143–153.

6 Behavioral Game Theory in Cyber Security: The Influence …

107

Maqbool, Z., Pammi, V. S. C., & Dutt, V. (2016). Cybersecurity: Effect of information availability in security games. In 2016 International Conference on Cyber Situational Awareness, Data Analytics And Assessment (CyberSA). Maqbool, Z., Makhijani, N., Pammi, V. S. C., & Dutt, V. (2017). Effects of motivation: Rewarding hackers for undetected attacks cause analysts to perform poorly. Human Factors, 59(3), 420–431. https://doi.org/10.1177/0018720816681888 McEneaney, W., & Singh, R. Unmanned vehicle decision making under imperfect information in an adversarial environment. In AIAA Guidance, Navigation, and Control Conference and Exhibit. https://doi.org/10.2514/6.2004-5249. Ocasio, W. (2011). Attention to attention. Organization Science, 22(5), 1286–1296. https://doi.org/ 10.1287/orsc.1100.0602 Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V., & Wu, Q. (2010). A survey of game theory as applied to network security. In 2010 43rd Hawaii International Conference on System Sciences. Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124. Xiaolin, C., Xiaobin, T., Yong, Z., & Hongsheng, X. (2008). A Markov game theory-based risk assessment model for network information system. In 2008 International Conference on Computer Science and Software Engineering.

Zahid Maqbool is an Assistant Professor at the Department of Computer Sciences, Government Degree College Dooru, UT of J&K, India and a former doctoral student at Applied Cognitive Science Laboratory, School of Computing and Electrical Engineering, Indian Institute of Technology Mandi. His research interests involve cognitive modelling, decision making and the use of behavioral game theory in cyber-security. V. S. Chandrasekhar Pammi was a Professor at the Centre of Behavioral and Cognitive Sciences, University of Allahabad, India. He was working on cognitive and computational neuroscience studies of decision making, sequential skill learning, cross-modal integration, and spatial navigation in built environments. Prof. Pammi was a member of the editorial board of various journals, including Frontiers in Decision Neuroscience (Neuroscience and Psychology), Frontiers in Cognitive Science (Psychology), Frontiers in Movement Science and Sport Psychology (Psychology). Varun Dutt works as an Associate Professor in the School of Computing and Electrical Engineering and School of Humanities and Social Sciences at Indian Institute of Technology Mandi. Dr. Dutt has applied his knowledge and skills in the fields of psychology, public policy, and computer science to explore how humans make decisions on social, managerial, and environmental issues. Dr. Dutt has used lab-based methods involving experiments with human participants and cognitive models to investigate his research questions. He serves as an Associate Editor of Frontiers in Cognitive Science (Psychology) journal.

Chapter 7

Exploring Cybercriminal Activities, Behaviors, and Profiles Maria Bada and Jason R. C. Nurse

Abstract While modern society benefits from a range of technological advancements, it also is exposed to an ever-increasing set of cybersecurity threats. These affect all areas of life including business, government, and individuals. To complement technology solutions to this problem, it is crucial to understand more about cybercriminal perpetrators themselves, their use of technology, psychological aspects, and profiles. This is a topic that has received little socio-technical research emphasis in the technology community, has few concrete research findings, and is thus a prime area for development. The aim of this article is to explore cybercriminal activities and behavior from a psychology and human aspects perspective, through a series of notable case studies. We examine motivations, psychological and other interdisciplinary concepts as they may impact/influence cybercriminal activities. We expect this paper to be of value and particularly insightful for those studying technology, psychology, and criminology, with a focus on cybersecurity and cybercrime. Keywords Cybersecurity · Cyberpsychology · Cognition · Human aspects · Cybercrime · Cybercriminal · Online offender · Behavior

Introduction Cybercrime has grown substantially in the last 18 months, and has impacted businesses, members of the public, and governments alike. While the trajectory of cyberattacks has been on the rise for a number of years, the increased digitization that has emerged as a result of COVID-19 (SARS-CoV-2), the stress and uncertainty caused M. Bada (B) School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, UK e-mail: [email protected] J. R. C. Nurse School of Computing, University of Kent, Canterbury, Kent CT2 7NF, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_7

109

110

M. Bada and J. R. C. Nurse

in the population because of the pandemic, and the general challenges to securing remote workforces, has led to significant issues of online crime (Lallie et al., 2021; Nurse et al., 2021). One study has reported that cybercrime has increased 600% due to the COVID-19 pandemic (PurpleSec, 2021) and in some countries (e.g., the UK) this rise has led to record numbers of attacks faced by the society (Sabbagh, 2020). International and regional policing organizations (e.g., Interpol and Europol) have thus warned businesses and individuals about these attacks, and released guidance on staying safe and boosting threat response and cyber hygiene. To understand the nature of cybercrime, it is imperative to examine the threat actors or offenders behind the crimes, what motivates them, their behaviors, and profiles. This area of research has often been referred to as cybercriminal (or online offender) understanding or profiling, and tends to mirror the offline, and more traditional action of criminal profiling (Turvey, 2011). In this regard, and extending upon prior works (Bada & Nurse, 2021; Jahankhani & Al-Nemrat, 2012; Warikoo, 2014), we consider cybercriminal profiling/understanding to generally be an educated attempt to define information about the person who committed a cybercrime, which considers their characteristics, patterns, or other factors of uniqueness. While there has been an increasing amount of research in the cybercriminal space, this topic that has received little socio-technical research emphasis in the technology community has few concrete research findings, and is thus a prime area for development. Bada and Nurse (2021) summarized the outstanding challenges with specific mention of the need to explore the actions and personality traits apparent from certain online criminal behaviors, a factor also driven by the lack of studies drawing on actual data linked to behavioral profiles. The aim of this article therefore is to investigate cybercriminal activities and behavior from a socio-technical (psychology and human aspects) perspective, by reflecting on the state of the art as well as a series of notable cybercriminal case studies. This work considers the motivations of online offenders, and psychological and other interdisciplinary concepts as they may impact/influence cybercriminal actions. The remainder of this contribution is as follows. Section “The Threat of Cybercrime: Actions and Actors” reflects on the threat of cybercrime more broadly, and outlines the main types of attack and the traditional threat actors commonly discussed in research and practice. Section “Cybercriminal Case Studies” examines several cybercriminal cases in detail drawing on cyberpsychology, cognition, human aspects, and cybersecurity research, to identify characteristics and profiles of offenders. Finally, section “Discussion and Conclusion” concludes this article and highlights key aspects when exploring cybercriminal activities, behaviors, and profiles.

The Threat of Cybercrime: Actions and Actors Cybercrime is often used in the media and in research to refer to a range of crimes conducted in the online (or cyber) space. The reality, however, is that these crimes are extremely varied. The UK’s Crown Prosecution Service (CPS) deconstructs cyber-

7 Exploring Cybercriminal Activities, Behaviors, and Profiles

111

crimes in two primary types: Cyber-dependent crimes and Cyber-enabled crimes. Cyber-dependent crimes are “crimes that can be committed only through the use of Information and Communications Technology (‘ICT’) devices, where the devices are both the tool for committing the crime, and the target of the crime (e.g., developing and propagating malware for financial gain, hacking to steal, damage, distort or destroy data and/or network or activity)” (The Crown Prosecution Service, 2019). These include hacking, causing disruption due to malware, and the use of botnets for service disruption. Alternately, Cyber-enabled crimes are “traditional crimes which can be increased in scale or reach by the use of computers, computer networks or other forms of ICT” (The Crown Prosecution Service, 2019). Examples of these crimes include online fraud, data theft, cyber harassment, and child sexual offenses. The characterization above has evolved since early work on cybercrime (e.g., Gordon & Ford, 2006) but there are still various similarities, particularly the focus on technology versus other aspects (Gordon and Ford, for instance, refer to a continuum with technology crime on one side and people crime on the other Gordon & Ford, 2006). Other research overlooks high-level categorizations and concentrates on the specific actions/crimes. Relevant examples include Stabek et al. (2010) who examine specific scam types, Nurse (2019) that explores crimes against individuals, and Chiew et al. (2018) who assess the nature (types, vectors, approaches) of phishing attacks. Behind criminal actions (be them referred to as crimes or cyber-attacks) are perpetrators who are responsible for planning, orchestration, or execution. Initial characterizations of these individuals centered on high-level groupings, such as script kiddies, hackers, fraudsters, insider threats, hacktivists, and nationstates. In that context, script kiddies were typically viewed as the lowest skilled and resourced, while nationstates were at the other end of the spectrum. Today, online offenders and attack perpetrators share some similarities with the groupings above but their profiles are also often much more nuanced. For instance, research has examined the psyche of cybercriminals (Barnor et al., 2020; Kirwan & Power, 2013; Rogers, 2011) and the theories behind why cybercrime occurs (Palmieri et al., 2021; Stalans & Donner, 2018), and other work has investigated attackers in depth—be it on the presence of the hacktivist group Anonymous online (Jones et al., 2020) or nationstate Advanced Persistent Threats (APTs) (Nikkel, 2021). Considering the psychology of perpetrators themselves, online criminal behavior has been related to psychopathy and other antisocial behaviors (Seigfried-Spellar et al., 2017), persons high on Machiavellianism (one of the three Dark Triad personality traits) have been shown as more likely to engage in criminal behavior (Selzer & Oelrich, 2021), and we have found relationships cited between cybercriminal actions and conditions such as autism (Lim et al., 2021). All these point to the importance of exploring perpetrators as a part of understanding cybercrime. Theories of crime developed by the field of cyberpsychology such as the online disinhibition effect (Suler, 2004) can also be considered relevant to understanding why an individual may engage in online criminal acts, however, its usefulness depends on the type of cybercrime considered. Neutralizations (Sykes & Matza, 1957) from offenders offering explanations for crimes that they would normally consider to be morally unacceptable are common in different types of crime including cybercrime.

112

M. Bada and J. R. C. Nurse

Such excuses can include denying responsibility for their actions or denial of injury to the victim. In summary, the reality is that developing a better understanding of the persons behind cybercrimes is key to research and practice.

Cybercriminal Case Studies Overview and Method of Analysis In this study, our method of analysis draws on the different factors and abilities described in models such as the Deductive Cybercriminal Profile Model (Nykodym et al., 2005) and the Theoretical Model of Profiling a Hacker (Lickiewicz, 2011). These models guide the collection of information required in order to create a holistic profile. In general, they propose that in order to form a psychological profile of an offender, different factors need to be considered: (a) biological factors and the external environment which influences an individual; (b) intelligence; (c) personality; (d) social abilities; and (e) technical abilities. The theoretical model of profiling a hacker (Lickiewicz, 2011) also includes factors such as (f) motivation for offending; (g) the method of the attack; and (h) the effectiveness of the attack. Below, we will present cases of persons identified in the literature (at one point or another) as real cyber offenders, and describe their characteristics, traits, motivations, and behaviors. This approach will also allow for a reflection on the similarities and differences among the different cases. When analyzing the cases, theories of the Dark Triad/Tetrad (Paulhus & Williams, 2002), the HEXACO model of personality (Ashton & Lee, 2007), and theories of crime will be utilized as well. Readers should note that we intentionally do not directly name the persons that we present given the sensitivity of the topic. Moreover, we present point in time analyses based on the literature and existing reports. This is worth noting because people change (e.g., some oncefamous hackers are now well-respected security professionals), and secondly, we rely on reports for our reflection (thus, rely on the accuracy of the reports we draw on).

Case 1 Case 1 was known as the first cybercriminal in the US, releasing the first “worm” on the Internet in 1988 while attending Cornell University (FBI News, 2018). Utilizing the Unix Sendmail program, he reportedly altered it to replicate itself, and it caused computers to crash (with as many as 6,000 computers impacted). Skills: Case 1 studied computer science and graduated from Harvard. At Harvard, Case 1 was reportedly known for not only his technological skills but also his social skills (FBI News, 2018). After graduating, he continued his studies at Cornell; he later developed a malicious program which was released via a hacked MIT computer (FBI News, 2018).

7 Exploring Cybercriminal Activities, Behaviors, and Profiles

113

Characteristics and Psychological Traits: Case 1’s father was an early innovator at a technology lab, so he grew up immersed in computers (FBI News, 2018). Case 1 reportedly was the type of student who found homework boring and therefore focused his energy in programming; he also preferred to work alone (Lee, 2013). This rather agrees with findings indicating that personality traits such as introversion are associated with online criminal behavior (Seigfried-Spellar et al., 2017). Motivation: According to reports, Case 1 claimed that his actions did not have a malicious intent but rather his aim was to point out the safety issues and vulnerabilities of systems (OKTA, 2023). The worm did not damage or destroy any files, but it slowed down University functions causing substantial economic losses (FBI News, 2018). The network community tried several techniques in order to understand the worm and to remove it from their systems. Some of the affected institutions disconnected their computers, while others had to reset their systems. Case 1, however, was not imprisoned, but he was sentenced to probation for three years and also community service (OKTA, 2023).

Case 2 Case 2 was a teen hacker, a computer programmer, and the founder of a non-profit organization that publishes leaks. As covered by Leigh and Harding (2011), during his studies in Australia he lived in a student house where he spent much of his time dreaming of setting up a new way to disseminate classified information. By 1991, Case 2 was reportedly one of the most accomplished hackers in Australia (Leigh & Harding, 2011). Skills: He was characterized by high, analytical intelligence. In 1991, he reportedly formed a hacking group called the International Subversives (Atlantic Council, 2011). During this time, he hacked into Military Institutions, such as MILNET, the US military’s secret defense data network, and Universities (Leigh & Harding, 2011). According to reports, his father had a highly logical intellect which Case 2 is said to have inherited (Goodin, 2011). Characteristics and Psychological Traits: As a student, articles (e.g., Leigh & Harding, 2011) note that Case 2 was not interested much in the school system. In terms of his personality, resources state that he lacked social skills, had a dry sense of humor, and at times also often forgot basic hygiene behaviors (Leigh & Harding, 2011. Case 2 reportedly disregarded those he disapproved of, he could easily get angry, and had instant mood changes (Leigh & Harding, 2011). Eysenck’s Theory of Crime proposes that personality traits such as Psychoticism (being anti-social, aggressive, and uncaring), Extraversion (seeking sensation), and Neuroticism (being unstable in behavioral patterns) indicate a personality susceptible to criminal behavior (Eysenck, 1964). However, in Case 2 we may also see a similar pattern as in Case 1, a sense of superiority seen in narcissistic personalities (Paulhus & Williams, 2002). Motivation: In terms of the motive behind Case 2, according to the prosecution during his trial at the Victoria County Court in Melbourne, it was “simply an

114

M. Bada and J. R. C. Nurse

arrogance and a desire to show of his computer skills” (Leigh & Harding, 2011). Case 2 pleaded guilty to 24 counts of hacking (Leigh & Harding, 2011).

Case 3 Case 3 was a known ex-member of the group Anonymous. This group is referred to as hacktivists, who utilize sometimes criminal acts as a way to pursue particular motives. Case 3 was found to be a member when he gave his identity willingly to the police during an attempt to pursue justice in a rape case (Abad-Santos, 2013). The backstory involves a football team in the US that was accused of raping a 16-yearold girl, but were not prosecuted, despite evidence (see Kushner, 2013). This led to Anonymous’ hacking of the football website and email of someone affiliated with the team, revealing indecent images of young women. Case 3 was noted to be one of the main activists behind these events (Kushner, 2013). Skills: While Case 3 reportedly dropped out of school; he showed a keen interest in computers/technology, teaching himself how to code for instance (Kushner, 2013). Characteristics and Psychological Traits: Case 3 was reported being shy and a frequent target of bullying at school, experiencing violent episodes during adulthood (Kushner, 2013). Research Wolke and Lereya (2015) has suggested that bullying is linked to altered cognitive responses to stressful and threatening situations. Further, O’Riordan and O’Connell (2014) noted that the presence of school problems during adolescence may contribute to criminal behavior. Case 3 was reportedly unstable during his teenage years, he formed a gang to bully the bullies, had drinking issues, and spent some time homeless (Esquire, 2013). These behaviors could potentially indicate personality traits such as neuroticism and psychoticism, as defined by Eysenck’s theory (Eysenck, 1964). In addition, as the Five Factor Model (Costa & McCrae, 2002) and the HEXACO Model (Ashton & Lee, 2007) describe, an individual low in agreeableness may tend to be critical, hostile, and aggressive. In this case, these traits may be portrayed by being critical to others and speaking of injustice. Motivation: Case 3 claimed his motives were for justice, defending the victims being targeted. He spoke of a few cases of hacking he conducted under the signature Anonymous mask (Kushner, 2013). He claimed he would target bullies, who also used technology to harm others. Reflecting on this case, there is again a possible implication that he was better suited than law enforcement to manage such a situation. This self-justification of labeled criminal acts potentially suggests narcissistic personality traits (Paulhus & Williams, 2002). It is likely that this individual found a sense of power through hacking, something he may have not had as a child when he himself was the victim. Reports Esquire (2013) note that Case 3 optimized the overall Anonymous group persona, hiding his face, creating a false name, and posting videos online with distortions to protect his identity. It is such a persona that can facilitate such behavior online (Suler, 2004).

7 Exploring Cybercriminal Activities, Behaviors, and Profiles

115

Case 4 Case 4 was a hacktivist reported to be responsible for hacking into a large number of government computer systems, such as the FBI, stealing large amounts of data (Parkin, 2017). Skills: Activism and hacking were a noteworthy theme in Case 4’s life. According to resources, by the age of 8, he had enough skills to rewrite the computer code of applications (Goodwin & Ladefoged, 2016). Case 4 and his sibling enjoyed playing video games, and this led them into finding techniques to cheat the technology so that they would always win (McGoogan, 2016). Early on during his education, he appears to have become bored, and in lower school he was assigned a dedicated tutor because, as he stated, “there was nothing left to teach me on the curriculum” (Goodwin & Ladefoged, 2016). Case 4 studied computer science at A-level and at university. As he stated, “One of the things that attracted me to computers is that they are consistent and make sense. If it doesn’t do what you think it should do, you can eventually figure out why and it’s perfectly rational and reasonable” (Goodwin & Ladefoged, 2016). Characteristics and Psychological Traits: His professional development has been impacted by his symptoms of depression which, from reports, appears to have played some part in him leaving university twice ((Parkin, 2017). When he was 29, he was diagnosed with Asperger’s syndrome ((Parkin, 2017). As he stated, “It’s a bit morbid to count the number of times you’ve had suicidal thoughts, but it was getting to be six to 12 times a day at a peak last winter” (Keating, 2017). Reportedly, for him hacking was a form of problem-solving exercise which could have an impact and affect change, just like activism. Research has posited that “increased risk of committing cyber-dependent crime is associated with higher autistic-like traits”; however, a diagnosis of autism is not necessarily associated with an increased risk of committing such crime (Payne et al., 2019). Motivation: Regarding motivation, it is useful to consider some of the key quotes related to this Case. In Parkin (2017) for instance, Case 1 is reported as saying that a hacktivist’s ideology “began to shape his philosophy deeply”. It continued, “I started to see the power of the internet to make good things happen in the world”. Once again, we see a potential sense of a push to use skills for a purpose. In addition, in a sense one may note a tendency for neutralization in terms of the potential consequences of his actions (Sykes & Matza, 1957).

Case 5 Case 5 was a systems administrator and hacker. He was reportedly accused in 2002 of hacking into a large number of military and NASA computers during a period of 13 months (Wikipedia, 2023). He became famous in the UK after a protracted attempt by the USA government to have him extradited ultimately ended in failure (BBC News, 2012).

116

M. Bada and J. R. C. Nurse

Skills: Case 5 got a computer and practised his technical skills from 14 years old (BBC News, 2012). After he finished school, he went on to become a hairdresser. However, reports BBC News (2012) note that his friends later persuaded him to study computers. Following this advice, he completed a computing course and subsequently started work as a contractor in computing. He continued his training in programming, and it was these programming skills that he is assumed to have later utilized to hack into government computer systems (BBC News, 2012). Characteristics and Psychological Traits: Case 5 was diagnosed with Asperger’s syndrome during his trial (BBC News, 2012). This diagnosis lends some explanation to his personality. Reports suggest that Case 1 was introverted and hated leaving his flat (Kushner, 2011; Marsh, 2013). Like many people with Asperger’s, there is often a development of highly focused interests. His mother described him as “obsessive, naïve, intelligent, … highly introverted, prone to obsessions and meltdowns and fearful of confrontation” according to one article (Marsh, 2013). As covered by Kushner (2011), his diagnosis may explain his behavior which seemed unreasonable to others. Case 5 did not see himself as a hacker and was acting alone. Obsessed with UFOs since childhood, reports note that he was convinced that the US was suppressing alien technology and evidence of UFOs (Kushner, 2011). As he said, “I’d stopped washing at one point. I wasn’t looking after myself. I wasn’t eating properly. I was sitting around the house in my dressing gown, doing this all night” and to continue, “I almost wanted to be caught, because it was ruining me. I had this classic thing of wanting to be caught so there would be an end to it” (BBC News, 2012). Overall, once again there may be a push or entitlement to use skills for an important purpose as seen in other Cases above (with entitlement linked to other psychological factors (Paulhus & Williams, 2002)). Personality traits such as neuroticism, as defined by Eysenck’s theory (Eysenck, 1964) are associated with traits such as depression, anxiety, low self-esteem, shyness, moodiness, and emotionality. Personality traits such as introversion and neuroticism have also been associated with online criminal behavior (Seigfried-Spellar et al., 2017). Motivation: In terms of the motive, Case 5 may have committed his acts due to his tendency to form obsessions. He was noted to be obsessed with space and UFOs, and as said above became convinced that the American government was hiding their existence. Allegedly therefore, he hacked into USA military and NASA systems ultimately to prove to himself that UFOs existed (BBC News, 2012). He admitted hacking into US computers but says he had been on a “moral crusade” to find classified documents about UFOs (BBC News, 2012). Noting his comments: “I found out that the US military use Windows and having realized this, I assumed it would probably be an easy hack if they hadn’t secured it properly” (BBC News, 2012).

Discussion and Conclusion In exploring how cybercrime occurs, a key component is understanding the nature of attacks and the individuals/actors who have conducted them. This chapter advanced the discussion on cybercriminals (online offenders) with reflection on pertinent

7 Exploring Cybercriminal Activities, Behaviors, and Profiles

117

literature and an analysis of five prominent cases. From this work, we identified a number of key technology skills that individuals attained throughout their lifetimes, especially in younger years (e.g., Cases 3 and 4). This is by no means definitive but does pose some interesting questions regarding pathways to cybercrime, some of which have been explored before (Aiken et al., 2016; National Crime Agency, 2017). There were a range of characteristics and psychological traits covered in the cases including boredom and challenges at school, lower social skills, instability in teenage years, and conditions such as Asperger’s syndrome. Some research (e.g., Blue, 2012; Payne et al., 2019) has sought to investigate the links between these factors and online offenders, but clearly more is needed to understand the area given the increasing number and variety of online attacks. To consider the motivation of the cases, in a number of situations there is a push or feeling of entitlement present. This is notable for numerous reasons, but one of the most intriguing is the desire to find the truth or to prevent injustice. These motivations—as studied here—are quite different to those of several cybercriminal gangs (e.g., those involved in ransomware or fraud) for instance, who are more motivated by finances. There are various avenues in the area of cybercriminal profiling and understanding where more research is needed. One of the most important of these is a natural extension of this research and involves a critical examination of a larger set of offender cases. In this work, we concentrated on a number of notorious cases to demonstrate what can be done with openly available reports and data. However, other work could engage with individuals firsthand to understand their profiles and experiences. This may be more representative and not limited (or unduly biased) by cases that feature in the media. Embedding cognitive science and technology into these analyses would provide value for researchers from both fields, and contribute significantly to a more nuanced understanding of cybercrime and its prevention.

References Abad-Santos, A. (2013). Inside the anonymous hacking file on the Steubenville ‘rape crew’. Retrieved from https://www.theatlantic.com/national/archive/2013/01/inside-anonymoushacking-file-steubenville-rape-crew/317301/ Aiken, M., Davidson, J., & Amann, P. (2016). Youth pathways into cybercrime. Retrieved from https://www.europol.europa.eu/publications-documents/youth-pathways-cybercrime Ashton, M. C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, 11(2), 150–166. Atlantic Council. (2011). International subversives. Retrieved from https://www.atlanticcouncil. org/blogs/new-atlanticist/international-subversives/ Bada, M., & Nurse, J. R. C. (2021). Profiling the cybercriminal: A systematic review of research. In 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) (pp. 1–8). https://doi.org/10.1109/CyberSA52016.2021.9478246 Barnor, J. N. B., Boateng, R., Kolog, E. A., & Afful-Dadzie, A. (2020). Rationalizing online romance fraud: In the eyes of the offender. In AMCIS Proceedings (Vol. 21). BBC News. (2012). Profile: Gary McKinnon. Retrieved from www.bbc.co.uk/news/uk-19946902

118

M. Bada and J. R. C. Nurse

Blue, V. (2012). Asperger’s study asks: Are hackers cognitively different? Retrieved from https:// www.cnet.com/tech/services-and-software/aspergers-study-asks-are-hackers-cognitivelydifferent/ Chiew, K. L., Yong, K. S. C., & Tan, C. L. (2018). A survey of phishing attacks: Their types, vectors and technical approaches. Expert Systems with Applications, 106, 1–20. Costa, P., & McCrae, R. (2002). Personality in adulthood: A five-factor theory perspective. Management Information Systems Quarterly - MISQ. https://doi.org/10.4324/9780203428412 Esquire. (2013). I am anonymous. Retrieved from https://www.esquire.com/news-politics/a25210/ i-am-anonymous-1113/ Eysenck, H. (1964). Crime and personality. Routledge and Kegan Paul. FBI News. (2018). The (nation) state of APTs in 2021 (digital shadows). Retrieved from https://www.fbi.gov/news/stories/morris-worm-30-years-since-first-major-attack-on-internet110218. Accessed on 20 September 2021. Goodin, D. (2011). Assange traveled in drag to evade gov spooks. Retrieved from https://www. theregister.com/2011/02/01/assange_profile/ Goodwin, B., & Ladefoged, N. (2016). Lauri love: The student accused of hacking the us. Retrieved from https://www.computerweekly.com/feature/Lauri-Love-the-student-accused-ofhacking-the-US Gordon, S., & Ford, R. (2006). On the definition and classification of cybercrime. Journal in Computer Virology, 2(1), 13–20. Jahankhani, H., & Al-Nemrat, A. (2012). Examination of cyber-criminal behaviour. International Journal of Information Science and Management (IJISM), 41–48. Jones, K., Nurse, J. R. C., & Li, S. (2020). Behind the mask: A computational study of anonymous’ presence on twitter. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 327–338). Keating, F. (2017). British computer hacker Lauri love ‘would rather kill himself’ than spend years in US jail. Retrieved from https://www.independent.co.uk/news/uk/home-news/lauri-lovecomputer-hacker-asperger-syndrome-autism-suicide-fbi-aaron-swartz-a7866346.html Kirwan, G., & Power, A. (2013). Cybercrime: The psychology of online offenders. Cambridge University Press. Kushner, D. (2011). The autistic hacker Gary McKinnon hacked thousands of government computers. Retrieved from https://spectrum.ieee.org/the-autistic-hacker Kushner, D. (2013). Anonymous vs. steubenville. Retrieved from https://www.rollingstone.com/ culture/culture-news/anonymous-vs-steubenville-57875/ Lallie, H. S., Shepherd, L. A., Nurse, J. R. C., Erola, A., Epiphaniou, G., Maple, C., & Bellekens, X. (2021). Cyber security in the age of Covid-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic. Computers & Security, 105, 102248. https://doi.org/10.1016/ j.cose.2021.102248 Lee, T. B. (2013). How a grad student trying to build the first botnet brought the internet to its knees. Retrieved from https://www.washingtonpost.com/news/the-switch/wp/2013/11/01/howa-grad-student-trying-to-build-the-first-botnet-brought-the-internet-to-its-knees/ Leigh, D., & Harding, L. (2011). Julian Assange: The teen hacker who became insurgent in information war. Retrieved from https://www.theguardian.com/media/2011/jan/30/julian-assangewikileaks-profile Lickiewicz, J. (2011). Cyber crime psychology - proposal of an offender psychological profile. In Cybercrime in Context: The human factor in victimization, offending, and policing. Lim, A., Brewer, N., & Young, R. L. (2021). Revisiting the relationship between cybercrime, autistic traits, and autism. Journal of Autism and Developmental Disorders 1–12. Marsh, S. (2013). So sad, so frightening: The story behind hacker Gary McKinnon. Retrieved from https://www.thetimes.co.uk/article/so-sad-so-frightening-the-story-behind-hacker-garymckinnon-kclcktl9lh5 McGoogan, C. (2016). The full story of Lauri love’s fight against extradition. Retrieved from https:// s.telegraph.co.uk/graphics/projects/hacker-lauri-love-extradition/

7 Exploring Cybercriminal Activities, Behaviors, and Profiles

119

National Crime Agency. (2017). Pathways into cyber crime. Retrieved from https://www. nationalcrimeagency.gov.uk/who-we-are/publications/6-pathways-into-cyber-crime-1/file Nikkel, S. (2021). The (nation) state of APTs in 2021 (digital shadows). Retrieved from https:// www.digitalshadows.com/blog-and-research/the-nation-state-of-apts-in-2021/. Accessed on 20 September 2021. Nurse, J. R. C. (2019). Cybercrime and you: How criminals attack and the human factors that they seek to exploit. In The Oxford handbook of cyberpsychology. OUP. Nurse, J. R. C., Williams, N., Collins, E., Panteli, N., Blythe, J., & Koppelman, B. (2021). Remote working pre- and post-covid-19: An analysis of new threats and risks to security and privacy. In 23rd International Conference on Human-Computer Interaction. Nykodym, N., Taylor, R., & Vilela, J. (2005). Criminal profiling and insider cyber crime. Computer Law and Security Review, 2, 408–414. OKTA. (2023). What is the Morris worm? history and modern impact. Retrieved from https://www. okta.com/uk/identity-101/morris-worm/ O’Riordan, C., & O’Connell, M. (2014). Predicting adult involvement in crime: Personality measures are significant, socio-economic measures are not. Personality and Individual Differences, 68, 98–101. https://doi.org/10.1016/j.paid.2014.04.010 Palmieri, M., Shortland, N., & McGarry, P. (2021). Personality and online deviance: The role of reinforcement sensitivity theory in cybercrime. Computers in Human Behavior, 120, 106745. Parkin, S. (2017). Keyboard warrior: The British hacker fighting for his life. Retrieved from https:// www.theguardian.com/news/2017/sep/08/lauri-love-british-hacker-anonymous-extradition-us Paulhus, D. L., & Williams, K. (2002). The dark triad of personality: Narcissism, Machiavellianism, and psychopathy. Journal of Research in Personality,36, 556–563. Payne, K. L., Russell, A., Mills, R., Maras, K., Rai, D., & Brosnan, M. (2019). Is there a relationship between cyber-dependent crime, autistic-like traits and autism? Journal of Autism and Developmental Disorders, 49(10), 4159–4169. PurpleSec. (2021). Cybercrime up 600% due to COVID-19 pandemic. Retrieved from https:// purplesec.us/resources/cyber-security-statistics/. Accessed on 19 September 2021. Rogers, M. K. (2011). The psyche of cybercriminals: A psycho-social perspective. In Cybercrimes: A multidisciplinary analysis (pp. 217–235). Springer. Sabbagh, D. (2020). Covid-related cybercrime drives attacks on UK to record number (The Guardian). Retrieved from https://www.theguardian.com/technology/2020/nov/03/covidrelated-cybercrime-drives-attacks-on-uk-to-record-number. Accessed on 19 September 2021. Seigfried-Spellar, K. C., Villacís-Vukadinovi´c, N., & Lynam, D. R. (2017). Computer criminal behavior is related to psychopathy and other antisocial behavior. Journal of Criminal Justice, 51, 67–73. Selzer, N., & Oelrich, S. (2021). Saint or Satan? Moral development and dark triad influences on cybercriminal intent. In Cybercrime in context: The human factor in victimization, offending, and policing (p. 175). Stabek, A., Watters, P., & Layton, R. (2010). The seven scam types: Mapping the terrain of cybercrime. In 2010 Second Cybercrime and Trustworthy Computing Workshop (pp. 41–51). IEEE. Stalans, L. J., & Donner, C. M. (2018). Explaining why cybercrime occurs: Criminological and psychological theories. In Cyber criminology (pp. 25–45). Springer. Suler, J. (2004). The online disinhibition effect. Cyberpsychology and Behavior, 7(3), 321–326. Sykes, G. M., & Matza, D. (1957). Techniques of neutralization: A theory of delinquency. American Sociological Review, 22(6), 664–670. The Crown Prosecution Service. (2019). Cybercrime - prosecution guidance. Retrieved from https:// www.cps.gov.uk/legal-guidance/cybercrime-prosecution-guidance. Accessed on 19 September 2021. Turvey, B. E. (2011). Criminal profiling: An introduction to behavioral evidence analysis. Academic Press. Warikoo, A. (2014). Proposed methodology for cyber criminal profiling. Information Security Journal: A Global Perspective, 23(4–6), 172–178. https://doi.org/10.1080/19393555.2014.931491

120

M. Bada and J. R. C. Nurse

Wikipedia (2023). Gary McKinnon. Retrieved from https://en.wikipedia.org/wiki/Gary_McKinnon Wolke, D., & Lereya, S. T. (2015). Long-term effects of bullying. Archives of Disease in Childhood, 100(9), 879–885. https://doi.org/10.4324/9780203428412

Maria Bada is a Lecturer in Psychology at Queen Mary University in London and a RISCS Fellow in cybercrime. Her research focuses on the human aspects of cybercrime and cybersecurity, such as profiling online offenders, studying their psychologies and pathways towards online deviance as well as the ways to combat cybercrime through tools and capacity building. She is a member of the National Risk Assessment (NRA) Behavioural Science Expert Group in the UK, working on the social and psychological impact of cyber-attacks on members of the public. She has a background in cyberpsychology, and she is a member of the British Psychological Society and the National Counselling Society. Jason R. C. Nurse is an Associate Professor in Cyber Security in the School of Computing at the University of Kent, UK and the Institute of Cyber Security for Society (iCSS), UK. He also holds the roles of Visiting Academic at the University of Oxford, UK and Associate Fellow at the Royal United Services Institute for Defence and Security Studies (RUSI). His research interests include security risk management, corporate communications and cyber security, secure and trustworthy Inter-net of Things, insider threat and cybercrime. He has published over 100 peer-reviewed articles in internationally recognized security journals and conferences.

Part IV

Neural Networks and Machine Learning

Chapter 8

Computer Vision Technology: Do Deep Neural Networks Model Nonlinear Compositionality in the Brain’s Representation of Human–Object Interactions? Aditi Jha and Sumeet Agarwal Abstract In recent years, Deep Neural Networks (DNNs) have become a dominant modeling framework for computer vision. This has naturally raised the question of the extent to which these models also capture relevant aspects of biological vision. It is notable that while the most prominent computer vision applications have involved recognizing individual objects in images, ecological visual scene understanding often requires the holistic processing of multiple components of an image, such as in human–object interactions. Here we seek to explore if and how well a typical DNN model captures features similar to the brain’s representation of humans, objects, and their interactions. We investigate fMRI data from particular regions of the visual cortex which are believed to process human-, object-, or interactionspecific information, and establish correspondences between the neural activity in these regions and DNN features. Our results suggest that we can infer the selectivity of these regions for particular kinds of visual stimuli using DNN representations of the same stimuli as an intermediary. We also map features from the DNN to individual voxels in the brain regions, thus linking the DNN representations to those found in specific parts of the visual cortex. In particular, our results indicate that a typical DNN representation contains encoding of compositional information for human– object interactions which goes beyond a linear combination of the encodings for the separated human and object components, thus suggesting that DNNs may indeed be able to model this important property of biological vision. A better understanding of this correspondence opens up the possibility of being able to develop future machine vision technology which more closely builds on human visual cognition.

A. Jha Department of Electrical and Computer Engineering, Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA e-mail: [email protected] S. Agarwal (B) Department of Electrical Engineering and Yardi School of Artificial Intelligence, IIT Delhi, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_8

123

124

A. Jha and S. Agarwal

Keywords Visual compositionality · Computer vision · Deep neural networks · Human–object interactions · fMRI · Cognitive neuroscience

Introduction Visual representations formed by the human brain have been of interest, particularly for studying invariance in object representations (DiCarlo et al., 2012; Isik et al., 2014). It is known that downstream regions of the visual cortex process high-level visual information and that there are specialized regions for processing object- and human-specific information. Perception of human–object interactions by the brain, though, hadn’t been studied much until recent work by Baldassano et al. (2016) revealed that the neural representations of interactions are not a linear sum of the human and object representations. In fact, there appear to be certain areas in the brain like the pSTS (posterior Superior Temporal Sulcus) which are highly sensitive specifically to human–object interactions (Baldassano et al., 2016; Isik et al., 2017); and more generally, the STS has been recently proposed to be part of a ‘third visual pathway’ in the brain specialized for social perception (Pitcher & Ungerleider, 2021). The representation of interaction-specific information in the brain might also be thought of as a kind of visual compositionality: analogous to compositionality in language, one might say that the meaning of complex visual scenes emerges from the meanings of the individual components plus certain rules of composition. Deep Neural Networks (DNNs) have been widely used in recent years for a variety of computer vision tasks like object and action recognition (Chatfield et al., 2014; Simonyan & Zisserman, 2014). While they have reached human-like accuracy in certain settings, in general, there isn’t much explicit effort to model the biological visual system in these networks. Architectural innovations in these models have primarily aimed at improving accuracy for a given task, without there necessarily being any correspondence with the working of biological vision. However, there has been a lot of work in the past few years attempting to compare DNN representations with those of the human brain using several techniques (Barrett et al., 2018; Bonner & Epstein, 2018; Cichy & Kaiser, 2019; Greene & Hansen, 2018; Peterson et al., 2018). Some recent work has also looked at trying to develop DNNs with explicit compositionality (Stone et al., 2017). In this work, we examine if typical DNNs represent humans, objects, and in particular, their interactions similar to the brain. We also evaluate the correspondence between DNN representations and those of the brain for human- and object-specific information. We analyze three brain regions involved in high-level visual processing: the LOC (Lateral Occipital Cortex) which processes object-related information, the EBA (Extrastriate Body Area) which is involved in human pose identification, and the pSTS which also processes human information and is known to be specifically sensitive to human–object interactions. We seek to predict the BOLD (fMRI) responses of individual voxels in these regions to human/object/interaction image stimuli, from the final-layer DNN representations of the same images. Such an approach has been

8 Computer Vision Technology: Do Deep Neural Networks Model …

125

previously used to evaluate the correspondence of DNN layers to brain regions (Güçlü & van Gerven, 2015) and to model visual representations in the brain (Agrawal et al., 2014). We first seek to identify, independently of fMRI data, features in the DNN model which are interaction-specific, and which might hence represent compositional information, analogous to what the pSTS appears to do in the actual visual cortex. We then look at how well the DNN representations predict the selectivity of the three regions for object, human pose, or interaction stimuli. We probe human–object interaction images, in particular, to see if our approach also infers the sensitivity of the pSTS to such interactions, and if so then what the nature is of the interaction-specific information encoded by it. Additionally, we identify features from the DNN which are responsible for predicting the response of a certain voxel and attempt to examine the plausibility of establishing a direct correspondence between the DNN’s features and encodings in certain brain areas. The rest of this chapter is organized as follows. In section “Materials and Methods”, we describe the data the methodology used for the above analyses; in section “Results”, we present and interpret our results; and in section “Discussion and Conclusions”, we discuss the overall findings and takeaways from this study.

Materials and Methods Experimental Data The data used have been taken from Experiment 2 of Baldassano et al. (2016). fMRI data were collected from 12 subjects showing certain visual stimuli as summarized below; please refer to the original paper for further details.

Stimuli The stimuli consisted of three kinds of images: humans, objects, and images containing interactions between humans and objects (Fig. 8.1 depicts an example of each kind). The interaction category contained 40 images from each of the 4 action categories (person working on a computer, person pulling the luggage, person pushing a cart, and person typing on a typewriter). The human and object images were extracted from these interaction images by cropping out the relevant part and resizing it. Hence, there were 160 images for each of the three settings—humans, objects, and interactions. All the images were used in 2 orientations—original and mirrored from left to right—forming a total of 960 images (40 images .× 4 action categories .× 2 orientations .× 3 settings/tasks).

126

A. Jha and S. Agarwal

Fig. 8.1 Example images from the data set of Baldassano et al. (2016); images of humans or objects alone are obtained by segmenting out the respective portions of the human–object interaction images

Experimental Design Each of the 12 subjects viewed blocks of these images and performed a 1-back task. Every block contained 8 images, and each image was shown for 160 ms, with a 590 ms blank interval between two images. Every subject performed 14 runs, with the first 10 runs containing 8 blocks each and the last 4 runs containing 20 blocks each of images.

Direct Classification Using a Deep Neural Network For the computer vision model, in this study, we used the pre-trained VGG-16 architecture (Chatfield et al., 2014), trained on 1.3 million images from the ImageNet database (Deng et al., 2009). This particular DNN architecture was chosen as it is widely used and its representations have been previously shown to correspond to fMRI recordings from the ventral stream (Güçlü & van Gerven, 2015). Features from the last convolutional layer of this network were extracted (.7 × 7 × 512) and three linear SVMs were trained on top of these features to perform 4-way classification in each of the three scenarios: humans only, objects only, and human–object interactions, with a 3:1 train-test split of the data used. This is analogous to the Multi-Voxel Pattern Analysis (MVPA) carried out by Baldassano et al. (2016), except for the fact that the SVMs are trained not on voxel activity, but on DNN representations. The aim was to judge the goodness of the latter for discriminating between action categories in each of the three tasks (Fig. 8.2). We also looked at whether substructure or specialization could be identified within the DNN representations, analogous to what we see in subregions of the visual cortex, by quantifying the overlap between the top DNN feature sets picked out by our 3 SVMs. For this, we first used forward feature selection to identify the important features of each SVM. Subsequently, to obtain features that are specifically useful for one task and not the others, we define a ‘task-specific’ feature set for each of the 3 classification tasks by taking the features selected by the corresponding SVM classifier and removing those also selected by either of the other two classifiers. In

8 Computer Vision Technology: Do Deep Neural Networks Model …

127

Fig. 8.2 Overview of Methodology: Stimulus images were presented to human subjects and voxel responses were recorded via fMRI by Baldassano et al. (2016). We pass the same images through a DNN. A Baldassano et al. (2016) perform MVPA on the voxel responses to classify the response pattern into one of the four classes. B Direct SVM classification is performed over the final-layer DNN representations of the images. C Linear regression models are trained on the DNN representations to predict voxel responses

particular, to study interaction-specific information we looked at those DNN features which were picked using forward selection only by the SVM for the interaction task, and not by those for the object or human pose tasks. We sought to examine whether these features capture some nonlinear representation of the interaction, not obtainable by just adding the representations for the object and human constituents separately. This was done analogously to how voxels in the pSTS were analyzed by Baldassano et al. (2016).

DNN Representations to Predict Voxel Responses To compare DNN representations with those of the brain, we predict the BOLD response of every voxel to an image using VGG-16’s representation for the same image. We forward pass the stimulus images through pre-trained VGG-16 and then average the last convolutional layer representation of 2–3 images in the sequence that the images were shown during the fMRI recording. This has to be done because the fMRI recordings have been taken only at intervals of every 2–3 images (the volume repetition time of functional images being 2 s). We then train linear regression models on these averaged representations to predict the fMRI (BOLD) response of a voxel to the corresponding images. Three separate models are trained for each voxel since we have three tasks corresponding to the different classes of images: humans, objects, and interactions. Essentially, every voxel’s response to a particular class of images,

128

A. Jha and S. Agarwal

y, is modeled as: . y = β T φ(x), where .β is the vector of regression coefficients and .φ(x) stands for the DNN’s last convolutional layer representation for input image . x (in practice averaged representation over a sequence of 2–3 images, as mentioned). The final-layer representations are used because the regions we are looking at process high-level visual information, and correspondence between such areas and the last convolutional layer of CNNs has been established (Güçlü & van Gerven, 2015). The inputs to the regression model (DNN representations) are scaled to the unit norm, and outputs (BOLD responses) are normalized via taking .z-scores. The linear regression models are trained with L2 regularization and their performance is quantified using Pearson’s correlation (.r ) between predicted and actual outputs. 10-fold cross-validation with grid search on a training set is used to find the best hyperparameter value for L2 regularization strength for each model, which is then used to estimate the final test Pearson’s correlation (.r ) for that model on a held-out test set. Each voxel has three regression models for the three tasks and the test correlation of each is calculated separately, signifying the predictability of that voxel’s responses from DNN representations for that particular task. .

Region-Wise Analysis We separately analyze each of the three tasks (human, object, and interaction) for the three brain regions: LOC, EBA, and pSTS. Significantly predictable voxels are selected in each region for a particular task based on their Bonferroni-corrected . pvalues for correlation between predicted and actual responses (computed via crossvalidation on a training set). For a particular task, we then average the correlations (computed on a held-out test set) for all selected voxels for that region to obtain an average predictability measure for that region on that task. To further ensure that the obtained correlations are not merely flukes, we also trained linear regression models to predict voxel responses from DNN representations of mismatched stimuli, giving us a baseline correlation level. The number of voxels which cross the . p-value threshold for each task in a region was also noted and can be seen as a measure of sensitivity of that region to a particular task. We also sought to link the region-wise voxel regression analysis to the task-specific DNN feature sets mentioned above in section “Direct Classification Using a Deep Neural Network”. The idea was to check whether DNN features specifically informative for a particular classification task (say, object recognition) are also specifically more predictive of voxel responses on that task in a particular brain region (say, LOC). If this was true, it would suggest a correspondence between the substructure or modularity of representation in the DNN model and that in the visual cortex. We trained linear regression models to predict voxel responses for each of the three tasks from only their corresponding task-specific DNN feature sets. This allows us to see whether, for instance, the DNN features specific to object recognition also tend to be more predictive of voxel responses (for the same kind of stimuli) in the LOC than in the EBA or pSTS.

8 Computer Vision Technology: Do Deep Neural Networks Model …

129

Cross-Decoding Analysis Analogous to the MVPA of Baldassano et al. (2016) where the classifier trained on interaction images is tested on objects and humans in isolation, we also do a crossdecoding analysis here. Every voxel has a linear regression model trained to predict its fMRI responses from DNN representations specifically for human–object interaction stimuli. This model is tested on isolated objects and human image representations (from a held-out test set) to see how well it predicts the voxel’s response in those cases. For each region, we obtain the average cross-decoding correlation for those voxels which were selected as significantly predictable for the interaction task as above.

Feature Interpretation: Mapping DNN Features to Individual Voxels A direct correspondence can be established between the DNN’s representations and those of the human brain by mapping DNN features to individual voxels. The DNN’s representation in the last convolutional layer has 512 feature maps. For each voxel, we choose the top 5% or 10% corresponding feature maps by the magnitudes of their coefficients in the voxel’s linear regression models. This feature mapping is done independently for the three tasks. Based on this, we seek to interpret what kinds of information the different voxels in a given region of interest are capturing, and the coherence of this across the region as a whole.

Clustering of Voxels Once the top 10% of DNN features corresponding to each voxel have been estimated as above, voxels are clustered in a region based on the number of common features in their top 10% list. Such clustering in a region could either reveal common features across the entire region, highlighting the region’s sensitivity to a particular kind of feature; or if there are multiple smaller but highly coherent clusters within a region, this could be indicative of sub-modularities present in that region, i.e., a certain region could have multiple subregions which are independently sensitive to a particular kind of feature or features.

Results Comparing MVPA Classification with Direct DNN-Based Classification Baldassano et al. (2016) perform MVPA analysis in LOC, EBA, and pSTS to classify voxel response patterns (corresponding to stimuli images) into the four categories of

130

A. Jha and S. Agarwal

Table 8.1 Direct (SVM) classification performance of VGG-16 representations on the (Baldassano et al., 2016) stimuli, using all final-layer features (second column) and after the removal of 8 features which were found to contribute substantively only to classifying the human–object interaction images (third column) Task Accuracy Accuracy without interaction-specific features Object classification Human pose classification Interaction classification

0.90 0.74 0.86

0.88 0.72 0.71

stimuli images. Separate classifiers are trained for object, human poses, and interaction images; Baldassano et al. (2016) observe that in all three regions (LOC, EBA, and pSTS), the highest classification accuracy is obtained in case of the interaction classifier. Relatively high classification accuracy is also observed in LOC for the object classifier and in EBA for the human pose classifier, which is consistent with the known functions of these two regions. pSTS shows a comparable accuracy to the LOC and EBA only in case of interactions which indicates the high sensitivity of pSTS to human–object interactions. Interestingly, the accuracy of the interacting classifier is not replicated while testing the same classifier on isolated humans or objects or even their pattern averages in all regions. This implies that the high accuracy of the interacting classifier is not attributed only to the object or human in the image. To compare the discriminative powers of the fMRI recordings with those of DNN features extracted on the same stimuli, we trained three separate SVMs on the finallayer features from VGG-16 to classify human, object, and interaction images into the 4 categories. Table 8.1 gives the observed validation accuracies. Note that, in each case, the accuracy is far above the chance level (0.25), which signifies that the representations formed by VGG-16 are highly distinguishable into the 4 categories.

Do the DNN Representations Also Exhibit Nonlinear Compositionality for Human–Object Interactions? To try and establish a correspondence between the discriminative ability of the DNN final-layer representation of the stimuli and those recorded from the three brain regions of interest (LOC, EBA, and pSTS), we began by applying forward feature selection to the task-specific SVMs trained directly on the DNN features, in order to identify the relevant feature sets for each task. To further obtain subsets of these features which contained information specific to just one of the three tasks, we then compared the selected features for each task and observed substantial overlaps between these feature sets. Since all three tasks being performed here are high-level visual tasks, the fact that they make use of at least some similar features is not surprising. However, we also obtained a set of 8 features that are specific only to human–object

8 Computer Vision Technology: Do Deep Neural Networks Model …

131

interaction classification and do not contribute substantively to isolated object or human pose classification as per this feature selection analysis. Further, the removal of just these 8 features from the all-feature SVMs (Table 8.1) decreases interaction classification accuracy a lot (0.86–0.71) while having a much smaller impact on object classification accuracy (0.90–0.88) and human pose classification accuracy (0.74–0.72). Thus, these analyses do suggest that the representations of interacting images encode more than just object or human pose-related information. These 8 apparently interaction-specific features are the ones of greatest interest from the point of view of nonlinear compositionality, as they appear to be informative only for the interaction images, and not for classifying just objects or just human poses. Baldassano et al. (2016) claimed exactly the same property for the neural encodings recorded from voxels in the pSTS, based on their MVPA analysis: the pSTS shows a comparable accuracy to the LOC and the EBA only in case of human–object interactions, indicating the high sensitivity of the pSTS to such stimuli. Interestingly, they find that the accuracy of the interaction classifier is significantly reduced when tested on isolated humans or objects or even their pattern averages, implying that the relatively high accuracy of the interaction classifier is not attributable only to the object or human in the image. To examine if our interaction-specific DNN features indeed exhibit the same behavior as pSTS neural encodings, we carried out an MVPA-like decoding and cross-decoding analysis on the DNN features and compared it with the same analysis reported for the pSTS voxels by Baldassano et al. (2016). The results are depicted in Fig. 8.3. The interaction-specific features classify the interaction images with high accuracy (.∼85% on the test set; third bar of Fig. 8.3). However, classifiers trained on interactions using these features performed much less well when tested on just objects, just human poses, or their pattern averages (last 3 bars). Hence, the high accuracy of interactions is not explained solely by human- or object-specific information. Furthermore, even classifiers trained specifically on the object or human pose images using these features (first two bars) perform less well on their respective tasks than what the interaction classifier achieves. These results align well with those of Baldassano et al. (2016) for MVPA on the pSTS; hence these interaction-specific DNN features appear to be analogous to the pSTS in the brain in terms of picking up information beyond just the isolated object or human in a human–object interaction.

DNN Representations to Predict Voxel Responses Region-Wise Analysis The average correlation (.r ) was calculated in each brain region for each task, over voxels where the predicted and actual responses for that task were significantly correlated (. p-value less than the Bonferroni-corrected . p-value threshold for that region). The results are shown in Fig. 8.4. We observe that the average correlation

132

A. Jha and S. Agarwal

Fig. 8.3 SVM decoding and cross-decoding for interaction-specific DNN features. O: objects; H: human poses; I: human–object interactions; PA: pattern averages of humans and objects. . X → Y indicates a classifier trained on images of type . X and tested on images of type .Y . Figure inspired by Fig. 6 of Baldassano et al. (2016), but using DNN model features rather than fMRI voxel responses. The I.→I classifier is significantly more accurate than all others (. p < 10−13 for all pairwise tests, 100 train-test splits)

across all subjects is highest for objects in the LOC and for human poses in the EBA. This is consistent with the literature since the EBA is known to process human poserelated information, while the LOC processes object information. For human–object interaction images, we see relatively low predictability of fMRI responses from model features in all 3 regions. This may partly reflect the fact that our model representations come from a DNN pre-trained on a standard data set primarily consisting of images of individual objects. Hence, while learning its representations, the model has not really seen instances of human–object interactions, and may not be fully capturing information specific to such images. The results for the pSTS however are notable in that while predictability for interaction responses there is similar to the LOC and EBA, responses to human or object stimuli are predicted much less well in the pSTS than in the other regions. This is consistent with Baldassano et al. (2016), who find that the pSTS is less representative of isolated human or object information than the other regions, but similar to them for human–object interactions; and hence is likely capturing complementary, interaction-specific information for the latter type of stimulus.

8 Computer Vision Technology: Do Deep Neural Networks Model …

133

Fig. 8.4 Average Pearson’s.r and standard deviation across 12 subjects for voxels within each brain region where the DNN-based regression model prediction displays significant correlation with the actual fMRI signal; and for cross-decoding, we take voxels that are significantly predictable for interaction images and use the same regression model to predict their responses for object-only or human-only images instead. The baseline correlation on mismatched stimuli did not exceed 0.06 for any region

Cross-Decoding Cross-decoding was done by taking the voxel regression models trained on human– object interaction stimuli (using all DNN features) and testing them on the separate presentation of only the object or only the human pose. Figure 8.4 shows the average cross-decoding correlation on objects and humans for the same voxels where the interaction responses were significantly predictable as per our . p-value threshold (i.e., the last 3 bars for each region in the figure are over the same set of voxels). We note that the cross-decoding correlations in all three regions are much lower than those for same-task decoding. Notably, the relative cross-decoding performance is substantially worse for the pSTS than for the other two regions. This is exactly what we would expect if the pSTS voxels had a specific tendency to encode human–object interaction information which is not obtainable from a linear combination of the constituent human and object segments. Another way to get a sense of this is that, for cross-decoding in LOC or EBA, if we add up the fraction of variance explained by DNN representations of object alone or human alone, it is not a lot less than the fraction explained by the full interaction image (in Fig. 8.4, the fourth and fifth bars stacked are nearly as high as the third bar). But for the pSTS, even the sum of the object and human cross-decoding correlations

134

A. Jha and S. Agarwal

is hugely less than what we get when predicting from interactions. Thus, the linear models trained on DNN features for the pSTS voxels appear to be learning a different kind of mapping from the other two regions: a mapping that lends itself much less to predicting the responses of the same voxels for just objects or just human poses. This reinforces the hypothesis that the pSTS encodes interaction-specific information which is complementary to either object or human being present in isolation; and indicates (consistently with Table 8.1) that the DNN representations also contain at least some such information.

Mapping of Task-Specific DNN Features onto Brain Regions For each of the three brain regions of interest, we sought to see the extent to which the task-specific DNN features identified via direct classification (section “Direct Classification Using a Deep Neural Network”) were also predictive of voxel responses in that region to stimuli of the corresponding task. The results are depicted in Fig. 8.5. We see that the object-specific features are considerably more predictive of fMRI responses to object images in the LOC than in the other two regions (and a fairly large fraction of LOC voxels, 19%, were found to be significantly predictable in this case.) Similarly, the human-specific features are much more predictive of responses to human-pose images in the EBA than in the other regions (and 16% of EBA voxels were significantly predictable in this case). These observations point toward a correspondence between the task-specific features from the DNN and the respective regions in the visual cortex which are believed to specialize in object and human pose processing. The pSTS, however, is in general harder to predict once again: the fraction of significantly predictable voxels here does not exceed 6% for any task, and even over these, the correlation values with actual fMRI responses are relatively low across the board. On the assumption that the pSTS may be specialized to represent interaction-specific information, this is not entirely unexpected, given the lack of exposure of the DNN model to interaction images during training as discussed earlier. However, we again see that while object and human response predictability for the pSTS is substantively less than for the LOC and EBA, respectively, the interaction-specific DNN features are able to predict interaction responses similarly well in all three regions, reinforcing the observations with the all-feature regression models.

Mapping Features from the DNN to Voxels After mapping the top 5% of features from the DNN’s representation to every voxel based on the ranking of features by their regression weights for a given voxel, we look at features that are present in the top 5% subset for every significantly predictable voxel in a given region. Averaging across subjects, the LOC has 6 features (.σ = 2.5) which are common across all voxels that are significant for the object response task (based on the Bonferroni-corrected . p-value). The pSTS has an average of 7 features

8 Computer Vision Technology: Do Deep Neural Networks Model …

135

Fig. 8.5 Average Pearson’s .r and standard deviation across 12 subjects for voxels in each region where predictive regression models (trained on only task-specific DNN features) display significant correlation with the actual fMRI response

(.σ = 3) which are common across voxels that are significant for interactions, and the EBA has an average of 7 features (.σ = 2.1) which are common across voxels significant for human poses. This commonality of features among voxels within each region signifies that the common features for one region can be thought of as being indicative of the key visual information represented in that region. We visualized the two images from our data that maximally activate the common DNN features for each region (Fig. 8.6), and they are consistent with the posited object-, human-, and interaction-specificity of the LOC, EBA, and pSTS, respectively.

Clustering over Voxels We find that in every region, there exist smaller clusters of voxels that all share more than 15 features in common (out of the top 10% of features for each voxel). In the LOC and the pSTS, we observe 4 clusters with more than 15 features in common. Similarly, in the EBA we observe 3 such clusters. This kind of phenomenon hints toward a degree of sub-modularity in these regions based on their sensitivity to specific kinds of visual stimuli.

136

A. Jha and S. Agarwal

Fig. 8.6 Stimuli images that maximally activate features representative of the three regions (ROI visualization reproduced from Baldassano et al. (2016), with permission)

Discussion and Conclusions In this study, we sought to compare DNN representations of human–object interactions with those of the human visual cortex, as a means of modeling the latter computationally. Our results open up the possibility of establishing a correspondence between brain regions and DNN features. In particular, the final-layer DNN features, which are found to be useful for action categorisation specifically on the interaction images, are also found to display similar properties to encodings from the pSTS region of the brain, in terms of capturing information beyond that contained in the object or human sub-images. The region-wise analysis of voxel regression models indicates that DNN representations are predictive of the human brain’s responses to visual stimuli, hence implying that the former may model certain aspects of the latter. The cross-decoding analysis reveals that linear models trained to predict pSTS responses on interactions appear to learn a rather different mapping, compared to similar models trained for the LOC or EBA. Consistent with Baldassano et al. (2016), this suggests that the pSTS voxels encode some kind of nonlinear compositionality, and furthermore that

8 Computer Vision Technology: Do Deep Neural Networks Model …

137

our DNN model also contains such information, which the linear models can pick up when they have been trained to predict pSTS responses. Thus, we observe multiple lines of evidence indicative of compositional specialization of some kind in the DNN representations, analogous to what the pSTS shows for actual neural encodings. Further extensions of this work could include training the DNN architecture on larger data sets made up of visual stimuli more closely corresponding to what we are looking at: for instance, training DNNs specifically for human poses, objects, and human–object interaction images. This would presumably improve the specificity of the representations extracted from the DNN to these tasks. It may also make the feature mapping more interpretable in terms of understanding what particular features best represent a given brain region. The notion of ‘interaction’ could also be made more fuzzy, in terms of taking into account variation in the distance between the human and the object in a given interaction image (in the images used here, all interactions involved the human touching/holding/grasping the given object in some way). It would be of interest to see if there is some kind of distance threshold at which the interaction-specific representation posited here begins to show up, both for the DNN models and in actual fMRI data. On the whole, this study provides evidence for the supposition that generic finallayer DNN representations of visual stimuli have a substructure similar to that found in the visual cortex, in particular, this seems to include explicit representation for human–object interaction information which goes beyond the individual components. These observations can hopefully motivate an additional direction of study seeking to model biological vision via DNNs, by suggesting that the latter do possess the ability to compositionally represent complex visual scenes as ‘more than the sum of their parts’. Acknowledgements We are grateful to Chris Baldassano for sharing the stimuli and fMRI data, to Anirban Sengupta for inputs on MRI data analysis, and to the anonymous reviewers of earlier submissions to CogSci 2019 and CCN 2019 for helping improve the paper.

References Agrawal, P., Stansbury, D., Malik, J., & Gallant, J. (2014), Pixels to voxels: Modeling visual representation in the human brain. arXiv:1407.5104 Baldassano, C., Beck, D., & Fei-Fei, L. (2016). Human-object interactions are more than the sum of their parts. Cerebral Cortex, 27(3), 2276–2288. Barrett, D. G. T., Morcos, A. S., & Macke, J. H. (2018). Analyzing biological and artificial neural networks: Challenges with opportunities for synergy? arXiv:1810.13373 Bonner, M. F., & Epstein, R. A. (2018). Computational mechanisms underlying cortical responses to the affordance properties of visual scenes. PLOS Computational Biology, 14(4), 1–31. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference (BMVC). Cichy, R. M., & Kaiser, D. (2019). Deep neural networks as scientific models. Trends in Cognitive Sciences, 23(4), 305–317.

138

A. Jha and S. Agarwal

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? Neuron, 73, 415–434. Greene, M. R., & Hansen, B. C. (2018). Shared spatiotemporal category representations in biological and artificial deep neural networks. PLOS Computational Biology, 14(7), 1–17. Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005– 10014. http://www.jneurosci.org/content/35/27/10005.full.pdf Isik, L., Meyers, E. M., Leibo, J. Z., & Poggio, T. (2014). The dynamics of invariant object recognition in the human visual system. Journal of Neurophysiology, 111(1), 91–102. https://doi.org/ 10.1152/jn.00394.2013 Isik, L., Koldewyn, K., Beeler, D., & Kanwisher, N. (2017). Perceiving social interactions in the posterior superior temporal sulcus. Proceedings of the National Academy of Sciences, 114(43), E9145–E9152. https://www.pnas.org/content/114/43/E9145.full.pdf Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669. Pitcher, D., & Ungerleider, L. G. (2021). Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2), 100–110. https://doi.org/10.1016/j.tics.2020.11. 006, https://www.sciencedirect.com/science/article/pii/S1364661320302783 Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA (Vol. 1, pp. 568–576). MIT Press. Stone, A., Wang, H., Stark, M., Liu, Y., Phoenix, D. S., & George, D. (2017). Teaching compositionality to CNNs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 732–741).

Aditi Jha is a graduate student at the Electrical and Computer Engineering department in Princeton. Her research interests include developing statistical approaches to further our understanding of decision-making and visual perception in humans and animals. Before this, she studied electrical engineering as an undergraduate at Indian Institute of Technology, Delhi. Sumeet Agarwal teaches in the areas of Electrical Engineering, Artificial Intelligence, and Cognitive Science at IIT Delhi. His research interests are focused around the use of machine learning and statistical modelling techniques to better understand the structure, function, and evolution of complex systems, in both the biological and the social sciences.

Chapter 9

Assessment of Various Deep Reinforcement Learning Techniques in Complex Virtual Search-and-Retrieve Environments Compared to Human Performance Shashank Uttrani, Akash K. Rao, Bhavik Kanekar, Ishita Vohra, and Varun Dutt

Abstract Recently, the area of deep reinforcement learning (DRL) has seen remarkable advances in fields like medicine, robotics, and automation. Nonetheless, there remains a dearth of understanding about how cutting-edge DRL algorithms, such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), match up against human proficiency in demanding search-and-retrieve operations. Furthermore, there is a scarcity of structured assessments of the efficacy of these algorithms in intricate and dynamic surroundings after hyperparameter adjustment. To tackle this discrepancy, the research at hand aims to evaluate and contrast the impact of the proportion of targets to distractions on human and machine agents’ performance in a complex search simulation using a professional gaming engine. Moreover, the influence of the amount of neurons and layers in DRL algorithms will be scrutinized in connection to the search-and-retrieve missions. The task requires an agent (whether human or model) to traverse an environment and gather target objects while sidestepping distractor objects. The results of the study exhibit that humans accomplished better in training scenarios, whereas model agents performed better in test scenarios. Moreover, SAC emerged as a superior performer compared to PPO across all test conditions. Furthermore, boosting the amount of units and layers was found to enhance Author’s Note This research was funded by a grant from the Center for Artificial Intelligence and Robotics, Defence Research and Development Organization, under the title “Replicating human cognitive behavior on robots’ models using ACT-R cognitive architecture for search-and-retrieve missions in virtual environments” (Project number: IITM/DRDO/VD/324) and was awarded to Professor Varun Dutt. S. Uttrani · A. K. Rao · B. Kanekar · V. Dutt (B) Applied Cognitive Science Lab, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India e-mail: [email protected] I. Vohra International Institute of Information Technology Hyderabad, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_9

139

140

S. Uttrani et al.

the performance of DRL algorithms. These conclusions imply that similar hyperparameter configurations can be utilized when contrasting models are generated using DRL algorithms. The study also delves into the implications of utilizing AI models to direct human decisions. Keywords Deep reinforcement learning · Proximal policy optimization · Soft actor-critic · Human performance modeling · Unity 3D · Search-and-retrieve tasks

Summary The focus of the chapter is to measure and compare the proficiency of human participants with the newest deep reinforcement learning (DRL) techniques, such as PPO and SAC, in intricate search-and-retrieve tasks. The capabilities of these algorithms were evaluated by modifying elements of the neural network such as the amount of nodes and layers. Participants and model agents were instructed in the task of gathering target items and avoiding distractor items through a simulation known as the food collector simulation, which was designed using the Unity 3D game engine. Upon completion of the training phase, the performance of both human and model agents was evaluated across eight distinct test environments which varied in the ratio of accessible targets to distractions. The outcomes demonstrate that humans are capable of adapting to navigation and item collection faster than any of the DRL algorithms. Nevertheless, in the testing scenarios, PPO and SAC outperformed humans in most of the test environments.

Introduction In recent times, there has been a surge of interest in the marriage of computational reinforcement learning and deep learning techniques to navigate complex virtual reality setups (Bellemare et al., 2013). Researchers have been tirelessly working to design new algorithms or improve existing ones with the goal of demonstrating exceptional performance in demanding situations such as the games of Go, Shogi, Atari, and others (Zahavy et al., 2018; Zhang et al., 2018). The advancements in deep reinforcement learning (DRL) algorithms in virtual environments have drawn considerable attention over the past five to six years and have faced numerous challenges, including the management of large state spaces, learning policies, and handling random transitions (Zhang et al., 2018). To evaluate the effectiveness and robustness of these algorithms, platforms such as Arcade Learning, OpenAI Gym, and Unity ML-Agents have been established (Kristensen and Burelli, 2020). For instance, the classic first-person shooter (FPS) game Doom is embodied in VizDoom, which enables the creation and development of agents that can accomplish objectives within the game by using a screen buffer (Kempka et al., 2016). Another example is the

9 Assessment of Various Deep Reinforcement Learning Techniques …

141

work of researchers (Gao et al., 2021) who leveraged contemporary reinforcement learning algorithms, such as Proximal Policy Optimization (PPO) and Soft ActorCritic (SAC), to train an agent in the classic game of ping-pong using Unity MLAgents. Moreover, in (Peng et al., 2017), the BicNet, an actor-critic-based reinforcement learning agent using a bi-directional neural network, was employed within the Unity ML-Agents platform to learn how to solve complex problems and collaborate in the popular StarCraft game. In recent years, researchers in the area of deep reinforcement learning (DRL) have focused on developing algorithms that can outperform human experts in complex tasks like playing Go (Zahavy et al., 2018; Zhang et al., 2018). Despite these advancements, there is a scarcity of literature that explores the performance of humans in such challenges. Moreover, while researchers are continuously working to enhance DRL algorithms, few studies have been conducted to examine the impact of hyperparameters like the number of layers and nodes in the related neural networks on these algorithms’ performance. Additionally, the available research has failed to compare the performance of human beings with that of machine agents trained in DRLs in the same environment, taking into account the varying target-to-distractor ratios. The purpose of this chapter is to evaluate the capabilities of human participants, assess the effect of neural network layers and nodes in DRL algorithms such as SAC and PPO, and compare the results of both human and machine agents in a difficult and intricate search-and-find task. We used the “food collector environment” simulation, built within the Unity ML-Agents platform (Nandy and Biswas, 2018), to test the proficiency of SAC and PPO algorithms. The food collector environment, a preexisting simulation made with the Unity 3D game engine, challenges individuals to collect targets while avoiding distractors. The simulation will be altered to include various targets and distractors, with differing ratios of targets to distractors. We will then compare the results of SAC and PPO algorithms to human results obtained in the same task. The present study delves into the area of using computational reinforcement learning techniques merged with deep learning to handle intricate virtual setups. A human performance assessment is conducted in a challenging find-and-retrieve task, alongside evaluating the proficiency of two DRL algorithms, PPO and SAC, using machine agents in the food collector environment (Nandy and Biswas, 2018). The impact of changing the number of layers and nodes in the neural networks of the algorithms is also explored. The findings are analyzed and the significance of these results for the use of DRL algorithms in complex decision-making tasks is emphasized.

Background Recent research focusing on the analyses of human performance in complex searchand-retrieve tasks has only been presented from a psychological or neuroscientific perspective (Firestone, 2020; Hogg et al., 1995). For example, researchers in (Hogg

142

S. Uttrani et al.

et al., 1995) presented a human system interaction index for evaluating human performance in a similar complex environment using a pilot/operator simulation. However, no machine agents were trained on the same simulation using DRL algorithms to compare with human performance. The study conducted by Dwibedi and Vemula (2020) evaluated the possibility and success of using cutting-edge deep learning approaches such as Convolutional Neural Networks (CNNs) to identify visual patterns from information and understand the best value functions in an arcade learning environment. According to their research, the learning phase was performed without the inclusion of any prior simulation information in the design. Their results showed that the deep learning-based visual feature extraction design outperformed other reinforcement learning designs (Dwibedi and Vemula, 2020). In another research conducted by Kurzer et al. (2021), the viability of a deep Q-learning model that utilizes functional approximation techniques through artificial neural networks to determine the value function was investigated. The outcome showed that the deep neural networks trained on limited scenarios could be applied to many simulations and still produce efficient results (Kurzer et al., 2021). In 2020, Schrittwieser et al. investigated the combination of reinforcement learning algorithms and deep learning methods for learning game environments, including challenging games like Atari and Doom. Their study assessed the efficiency of the MuZero algorithm in teaching agents to play complicated games such as Atari and Chess (Schrittwieser et al., 2020). Despite the algorithm’s successful training of the agents, the research did not look into how to cope with stochastic transitions (Schrittwieser et al., 2020). Recent research by Kamaldinov and Makarov (2019) assessed the efficiency of various DRL techniques in a ping-pong game simulation. The study pitted DQNs, PPO, and asynchronous advantage Actor-Critic (A3C) algorithms against each other and found that the PPO approach yielded greater cumulative rewards when compared to the A3C and DQN methods (Kamaldinov and Makarov, 2019). Likewise, Teng (2019) conducted a study that evaluated the proficiency of DRL techniques, such as the SAC and imitation learning, in training an agent to play a game known as Snoopy Pop. The results of this research revealed that the cumulative rewards produced by both the SAC method and imitation learning were alike (Teng, 2019). Haarnoja et al. (2018a, b) compared the efficacy of PPO and SAC in simple simulations like Lily’s Garden. Although SAC was discovered to be more effective than PPO, no comprehensive analysis has been carried out on the influence of varying layers and nodes of an artificial neural network, on these algorithms’ performance. The purpose of the present examination is to rectify this issue by comparing the performance of different RL algorithms (SAC and PPO) according to diverse network designs and by evaluating these algorithms against human performance in the same task. Furthermore, this study assesses the efficiency of these algorithms in comparison to human performance in four distinctive testing circumstances.

9 Assessment of Various Deep Reinforcement Learning Techniques …

143

The Food Collector Experiment Participants A total of 20 human participants were recruited from the Indian Institute of Technology Mandi to participate in a study. Participants’ age ranged between 21 and 30 years, where the mean age of the participants was 25.5 years with a standard deviation of 3.41 years. Among the recruited participants, 16 were male, and the rest were females. Participants from different education levels participated in the study; 20% were from undergraduate programs, 50% were from postgraduate programs, and the rest were from doctoral programs. About 90% of the participants had an engineering background; whereas the rest were from humanities and social sciences backgrounds. The participants were paid a remuneration of INR 50 for their participation in the study. The top five performers based on the test score obtained entered a lucky draw for an Amazon gift card of INR 500.

Experiment Design In a study conducted by Nandy and Biswas (2018), an experiment was designed to evaluate the performance of human participants using a virtual food collector simulation (Fig. 9.1). This simulation, referred to as a search-and-retrieve task, was created using Unity 3D and consisted of different items to collect (called targets) and different items to avoid (called distractors) scattered throughout the simulated environment. Specific colors were assigned to targets, such as pink, white, and green, while distractors were assigned colors such as red, purple, and yellow. Participants were awarded + 1 point for every target they collected and −1 point for every distractor they collected. A study was performed that included various environmental circumstances, where the number of targets and distractors differed. Participants went through training within a setting that had 60 targets and 60 distractors, and the objects reappeared in new spots within the environment after they were gathered. Participants were at liberty to take as many moves as they pleased to examine the environment and collect the items. Nevertheless, the proportion of goals and distractions changed during the test environment, and the objects did not reappear once they were collected, allowing for an evaluation of the human participants’ performance in different situations.

Procedure The experiment used the food collector environment to train humans and machine agents (using SAC and PPO algorithms). Once human participants and machine agents were trained, test environments were presented with different target and

144

S. Uttrani et al.

Fig. 9.1. An image of the food collector environment in Unity 3D game engine

distractor configurations to evaluate their performance. Participants were recruited from the Indian Institute of Technology Mandi to perform the study. Later they were thanked and paid for their participation after completing the experiment.

Models According to the study, the impact of neural network specifications on the performance of PPO and SAC was explored in the food collector environment. The study employed the ML-Agent toolkit developed by Juliani et al. (2018) to implement the PPO and SAC algorithms. The simulation presented the same targets and distractors as in the human evaluation, with +1 reward attributed to each target and −1 reward assigned to each distractor. The model, designed for the food collector simulation, was equipped with three non-discrete actions for exploring and collecting the items within the simulated environment. According to a study, various configurations of PPO and SAC algorithms were analyzed to determine their performance in the food collector environment (Juliani et al., 2018). The Unity ML-Agent toolkit was utilized to apply these algorithms to the environment, which consisted of targets and distractors with respective rewards of +1 and −1. We varied the neural network configurations of the PPO and SAC algorithms by altering the number of hidden layers and nodes. Four different configurations were tested, including models with one hidden layer with 32 nodes, one hidden layer with 512 nodes, three hidden layers with 32 nodes per layer, and three hidden layers with 512 nodes per layer. The PPO and SAC algorithms were trained in various environments with differing numbers of targets and distractors and then tested in novel conditions.

9 Assessment of Various Deep Reinforcement Learning Techniques …

145

Fig. 9.2. Block diagram to explaining the architecture and working of SAC algorithm

In the training phase, the model agent’s behavior parameters were unchanged across all conditions (SAC and PPO). The behavior parameters such as agent behavior type (inference or heuristic) is a set of agent’s parameter values that help in making informed decision during a simulation run.

Soft Actor-Critic (SAC) A prior study (Haarnoja et al., 2018a, b) evaluated the performance of PPO and SAC algorithms in minimal simulations and discovered that SAC performed better than PPO. PPO uses an on-policy approach and can be hindered by limited sample availability, while off-policy algorithms like Deep Deterministic Policy Gradient (DDPG) can require extensive tuning. SAC strikes a balance between PPO and DDPG, leveraging a stochastic policy in an off-policy manner. Although SAC is believed to be superior, factors such as network design, batch size, and step size can significantly impact its performance. The architecture of SAC is depicted in Fig. 9.2.

Proximal Policy Optimization (PPO) The field of DRL has been the subject of much research, with several methods having been proposed (Rao et al., 2018, 2020). However, many of these methods come with their own set of limitations. For instance, DQN is not suitable for tasks that have a continuous action space (Rao et al., 2020), while policy gradient methods struggle with data efficiency and stability (Rao et al., 2018). Another approach, Trust Region Policy Optimization (TRPO), is difficult to employ with networks having numerous

146

S. Uttrani et al.

Fig. 9.3. Flowchart describing the working of PPO algorithm

outputs with a poor track record on tasks that demand CNNs and RNNs (Schulman et al., 2017). Thus, PPO was developed to overcome these drawbacks. PPO is a reinforcement learning (RL) approach that uses policy gradients (Wang et al., 2020). It can handle both discrete and continuous action spaces and is based on the actor-critic framework. The method trains a randomized policy in a manner that depends on the current state, and the policy is optimized through the generation of multiple pathways calculated from the existing policy (Wang et al., 2020). The pathways are then used to revise the policy and update the value function with the aid of advantage estimates and rewards (Wang et al., 2020). Lastly, the policy is updated through gradient ascent optimization, while the value function is adjusted through gradient descent optimization (Wang et al., 2020). A flowchart describing the working of PPO is shown in Fig. 9.3.

Evaluation Metric for Human and Model Performance To assess human performance, we calculated the average number of rewards (including both targets and distractors) collected by 20 human participants in both training and testing environments. However, for evaluating the performance of the model, we used the individual values of rewards (including both targets and distractors) collected by the model in both training and testing environments. We plotted the progression of targets over a series of episodes and calculated the rate of increase or decrease of the targets (known as the “target slope”). In a similar manner, we plotted the progression of distractors over a series of episodes and found the rate of change of the distractors (known as the “distractor slope”) during the test phase. Human and model performance was calculated using Eq. 9.1.

9 Assessment of Various Deep Reinforcement Learning Techniques …

147

Fig. 9.4. Evaluation of gradient for a targets and b distractors using Eq. 9.1

Per f or mance = nor msinv(target slope) − nor msinv(distractor slope) (9.1) The function named “normsinv” returns the inverse of the standard normal cumulative distribution. This means that it gives the value of the standard normal variable that corresponds to a given probability. When calculating the performance of agents in the simulation task, the collection of more target items than distractor items infers to a positive score (Fig. 9.4).

Results Human Experiment Results Training and Test Results The performance of human participants was evaluated in training and different test conditions by plotting the graph between the collected rewards averaged across 20 participants and the average of steps taken by the human participants. In the train condition, human participants took less than 900 steps to explore the environment and collect a maximum reward of 79.25 averaged across 20 participants. Figure 9.5 presents the human participant’s performance in the task in the training phase. Among 20 human participants, 8 achieved a median score (cumulative reward) of 56 in 435 steps, whereas toward the end of the curve, 762 steps were taken by 3 participants, 803 steps by 2 participants, and only one participant took more than 843 steps to achieve maximum cumulative reward (see Fig. 9.5). After training, human participants were transferred to test conditions where feedback on collecting an object was absent. The human performance was tested across eight different test environments which varied in the number of distractor items and target items as shown in Table 9.1. We evaluated human performance in test conditions using Eq. 9.1 across all the environment configurations.

148

S. Uttrani et al.

Fig. 9.5. Graph of human performance in training condition averaged across 20 participants

Table 9.1. Performance of human participants and average human scores, and standard deviation across different test environment configurations in the food collector simulation S. No.

Environment configuration Number of targets

Number of distractors

Average human scores (Std. Dev.)

Human performance

1

12

108

−20.50 (15.47)

−0.13

2

24

96

0.05 (13.58)

0.00

3

36

84

5.45 (12.49)

0.05

4

48

72

27.00 (12.38)

0.19

5

108

12

34.80 (33.74)

0.67

6

96

24

43.35 (28.29)

0.46

7

84

36

19.10 (29.46)

0.33

8

72

48

15.45 (16.63)

0.35

Influence of Number of Targets and Distractors on Human Performance Table 9.1 shows that human performance was directly proportional to the amount of targets item available in the test phase. With an increase (or decrease) in the total number of available targets present in the test environment, the human performance increased (or decreased). This result meets our expectations.

9 Assessment of Various Deep Reinforcement Learning Techniques …

149

Model Results Training and Test Results The ability of the model agents was tested in a simulated setting by comparing the correlation between the rewards received and the number of steps taken by each model. The models trained with the PPO algorithm took roughly 6 million steps to earn a maximum cumulative reward of 90, while those trained with the SAC algorithm only took about 2 million steps to reach a peak cumulative reward of 98. The training results for the four models trained with the PPO method are depicted in Fig. 6a and those trained with the SAC algorithm are shown in Fig. 6b. The fourth model, which had a neural network with three hidden layers and 512 units, demonstrated the highest performance using the PPO method, as illustrated in Fig. 6a. However, it is evident from Fig. 6c that the fourth model performed best during the early stages of training when using the SAC algorithm. After the training process was complete, the model agents were evaluated under eight different test conditions that mirrored the trials of human participants. A comparison of their performance was conducted, and the results are presented in Table 9.2. The table highlights the outcomes of the model agents under varying neural network structures and environmental setups, with a difference in the number of target items and distractors items, and using both the SAC and PPO algorithms.

Impact of Number of Intermediate Layers and Nodes on Model Performance The human performance (shown in Table 9.2) was directly proportional to the amount of target items available in the test phase. With an increase (or decrease) in the total number of available targets present in the test environment, the human performance increased (or decreased). This result meets our expectations.

Discussion Earlier studies have employed DRL algorithms such as DQN that rely on neural networks to estimate the Q-value. In contrast, policy optimization techniques use policy gradients and DNNs to approximate the policy. Algorithms of this sort include SAC, PPO, and DDPG. According to research, SAC is said to outperform PPO (Haarnoja et al., 2018a, b). Despite these advancements, the effectiveness of these algorithms in game environments with diverse neural network specifications and the comparison between human performance and these cutting-edge deep learning algorithms remains uncertain.

150

S. Uttrani et al.

Fig. 9.6. Assessing the performance of the model agent in a scenario containing 75 targets and 75 distractors through the use of DRL across different algorithms

9 Assessment of Various Deep Reinforcement Learning Techniques …

151

Table 9.2. Performance of model agents across different neural networks and test environment configurations in the food collector simulation S. No.

Neural network configuration

Environment configuration

Performance (PPO)

Performance (SAC)

1

1

32

12

108

−0.42

−11.99

2

3

32

12

108

−0.26*

0.20*

3

1

512

12

108

−0.48

−0.21

4

3

512

12

108

−0.44

−0.09

5

1

32

24

96

0.44

0.49

6

3

32

24

96

1.35*

1.66

7

1

512

24

96

1.12

0.91

8

3

512

24

96

1.24

1.80*

9

1

32

36

84

0.36

0.29

10

3

32

36

84

0.78*

1.19

11

1

512

36

84

0.77

0.75

12

3

512

36

84

0.75

1.39*

13

1

32

48

72

0.20

0.12

14

3

32

48

72

0.85*

1.20*

15

1

512

48

72

0.61

0.67

16

3

512

48

72

0.65

1.09

17

1

32

108

12

0.73

−11.99

18

3

32

108

12

0.89

1.37*

19

1

512

108

12

1.23

1.26

20

3

512

108

12

1.49*

1.33

21

1

32

96

24

−0.51

−0.43

22

3

32

96

24

−0.06*

0.70*

23

1

512

96

24

0.49

0.04

24

3

512

96

24

0.11

−0.07

25

1

32

84

36

−0.28

−0.26

26

3

32

84

36

0.19

0.34

27

1

512

84

36

0.00

0.34

28

3

512

84

36

0.39*

0.34

29

1

32

72

48

−0.05

−0.11

30

3

32

72

48

0.48*

1.18*

31

1

512

72

48

−0.07

0.15

32

3

512

72

48

−0.01

0.55

*

denotes the best performing model agents in their respective environment configurations

152

S. Uttrani et al.

The purpose of our research was to investigate the influence of the amount of targets and distractions on human performance, as well as the impact of neural network characteristics such as the quantity of intermediate layers and neurons (units) on the performance of models that utilize DRL algorithms such as SAC and PPO. The aim was also to contrast the outcome of human participants with the performance of model agents trained and tested in comparable environments using advanced DRL algorithms such as SAC and PPO. The findings showed that human participants improved at a faster pace compared to advanced deep reinforcement learning methods like PPO and SAC. At first, the SAC approach produced better results in terms of the maximum cumulative rewards collected compared to PPO, but eventually, the performance of the most complex PPO model with three intermediate layers and 512 neuronal units surpassed it. Nevertheless, despite the intricate design of the PPO and SAC models, human participants still managed to outperform them in the training scenario. The results of the testing phase showed that artificial intelligence models equipped with PPO and SAC algorithms outperformed human participants in the majority of simulated scenarios. The human participants’ haste to gather targets as fast as possible caused their performance to decrease. The research found that PPO and SAC algorithms improved with the increase in the amount of intermediate layers, and for SAC, increasing the amount of nodes was advantageous in some situations. The outcomes of this study indicate that there is room for additional exploration into the optimization and evaluation of various deep reinforcement learning algorithms. This could involve creating a more intricate scenario with various hindrance and adversary elements, creating mental models like the instance-based learning model to imitate human investigation and utilization methods, and constructing multiplayer simulations to assess the influence of model agents using SAC and PPO on human performance in target-distractor assignments. Furthermore, additional comparisons with other reinforcement algorithms such as DQN and SARSA can also be made.

Conclusion In this investigation, the influence of the presence of distractions and objectives on human abilities was examined, and the capabilities of modern DRL algorithms were compared to human participants in a search simulation (Haarnoja et al., 2018a, b). The outcomes revealed that human subjects learned faster than the most advanced artificial intelligence algorithms, particularly PPO and SAC. The research also looked into the influence of network structure on RL algorithms and discovered that in some cases PPO can perform better than SAC, which contradicts prior literature that states SAC is superior to PPO (Haarnoja et al., 2018a, b). This conclusion highlights the significance of reporting the hyperparameters when evaluating algorithms , as it provides impartial information on baseline results. The study offers fresh perspectives on how to investigate the performance of algorithms in diverse test

9 Assessment of Various Deep Reinforcement Learning Techniques …

153

conditions, which can be used as a reference point for testing in DRL agents (Haarnoja et al., 2018a, b). Further study can expand on this research in order to delve deeper into the functioning of various DRL algorithms. This can involve constructing a simulation task with varying degrees of difficulty to assess the performance of agents employing SAC and PPO in challenging scenarios. It is also possible to create instance-based learning cognitive models that simulate human-like exploration and exploitation strategies in search and collection situations (Lejarraga et al., 2012). Multiplayer simulations can be developed to examine the influence of SAC and PPO agents on human performance in tasks that involve target items and distractor items. The study’s methodology can also be extended to make a thorough comparison with other RL algorithms, like SARSA and DQN (Mnih et al., 2013; Zhao et al., 2016). These considerations will be taken into account in future studies.

References Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47. https:/ /doi.org/10.1613/jair.3912 [Record #8 is using a reference type undefined in this output style.]. Dwibedi, D., & Vemula, A. (2020). Playing Games with Deep Reinforcement Learning. Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562. https://doi.org/10.1073/pnas.1905334117. Gao, Y., Tebbe, J., & Zell, A. (2021). Optimal stroke learning with policy gradient approach for robotic table tennis. arXiv:2109.03100. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018a). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research. https:/ /proceedings.mlr.press/v80/haarnoja18b.html Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., & Abbeel, P. (2018b). Soft actor-critic algorithms and applications. arXiv:1812.05905. Hogg, D. N., Folles⊝, K., Strand-Volden, F., & Torralba, B. (1995, November 01). Development of a situation awareness measure to evaluate advanced alarm systems in nuclear power plant control rooms. Ergonomics, 38(11), 2394–2413. https://doi.org/10.1080/00140139508925275. Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., & Mattar, M. (2018). Unity: A general platform for intelligent agents. arXiv:1809.02627. Kamaldinov, I., & Makarov, I. (2019, August 20–23). Deep reinforcement learning in match-3 game. In 2019 IEEE Conference on Games (CoG). Kempka, M., Wydmuch, M., Runc, G., Toczek, J., & Ja´skowski, W. (2016, September20–23). ViZDoom: A doom-based AI research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG). Kristensen, J. T., & Burelli, P. (2020). Strategies for using proximal policy optimization in mobile puzzle games. In International Conference on the Foundations of Digital Games, Bugibba, Malta. https://doi.org/10.1145/3402942.3402944. Kurzer, K., Schörner, P., Albers, A., Thomsen, H., Daaboul, K., & Zöllner, J. M. (2021). Generalizing decision making for automated driving with an invariant environment representation using deep reinforcement learning. arXiv:2102.06765. Lejarraga, T., Dutt, V., & Gonzalez, C. (2012). Instance-based learning: A general model of repeated binary choice. Journal of Behavioral Decision Making, 25(2), 143–153.

154

S. Uttrani et al.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv:1312.5602. Nandy, A., & Biswas, M. (2018). Unity ml-agents. In Neural Networks in Unity (pp. 27–67). Springer. Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv:1703.10069. Rao, A. K., Chandra, S., & Dutt, V. (2020). Desktop and Virtual-Reality Training under varying degrees of task difficulty in a complex search-and-shoot scenario. In International Conference on Human-Computer Interaction. Rao, A. K., Satyarthi, C., Dhankar, U., Chandra, S., & Dutt, V. (2018, October 7–10). Indirect visual displays: influence of field-of-views and target-distractor base-rates on decision-making in a search-and-shoot task. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., & Silver, D. (2020, December 01). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609. https:// doi.org/10.1038/s41586-020-03051-4. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347. Teng, E. (2019). Training your agents 7 times faster with ml agents. https://blogs.unity3d.com/2019/ 11/11/training-your-agents-7-times-faster-with-ml-agents. Wang, Y., He, H., & Tan, X. (2020). Truly proximal policy optimization. In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v115/wang20b.html. Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D. J., & Mannor, S. (2018). Learn what not to learn: Action elimination with deep reinforcement learning. arXiv:1809.02121. Zhang, A., Satija, H., & Pineau, J. (2018). Decoupling dynamics and reward for transfer learning. arXiv:1804.10689. Zhao, D., Wang, H., Shao, K., & Zhu, Y. (2016). Deep reinforcement learning with experience replay based on SARSA. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

Shashank Uttrani is a research scholar at the Applied Cognitive Science Lab at the Indian Institute of Technology Mandi. He is currently pursuing his MS (by research) degree in the School of Computing and Electrical Engineering at the Indian Institute of Technology Mandi under the guidance of Prof. Varun Dutt. He completed his B.E. in Computer Science from Birla Institute of Technology Mesra in 2016. Mr. Uttrani has been working in computational cognitive science domain to model human decisions using theories of cognition and machine learning approaches. His interests include cognitive modeling, human preference reversal, reinforcement learning, and behavioral cybersecurity. Akash K Rao is a Ph.D. scholar at the Applied Cognitive Science Laboratory at the Indian Institute of Technology Mandi under Prof. Varun Dutt. He completed his M.S. (By Research) in the School of Computing and Electrical Engineering at the Indian Institute of Technology Mandi in 2020. His research interests include cognitive neuroscience, brain–computer interfaces, and extended reality. Bhavik Kanekar is currently working as a project associate in Applied Cognitive Science Lab, Indian Institute of Technology Mandi, in collaboration with the Centre for Artificial Intelligence & Robotics, Defence Research and Development Organisation project. He received an MTech degree in Computer Engineering from Sardar Patel Institute of Technology, Mumbai University, in 2020

9 Assessment of Various Deep Reinforcement Learning Techniques …

155

and a B.E. degree in Computer Engineering from Ramrao Adik Institute of Technology Mumbai University in 2017. His current research concentrate on Reinforcement learning. Ishita Vohra is a recent graduate from IIIT Hyderabad. In her 4th year, fascinated by the field of HCI, she interned at IIT Mandi. Besides this, at IIIT, her work involved exploring the bias in voice assistants and measuring awareness and emotion of tweets related to climate change in India. Intrigued by the application of machine learning in the financial sector, she pursued an internship at Goldman Sachs in the 3rd year of her graduate studies. She is currently working as a full-time employee in the Compliance Division of Goldman Sachs. Varun Dutt works as an Associate Professor in the School of Computing and Electrical Engineering and the School of Humanities and Social Sciences at the Indian Institute of Technology Mandi. Dr. Dutt has applied his knowledge and skills in the fields of psychology, public policy, and computer science to explore how humans make decisions on social, managerial, and environmental issues. Dr. Dutt has used lab-based methods involving experiments with human participants and cognitive models to investigate his research questions.

Chapter 10

Cognate Identification to Augment Lexical Resources for NLP Shantanu Kumar, Ashwini Vaidya, and Sumeet Agarwal

Abstract Cognates are words across different languages that are known to have a common ancestral origin. For example, the English word night and the German Nacht, both meaning night are cognates with a common ancestral (Proto-Germanic) origin. Cognates are not always revealingly similar and can change substantially over time such that they do not share form similarity. Automatic cognate identification determines whether a given word pair is cognate or not. A cognate pair may have diverged at the surface level over time, but it shares a common ancestor and is likely to have similar meanings. This is especially true in languages that are typologically closer to each other. Our system uses a character-level model with recurrent neural network architecture and attention. We test its performance on datasets drawn from three different language families. Our results show an improvement in performance as compared to existing models and highlight the usefulness of phonetic and conceptual features. Our model finds similar word pairs with high accuracy from a pair of closely related languages (Hindi and Marathi). One of the applications of our work is to project linguistic annotations from a high-resource language to a (typologically-related) low-resource language. This projection can be used to bootstrap lexical resource creation, e.g., predicate frame information, word sense annotations, etc. The bootstrapping of lexical resources is particularly relevant for languages in South Asia, which are diverse but share areal and typological properties. Apart from this application, cognate identification helps improve the performance of tasks like sentence alignment for machine translation. Keywords Computational linguistics · Neural networks · Lexical similarity · Language change At the time of writing this paper, Shantanu Kumar was a student at Indian Institute of Technology, Delhi. He is now working at Sizmek, USA. S. Kumar · A. Vaidya (B) · S. Agarwal IIT Delhi, New Delhi, India e-mail: [email protected] S. Agarwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_10

157

158

S. Kumar et al.

Introduction Cognates are words across different languages that are known to have originated from the same word in a common ancestral language. For example, the English word Night and the German word Nacht, both meaning Night and English Hound and German Hund, meaning “Dog” are cognates whose origin can be traced back to Proto-Germanic. Cognate words are not simply the translations of each other in any two languages, but are historically known to have a common origin. For example, the English word Hound and the Spanish word Perro both mean “Dog” but are not cognates. Traditionally, the identification of cognates was carried out by historical linguists, using word lists and establishing sound correspondences between words. These are useful in determining linguistic distance within a language family, and also to understand the process of language change. Cognate information has also been used in several downstream NLP tasks, like sentence alignment in bi-texts (Simard et al., 1993) and improving statistical machine translation models (Kondrak et al., 2003). Additionally, it has been proposed that cognates can be used to share lexical resources among languages that are closely related (Singh & Surana, 2007). For some time now, there has been a growing interest in automatic cognate identification techniques. Most approaches for this task focus on finding similarity measures between a pair of words such as orthographic or phonetic similarity (Hauer & Kondrak, 2011; Inkpen et al., 2005; List et al., 2016). These are used as features for a classifier to identify cognacy between a given word pair. For instance, Rama (2015) attempt to identify cognates by looking at the common subsequences present in the candidate word pair. For a cognate pair like the English Wheel and the Sanskrit Chakra, such an approach fails as they have nothing in common with each other orthographically. In fact, even for a pair like English Father and Latin Pater, a common subsequence approach completely ignores the similarity between the Fa and Pa phonemes, which is a possible indication of cognacy between the pair. Such surface similarity measures miss out on capturing generalizations beyond string similarity, as cognate words are not always revealingly similar. Thus, there is a need for information about phonological similarity that is beyond surface similarity, such as the sound correspondences that are used in historical linguistics to narrow down candidate pairs as cognates. By using DL-based models, the need for external feature engineering is circumvented as the system learns to find hidden representations of the input depending on the task in hand. We make use of an end-to-end character-level recurrent neural network (RNN) based model that is adapted from a model used on a word level entailment task (Rocktäschel et al., 2016). Our model is able to outperform both the common subsequence model (Rama, 2015) as well as a recent CNN-based model (Rama, 2016) on the task. LSTM (Long Short Term Memory) networks are being used in an extensive range of NLP tasks to build end-to-end systems. LSTMs have been successfully applied

10 Cognate Identification to Augment Lexical Resources for NLP

159

to machine translation (Bahdanau et al., 2014), language modeling (Mikolov et al., 2010), information retrieval (Sordoni et al., 2015) and RTE (Bowman et al., 2015). In the subsequent sections, we describe our LSTM-based Siamese-style architecture which uses character by character attention to enrich the representations of the input word pairs. We compare our model against existing supervised approaches. We also demonstrate the importance of the attention layer and the role of word semantics in the task and examine how the properties of the datasets specifically with respect to size and transcription affect the models. The task of discovering cognates can be particularly useful among the languages of South Asia, which are not rich in lexical resources. Information about cognates can become an important source for assisting the creation and sharing of lexical resources between languages. Therefore, another contribution of this work is to apply our cognate detection model to a real language pair. We apply our model to the domain of Hindi–Marathi, using a large unlabelled corpus of aligned texts to find cognate pairs.

Problem Statement The task of cognate identification will make use of word lists from different language families taken from the basic vocabulary, e.g., kinship terms, body parts, numbers, etc. Usually, this vocabulary will represent concepts from the language itself and not borrowed items, (although this is also possible at times). Table 10.1 shows a part of a word list that is used for the task. Each cell in the table contains a lexical item belonging to a particular language and a particular concept, along with its cognate class ID. If two words have the same cognate class ID, then they are identified as cognates. The task of pairwise cognate prediction can thus be more formally defined as follows: given two words from a word list, belonging to different languages but to the same concept, predict whether the words in the pair are cognates. Thus, a model for this task would take as input a candidate pair of words and produce as output a single value classifying the pair as cognate or non-cognate.

Table 10.1 Sample word list from the Indo-European dataset Concept ALL BIG Language ENGLISH FRENCH MARATHI HINDI

all tut serve seb

001 002 006 006

big grand motha bara

ANIMAL 009 010 011 012

animal animal jenaver janver

015 015 017 017

160

S. Kumar et al.

Model The overall model used in our system is called the Recurrent Co-Attention Model (CoAtt). It is adapted from the word-by-word attention model used by Rocktäschel et al. (2016) for the task of recognizing textual entailment (RTE) in natural language sentences. Just as the RTE task involves understanding the semantics of a sentence that is hidden behind the sequence of words, the cognate identification task also requires information beyond surface character similarity, which was the motivation to adapt this particular model for our task. The network is illustrated in Fig. 10.1. We have converted the RTE model into a siamese-style network that encodes a word pair in parallel and then makes a discriminative judgment in the final layer. The input words are first encoded into character-level embeddings followed by a bidirectional LSTM network and finally a character by character attention layer as described in the subsections that follow. The encodings of both words are merged and passed through a 2-layer neural network classifier with tanh and sigmoid activations to make a final binary prediction. Additionally, we also add a Language features vector or a Concept features vector to the model by concatenating it with the merged attention vector before passing it to the 2-layer neural network.

Fig. 10.1 Recurrent co-attention network for cognate discovery

10 Cognate Identification to Augment Lexical Resources for NLP

161

Character Embeddings The input words are first encoded into character-level embeddings. Character embeddings are a form of distributional representation, where every character of the vocabulary is expressed as a vector in a vector space. This is done using a character-level embedding matrix . E ∈ Rn e ×|C| . Here .n e is the dimensionality of the embeddings and .C is the vocabulary of all characters. Thus for an input word .x which can be represented as a sequence of characters .x = {ci1 , ci2 , . . . , cin }, is transformed into a sequence of vectors . y = {ei1 , ei2 , . . . , ein } where .e j is the . jth column of the . E matrix. This embedding matrix is learnt during training and each column in the matrix represents the embedding vector of the respective token in the vocabulary. There are two ways of initializing the character embedding matrix for training. The matrix can be randomly initialized by sampling values from a uniform distribution. In such a case, the embeddings are dependent heavily on the training and the random vectors assigned to each character can change during training to such values that are optimal for the task. The other method of initializing the embeddings matrix is by using phonetic feature vectors (PV). These phonetic vectors are manually defined binary vectors that are based on various linguistic properties of phonemes such as place of articulation (Dental, Nasal) and manner of articulation (Fricative, Voiced, Lateral). We adapted these feature vectors from Rama (2016) after some minor corrections.

LSTM After the input words to the network are transformed using the character embedding matrix, we encode them with an LSTM. Given an input word . y = {e1 , e2 , . . . , en }, at every time step .t the LSTM of hidden unit size .n h uses the next input .et , the previous output .h t−1 and the previous cell state .ct−1 to compute the next output .h t and the next cell state .ct as follows, .

H = [et h t−1 ]

(1)

i t = σ (W H + b ) ot = σ (W o H + bo )

(2) (3)

f t = σ (W f H + b f ) ct = i t ∗ tanh(W c H + bc ) + f t ∗ ct−1

(4) (5)

h t = ot ∗ tanh(ct )

(6)

i

i

Here .W i , .W o , .W f , .W c ∈ Rn e +n h ×n h and .bi , .bo , .b f , .bc ∈ Rn h are trained weights of the LSTM. .[ ] is the concatenation operator and .σ is the element-wise sigmoid operator. The final output of the LSTM gives us a sequence .{h 1 , h 2 , . . . , h n } for each word, where .h j ∈ Rn h .

162

S. Kumar et al.

Attention Layer Attention neural networks have been used extensively in tasks like machine translation (Luong et al., 2015), image captioning (Xu et al., 2015), and visual question answering (Yang et al., 2016). At a high level, attention can be considered as a soft selection procedure, where given a sequence of inputs, one would like to focus or attend on the important part of the sequence with respect to a context. This procedure is used to enhance the representation of the character sequence of a word coming out of the LSTM, by giving it as context the second word. One can compare the attention mechanism with the common subsequence model. In a common subsequence model, one makes a hard selection procedure by using as features only the common subsequences of both the words. In the attention model, the network makes a soft selection while focusing on those parts of the sequence which are important with respect to the other word. Given a character vector .h ∈ Rn h using which we would like to attend on a sequence of character vectors .Y = {c1 , c2 , . . . , c L } ∈ Rn h ×L , we generate a set of attention weights .α and an attention weighted representation .r ∈ Rn h of .Y as, .

M = tanh(W y Y + W h h ⊗ e L ) α = softmax(w M) T

r=

Y αtT

(7) (8) (9)

The outer product .W h h ⊗ e L repeats the linearly transformed .h as many times as there are characters in .Y (. L times). Using the mechanism followed by Rocktäschel et al. (2016) for word-by-word attention, we employ a character by character attention model, wherein we find an attention weighted representation of the first word .Y = {c1 , c2 , . . . , c L } ∈ Rn h ×L at every character of the second word n ×N . H = {h 1 , h 2 , . . . , h N } ∈ R h . .

Mt = tanh(W y Y + (W h h t + W r rt−1 ) ⊗ e L )

(10)

αt = softmax(w Mt )

(11)

T

rt =

Y αtT

+ tanh(W rt−1 ) t

(12)

Here .W y , .W h , .W r , .W t ∈ Rn h ×n h and .w ∈ Rn h are trained weights of the Attention layer. The final output gives us .r N = rY H which is the attention weighted representation of .Y with respect to . H . Similarly, we also obtain .r H Y . The final feature vector ∗ .r that is passed to the multi-layer perceptron for classification is the concatenation of .r H Y and .rY H vectors. This method of making both the character sequences attend over each is called the Co-Attention mechanism.

10 Cognate Identification to Augment Lexical Resources for NLP

163

Language and Concept Features It is known that some languages are more closely related to each other as compared to others. For example, from Table 10.1, one can see that Hindi is more related to Marathi than to French. That is a candidate word pair with words from Hindi and Marathi is more likely to be a cognate pair as compared to a word pair with words from Hindi and French. This information about language relatedness can be exploited by using as features a 2-hot encoding vector that represents the respective languages of the two input words. During training, the network can use these features to learn automatically which language pairs are closely related to the data. Similar to language information, we hypothesize that information about the semantics of the input words can also be useful features for the classifier. The word semantics inherently contain information like the part-of-speech (POS) category of the word. Our initial exploration of the data showed that certain POS categories such as question words or prepositions tend to have greater divergence in their cognate classes as compared to nouns or adjectives. We use the GloVe word embedding (Pennington et al., 2014) for the English concept of the input word pair as the concept feature vector. Word embeddings are distributional representations of words in a low-dimensional space compared to the vocabulary size and they have been shown to capture semantic information about the words inherently.

Experiments In the subsections below, we describe the datasets that we used and the comparisons we made with our models. This is followed by the three experiments we conducted with the datasets.

Datasets We make use of three datasets in our work which come from three language families. These families make a good test set as they vary widely in terms of the number of languages, concepts, and cognate classes. The first and primary dataset that we use is the IELex Database, which contains cognacy judgments from the Indo-European

164

S. Kumar et al.

Table 10.2 Statistics about the different language families Language family Languages Concepts Unique lexical items Indo-European Austronesian Mayan

52 100 30

208 210 100

8622 10079 1629

Cognate classes 2528 4863 858

Table 10.3 Number of word pairs obtained for both modes of evaluation from different language families Indo-European Austronesian Mayan Total Positive Total Positive Total Positive Cross language evaluation Training 218,429 samples Testing 9,894 samples Cross concept evaluation Training 223,666 samples Testing 103,092 samples

56,678

333,626

96,356

25,473

9,614

2,188

20,799

5,296

1,458

441

61,856

375,693

126,081

28,222

10,482

21,547

150,248

41,595

12,344

4,297

language family. The dataset is curated by Michael Dunn.1 Second, we include a dataset taken from the Austronesian Basic Vocabulary project (Greenhill et al., 2008), and a third dataset from the Mayan family (Wichmann & Holman, 2008). There are several differences in transcription in each of these datasets. While IndoEuropean is available in IPA, ASJP, and a coarse “Romanized” IPA encoding, the Mayan database is available in the ASJP format (similar to a Romanized IPA) (Brown et al., 2008) and the Austronesian has been semi-automatically converted to ASJP (Rama, 2016). We use subsets of the original databases due to lack of availability of uniform transcription. The Indo-European database contains words from 52 languages for over 200 concepts, while the Austronesian contains words from 100 languages and as many concepts, as can be seen in Table 10.2. The Mayan dataset is comparatively very small with only 100 concepts from 30 languages. The Austronesian dataset also contains the largest number of cognate classes as compared to the other two. The number of sample pairs obtained from each dataset is mentioned in Table 10.3. The small size of the Mayan dataset especially poses a challenge for training the deep learning models which is addressed in the later sections.

1

http://ielex.mpi.nl/.

10 Cognate Identification to Augment Lexical Resources for NLP

165

Evaluation There are two methods of evaluation that we follow in our experiments, namely the cross-language evaluation and cross-concept evaluation. In cross-language evaluation, the training and testing sample pairs are created using exclusive sets of languages, whereas in cross-concept they come from an exclusive set of concepts. This can be done by dividing the word list in Table 10.1 on the basis of rows or columns, respectively for cross-language or cross-concept, into training and testing sets and then forming the sample pairs from them. Both words in a sample pair always belong to the same concept or meaning. A sample pair is assigned a positive cognate label if their cognate class ids match. We report the F-score and the area under the PR curve (AUC) as a measure of performance for all the models. F-score is computed as the harmonic mean of the precision and recall.2 Since the dataset is heavily biased and contains a majority of negative cognate sample pairs, we do not use accuracy as a measure of performance.

Baseline Models We compare our model against a model based on surface similarity (subsequence model) and two CNN-based DL models as described below: Gap-weighted Subsequences: This model refers to the common subsequence model (Rama, 2015) mentioned earlier. The author uses a string kernel-based approach wherein he defines a vector for a word pair using all common subsequences between them and weighting the subsequence by their gaps in the strings. The results reported for the subsequence model were found by implementing the model using the paper as the original code was not available. Phonetic CNN and Character CNN: These models are variations of the siamesestyle CNN-based models (Rama, 2016). The models are inspired by convolutional networks used for image-similarity tasks. The Phonetic CNN model uses the manually defined phonetic feature vectors as character embeddings in the network (but they are fixed during training), whereas the Character CNN model uses a 1-hot encoding to represent the different characters. The results reported for these models were found by rerunning the original code from the author on the prepared datasets.3 LSTM .+ No Attention and LSTM .+ Uniform Attention: We also introduced two sanity-check baseline models to test the attention layer of the CoAtt model. The

Precision and Recall is computed on positive labels at 0.5 threshold. Precision .= TP/(TP .+ FP), Recall .= TP/(TP .+ FN), TP: True Positives, FP: False Positives, FN: False Negatives. 3 It can be noted that there is a difference in the reported F-score of the CNN models as compared to the original paper Rama (2016). This is because we report the f-score with respect to the positive labels only, whereas the original paper reported the average f-scores of positive and negative labels (Observed from the implementation in the author’s code). 2

166

S. Kumar et al.

LSTM .+ No Attention model removes the Attention layer from the CoAtt model, while the LSTM .+ Uniform Attention model does a simple average rather than a weighted average in the attention layer.

Experiment 1: Cross-Language Evaluation As can be observed in Table 10.4, the CoAtt model performs significantly better than the baseline models (both CNN and subsequence-based). The LSTM .+ No Attention and LSTM .+ Uniform Attention models reflect the importance of the attention layer. The soft selection procedure of attention is able to highlight the important features of either word automatically for the classification task. The additional features added to the CoAtt model help to improve its performance further. Initializing the character embeddings with the manually defined phonetic vectors (.+ PV models) increases the AUC by around 3%. Further, addition of the Concept features discussed earlier is also found to be useful (.+ CF model). The concept features can intuitively help the model to indicate different thresholds to use depending on the kind of variation in cognates observed for that concept. The Mayan language family is linguistically and geographically less diverse than the other datasets (see also Table 10.2). The addition of Concept features significantly improves the CoAtt model on the Mayan dataset. This shows the usefulness of semantic information for cognacy detection when less data is available. For IndoEuropean and Austronesian, on the other hand, the CoAtt performs more effectively than the subsequence models or the Phonetic or Character CNN, even without the concept features. Table 10.4 shows that the CoAtt model does not train well on the Mayan dataset directly. It is found that the loss does not decrease a lot during training as compared to the other models. This poor performance on the Mayan dataset is associated with its

Table 10.4 Cross language evaluation results Indo-European Model F-Score AUC

Austronesian F-Score AUC

Mayan F-Score

AUC

Gap-weighted subsequence Phonetic CNN Character CNN LSTM .+ No attention LSTM .+ Uniform attention Co-attention model .+ PV .+ PV .+ CF

58.8 54.6 62.2 51.2 49.8 69.0 70.2 70.5

71.8 72.8 75.9 60.6 60.8 67.1 63.6 81.5

81.8 85.0 85.7 67.1 66.1 67.7 71.3 89.0

59.0 73.7 75.3 56.7 52.8 83.8 85.1 86.2

75.5 86.1 85.3 59.0 59.4 89.2 92.4 93.0

PV phonetic feature vectors, CF concept features

68.9 68.0 71.6 55.2 52.7 77.5 79.3 79.7

10 Cognate Identification to Augment Lexical Resources for NLP

167

Table 10.5 Cross-language evaluation results for Mayan dataset with pre-training Mayan Model F-score AUC Gap-weighted subsequence Phonetic CNN Character CNN Co-attention model .+ PV .+ PV .+ PreT (Indo-European) .+ PV .+ PreT (Austronesian)

71.8 72.8 75.9 67.1 63.6 82.5 83.5

81.8 85.0 85.7 67.7 71.3 90.6 91.2

PV Phonetic feature vectors, PreT Pre-training on another dataset

small size and relatively fewer number of languages and this does not prove sufficient for training the CoAtt network. We justify this hypothesis in the following section with the Cross-Family Pre-training experiment.

Experiment 2: Cross-Family Pre-training The three different language families with which we work have completely different origins and are placed across different regions geographically. We test whether any notion of language evolution might be shared among these independently evolved families. This is done through the joint learning of models. The network is instantiated with the combined character vocabulary of two datasets. Then the model is trained on one dataset till the loss is saturated. This is followed by the training on a second dataset, starting from the weights learned from the pre-training. It is found that such a joint-training procedure helps the CoAtt model on the Mayan dataset significantly. The pre-training procedure is able to provide a good initialization point to start training on the Mayan dataset. The pre-trained models perform better than the baseline models (PreT models in Table 10.5). This shows that pre-training CoAtt is helpful and also points to the fact that some regular sound changes could potentially be shared across language families (keeping in mind that the datasets are coarsely transcribed, with a few similarities that may be coincidental).

Experiment 3: Cross Concept Evaluation The cross-concept evaluation test can be thought of as a more rigorous test as the models have not seen any similar word structures during training. The testing sample words are from completely different concepts. Words coming from different concepts would have different sequence structures altogether. A model that predicts cognate

168

S. Kumar et al.

Table 10.6 Cross concept evaluation results for Indo-European Indo-European Model F-score AUC Gap-weighted subsequence Phonetic CNN .+ LF Character CNN .+ LF Co-attention model .+ CF .+ LF .+ PV .+ CF .+ PV .+ LF

51.6 66.4 63.5 64.8 64.1 65.6 69.0 69.1

62.0 73.2 70.5 69.8 70.6 70.8 74.9 75.0

PV Phonetic feature vectors, CF Concept features, LF Language features

similarity in such a case would have to exploit phonetic similarity information in the context of cognates. The results for the cross concept evaluation tests are listed in Table 10.6. It is observed that the CoAtt model is able to reach close to the performance of the CNN-based models. With the phonetic feature vectors and extra Language features, the model performs slightly better than the baselines. We note that the initialized embeddings consisting of phonetic feature vectors are useful for this task as compared to the randomly initialized cases. The lower performance values on the cross-concept task as compared to the crosslanguage task can be associated due to the nature of this experiment. Even for a human historical linguist, this task is unnatural as traditionally a new word is assigned to a cognate class after it is compared against words of the same meaning that are known to be cognates. Therefore, there is sufficient information about that concept available to a human being that is performing this task.

Hindi–Marathi Domain Experiment We also applied our CoAtt model to the domain of Hindi–Marathi. The model used was trained on the Indo-European dataset with IPA transcription. It should be noted that the Indo-European database contains instances from Marathi, but it does not directly contain instances from Hindi. However, it does contain words from Urdu and Bhojpuri (Bihari) which are also languages closely related to Hindi and share many vocabulary items with Hindi. We used a Hindi–Marathi parallel corpus downloaded from TDIL. This dataset provides a large part of the vocabulary from both the languages to search for cognates. The corpus contains sentences from Hindi–Marathi that are POS tagged and transcribed in Devanagari. We specifically extracted word pairs from each sentence with the NOUN and VERB tags. Since the sentences are not word aligned, we extracted

10 Cognate Identification to Augment Lexical Resources for NLP

169

candidate word pairs for testing by choosing the first word with the same tag in either sentence as the candidate pair. The words were converted from Devanagari to IPA using a rule-based system and finally fed into the model. We extracted 16 K pairs from Nouns and 9K pairs from Verbs. Our model does a fair job of aligning similar word pairs that are possibly cognates. We tested the performance of the model by randomly sampling 50 word pairs each from NOUNs and VERBs and manually annotating them. We found that our model gives an 80% accuracy on Verbs and 74% accuracy on Nouns. The model is able to find word pairs with a common stem without the need of lemmatization. In the case of verbs, it can be observed that the model is able to see through the inflections on the verbs to predict the pairs with similar stems as cognates. This implies that the model can be used to find cognate pairs across closely related languages in order to share resources that are missing in one of the pairs. As an example, lexical resource sharing between Urdu and Hindi was carried out for verb subcategorization frames (Bhat et al., 2014).

Analysis Concept Wise Performance We examined the performance of the models over individual concepts in the test set samples for the Indo-European dataset. In Table 10.7, we compare the performance of our model with CNN and the subsequence models across various part-of-speech categories. We find that the performance of CoAtt is more consistent throughout the categories as compared to the more varied distribution of the other models. Table 10.8 also examines particular concepts which consist of words that are lexically divergent. For example, Hvad in Danish and Que in Spanish are cognates which both belong to the concept WHAT. These concepts are also more challenging because they contain a large number of cognate classes with only a handful of positive cognate pair examples. This results in a bias in the training data due to the presence of negative samples. For these concepts, we find that the CoAtt model is able to achieve high scores, but in comparison, both the subsequence model and the CNN model perform poorly. For the most challenging concepts, e.g., AT, IF, BECAUSE, GIVE, the CoAtt model is still able to perform almost as well as the CNN model, whereas the subsequence model is worse due to the almost zero overlap of subsequences.

170

S. Kumar et al.

Table 10.7 F-scores for various model on different POS categories CoAtt CNN POS Noun Pronoun Verb Adverb Adjective Preposition Determiner

0.80 0.73 0.82 0.76 0.80 0.63 0.85

0.74 0.52 0.76 0.63 0.65 0.59 0.74

Subseq 0.66 0.22 0.62 0.50 0.60 0.30 0.38

CoAtt Co-attention model, CNN Phonetic CNN model, Subseq Gap-weighted subsequence model Table 10.8 F-scores for various model on different concepts CoAtt CNN Concept WHAT THERE HOW WHERE WHO IN GIVE AT IF BECAUSE

0.97 0.86 0.83 0.80 0.78 0.70 0.56 0.50 0.46 0.33

0.39 0.78 0.27 0.43 0.45 0.64 0.56 0.50 0.33 0.00

Subseq 0.20 0.17 0.16 0.05 0.04 0.17 0.31 0.29 0.00 0.00

CoAtt Co-attention model, CNN Phonetic CNN model, Subseq Gap-weighted subsequence model

Transcription Tests The Indo-European dataset, as mentioned earlier, is available in two different transcriptions. The dataset is transcribed in ASJP like Austronesian and Mayan and is also transcribed in IPA, which is a much finer phonetic representation as compared to ASJP. Table 10.9 shows the results of the CoAtt model on the Indo-European dataset in either transcription. Note that neither of these models includes the phonetic features vectors since we only have these defined for the ASJP character vocabulary and not the IPA. It is found that the overall performance is not really affected by the difference in transcription. Thus, the finer IPA transcription does not give any immediate added advantage over using the ASJP transcription. We note that there is a difference in the performance of the model in either transcription on different concepts. As an example, Table 10.10 shows example test word pairs from the concept SWIM for different languages in both transcriptions. All these pairs are true cognates. It is observed that for all of these samples, the IPA model falsely predicts negatives, whereas the ASJP model is correctly able to predict them all as cognates. Here, the finer representation in IPA interferes with the

10 Cognate Identification to Augment Lexical Resources for NLP

171

Table 10.9 Cross-language evaluation transcription tests Indo-European (ASJP) Indo-European (IPA) Model F-score AUC F-score AUC CoAtt CoAtt .+ CF

83.8 83.5

89.2 90.5

82.2 82.1

89.1 90.7

CF Concept features

Table 10.10 Sample word pairs from the concept SWIM from the Indo-European dataset. All samples are cognate pairs ASJP IPA Word A Word B Word A Word B swim sinda swIm ’sInda ˚ .swim zwem3n swIm zwEm@n svim3n swIm SvIm@n swim swim swem3 swIm ’sVFm:@ swim sima swIm ’sim:a

model’s judgment and it is not able to pick up the correspondence correctly. It is also interesting to note that for all of these samples, adding the concept feature of SWIM in the CoAtt .+ CF model makes it predict all of the samples as cognates. Thus, perhaps adding the concept features signals the model to relax the degree of overlapping of phonemes for this case and become less strict in predicting cognates.

Discussion There are several interesting insights that we have observed from our experiments. Primarily we find that the co-attention model is effective at the character level for the cognate classification task. The model was successfully adapted from the word level model used for RTE and worked well in the domain of the character-level task. The attention layer especially is very important for the model, as it is found that the LSTM by itself is not able to successfully extract enough information from the words for cognate identification. We found that the size and properties of the dataset also affect the training and performance of the models. For a small dataset like Mayan, joint learning of models proves beneficial as information drawn across datasets overpowers the lack of data. We find that additional features that provide information about the word semantics are also useful for improving the models. Concepts belonging to the class of prepositions or determiners tend to have a greater frequency of use and have evolved to become more lexically divergent in comparison to nouns and adjectives. These

172

S. Kumar et al.

models can exploit this variation in the lexical information by introducing concept features in the form of word embeddings. The transcription of the data into a finer character vocabulary does not have any added advantage in the overall performance, but it is observed that there are cases where the coarser representation helps the model to realize correspondences.

Conclusion The task of cognate discovery is benefited by finding rich hidden representations for words. It is found that simple surface similarity measures like common subsequencebased features fail to capture the range of phonological evolution and sound correspondences. When there is greater divergence in the word structures and the character sequences, these methods fail to capture any similarity between the words. Deep learning models like LSTMs are able to exploit such features to make better judgments on the prediction task. In comparison to the CNN models that we explored, LSTMs with co-attention also seem to be better at discovering correspondences for more lexically divergent concepts. Character-level models, in particular, seem to be a good choice for tasks that contain “noise”, e.g., words that contain similar sequences that may not be contiguous. At the same time, a task like cognate discovery also involves knowledge of sound and meaning change. When these models are enhanced with phonetic and lexical semantic information, we find that they perform even better. This suggests that such models may be useful for detecting other types of linguistic divergences, e.g., morphophonemic changes such as vowel harmony, or spelling variations in web text. Cognate discovery itself has been leveraged for other NLP applications such as machine translation (Kondrak et al., 2003) and has also been used in other areas such as bioinformatics to identify confusable drug names (Kondrak & Dorr, 2004). Going further, it could also be used for finding similarities between language pairs that are diachronic variants, e.g., Old Marathi and Modern Marathi in order to share lexical resources or tools.

References Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. In ICLR 2015. Bhat, R., Jain, N., Vaidya, A., Palmer, M., Khan, T., Sharma, D., & Babani, J. (2014). Adapting predicate frames for Urdu PropBanking. In Proceedings of the EMNLP 2014 Workshop On Language Technology For Closely Related Languages and Language Variants (pp. 47–55). http:// www.aclweb.org/anthology/W14-4206 Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on EMNLP. ACL.

10 Cognate Identification to Augment Lexical Resources for NLP

173

Brown, C., Holman, E., Wichmann, S., & Villupillai, V. (2008). Automated classification of the world’s languages: A description of the method and preliminary results. Language Typology and Universals. Greenhill, S., Blust, R., & Gray, R. (2008). The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary Bioinformatics, 4, 271–283. Hauer, B., & Kondrak, G. (2011). Clustering semantically equivalent words into cognate sets in multilingual lists. In IJCNLP (pp. 865–873). Citeseer. Inkpen, D., Frunza, O., & Kondrak, G. (2005). Automatic identification of cognates and false friends in French and English. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2005) (pp. 251–257). Kondrak, G., & Dorr, B. (2004). Identification of confusable drug names: A new approach and evaluation methodology. In Proceedings of the 20th international conference on Computational Linguistics (p. 952). Association for Computational Linguistics. Kondrak, G., Marcu, D., & Knight, K. (2003). Cognates can improve statistical translation models. In Proceedings of HLT-NAACL 2003 (Short papers, Vol. 2, NAACL-Short’03, pp. 46–48). ACL. https://doi.org/10.3115/1073483.1073499 List, J. M., Lopez, P., & Bapteste, E. (2016). Using sequence similarity networks to identify partial cognates in multilingual wordlists. In Proceedings of the ACL 2016 (Vol. 2: Short Papers, pp. 599–605). Berlin. http://anthology.aclweb.org/P16-2097 Luong, M., Pham, H. & Manning, C. Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1412-1421. Lisbon (2015). Mikolov, T., Karafiát, M., Burget, L., Cernock`y, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Interspeech (Vol. 2, p. 3). Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). http://www. aclweb.org/anthology/D14-1162 Rama, T. (2015). Automatic cognate identification with gap-weighted string subsequences. In Proceedings of the 2015 Conference of NAACL: Human Language Technologies (pp. 1227–1231). Rama, T. (2016). Siamese convolutional networks for cognate identification. In Proceedings of COLING 2016 (pp. 1018–1027). Rocktäschel, T., Grefenstette, E., Hermann, K. M., Kocisky, T., & Blunsom, P. (2106). Reasoning about entailment with neural attention. In ICLR. Simard, M., Foster, G. F., & Isabelle, P. (1993). Using cognates to align sentences in bilingual corpora. In Proceedings of the 1993 Conference of CASCON (pp. 1071–1082). IBM Press. Singh, A. K., & Surana, H. (2007). Study of cognates among south Asian languages for the purpose of building lexical resources. Journal of Language Technology. Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Grue Simonsen, J., & Nie, J. Y. (2015). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 553– 562). ACM. Wichmann, S., & Holman, E. (2008). Languages with longer words have more lexical change. Approaches To Measuring Linguistic Differences. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International Conference On Machine Learning (pp. 2048–2057). Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 21–29).

174

S. Kumar et al.

Shantanu Kumar studied electrical engineering as an undergraduate student at IIT Delhi. He presently works at Productiv. Ashwini Vaidya is a faculty in the Department of Humanities and Social science. She teaches courses in Computational Linguistics and Cognitive Science. Her research interests are in computational lexical semantics using methods from NLP and psycholinguistics. Sumeet Agarwal teaches in the areas of Electrical Engineering, Artificial Intelligence, and Cognitive Science at IIT Delhi. His research interests are focused around the use of machine learning and statistical modelling techniques to better understand the structure, function, and evolution of complex systems, in both the biological and the social sciences.

Part V

Human Factors

Chapter 11

Psychophysiological Monitoring to Improve Human–Computer Collaborative Tasks Daniel N. Cassenti and Chou P. Hung

Abstract In today’s world, humans increasingly use technology to perform tasks. The field of human factors uses principles of human sciences to determine how to ensure that technology works with human capabilities rather than against them. We argue that the next step in human factors is adaptive automation where computers monitor the human (1) to determine when extra measures are required to adapt to the user and enable better performance and (2) to leverage human signals to enhance continuous learning and adaptation of the automation. Cassenti et al. (2016) described four ways that a computer can trigger adaptive aids from user monitoring: user-initiated action, concurrent performance, physiological variables, and cognitive modeling. Here, we focus on physiological monitoring as a key to successful human–computer collaboration, because physiological variables are continuously measurable and are outside of the user’s error-prone subjective judgment. We discuss what physiological variables can help determine user states and some examples of adaptive aids that can intervene in real-world tasks. Also, we review the concept of cognitive modeling based on physiological monitoring and describe how recent advances in signal processing and machine learning algorithms increase the relevance of cognitive modeling for the evolution of core artificial intelligence. Keywords Adaptive automation · Physiological signals · Human factors · Cognitive modeling

Summary We use computerized devices so routinely that it is becoming increasingly difficult to think of tasks that do not require them. In this state of constant digital-device interaction, it would benefit humanity to focus research on human D. N. Cassenti (B) US Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783, USA e-mail: [email protected] C. P. Hung US Army Research Laboratory, 7101 Mulberry Point Road, Aberdeen Proving Ground, Aberdeen, MD 21005, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_11

177

178

D. N. Cassenti and C. P. Hung

performance in collaboration with digital devices. We focus here on adaptive automation and human–Artificial Intelligence (AI) interaction, where AI intervenes with digital aids to help a user who is struggling to perform well. Though there are many ways by which digital devices can measure struggle with performance, we focus on physiological measures, which provide continuous measure and do not depend on subjective judgment or modeling and simulation steps that do not capture concurrent performance. We review technology requirements that ought to be considerations for building adaptive automation, then discuss different physiological measures including: pupil diameter, heart rate variability, electrodermal activity, and electroencephalography. We conclude by recommending a course of empirical investigations to develop a new field of adaptive automation.

Introduction In the modern world, when we think of typical work tasks, we often come up with activities that require some type of digital computational device. Whether that activity is with a computerized automobile, a cell phone, a laptop computer, or any number of devices, we as a society have largely shifted from pure manual performance to human–computer collaboration. As such, establishing productive communication of the user’s intention to the device and the results of digital activity to the user is of utmost importance.

Multi-level Cognitive Cybernetics and Adaptive Automation The reciprocal relationship between humans and digital computational devices is essential for understanding how the two sides collaborate to achieve tasks. A useful model for this is Multi-Level Cognitive Cybernetics (MLCC; Cassenti et al., 2016). Cybernetics, in the human sciences, is the study of how people interact through closed feedback loops with systems and the environment to accomplish goals. Cassenti et al. (2016) described an approach to extend cybernetics research by using Multi-Level Cognitive Cybernetics (MLCC), which can lead to future improvements in humantechnology interactions. The MLCC approach builds on previous theories of cybernetics by specifying levels of analysis (i.e., performance, physiological measures, and computational modeling), to allow for a robust and systematic investigation of human–technology adaptation. The basis of cybernetics is closed feedback loops among systems; however, this is a broad research direction. We narrow this down by aiming to study how human– technical system performance changes with adaptive automation (i.e., technology that adjusts its behavior based on user characteristics and behavior). In our view, systems represent human users and technology, which work symbiotically to achieve a common goal. In order to investigate performance optimization, human behavioral

11 Psychophysiological Monitoring to Improve Human–Computer …

179

responses and individual differences should be used as an input that helps technology adapt to better support human–technology performance (see Parasuraman et al., 2000; Steinhauser et al., 2008), which will, in turn, facilitate mutual adaptations between humans and technology that will improve overall human–machine teaming performance over time. This adaptive closed feedback loop can be further broken down into two smaller open loops. Figure 11.1 depicts this theoretical stance. In this conceptualization, the human is represented by cognition with the following stages : (1) perception to process the sensations from the technology; (2) information processing to mentally transform perception into goal-oriented options; (3) decisionmaking to select among these options; and (4) motor control to implement the chosen solution (see Kaber et al., 2005). Similarly, technology has four stages including (1) input stream designed to incorporate the behavior of the user; (2) digital computation to transform the input into meaningful units; (3) assessment of input (whether subjective, performance, or physiological data) to understand whether adaptive automation must be triggered; and (4) triggering or not triggering adaptive automation algorithms. In the present chapter, we focus on the information-processing and decisionmaking stages of the MLCC model, thus the tasks that have been selected for the following experiments reflect these cognitive stages and the adaptive aids that target them. Adaptive automation occurs in a feedback loop with the user, where there is a reciprocal relationship between a human and technology, as each act upon and in response to the other. That is, each one uses input received from the other to adjust their behavior, and thus their output back to the other system reflects the input received. Autocorrect in spelling is an example of adaptive automation. Technology offers output, spelling suggestions, to a human, who will, in turn, provide feedback about what output will be more useful for improved future performance, and the technology will use the human’s output to improve its next output. Autocorrect is a simplified example of adaptive automation. Word processing software can simply monitor words as the user types them, then indicate a misspelling when it occurs. The program has a concrete marker of when to and when not to intervene. As a rule, we cannot expect that every time human–computer collaboration occurs, there will be an easy deciphering of when performance is poor or good, which are both important to assess. A computerized aid that is deployed too late or

Fig. 11.1 Theoretical formulation of the stages and interaction between humans and technology with adaptive automation from the Multi-Level Cognitive Cybernetics approach described in Cassenti et al. (2016)

180

D. N. Cassenti and C. P. Hung

is poorly designed is an obvious problem, as poor user performance will continue to cause errors. However, intervening when the user does not need it will also lead to problems. A poorly designed “aid” could maladaptively act as an interruption or distraction, reallocating the user’s limited attentional resources to something that will not help with the task (Larkin et al., 2020; Parasuraman and Riley, 1997). For applications such as aided target recognition (Hung et al., 2021), careful consideration of cognitive, computational, and physical constraints is necessary to ensure that the technological aid itself does not become a burden.

Means of Communicating Human Performance Factors There are multiple ways in which the human user’s behavior may be conveyed to a computerized device. These include manual manipulation, speech recognition, or even gestural commands, which are the typical modes of communication. To discuss all the ways that human user behavior may be interpreted by a computer is beyond the scope of this chapter. Here, we focus on the concept of computerized aids for increasing human performance. Cassenti et al. (2016) labeled four types of communication that could be used to convey difficulty with a computer-enabled task that could lead to the activation of the aid. The user could simply let the computer know that she or he is having difficulty by pressing a button or in some other way communicate when help is needed and when it is no longer needed. A second way is for the computer to track some ongoing measure of performance, then calculate when that performance drops below a preset threshold for some given interval. Once the threshold is crossed, the computer activates the aid. Another way is for it to collect measures on the user for different sub-tasks ahead of the task. Computer analysis of these measures can trigger the aid when those difficult sub-tasks must occur. The fourth way is to record physiological measures that can be tied to task difficulty (e.g., indicators of stress or fatigue). Unfortunately, each of these measures is flawed in some way. First, humans must rely on metacognition to know when their performance has fallen, and metacognition is not always reliable. We often believe we are doing well when we are not and vice-versa. A user-initiated call for help or to stop helping would be unreliable at best. Second, measures of performance would work well; however, there are few situations where a running indicator of performance is feasible. Often, our tasks do not demonstrate poor or good performance until the final response is made. Third, modeling sub-task performance is also limited at best. It is not always clear how to divide an overall task into components, so the sub-tasks may not be optimal representations, nor is it always clear when the sub-task is necessary at any given moment in time. Of the factors that could be used to trigger adaptive, only physiological indicators remains as potentially optimal. Generally, physiological variables operate constantly, and therefore, have the same benefit as running performance measures. They also do not have the same drawback of being dependent on the task, as they

11 Psychophysiological Monitoring to Improve Human–Computer …

181

constantly produce outputs no matter what the situation. The problem with physiological measures is two-fold. First, there is limited consensus on what cognitive variables are associated with given physiological variables. Second, it is difficult to measure certain physiological variables. For example, though pupil size may be a reliable indicator of mental workload (Batmaz and Ozturk, 2008), luminance differences in the ambient light can cause pupil contraction or expansion. These diameter changes would then be assumed to reflect task variables when no such association is justified. Fortunately, there are ways to overcome these limitations. First, the limited consensus on what and how physiological measures indicate about cognitive state does not prevent them from being useful in specific ways. For example, whether heart rate variability indicates stress or arousal may not be very important in the end as too little or too much of either can still cause decreased performance (van der Lei et al., 2016). So, it does not matter which one the measure indicates as long as low or high thresholds cause the same interpretation that a computerized aid is necessary. For pupil size and luminance variability, there may be ways to reduce or correct luminance artifacts, e.g., via head-mounted devices and eye tracking. In the next section, we will discuss what it would take for the technology to receive and process user physiological indicators.

Technology Requirements Artificial Intelligence (AI) enabled systems are often used to reduce mental workload, and they are increasingly being used to augment human capabilities. Technological capabilities to sense and process a user’s cognitive state and intervene with adaptive automation aids could improve both bottom- and top-line human–computer collaborative performance. To act in an adaptive way requires more than traditional computational processing. Instead, we will focus on AI through the rest of this chapter with the intention to introduce new technology to improve aid triggering that is not presently known to exist. A developer would need to outline a plan for how technological innovations can be used to (1) sense pertinent physiological data from the user; (2) transform these data into meaningful digital signatures; (3) detect and set cutoffs to determine whether the physiological states indicate a need for cognitive intervention to aid goal completion in the form of adaptive automation; and (4) provide solutions to how AI can aid a relevant task (e.g., convoy route-planning, threat assessment from military intelligence, command and control, etc.) when the user is struggling as indicated by poor performance or physiological indicators. Physiological measures are often associated with different cognitive processes, and therefore, may act as triggers for adaptive automation meant to aid the user. Physiological measures are objective (i.e., not involving subjective opinions of the user), and therefore, plausible options for AI triggers. The goal of this technology would be to develop a system that can: (1) sense output from the user on any number of physiological variables; (2) assess distinct

182

D. N. Cassenti and C. P. Hung

cognitive variable states for each; (3) set quantitative thresholds for each of the variables (via physiological assessment) at which the system engages or disengages adaptive automation, and (4) demonstrate improved task performance via empirical testing. To incorporate physiological measures, the technology must be designed with the capabilities to measure, transfer, and interpret specific types of data. For example, measures of heart rate variability would require the use of a heart monitor that can transmit data in real time and technology that can analyze and interpret these data as soon as they are received. Heart monitors can detect heart rate in real time, but to use those readings for real-time use of adaptive automation, e.g. focusing on the phase of the heartbeat in addition to the rate, does not appear to exist at this point. Technology that includes wearable physiological sensors like a heart rate monitor could be adapted into the design of the contractor’s software–hardware systems. It is essential that the human and machine jointly adapt to a common understanding of the situation and optimize for joint human-AI capability. In cases of faster-thanhuman operational tempo, it is even more important that the AI adapts quickly to the human, perhaps before the human can express the response.

Physiological Indicators of Performance To meet the goal of creating adaptive automation around physiological measures, we need to isolate physio measures that can be sensed, processed, and used as triggers to aid the user only when the user needs help. The research question is: “How can physiological indicators from a user be used to improve AI-user collaborative performance?” We will review four physiological variables, attempting to answer this question by indicating what cognitive state is related to the variable and what would need to be done to use the variable.

Pupil Diameter We start with pupil diameter. The diameter of an individual’s pupil has long been established as a non-verbal cue reflecting the amount of interest the individual has in a situation. Cognitive science research into pupil diameter (see Naicker et al. 2016) has refined this point of view by documenting evidence that shows a relationship between pupil diameter and mental workload (i.e., the amount of mental effort that must be expended in the process of completing a task, see Batmaz and Ozturk, 2008; Pfleging et al., 2016; Marinescu et al., 2018). When the pupil is wide, the individual is experiencing a lot of mental workload, and when the load is less, the pupil is narrow (Cohen Hoffing et al., 2020; Touryan et al., 2017). A pupilometer measures pupil diameter and is an inexpensive tool. Theoretically, adaptive automation using a pupilometer could trigger an automated aid when the pupil diameter increases over a

11 Psychophysiological Monitoring to Improve Human–Computer …

183

threshold, thereby assuming that the workload is too high and the user requires help. When the diameter decreases past the same threshold, the aid could be withdrawn. The difficulty with measuring pupil diameter is that workload is not the most common reason why pupil diameter changes. It is most responsive to variations in environmental light intensity (Pfleging et al., 2016). Those light values change all the time with human-made light sources cycling in intensity based on factors like an electric current and driving with intermittent shading from trees. The way around this experimentally is to control the intensity of the light in the lab when testing adaptive automation involving pupil diameter, e.g., by seating the human at a console. However, this is not a full solution, especially for the use of adaptive automation outside of light-controlled environments, as changes in gaze can also lead to large changes in retinal illumination and pupil size (Hung et al. 2020). The effect of such gaze-dependent changes in retinal light intensity would have to be measured close to the user’s eyes, e.g., via a head-mounted augmented reality device with eye tracking, so that the corresponding changes in pupil diameter can be corrected by the system. It remains an empirical question whether approaches like this will be sufficient to make pupil diameter a reliable measure for adaptive automation.

Heart Rate Variability Heart rate variability is the next physiological measurement under consideration. This measure goes beyond heart rate, which by itself does not indicate a cognitive state. Instead, changes in heart rate (acceleration or deceleration) tell us when there are changes in cognitive state (Goldstein et al. 2011). There has been some debate about what changes in heart rate indicate. The most compelling case was made by Thayer and Lane (2009), who argue that heart rate variability relates to emotional regulation. A high heart rate variability (i.e., heart rate changes often) indicates an engaged parasympathetic nervous system or well-regulated emotions, while a low heart rate variability indicates an engaged sympathetic nervous system and poorly regulated emotions. This explanation could be confusing as one would think that a lot of heart rate variability would indicate instability, and therefore, a lack of regulation. However, this framing is misleading. Driving a task toward fulfillment requires constant adjustment as some steps require more blood flow to the brain and some less. An individual experiencing high emotional regulation is able to regulate their emotions to respond with the right amount of blood flow, and therefore, shows high heart rate variability. This also indicates resourceful attention allocation as the individual can attend well to the requirements of the task (see Mehler et al., 2012). We propose that a low heart rate variability could trigger adaptive aids and a high rate variability could stop any engaged aids. Heart rate variability suffers from two problems. First, just like pupil diameter, there is a second factor that is more influential on the measure than emotional regulation. The brain is not the only organ that requires more or less blood flow as tasks proceed. Voluntary muscle movement also requires more blood flow. An adaptive

184

D. N. Cassenti and C. P. Hung

automation user may become fidgety and keep adjusting in their seat, which would be interpreted by the automation as good heart rate variability when the user could be struggling with the tasks. During testing of the adaptive automation, the researcher could require that the test participants minimize bodily movement, but this could not be enforced outside of the lab. This would need to be mitigated by sensors on the body to indicate when the user is moving and these times would not count in the aid-trigger calculations, or by careful construction of the algorithm, e.g., to detect changes in the phase of the heartbeat, and avoid such artifacts. The second problem is the timescale. By measuring the change in a measure rather than the measure itself, it is unclear how long intervals have to be to get accurate data about when emotional regulation has changed enough to justify the engagement or disengagement of an aid. How long is enough? When heart rate variability is low, does the time lag induced by temporal filtering limits the timeliness of the aid? A lot more empirical testing should be done before we can assume that heart rate variability will be a good indicator for the engagement of adaptive aids.

Electro-dermal Activity Electrodermal activity (EDA) is the third physiological measure we will consider. As the name implies, it is the amount of electrical activity on the skin. “Sweaty palms” may be the most frequent response to the question, “Name one thing associated with feeling stressed.” However, increases in EDA are not just in response to stress. We may get the same response to feelings of excitement and enthusiasm. EDA is therefore better associated with a cognitive state of arousal (Critchley, 2002; Picard et al., 2016). For this measure, it may be better to set two thresholds instead of just one. States of arousal plotted against task performance often show that the best performance is with moderate amounts of arousal. If there is too much arousal, it is difficult to keep focused on a task as the arousal itself overwhelms attentional resources. On the other side of the spectrum, too little arousal indicates that focus is waning on the task as the user no longer feels engaged by it. In either case, an adaptive aid would likely improve performance and therefore should be triggered by a low and high threshold of EDA. As with the first two measures, there are also issues with EDA. Environmental temperature and humidity levels, as well as body temperature and physical exertion, often change EDA independent of cognitive state. Again, laboratory conditions can be kept relatively consistent, but any fluctuations outside of the laboratory would need to be measured and the system would need to factor in those variables, including possibly differential measurements across the body, into its thresholds. In addition, there are delays between the incidents that would change arousal levels and the EDA response of one to three seconds. In those delays, the need for intervention may wane but the automation would lack the knowledge to adjust. As in heart rate variability, EDA would need to undergo more empirical testing with adaptive automation to determine its viability as an adaptive automation physiological measure.

11 Psychophysiological Monitoring to Improve Human–Computer …

185

Electroencephalography Scalp recording of brain activity (i.e., EEG) is promising for tasks in which the human is in a controlled environment (e.g., at a console), but may also be effective in less controlled environments (Bradford et al., 2019) via the development of dry electrodes. Diverse signals have been linked to different types of brain activity, and one of the strongest and most reliable signals is the P300 signal indicating recognition of a searched target (see Cassenti et al., 2011). It is thought to be driven by signals propagating back from the frontal cortex to the sensory cortex, matching a working memory trace to the sensory input. Another commonly measured signal is that of power spectral frequency, which can correspond to alertness versus drowsiness or to divided attention (Wang et al., 2018). EEG coherence has also been used to measure whether students in a class share a common processing of the lesson. Sensorimotor response potentials are also reliable measures in controlled laboratory settings. Recently, the power of EEG signals has been boosted by ongoing advances in machine learning. A shallow neural network, together with signal preprocessing, has been shown to enable the reliable extraction of multiple types of EEG signals across people (Lawhern et al. 2018). EEG signals have also been used to directly train neural networks, capturing the human’s tacit understanding of target versus nontarget appearance (Lee and Huang, 2018; Solon et al. 2017). Under more controlled settings, and with additional processing of EEG dynamics, it may be possible to extract signals related to reward expectation and surprise, which could be helpful for the development of reinforcement learning algorithms for more complex decision tasks that are not possible with animals. This shows that an integrative approach, combining weak and somewhat unreliable EEG signals with machine learning and with ongoing advances in animal neuroscience, is leading to increasingly powerful methods to tie real-world cognitive function to the underlying brain processes. The problem with this approach is that we are not quite there yet and eventual outcomes are still uncertain. EEG is susceptible to artifacts where any ambient energy reaches the electrodes on the scalp. We need this hybrid approach because EEG alone is too unreliable.

Adaptive Automation Empirical Research A challenge facing developers attracted to the idea of building adaptive automation is that the concept is largely theoretical at this point. There is no defined practice of adaptive automation of human factors. Even at a more global scale, there is little formality or robustness in human–computer collaboration studies (though there are exceptions, such as Kaber and Endsley, 2004; Kaber and Riley, 1999). A user test often consists of asking a few expert users the developer knows to try out the software and get opinions about what may be wrong or right with it. Yet, expert users are not generally going to perform poorly given their expert status. In addition, experts are

186

D. N. Cassenti and C. P. Hung

so well trained in their craft that they have difficulty thinking from the vantage point of a novice. Certainly, utility can be derived from using expert opinions in human factors research, but in cases where adaptive automation is helping those who are not adept at a human–computer collaboration task, focusing solely on expert feedback would not be a very useful method for adaptive automation development. The overall goal for the field of adaptive automation is to conduct basic research studies of the conditions under which adaptive automation supports optimal performance, a goal that requires both preventing overreliance on automation and developing targeted uses of it. Three aspects of adaptive automation should be targeted: Aim 1: Investigate the timing and threshold requirements of automated aids. Automation is meant to aid performance, yet, if improperly deployed, it can act as a performance impediment, ultimately causing more stress and workload (see Minotra and McNeese, 2017; Bolia, 2004). Four possibilities of automation, timing should be investigated: (1.1) always active, (1.2) never active, (1.3) triggered by user behavior, and (1.4) triggered by high task demands (for an investigation of adaptive automation timing, see Kaber and Riley, 1999). Aim 2: Research user behaviors and signals that may act as optimal triggers for automated aids (i.e., 1.3). Following the framework for research in adaptive automation proposed by Feigh et al. (2012), we propose to use three types of behavioral triggers for adaptive automation: (2.1) performance-threshold triggers (i.e., automation triggered when user performance drops below a specified threshold; Year 1), (2.2) a physiological trigger (i.e., automation triggered when heart rate variability, an indicator of stress and possibly emotion regulation (Quigley and Feldman Barrett, 1999), predicts a performance decrease), and (2.3) another physiological trigger, e.g. pupil diameter (PD), an indicator of stress and potentially workload (Naiker AnoopkumarDukie et al. 2016). Aim 2 is complimentary to Aim 1 as the focus is narrowed from the set of activation timing conditions to solely behavioral or physiological triggers. Should the category of behavioral or physiological triggers (i.e., performance, heart rate variability, and pupil diameter) be found not to optimize performance, Aim 2 would remain an important point for investigation, because the specific aids or tasks used in the experiments may be responsible for less-than-optimal performance results, whereas the same may not hold for alternate tasks or aids. Thus, an investigation of Aim 2 is necessary regardless of the findings for Aim 1. Aim 3: Determine appropriate stages of cognitive processing for adaptive automation We propose tasks with simplified perceptual and motor control processing, allowing investigation of the effect of adaptive automation on (3.1) information processing and (3.2) decision-making. While Kaber et al. (2005) found that the motor control stage of cognition benefited the most from adaptive automation, this is not always the case (for a review see Steinhauser et al., 2008). Parasuraman et al. (2000) indicate that adaptive automation can aid in a sensory processing task, perceptual and working memory-based tasks, decision-making tasks, and response selection, when implemented properly within the technology. With greater prevalence in the adaptive automation literature on perceptual and motor control stages and less research showing how aids can help with tasks that focus on cognitive stages between

11 Psychophysiological Monitoring to Improve Human–Computer …

187

perception and motor control, we will focus on the less-studied decision-making and information-processing stages.

Conclusions With human–computer collaboration becoming the standard for how tasks are performed in the modern world, research goals in the social sciences should be increasingly aligned with studying humans as symbiotic users of technology. Throughout this chapter, we have maintained that adaptive automation, in which computers can adapt to a user’s cognitive state, can improve performance across a wide range of tasks and baseline performance levels. Although there are four options for performance communication, the most promising appears to be physiological indicators. We reviewed four of these and made it clear that each had their problems. We hope that the takeaway from this chapter is that we have a long way to go, but empirical testing and solution-oriented research aimed at developing reliable communication of physiological indicators and responsive activation of digital aids will develop smoother human–AI collaboration.

References Batmaz, I., & Ozturk, M. (2008). Using pupil diameter changes for measuring mental workload under mental processing. Journal of Applied Sciences, 8, 68–76. Bolia, R. S. (2004). Overreliance on technology in warfare: The Yom Kippur War as a case study. Parameters: US Army War College Quarterly, 34, 46–55. Bradford, J. C., Lukos, J. R., Passaro, A., Ries, A., & Ferris, D. P. (2019). Effect of locomotor demands on cognitive processing. Science and Reports, 9, 1–12. Cassenti, D. N., Gamble, K. R., & Bakdash, J. Z. (2016). Multi-level cognitive cybernetics in human factors. In K. Hale & K. Stanney (Eds.), Advances in neuroergonomics and cognitive computing (pp. 315–326). Springer. Cassenti, D. N., Kerick, S. E., & McDowell, K. (2011). Observing and modeling cognitive events through event related potentials and ACT-R. Cognitive Systems Research, 12, 56–65. Cohen Hoffing, R. A., Lauharatanahirun, N., Forster, D. E., Garcia, J. O., Vettel, J. M., & Thurman, S. M. (2020). Dissociable mappings of tonic and phasic pupillary features onto cognitive processes involved in mental arithmetic. PLoS ONE, 15, e0230517. Critchley, H. D. (2002). Book review: electrodermal responses: What happens in the brain. The Neuroscientist, 8, 132–142. Feigh, K. M., Dorneich, M. C., & Hayes, C. C. (2012). Toward a characterization of adaptive systems a framework for researchers and system designers. Human Factors, 54, 1008–1024. Goldstein, D. S., Bentho, O., Park, M. Y., & Sharabi, Y. (2011). Low-frequency power of heart rate variability is not a measure of cardiac sympathetic tone but may be a measure of modulation of cardiac autonomic outflows by baroreflexes. Experimental Physiology, 96, 1255–1261. Hung, C. P., Callahan-Flintoft, C., Fedele, P. D., et al. (2021). Low contrast acuity under strong luminance dynamics and potential benefits of divisive display augmented reality (ddAR). Journal of Perceptual Imaging, 4, 010501–010511.

188

D. N. Cassenti and C. P. Hung

Kaber, D. B., & Endsley, M. R. (2004). The effects of level of automation and adaptive automation on human performance, situation awareness and workload in a dynamic control task. Theoretical Issues in Ergonomics Science, 5, 113–153. Kaber, D. B., & Riley, J. M. (1999). Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics, 3, 169– 187. Kaber, D. B., Wright, M. C., Prinzel, L. J., & Clamann, M. P. (2005). Adaptive automation of human-machine system information-processing functions. Human Factors, 47, 730–741. Larkin, G. B., Geuss, M., Yu, A., et al. (2020). Augmented target recognition display recommendations. DSIAC Journal, 7, 28–34. Lee, Y., & Huang, Y. (2018, March). Generating target/non-target images of an RSVP experiment from brain signals in by conditional generative adversarial network. In Proceedings of 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (pp. 182–185). IEEE. Marinescu, A. C., Sharples, S., Ritchie, A. C., Sanchez Lopez, T., McDowell, M., & Morvan, H. P. (2018). Physiological parameter response to variation of mental workload. Human Factors, 60, 31–56. Mehler, B., Reimer, B., & Coughlin, J. F. (2012). Sensitivity of physiological measures for detecting systematic variations in cognitive demand from a working memory task: An on-road study across three age groups. Human Factors, 54, 396–412. Minotra, D., & McNeese, M. D. (2017). Predictive aids can lead to sustained attention decrements in the detection of non-routine critical events in event monitoring. Cognition Technology & Work, 19, 161–177. Naicker, P., Anoopkumar-Dukie, S., Grant, G. D., Neumann, D. L., & Kavanagh, J. J. (2016). Central cholinergic pathway involvement in the regulation of pupil diameter, blink rate and cognitive function. Neuroscience, 334, 180–190. Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 230–253. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics-Part a: Systems and Humans, 30, 286–297. Pfleging, B., Fekety, D. K., Schmidt, A., Kun, A. L. (2016). A model relating pupil diameter to mental workload and lighting conditions. In Proceedings of the 2016 Chi Conference on Human Factors in Computing Systems (pp. 5776–5788). Association for Computing Machinery. Picard, R. W., Fedor, S., & Ayzenberg, Y. (2016). Multiple arousal theory and daily-life electrodermal activity asymmetry. Emotion Review, 8, 62–75. Quigley, K. S., & Feldman Barrett, L. (1999). Emotional learning and mechanisms of intentional psychological change. In K. Brandstadter & R. M. Lerner (Eds.), Action and development: Origins and functions of intentional self-development (pp. 435–464). Sage. Solon, A. J., Gordon, S. M., Lance, B. J., & Lawhern, V. J. (2017, December). Deep learning approaches for P300 classification in image triage: Applications to the NAILS task. In Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies (pp. 5–8). NTCIR. Steinhauser, N. B., Pavlas, D., & Hancock, P. A. (2008). Design principles for adaptive automation and aiding. Ergonomics Design, 17, 6–10. Thayer, J. F., & Lane, R. D. (2009). Claude Bernard and the heart-brain connection: further elaboration of a model of neurovisceral integration 81–88 Touryan, J., Lawhern, V. J., Connolly, P. M., Bigdely-Shamlo, N., & Ries, A. J. (2017). Isolating discriminant neural activity in the presence of eye movements and concurrent task demands. Frontiers in Human Neuroscience, 11, 357. van der Lei, H., Tenenbaum, G., & Land, W. M. (2016). Individual arousal-related performance zones effect on temporal and behavioral patterns in golf routines. Psychology of Sport and Exercise, 26, 52–60.

11 Psychophysiological Monitoring to Improve Human–Computer …

189

Wang, Y. K., Jung, T. P., & Lin, C. T. (2018). Theta and alpha oscillations in attentional interaction during distracted driving. Frontiers in Behavioral Neuroscience, 12, 3.

Daniel N. Cassenti earned his Ph.D. from Penn State University in Cognitive Psychology in 2004. After one year as a post-doc at Army Research Laboratory, he became a civilian employee and had been working at ARL ever since. Dan has filled many roles including as BRIMS Conference Chair, Senior Co-Chair of the ARL IRB, and Technical Assistant to the ARL Director. He is currently the Cooperative Agreement Manager for the Army AI Innovation Institute. Chou P. Hung is a neuroscientist at ARL’s Humans in Complex Systems Directorate. His research topics include developing novel AIs for complex human–machine teaming and decision-making, visual perception, and neuromorphic computing. He received his Ph.D. from Yale University in 2002 is an adjunct professor at Georgetown University.

Chapter 12

Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why it matters Devpriya Kumar

Abstract Our interactions with the environment are often accompanied by instances of experience where we feel in control of our actions. This agentive experience of initiating and controlling our action is known as the “Sense of Agency” (SoA), it plays an important role in influencing our self-experiences. This experience is not necessarily a true reflection of an underlying causal link between our actions and their outcomes, rather, our brains can easily be fooled into having an agentive experience for actions we did not perform. Based on findings from various researchers investigating SoA, the chapter looks at how our understanding of the sense of agency can be applied in real life. I start by summarizing what is SoA and our understanding of its mechanism, focusing on the role of control in influencing SoA. The second part of the chapter presents case studies where SoA has been used to solve some applied problems including Human–Computer interface, automation technology and virtual reality immersion therapy. Toward the end, we speculate on the possible areas in which our understanding of the sense of agency can be applied. Keywords Agency · Control · Human–computer interaction · Assistive technology · Virtual reality

What is a Sense of Agency? The agency is the capacity to engage in actions associated with the conscious intentions and goals of individuals. This capacity to perform actions is closely tied with the experience of being the agent of the action (“the one in control”), often referred to as the sense of agency (SoA). To have a SoA, we need to have an intention to act. Action that are involuntary or are lacking conscious intentions have minimal sense of agency associated with them. Furthermore, the intentional action should be directed toward a change in the external environment, the individual achieves the intended outcome. D. Kumar (B) Department of Cognitive Science, Indian Institute of Technology, Kanpur 208016, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_12

191

192

D. Kumar

These outcomes that we associate with our actions can vary a lot in terms of their distality from the intention to act. These outcomes can be as concrete as me moving my finger to cause depression of a key on the keyboard to much more complex distal actions involving technological innovations, agriculture, implementation of societal norms, moral responsibility, etc. One can argue that the sense of agency might have been crucial for many of the achievements that set humans apart from other organisms. The reader might ask why we need the sense of agency, won’t it be possible to have the same outcomes without necessitating an experience of agency associated with them? There are two ways to answer that question. The first involves trying to look at the criticality of a sense of agency at various levels of analysis including the cognitive level, interpersonal level, and societal level. The other involves looking at situations where deviations in the sense of agency are observed (disorders of the sense of agency) and how these deviations impact individuals. Before we go ahead and talk about the role that sense of agency might play, we need to understand what is the SoA and what causes this sense of agency to occur in the first place. Even though at first glance SoA appears to simply be the phenomenology that we associate with intended actions, recent work has suggested that SoA might be much more complex. The phenomenology itself seems to be multifaceted, there seems to be a dissociation between the experience of having planned an action and the experience of having performed the action. A distinction has been made between Feeling of Agency and Judgment of Agency (Synofzik et al., 2008). While the first is a low-level conceptual feeling of being an agent, the latter arises in situations where we make explicit judgments about the agency. Studies have suggested that the two might be dissociated from each other (Gallagher, 2012) as well as measured (Ebert and Wegner, 2010; Kumar and Srinivasan, 2013). While the first is thought to be automated and dependent more on sensory-motor contingency, the JOA might actually refer to a more general-purpose causal attribution process not necessarily dependent on sensory-motor contingencies. Furthermore, the process of judging the causal efficacy of actions in influencing environmental outcomes is not infallible resulting in lapses in SoA (Moore, 2016). Additionally, the journey from the goal that an individual has to the actual implementation of the action and how it relates to the sense of agency seems to be non-trivial. Experiments have suggested that it is possible to have a sense of agency for actions that were not actually performed by the participants. Similarly, it is also possible not to have a sense of agency for an action that has been intentionally performed. This finding seems to be counter-intuitive to our basic understanding of causal agency. A simple way to determine whether the agent caused an action would be to follow the causal chain of events in the environment starting from the intention to act up to the outcome in the environment. The problem is that very often these causal chains of events are not readily accessible to us, for example, how pressing a switch might result in an increase in the luminance of the room. Even if these causal chains are very available for investigation, following such a chain of events and attributing causality in a multi-agent scenario, would be computationally challenging. In real life, we seem to be doing this causal attribution automatically without much effort. Hence, it becomes important to understand how

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

193

we effectively utilize the knowledge about the environment for proper attribution of agency.

Mechanism Underlying Sense of Agency Computational accounts of SoA have suggested that a “comparator” lies at the heart of a sense of agency (Synofzik et al., 2008). The basic idea involves comparing unconscious predictions about the outcome of actions with actual outcomes and linking prediction error with the experience of agency associated with the action. The model suggests that if and only if a motor command is sent, and the predicted sensory event occurs (or in other words, we initiate an action and produce the intended outcome), we would feel a sense of agency. Even without going into the details of how the comparator is cognitively implemented, it is easy to see that such a simplistic model fails to completely capture the computation that maps the interaction with the environment onto the experience of agency. One problem with assuming a simple comparator model is that it allows for a unidimensional signal capturing the success of the action (whether the predicted sensory outcome is achieved or not). In real life, our actions are multiscale, most of our actions have more than one outcome associated with them (Kumar and Srinivasan, 2014), and as such, it is not clear how our system might be able to resolve the experience. Other criticism relates to the experience itself (phenomenology of action). A comparator-based model would suggest that a sense of agency should be retrospective, i.e., it occurs after the action has been completed. Many studies have suggested that the experiences associated with intended actions might be anticipatory in nature, we feel in control of our actions even before outcomes are presented. Also, such models cannot explain how it is possible to have an experience of agency in the absence of intended actions. From an application point of view, the inability of the simple causal theory of action to explain how a sense of agency emerges allows a greater manipulability of a sense of agency with the help of several factors. This flexibility/plasticity in the sense of agency can be exploited to analyze and optimize the experience that individuals have while interacting with the environment, especially with technologically mediated interactions such as remote interactions with other agents, interacting with interfaces, etc. Keeping in mind the richness of the experience of agency and how it might be linked to other behavioral and cognitive aspects can help in different domains. Later, in this chapter, we will talk about some fields in which applications of SoA have been explored. Another account of the sense of agency tries to understand it as a result of a cognitive computation that tracks all the steps of a chain of events up to the outcome. In an experiment investigating the tracking-based account of agency (Caspar et al., 2016), authors found that when participants are asked to perform an action where the action and outcome are mediated by a robotic hand, a sense of agency was observed when not only was there congruency between action and outcome as well as the movement of the robotic hand. This account of agency resolves the problem of

194

D. Kumar

retrospective comparison that a simple comparator model faces. However, there are other studies that show that the causal chain of events is not always followed. Several studies have shown that congruency between action and outcome might be sufficient to elicit a sense of agency even when intermediate control is absent (Desantis et al., 2011). A second issue that even the comparator model of agency cannot handle is that an action might not have a single outcome associated with it. Rather, an action can result in a hierarchy of outcomes that differ in terms of spatiotemporal distality between action and outcome. The event-control framework provides a way to reconcile the idea of multiscale effects with a sense of agency. According to the framework, a sense of agency rather than being continuous is plastic in nature and depends on the amount of control exercised by the agent at a given point in time. The relationship between exercised control/perceived control and a sense of agency is not a novel one (pacherie, dewey). Given that there are multiple effects that can occur as a result of the action, the system exercises control simultaneously at multiple levels differing in terms of the spatiotemporal distality of the effect. Sense of agency is such a framework that is conceptualized as being plastic and being attached to different levels of control at different points in time. At any point in time, the sense of agency will be attached to the highest level at which control is exercised (Jordan, 2003). There is some empirical evidence to support the idea, experiments affording control at multiple levels show that participants’ sense of agency depends on lower more proximal levels of control only when control at higher levels is not exercised. In conditions where higher level control is exercised, the lower-level control doesn’t matter (Kumar and Srinivasan, 2014). The effect is found for both explicit (Kumar and Srinivasan, 2014) and implicit sense of agency (Kumar and Srinivasan, 2017). The significance of the framework is that it allows for an understanding of the sense of agency situated within and being modified by the physical environment as well as other agents with whom we interact. The sense of agency then is not just limited to a retrospective experience of having performed the action but serves the important purpose of maintaining and achieving abstract distal goals, while performing actions that are constrained by this larger level goal.

Importance of Sense of Agency Apart from the knowledge that he/she is the causal agent underlying a consequence in the environment, SoA has been linked also with several other cognitive and psychological processes. The sense of agency has been strongly associated with one’s sense of identity and sense of control. It has also been linked to higher aspects of the sense of self such as self-image. It has been argued that the sense of agency not only provides the phenomenological experience of our actions, but is also linked with motivation for action. Having a goal doesn’t automatically explain why the organism might try to plan and persist toward that goal, especially when these goals are more abstract and distal. The sense of agency seems to be bidirectionally linked to motivation.

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

195

Individuals tend to overestimate their agency over outcomes that are positive while showing a reduction in the agency for negative outcomes. Others have suggested that a hierarchy of goals and intentions might provide a scaffolding that affects the sense of agency. Some researchers have even argued that a sense of agency is an emergent property of control over the hierarchy of intentions that vary in the distality of outcomes associated with the actions. A central aspect of the sense of agency is the self-other distinction that it provides in a multi-agent scenario. Although a vast majority of work involved participants in a single agent, studies have shown that introducing other agents in the environment reduces one’s sense of agency. One reason for this might be a tendency to account for the possibility that outcomes can be attributed to other agents. These results are in line with the idea that we tend to attribute negative outcomes to others due to self-serving bias (Sidarus et al., 2018). This reduction in the sense of agency for outcomes where other agents are present is due to a higher level attributional bias, an increase in the complexity of decision-making, or due to presence of the other agents. Sidarus et al (2018) investigated the plausible causes by presenting participants with negative or positive outcomes with either a mechanistic failure or social agent as a potential cause for the outcomes in addition to the participants himself/herself. Interestingly, authors report a reduction in the sense of agency but only when a social agent was present. This reduction was seen for both negative and positive outcomes. Authors conclude that modulations in the sense of agency result from an increased cognitive load due to the need to incorporate the potential actions of others while making decisions. Authors argue that the reduction in the sense of agency in a social context might be important in many different scenarios involving peer influence, specifically, authors focus on the field of education and feedback-driven learning. The results are also important for individual variations in the processing of social cues (see the section on autism and sense of agency). The increased cognitive load to process and incorporate social cues might be important for social feedback and social development. The nature of outcomes seems to be closely related to the sense of agency. When participants take a decision freely versus a decision taken by others, there is a differential activation in the anterior frontal-median cortex and rostral cingulate zone (Forstmann et al., 2008) as well as an activation of the reward mechanisms of the brain (Blain and Sharot, 2021). The behavioral result of decisions taken freely can be seen both in terms of related perceptual effect (Barlas and Kopp, 2018) as well as a change in the regret reported by participants for decisions resulting in a negative outcome.

Measures of Sense of Agency One of the most important requirements to scientifically investigate a subjective experience like the sense of agency is the need for naturalizing the construct. A major issue with measurement is that it often involves minimal awareness, that is, the agent has a very fleeting experience of agency. Compare it to the experience of seeing a

196

D. Kumar

tree or hearing the singing of a bird, where the experience is strong and stable. This often translates in SoA being a very weak effect, difficult to detect, and with a lot of intersubject variability. A common way of measuring the sense of agency is by asking participants to ask participants to report their agentive experience in the form of ratings on questions involving feelings of control, ownership, and authorship of action (Kumar and Srinivasan, 2014). Or by looking at the meta-awareness of participants when they perform an action and give feedback. This can include action as well as perceptual awareness. One way to measure Sense of Agency is by looking at the sensitivity to distortions between expected action outcome and actual feedback from the environment. An example of such a measure is provided in a study in which the authors asked participants to perform an action and the visual feedback was presented on screen. The authors manipulated the relationship between actual movement and change in visual feedback and asked participants to report the direction of the discrepancy between the two; another way in which explicit judgments of agency can be captured indirectly. Certain other studies have asked participants to make self/other distinctions in terms of action attribution in multiagent scenarios and used that as a measure of agency (Kumar and Srinivasan, 2014). SoA has also been measured indirectly, by asking participants to make judgments about the causal efficacy of their actions (Chambon et al., 2015; Moore, 2016). These types of methods where participants make explicit judgments related to the agency are called explicit measures of the agency. Some studies have reported that under certain circumstances, implicit and explicit measures of agency might end up giving us different answers (Dewey and Knoblich, 2014). Ebert and Wegner (2010) showed that there is a dissociation between the two measures of agency when produced action is directed outwards but not when actions are directed inwards. Similarly Kumar and Srinivasan (2013), show that depending upon whether or not higher level goals are achieved the measures of sense of agency might show dissociation. Previous work has suggested that the implicit and explicit measures of agency don’t always correlate. A common paradigm to demonstrate this evidence comes from the Perruchet Paradigm (Moore et al., 2012; Perruchet et al., 2006), where researchers compared the intentional binding for self-generated action with the explicit judgment of the likelihood that their actions caused the outcome. The authors suggest some independent agency processing systems. Dewey and Knoblich (2014), further investigated the commonality of the underlying attribution mechanism for implicit and explicit measures of agency. They report a lack of correlation between intentional binding, sensory attenuation, and explicit sense of agency indicating a difference in the underlying mechanism. To resolve some of these problems, recent researchers have started looking at the brain measure of agency. In recent work (Wen et al., 2019) researchers use changes in visually evoked steady state potentials(SSVEP) as measure of SoA. A second type of measure involves asking participants to perform an action and then infer the agentive experience from the voluntary action. One of the commonly used implicit measures of the agency is intentional binding (Haggard et al., 2002), which refers to the temporal compression observed for self-generated actions compared to involuntary or cued actions. Another implicit measure of agency, especially used for shorter durations, is sensory attenuation (Moore, 2016), which

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

197

is based on the idea that the perceived intensity of self-generated actions is reduced compared to passive movements (a common example used is the inability to tickle ourselves due to the reduced perceived intensity of the tickle). It is important to note that both these measures have been criticized as not being a valid measure of SoA. For example, it has been suggested that intentional binding instead of measuring SoA is a reflecting temporal binding observed between any two events causally associated with each other (Buehner, 2012).

Applications of Sense of Agency Health and Sense of Agency A domain that has received a lot of focus from cognitive scientists is the relationship between SoA and Health. Not only is there a variation in SoA with age in the normal population, but individuals suffering some certain disorders also show loss of agency as a prominent symptom. Next we discuss some of the areas in which understanding of sense of agency has played a critical role.

Sense of Agency and Aging A number of studies have shown a loss in agency among the older population (Cioffi et al. 2017). This decrease in an agency might be due to the inability to integrate internal and environmental cues during the process of agency attribution. It has also been suggested that there might be an actual decrease in ability to control due to physical impairments and these can lead to a decrease in well-being and quality of life in aging adults (Moore, 2016). With aging population in India and the globe, the Sense of Agency as a measure might prove to be an important maker to study and improve quality of life. The sense of agency might also help understand how aging influences the ability to control and effectively interact with complex devices such as mobile phones, ATMs, etc.

Sense of Agency and Schizophrenia Schizophrenia is a disorder that includes distortions in thinking and perception along with problems in motor control, communication, and affective expression (Garbarini et al., 2016). Schizophrenia has also been associated with loss of agency where patients incorrectly attribute actions to external agents (Garbarini et al., 2016). The decreased ability to make correct agency attributions is associated with the severity

198

D. Kumar

of symptoms (Kozáková et al., 2020). The SOA-related problems are seen to a greater extent in patients showing negative symptoms (Maeda et al., 2013). A major problem that might underly some of the symptoms associated with schizophrenia would be the inability to integrate information cues together and make accurate predictions about the environment. There is strong evidence suggesting a negative relationship between aging and a sense of control, both crossectionally as well as longitudinally, and individuals with a lower sense of agency are less likely to take responsibility for their health (Wolinsky et al., 2003). By understanding the sense of agencyrelated problems in Schizophrenia, greater insights about the disorder can be obtained with implications for possible intervention techniques.

Sense of Agency and Autism Many children with autism have a lot of anxiety as well as extreme sensitivities to light and sound. A new study suggests that fear of the unknown drives both features and hints that helping children who have autism cope with uncertainty could ease some of their symptoms. Autistic children try to make their environment more predictable and controllable (Crane et al., 2021). Helping children learn to draw from past experiences to better predict the outcome of future situations may quell their anxiety along with their sensory sensitivities. Researchers are only beginning to explore how unpredictability relates to anxiety and sensory sensitivity in people with autism. Pellicano’s work provides much-anticipated evidence to support the idea that unpredictability can exacerbate both of those symptoms (Cannon et al., 2021). The issues like traveling and going to a new environment have troubled parents of autistic individuals. The common way parents deal with these issues is by creating a mental image through storytelling or creating an artificial environment at home (while later talking about meditation, we talk about the relationship between mentalizing and a sense of agency). Traveling on flight and trains are a very difficult issue for lots of autistic individuals. Recent techniques like flight simulation are found to be useful to make them familiarize themselves with the situation prior. A sense of control of the environment becomes necessary for an autistic individual to function in daily life. In such instances, technological intervention like virtual reality gear to provide them with the sense of environment apriori has been found to be helpful in dealing with real-life uncertainties.

Sense of Agency and Immersive Therapy Phantom pain refers to a background awareness of a more specific sensation that occurs after losing a limb which is chronic and persistent in nature. Pain is common in amputees with a poor understanding of the causal and lack of effective drug/ surgical therapy. A therapy that is commonly used to alleviate the pain is mirror

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

199

box therapy which involves motor imagery of the normal hand that results in vivid kinesthetic sensation for the amputated arm (Kim and Kim, 2012; Ramachandran and Rogers-Ramachandran, 1996). Later studies addressed some of the challenges with mirror box therapy by using a VR system and showed the effectiveness of sensation of control and agency over a virtual hand in alleviating phantom pain (Cole et al., 2009).

Sense of Agency and Meditation The research on meditation has intricately linked one’s feeling of self with meditation. Research on mindfulness meditation reports that meditating leads to losing the sense of self and a self-boundary dissolution (Chiarella et al., 2020). Using intentional binding as a measure of agency, the study shows that participants who were long-term meditators of Vipassana showed reduced intentional binding, indicating a weaker sense of agency. In another study (Lindahl and Britton, 2019) investigating the relationship between Buddhist meditation and sense of self, participant reported a loss of sense of agency along with a loss of sense of embodiment as well as changes in self-other world distinction. Interestingly, the changes in the sense of self with meditation were not always beneficial, more changes in the global sense of self were associated with higher level impairments (Lindhall and Britton, 2019). A clever way to understand how a sense of agency influences meditation is to look at hypnosis, which provides the opposite of meditation in terms of a metacognitive state. While hypnosis results from inaccurate metacognitive higher states of intention, meditation involves developing more accurate metacognition. (Lush and Dienes, 2019) show a relationship between metacognition of intentions intricately linked with implicit measures of agency. In their study, the highly hypnotizable group reported later awareness of motor intentions and control compared to group more resistant to hypnosis. Additionally, meditators showed more outcome binding than non-meditators suggesting changes in postdictive processing.

Sense of Agency and Prosthetics A construct that has been intricately linked with the experience of agency is the experience of ownership. While SoA refers to the experience of being in control of our actions and its outcome, sense of ownership refers to the experience of mineness towards our own body (Braun et al., 2018). It has been found that sense of ownership is plastic and can be manipulated to include rubber limbs (Botvinick and Cohen, 1998), virtual hand (Suzuki et al., 2013), faces of others (Porciello et al., 2018). This is done generally either by using interoceptive signals correlated with visual feedback of the targeted body part or by using synchronous sensory stimulation. A sense of ownership can also be modulated to produce out-of-body illusions (Metzinger, 2005).

200

D. Kumar

It has been found that SoA and SOO promote each other, that is, enhancement in one, results in enhancement in the other (Braun et al., 2018). From an application perspective, this is important as in order to extend ownership toward an object and integrate it into one’s body part (for example a cane or a prosthesis), we need to enhance the sense of agency, that is, make the patient feel he is in control of actions being produced by that object. This relationship has been observed in empirical studies, in a study with amputees, it was found that moving an artificial robotic hand coordinated with the amputated stump of individuals resulted in an enhancement of SoA (Sato et al., 2018).

Sense of Agency and Technology The experience of agency provides a window into intentional actions and their outcomes when interacting with not just other social agents but also when interacting with complex machines and interfaces. Being able to effectively monitor goal-oriented actions can be crucial for optimal utilization of such systems, especially when it comes to situations where there can be safety concerns, for example, cars, flight control, etc. Sense of agency might also provide clues to the integration of artificial sensory systems and actuators in one’s body image. We discuss some of these applications describing how the understanding of SoA can used to effectively integrate man and machine.

Sense of Agency and HCI The field of HCI focuses on understanding the factors that mediate the interaction between man and machine and the design principles that can help enrich the user experience while interacting with an interface (Limerick et al., 2014). Due to the strong dependency of HCI on the modality of input (how can a user act on a system) and feedback provided by the system, the need for understanding the experiences associated with the action feedback becomes quite crucial. Sense of agency, especially the neural measures of agency might be utilized as reliable metrics for the experience of control. A critical issue in optimizing user experience is the gap between user intentions and the state of the system. This gap is known as the gulf of execution and decreasing this gulf involves the input modality of the system. This becomes especially relevant when talking about newer input techniques such as speech and gesture control. As SoA is intricately linked to the ability to control, it may provide a good way to investigate the effectiveness of input modality (Limerick et al., 2014). A good example of such an application is provided by Coyle and colleagues (Coyle et al., 2012). The authors asked participants to perform an action either by pressing a button or with the help of a skin-based input device. They report that implicit sense of agency measured

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

201

using intentional binding between action and an auditory tone was stronger for skinbased input compared to keyboard input. The dependence on a sense of agency with noise in the environment is also of significance to the gulf of execution. Looking at variations in the sense of agency with an accuracy of the input device can help design better input devices. Another factor that influences the sense of control is the feedback we receive for our actions. There can be a gap between what the user expects the feedback to be like based on their intentions and how the system works. Given the hierarchical nature of control and plasticity of agentive experience, it is possible to have a mismatch between the actual causal interaction with an agent and how participant attribute agency (Desantis et al., 2011; Kumar and Srinivasan, 2017). Recent work has suggested that disruptions in sensory-motor conflicts even on unrelated tasks can influence higher order monitoring of actions (Faivre et al., 2020). Hence, it becomes important to avoid sensory-motor conflicts that may disrupt the experience of the user.

Sense of Agency and Gaming Industry and VR One of the key aspects of virtual reality technology as an interface is the realistic experience that it provides the user. VR setup consists of a range of technologies that involve visual, auditory, and haptic interfaces that are holistically provided with a feeling of immersion in the artificial setup. The innovation in VR technology often involves designing a richer virtual environment. However, it is also important to also consider the user’s viewpoint while designing these techniques. When providing novel VR tasks, it has been argued that the user experiences and learns skills by performing the task (acting on the environment) rather than the avatar itself. The general idea is that experience of the environment is intricately linked and inseparable from the VR setup and the experience of the action is much more important than the nature of the virtual avatar itself. The experience of an agency is intricately linked conceptually to the sense of immersion and if we assume a version of the comparator model as underlying the sense of agency, the immersion experience by an individual might depend more on action-related authorship (congruence between actual and predicted action) rather than the richness of the environment itself (something that has been the major focus while building more effective VR setups). Lack of actionoutcome congruence might result in a feeling of not being the author of the actions as well as a lack of belief about the reason for the change in perception. The importance of the experience and belief in being a causal agent for a change in the environment comes from a study where participants performed the task of grasping, orienting, and moving a rod through a slit in a real or virtual environment. Authors report undesirable performance in the virtual environment compared to the real environment which they link to a decrease in visual-feedforward cues and a lower SoA in the virtual environment (Kobayashi and Shinya, 2018). This relationship between performance and agency is not observed by all researchers, in another study that investigated

202

D. Kumar

the relationship between a sense of presence in a virtual environment and how it influences the learning of motor skills in a VR golf ball game, authors found that when participants actively played the game (versus watching a game of golf), there was an increase in sense of agency although no improvement in skill was observed (Piccione et al., 2019). From a user experience perspective, it is important to note that the active nature of task enhances SoA and the feeling of presence in the VR environment. The sense of agency can be modulated by manipulating discrepancies or temporal delays in self-generated actions and their feedback. Hence, in order to provide a greater sense of agency the nature and timing of feedback is crucial to enhance SoA. From a virtual reality perspective, this means that in order to have a stronger presence in VR, real-time monitoring of participants’ actions by tacking or motion capture and mapping the movements to a virtual avatar is much more crucial than the richness of the VR experience (Kilteni et al., 2012). Although it might seem that an accurate mapping between action and feedback is important and disturbance might lead to a loss of agency, there are studies that show enhancement in agency with modified sensory feedback (Aoyagi et al., 2021). Healthy participants moved a virtual object to trace a trajectory, during this movement, the visual feedback was modified by either introducing a temporal delay or by attaching some weight to the wrist of the hand they used for motion, both manipulations lead to a reduced sense of agency. However, when this disturbance was accompanied by an offset in the position of participants’ actual hand and virtual on-screen hand, authors observed an enhanced sense of agency. Further studies indicating the importance of feedback on SoA show that both implicit as well as explicit measures of the agency are not influenced by the modality of display (HMD, Monitor screen) but is influenced by the modality of action as well as temporal delays. Interestingly, the authors report a differential effect of implicit sense of agency (often linked with feeling of agency) and explicit sense of agency (often linked with judgment of agency), (Winkler et al., 2020). The close relationship between VR setup and SoA indicates that SoA can also be used in therapies. In a recent study (Nagamine et al., 2016), authors report that watching a virtual hand in a VR setup led to an enhanced sense of agency and sense of ownership indicating the usefulness of motor imagery-based BCI for neurorehabilitation.

Sense of Agency and Automation For increasingly complex tasks (for example driving an aircraft, or operating complex machinery) use of assistive computer systems is becoming more common. Such assistive systems are now seen in day-to-day usage in form recommendation systems or auto text correction. However, users do not always find these assistive systems useful, an example is the Microsoft Office Assistant in early iterations of the package which was not used by many users and was finally removed from newer versions. Also, there can be an intrasubject variability in the level of assistance required. Too much assistance can lead to a feeling of being overwhelmed and loss of control. An

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

203

insight into how the problem of assistance can be solved is given by studies on a joint sense of agency where participants are asked to perform a task jointly (Obhi and Hall, 2011). The sense of agency when humans have other humans as co-actor is much stronger than when they have computers as co-actor, suggesting that the joint agency “we” is an important metric to consider when designing assistive agents. In a study looking at automation and agency, participants were placed in a flight simulator and performed a complex task under varying levels of automation (Berberian et al., 2012). The intentional binding task was embedded in the flight simulator with the action being avoiding a conflict with another plane which was followed by feedback about their success. They found that with the increase in level of automation, intentional binding and the sense of agency decreased. The study shows an interesting way in which measures of agency can be incorporated ecologically in other tasks, and without interference, a measure of agentive experience can be obtained, giving an idea about the optimal level of automation (indicated by an enhanced sense of agency) beyond which a feeling of loss of control occurs. In a review paper looking at driving automation and a sense of agency (Wen et al., 2019), the authors suggest that the maintenance of a sense of agency is crucial for both ethical as well as safety purposes. One of the important functions that a sense of agency might serve would be to allow for self-other distinctions and decoding of the intention of other drivers. They also suggest that a sense of agency is important to consider also for other scenarios including robotics, joint control, etc.

Sense of Agency and Education Another domain that has received a lot of attention from technological advancement is that of education and learning among school students. The recent lockdown saw a boost in software agents based on remote teaching–learning experience. A study examined the relationship between concepts of agency attribution and learning from agent-based computer systems. They report that students who made more agentive attributions of human behavior learned more effectively from the system (Jaeger et al., 2019). It seems that the ability to effectively make agency attributions are better at interacting with software agents. To promote a sense of agency, certain core practices have been suggested (Vaughn, 2018). It is suggested that teachers should be sensitive to students’ readiness, their choices and allow them to guide their own learning. This ensures a greater sense of control over the learning process. As the sense of agency is dependent on the ability to control the environment, the classrooms should be organized and structured to enable greater control. This would include things like discussions, activities, and projects (Vaughn, 2018). Being able to integrate these components effectively with self-paced learning will provide students with a greater sense of control over the process of learning and can be a highlight of technology-assisted learning over traditional classroom teaching.

204

D. Kumar

Conclusion The chapter discussed the basic idea of what sense of agency is, and the theoretical enquiry that led the empirical investigations into the nature of agentive experience. We also looked at the possible mechanism that might underly the emergence of a sense of agency, linking it to multiscale control exercised by an agent in its environment. We described the various measures that have been used to operationalize and empirically investigate the sense of agency. The second part of the chapter discussed the various domains in which understanding of the sense of agency can be applied in real life. We note that although the concept of the sense of agency and its importance has been discussed in the fields of philosophy and psychology, it might provide a tangible way to explore the human experience of action control and integrate it with technology. This kind of integration will truly help us to use technology for improving human life and experience and integrate it in a much more intuitive fashion.

References Aoyagi, K., Wen, W., An, Q., Hamasaki, S., Yamakawa, H., Tamura, Y., Yamashita, A., & Asama, H. (2021). Modified sensory feedback enhances the sense of agency during continuous body movements in virtual reality. Scientific Reports, 11(1), 2553. https://doi.org/10.1038/s41598021-82154-y Barlas, Z., & Kopp, S. (2018). Action choice and outcome congruency independently affect intentional binding and feeling of control judgments. Frontiers in Human Neuroscience, 12, 137. https://doi.org/10.3389/fnhum.2018.00137 Berberian, B., Sarrazin, J.-C., Blaye, P. L., & Haggard, P. (2012). Automation technology and sense of control: A window on human agency. PLoS ONE, 7(3), e34075. https://doi.org/10.1371/jou rnal.pone.0034075 Blain, B., & Sharot, T. (2021). Intrinsic reward: Potential cognitive and neural mechanisms. Current Opinion in Behavioral Sciences, 39, 113–118. https://doi.org/10.1016/j.cobeha.2021.03.008 Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’ touch that eyes see. Nature, 391(6669), 756–756. https://doi.org/10.1038/35784 Braun, N., Debener, S., Spychala, N., Bongartz, E., Sörös, P., Müller, H. H. O., & Philipsen, A. (2018). The senses of agency and ownership: A review. Frontiers in Psychology, 9, 535. https:/ /doi.org/10.3389/fpsyg.2018.00535 Buehner, M. J. (2012). Understanding the past, predicting the future: Causation, not intentional action, is the root of temporal binding. Psychological Science, 23(12), 1490–1497. Cannon, J., O’Brien, A. M., Bungert, L., & Sinha, P. (2021). Prediction in autism spectrum disorder: A systematic review of empirical evidence. Autism Research: Official Journal of the International Society for Autism Research, 14(4), 604–630. https://doi.org/10.1002/aur.2482 Caspar, E. A., Desantis, A., Dienes, Z., Cleeremans, A., & Haggard, P. (2016). The sense of agency as tracking control. PLoS ONE, 11(10), e0163892. https://doi.org/10.1371/journal.pone.0163892 Chambon, V., Moore, J. W., & Haggard, P. (2015). TMS stimulation over the inferior parietal cortex disrupts prospective sense of agency. Brain Structure & Function, 220(6), 3627–3639. https:// doi.org/10.1007/s00429-014-0878-6 Chiarella, S., Makwana, M., Simione, L., Hartkamp, M., Calabrese, L., Raffone, A., & Srinivasan, N. (2020). Mindfulness meditation weakens attachment to self: Evidence from a self vs other binding task. Mindfulness, 11, 2411–2422. https://doi.org/10.1007/s12671-020-01457-9

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

205

Cioffi, M. C., Cocchini, G., Banissy, M. J., & Moore, J. W. (2017). Ageing and agency: Age-related changes in susceptibility to illusory experiences of control. Royal Society Open Science, 4(5), 161065. https://doi.org/10.1098/rsos.161065. Cole, J., Crowle, S., Austwick, G., & Henderson Slater, D. (2009). Exploratory findings with virtual reality for phantom limb pain; from stump motion to agency and analgesia. Disability and Rehabilitation, 31(10), 846–854. https://doi.org/10.1080/09638280802355197 Coyle, D., Moore, J., Kristensson, P. O., Fletcher, P., & Blackwell, A. (2012). I did that! Measuring users’ experience of agency in their own actions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2025–2034). Association for Computing Machinery. https://doi.org/10.1145/2207676.2208350. Crane, L., Hearst, C., Ashworth, M., Davies, J., & Hill, E. L. (2021). Supporting newly identified or diagnosed autistic adults: An initial evaluation of an autistic-led programme. Journal of Autism and Developmental Disorders, 51(3), 892–905. https://doi.org/10.1007/s10803-020-04486-4 Desantis, A., Roussel, C., & Waszak, F. (2011). On the influence of causal beliefs on the feeling of agency. Consciousness and Cognition, 20(4), 1211–1220. https://doi.org/10.1016/j.concog. 2011.02.012 Dewey, J. A., & Knoblich, G. (2014). Do implicit and explicit measures of the sense of agency measure the same thing? PLoS ONE, 9(10), e110118. https://doi.org/10.1371/journal.pone.011 0118 Ebert, J. P., & Wegner, D. M. (2010). Time warp: Authorship shapes the perceived timing of actions and events. Consciousness and Cognition, 19(1), 481–489. https://doi.org/10.1016/j.concog. 2009.10.002 Faivre, N., Vuillaume, L., Bernasconi, F., Salomon, R., Blanke, O., & Cleeremans, A. (2020). Sensorimotor conflicts alter metacognitive and action monitoring. Cortex; A Journal Devoted to the Study of the Nervous System and Behavior, 124, 224–234. https://doi.org/10.1016/j.cor tex.2019.12.001 Forstmann, B. U., Wolfensteller, U., Derrfuss, J., Neumann, J., Brass, M., Ridderinkhof, K. R., & von Cramon, D. Y. (2008). When the choice is ours: context and agency modulate the neural bases of decision-making. PLoS ONE, 3(4), e1899. https://doi.org/10.1371/journal.pone.000 1899 Gallagher, S. (2012). Multiple aspects in the sense of agency. New Ideas in Psychology, 30, 15–31. Garbarini, F., Mastropasqua, A., Sigaudo, M., Rabuffetti, M., Piedimonte, A., Pia, L., & Rocca, P. (2016). Abnormal sense of agency in patients with schizophrenia: evidence from bimanual coupling paradigm. Frontiers in Behavioral Neuroscience, 10, 43. https://doi.org/10.3389/fnbeh. 2016.00043 Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature Neuroscience, 5(4), 382–385. https://doi.org/10.1038/nn827 Jaeger, C. B., Hymel, A. M., Levin, D. T., Biswas, G., Paul, N., & Kinnebrew, J. (2019). The interrelationship between concepts about agency and students’ use of teachable-agent learning technology. Cognitive Research: Principles and Implications, 4(1), 14. https://doi.org/10.1186/ s41235-019-0163-6 Jordan, J. S. (2003). Emergence of self and other in perception and action: an event-control approach. Consciousness and Cognition, 12(4), 633–646. https://doi.org/10.1016/S1053-810 0(03)00075-8 Kilteni, K., Groten, R., & Slater, M. (2012). The sense of embodiment in virtual reality. Presence: Teleoperators and Virtual Environments, 21(4), 373–387. https://doi.org/10.1162/PRES_ a_00124. Kim, S. Y., & Kim, Y. Y. (2012). Mirror therapy for phantom limb pain. The Korean Journal of Pain, 25(4), 272–274. https://doi.org/10.3344/kjp.2012.25.4.272 Kobayashi, D., & Shinya, Y. (2018). Study of virtual reality performance based on sense of agency. In S. Yamamoto & H. Mori (Eds.), Human Interface and the Management of Information. Interaction, Visualization, and Analytics (pp. 381–394). Springer International Publishing. https:// doi.org/10.1007/978-3-319-92043-6_32.

206

D. Kumar

Kozáková, E., Bakštein, E., Havlíˇcek, O., Beˇcev, O., Knytl, P., Zaytseva, Y., & Španiel, F. (2020). Disrupted sense of agency as a state marker of first-episode schizophrenia: A large-scale followup study. Frontiers in Psychiatry, 11, 1489. https://doi.org/10.3389/fpsyt.2020.570570 Kumar, D., & Srinivasan, N. (2013, August 2). Hierarchical control and sense of agency: Differential effects of control on implicit and explicit measures of agency. https://doi.org/10.13140/2.1.4720. 1283. Kumar, D., & Srinivasan, N. (2014). Naturalizing sense of agency with a hierarchical event-control approach. PLoS ONE, 9(3), e92431. https://doi.org/10.1371/journal.pone.0092431 Kumar, D., & Srinivasan, N. (2017). Multi-scale control influences sense of agency: Investigating intentional binding using event-control approach. Consciousness and Cognition, 49, 1–14. https:/ /doi.org/10.1016/j.concog.2016.12.014 Limerick, H., Coyle, D., & Moore, J. W. (2014). The experience of agency in human-computer interactions: A review. Frontiers in Human Neuroscience, 8, 643. https://doi.org/10.3389/fnhum. 2014.00643 Lindahl, J. R., & Britton, W. B. (2019). “I have this feeling of not really being here”: Buddhist meditation and changes in sense of self. Journal of Consciousness Studies, 26(7–8), 157–183. Lush, P., & Dienes, Z. (2019). Time perception and the experience of agency in meditation and hypnosis. PsyCh Journal, 8(1), 36–50. https://doi.org/10.1002/pchj.276 Maeda, T., Takahata, K., Muramatsu, T., Okimura, T., Koreki, A., Iwashita, S., Mimura, M., & Kato, M. (2013). Reduced sense of agency in chronic schizophrenia with predominant negative symptoms. Psychiatry Research, 209(3), 386–392. https://doi.org/10.1016/j.psychres.2013. 04.017 Metzinger, T. (2005). Out-of-body experiences as the origin of the concept of a ‘soul.’ Mind and Matter, 3, 57–84. Moore, J. W. (2016). What is the sense of agency and why does it matter? Frontiers in Psychology, 7, 1272. https://doi.org/10.3389/fpsyg.2016.01272 Moore, J. W., Middleton, D., Haggard, P., & Fletcher, P. C. (2012). Exploring implicit and explicit aspects of sense of agency. Consciousness and Cognition, 21(4), 1748–1753. https://doi.org/10. 1016/j.concog.2012.10.005 Nagamine, S., Hayashi, Y., Yano, S., & Kondo, T. (2016). An immersive virtual reality system for investigating human bodily self-consciousness. In 2016 Fifth ICT International Student Project Conference (ICT-ISPC) (pp. 97–100). https://doi.org/10.1109/ICT-ISPC.2016.7519245. Obhi, S. S., & Hall, P. (2011). Sense of agency and intentional binding in joint action. Experimental Brain Research, 211(3), 655. https://doi.org/10.1007/s00221-011-2675-2 Perruchet, P., Cleeremans, A., & Destrebecqz, A. (2006). Dissociating the effects of automatic activation and explicit expectancy on reaction times in a simple associative learning task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(5), 955–965. https://doi.org/ 10.1037/0278-7393.32.5.955 Piccione, J., Collett, J., & De Foe, A. (2019). Virtual skills training: The role of presence and agency. Heliyon, 5(11), e02583. https://doi.org/10.1016/j.heliyon.2019.e02583 Porciello, G., Bufalari, I., Minio-Paluello, I., Di Pace, E., & Aglioti, S. M. (2018). The ‘enfacement’ illusion: A window on the plasticity of the self. Cortex, 104, 261–275. https://doi.org/10.1016/ j.cortex.2018.01.007 Ramachandran, V. S., & Rogers-Ramachandran, R. (1996). Synaesthesia in phantom limbs induced with mirrors. Proceedings of the Royal Society of London. Series B: Biological Sciences, 263, 377–386. https://doi.org/10.1098/rspb.1996.0058. Sato, Y., Kawase, T., Takano, K., Spence, C., & Kansaku, K. (2018). Body ownership and agency altered by an electromyographically controlled robotic arm. Royal Society Open Science, 5(5), 172170. https://doi.org/10.1098/rsos.172170 Sidarus, N., Haggard, P., & Beyer, F. (2018). How social contexts affect cognition: Mentalizing interferes with sense of agency during voluntary action. PsyArXiv. https://doi.org/10.31234/ osf.io/wj3ep.

12 Human–Technology Interfaces: Did ‘I’ do it? Agency, Control, and why …

207

Suzuki, K., Garfinkel, S. N., Critchley, H. D., & Seth, A. K. (2013). Multisensory integration across exteroceptive and interoceptive domains modulates self-experience in the rubber-hand illusion. Neuropsychologia, 51(13), 2909–2917. https://doi.org/10.1016/j.neuropsychologia.2013. 08.014 Synofzik, M., Vosgerau, G., & Newen, A. (2008). Beyond the comparator model: A multifactorial two-step account of agency. Consciousness and Cognition, 17(1), 219–239. https://doi.org/10. 1016/j.concog.2007.03.010 Vaughn, M. (2018). Making sense of student agency in the early grades. The Phi Delta Kappan, 99(7), 62–66. Wen, W., Kuroki, Y., & Asama, H. (2019). The sense of agency in driving automation. Frontiers in Psychology, 10, 2691. https://doi.org/10.3389/fpsyg.2019.02691 Winkler, P., Stiens, P., Rauh, N., Franke, T., & Krems, J. (2020). How latency, action modality and display modality influence the sense of agency: A virtual reality study. Virtual Reality, 24(3), 411–422. https://doi.org/10.1007/s10055-019-00403-y Wolinsky, F. D., Wyrwich, K. W., Babu, A. N., Kroenke, K., & Tierney, W. M. (2003). Age, aging, and the sense of control among older adults: A longitudinal reconsideration. The Journals of Gerontology: Series B, 58(4), S212–S220. https://doi.org/10.1093/geronb/58.4.S212

Devpriya Kumar is an associate professor of Cognitive Science and Psychology at IIT Kanpur. He heads the perception action and cognition lab which primarily investigates how we meaningfully interact with the world around us, how this interaction influences higher order experiences of consciousness and self, and how this understanding can be translated to real-life applications.

Part VI

Engineering Design

Chapter 13

Do Analogies and Analogical Distance Influence Ideation Outcomes in Engineering Design? V. Srinivasan, Binyang Song, Jianxi Luo, Karupppasamy Subburaj, Mohan Rajesh Elara, Lucienne Blessing, and Kristin Wood

Abstract The efficacy of using patents for stimulation to support creativity during the concept phase in engineering design is investigated through understanding: (a) the effects of patents for stimulation on generated concepts’ quantity, novelty, and quality and (b) the consequences of stimulating with patents from various distances of analogy on the generated concepts’ novelty and quality. A design experiment is devised in a design course, in which 105 students ideate without and with various patents to generate concepts of spherical robots. The principal observations are as follows: (a) stimulation with as compared to without patents yields more concepts, V. Srinivasan (B) Department of Design, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India e-mail: [email protected] B. Song Department of Mechanical Engineering, Massachusetts Institute of Technology, Massachusetts Cambridge, USA e-mail: [email protected] J. Luo · M. R. Elara · L. Blessing Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore, Singapore e-mail: [email protected] M. R. Elara e-mail: [email protected] L. Blessing e-mail: [email protected] K. Subburaj Department of Mechanical and Production Engineering - Design and Manufacturing, Aarhus University, Aarhus, Denmark e-mail: [email protected] K. Wood College of Engineering, Design and Computing, University of Colorado Denver, Anschutz Medical Campus, Denver, CO, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_13

211

212

V. Srinivasan et al.

(b) stimulation with patents, other resources, or their combination as compared to no stimulation yields concepts of higher novelty, (c) stimulation with patents as compared to no patents yields concepts of higher quality, and (d) stimulation with patents, other resources, or their combination as compared to no stimulation yields concepts of higher quality, (e) when the analogical distance between patents and problem domains decreases, quality increases but novelty decreases. Keywords Patents · Stimuli · Design creativity · Novelty · Quality

Introduction Engineering design spans from identification of needs to the development of instructions with which solutions for the needs can be manufactured, and it comprises of phases: task clarification, conceptual design, embodiment design, and detail design (Pahl et al., 2007). In conceptual design, solution principles or concepts are generated, evaluated, and modified, before the most promising ones are selected for further detailing. Among other phases, this phase is significant for multiple reasons, for example, a successful solution is likely to be a consequence of the exploration of multiple principles than attention on details (Pahl et al., 2007), less cumbersome to effect changes that are less costly but have greater repercussions in the later phases (French, 1988), etc. Design creativity is the ability to generate outcomes that are both novel and valuable (Sarkar & Chakrabarti, 2011). Analogical design or design-byanalogy is a design method to generate creative solutions; it involves using stimuli to develop analogies and using these for various tasks (Chakrabarti et al., 2011; Chan et al., 2011; Srinivasan et al., 2015; Jiang et al., 2021). Analogies are similarities in some features between seemingly unrelated objects. For example, analogy in structure between Rutherford’s model of an atom (nucleus at the center and electrons revolving around it in different energy orbits) and the model of the solar system (sun at the center and planets revolving around it in different orbits). Analogical design was used for identification, interpretation, and reformulation of problems; generation, evaluation, explanation, and anticipation of issues in solutions (Goel, 1997; Hey et al., 2008; Srinivasan et al., 2015), and for aiding generation of creative solutions, enhancement of novelty, inhibit fixation, etc. (Linsey et al., 2010; Chan et al., 2011; Fu et al., 2014; Murphy et al., 2014). Conventionally, patents have been used as legal documents to protect inventions from being copied. In design, they have been used for various applications: check infringements, study current technologies and forecast future technologies, generate solutions, represent and model technologies, etc. (Fantoni et al., 2013; Koh, 2013; Fu et al., 2014; Murphy et al., 2014; Song et al., 2019). Patents contain information of novel and functional, products and processes, that span multiple and diverse domains. So, given this span, potentially, patents can be used as stimuli in analogical design. However, more information in patents is hidden than disclosed and the language is legal jargon (Fantoni et al., 2013; Koh, 2013). So, patents may not be conducive in

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

213

helping comprehend the functioning of products and processes. Consequently, their potency as stimuli for sourcing analogies to support creative ideation is questionable. So, the overarching goal of this study is to assess the efficacy of using patents for stimulating the generation of concepts in engineering design.

Literature Review Patents in Design Apart from the applications of patents mentioned earlier, the Theory of Inventive Problem-Solving (TIPS) (better known in Russian as Teoriya Resheniya Izobretatelskikh Zadatch (TRIZ)) is based on patents and their patterns of evolution. It comprises a set of design methods (such as Ideality, Contradiction matrix, Su-Fi analysis, etc.) which can be used for problem-finding, problem-solving, failure analysis, and forecasting (Altshuller, 1999). In addition, several researchers (such as Chan et al., 2011; Fu et al., 2013a; Murphy et al., 2014; Song et al., 2017) have used patents as stimuli to test various parameters on ideation. These pieces of work are reviewed in detail in the Section “Effects of Patent-Based Analogies on Ideation”. In design, patents have been primarily used to check infringements by the designs developed. Koh (2013) discussed the pros and cons of checking patent infringements before, during, and after ideation. Reviewing patents before ideation can cause fixation and create difficulty in identifying relevant patents but can reduce the chances of infringements and the consequent redesigning necessary to overcome infringements. Reviewing patents after ideation can help eliminate fixation and reduce the burden of identifying relevant patents to check for infringements. However, if infringements are found after ideation, the effort of redesigning is high. Therefore, reviewing patents after ideation is the most efficient to identify infringements and reviewing patents before ideation is the most effective to minimize additional design work (Koh, 2020a, b). An experiment was conducted to study whether reviewing a patent before ideation causes design fixation and design disturbance, and it was reported that: (a) some design features from patents were included in the conceptual sketches from ideation and thus revealing fixation and (b) some design features were excluded, which would otherwise be included, and thus revealing disturbance (Koh & Lessio, 2018). Another experiment was conducted to investigate whether design fixation and distraction can be reduced when the amount of content reviewed in a patent is reduced before ideation, and it was found that: (a) more fixation and disturbance were observed when more information from patents was presented and (b) less fixation and no change in disturbance were reported when less information was shown (Koh, 2020a, b). Alstott et al. (2016) represented the space of technologies of US patents issued during 1976–2010 as a network (Fig. 13.1). This network comprised of 121 nodes,

214

V. Srinivasan et al.

where each node represented a class of technology in the International Patent Classification system and so contained patents relevant to the technology class. The size of a node was proportional to the number of patents in it. The position of a node in the network is decided based on its knowledge similarity or proximity to the other nodes. This similarity or proximity was calculated using citations of patents, the Jaccard index (Jaccard, 1901), cosine similarity (Jaffe, 1986), co-occurrence (Teece et al., 1994), etc. In Fig. 13.1, the Jaccard index is used and is recommended as the representative among all metrics (Yan & Luo, 2016). The Jaccard index is defined as the ratio of the number of common references of patents between two classes of technologies to the total number of distinct references of patents in the two classes. When there are more common references of patents in two classes of technologies, the index is high, and this indicates high knowledge similarity or proximity between the classes. In the technology map shown in Fig. 13.1, the knowledge proximity or similarity between two technology classes is proportional to the thickness of lines connecting them and only the strongest links (i.e., 120) that join all the nodes are shown.

Fig. 13.1 Screenshot of 121 technology classes

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

215

Effects of Patent-Based Analogies on Ideation In design research, several aspects of analogical design have been researched and explored. Further, multiple researchers have investigated the use of design-byanalogy during ideation. However, in this section, only literature that uses patents for stimulation in analogical design is reviewed. A computer-based tool for assessing the similarities of functional and surface contents of analogies was developed by Fu et al. (2013a). This can be used for identifying patents located at different analogical distances and these patents can then be used for stimulation during ideation. Based on the similarities of functional and surface contents, the tool developed networks of patents. Within these networks, based on functional and surface similarities, multiple sub-clusters were also created manually. An experiment was conducted to investigate the effects of stimulating from various analogical distances on the conceptualization performance (Fu et al., 2013b). 45 US patents were used as stimuli in this experiment and were categorized into “near” or “far” analogical distance with the assistance of a Bayesian-based algorithm and latent semantic analysis. The researchers reported that stimulation using far analogies helped develop concepts of lower novelty and quality in comparison to concepts generated by stimulation using near or no analogies. To systematically search and identify function-based analogies from the database of US patents, Murphy et al. (2014) developed and tested a methodology, which comprised of (a) identify functions’ vocabulary by processing patents, (b) define a function-set using 8 primary, 74 secondary, and 1618 correspondent functions, respectively, (c) index patents with the function-set to create a representation of vector of patents database, (d) develop tools to query and estimate patents’ relevance, and (e) retrieve and display relevant patents to a query. The methodology was evaluated to check whether it can help identify analogies that are similar function-wise. A methodology for assisting engineering designers in searching and identifying analogies that are functionally relevant from the database of US patents was proposed and tested (Fu et al., 2014). The methodology was tested for its effectiveness to help develop solutions that are more novel and higher in quantity than solutions developed without any given external analogies. The purpose of the methodology was to assist the search and identification of function-based analogies, and then, to use them for developing creative concepts. To test the effectiveness of the methodology an experiment was conducted. Within this experiment, the control group was instructed to develop solutions without any external analogies, and the experimental groups used analogies to support varying numbers of functions. The experimental group (where all the functions were supported) developed solutions that had higher novelty than the solutions developed by the control group. Song et al. (2017) investigated from which field among home, near or far in patent databases do designers identify analogies for ideation. The knowledge similarity of a source domain to a problem domain was used to decide whether the source domain belonged to the home, near, or far fields. They also studied the implications

216

V. Srinivasan et al.

of sourcing analogies from various fields on design outcomes. Designers identified most patents from the near field, and consequently, these stimulated most concepts. Patents from near-field stimulated concepts of higher novelty than concepts stimulated from home or far-field domains, and patents from a combination of home- and far-field stimulated concepts of higher quality than concepts stimulated from other field domains. Luo et al. (2021) developed a knowledge-based expert system that can provide design stimuli based on knowledge distance, from a database of patents spanning multiple fields of engineering and technology. The expert system is based on a digitalized and interactive total technology map which is a network of various technology classes that together encapsulate the patent classification system. This expert system was validated using two case studies and was found to be effective in identifying different forms of stimuli, supporting various ideation processes (combination, analogy, etc.), and answering various design innovation questions (open-ended, specific problems, etc.). Sarica et al. (2021) developed a methodology based on Technology Semantic Network which is a semantic network database, to stimulate idea generation in design. The methodology guides the designers to “white spaces” around a domain based on the semantic distance in the Technology Semantic Network, to create new solutions for a design problem. The effectiveness of this methodology was demonstrated using a case study of idea generation for concepts of flying cars.

Effects of Analogical Distance on Ideation According to the Conceptual Leap hypothesis stimuli in domains that are far from the domain containing the problem provide the best opportunities for novelty due to surface dissimilarities (Gentner & Markman, 1997; Ward, 1998). Therefore, far domains become the principal source for developing novel outcomes. Notwithstanding the existence of some anecdotal evidence to support the hypothesis, empirical validations did not reveal consistent results across different studies. It was reported by Chan et al. (2011) that far sources stimuli helped generate concepts of higher novelty and quality than near sources stimuli. However, they argued that the most creative solutions are likely to be developed from stimuli that are located near than far to a problem due to better relevance and perception to the problem. No differential benefits between far and near sources were reported by Wilson et al. (2010). Stimuli from near sources or “middle ground” helped generate solutions of higher “maximum novelty” than stimuli from far sources (Fu et al., 2013b). Fu et al. also compared the “average novelty” of solutions with stimuli from near and far sources and found no significant differences. They also found when concepts are generated with stimulation from near sources as compared to far sources, “mean quality” and “maximum quality” of concepts are higher. Based on these, Fu et al. concluded: (a) stimuli from “middle ground” to be more beneficial for developing creative solutions and (b) stimuli from near domains and “middle ground” were

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

217

perceived to be more relevant to a design problem than the stimuli from far domains Further, Fu et al. concluded the following: (a) it is difficult to compare the effects of analogical distance from across various studies due to different metrics used to assess distance and (b) it is difficult to characterize stimuli and their analogical distances across different studies due to the relativity of the terms ‘near’ and ‘far’ in these studies.

Metrics for Assessing the Performance of Ideation Multiple researchers developed and used various metrics (such as novelty, similarity, usefulness, feasibility, fluency, quality, quantity, variety, usefulness, etc.) to assess the goodness of ideation (McAdams & Wood, 2002; Oman et al., 2013; Sarkar & Chakrabarti, 2011; Shah et al., 2003; Srinivasan & Chakrabarti, 2010, 2011). For assessing ideation effectiveness, Shah et al. (2003) proposed novelty, quantity, variety, and quality. Quantity is the total number of generated ideas; quality is a measure of the feasibility of ideas and how well the identified needs and requirements are fulfilled; variety is an extent of the exploration of a solution space and novelty is a measure of unexpectedness or unusualness in comparison to ideas that solve the same problem. Similarity between two functions was measured in terms of the angle between the two vectors representing these functions (McAdams & Wood, 2002). Design creativity was proposed as a function of novelty and usefulness (Sarkar & Chakrabarti, 2011). Novelty was defined in terms of newness of designs and usefulness through products’ use and value. Quantity, variety, fluency, and novelty were used as metrics to evaluate a framework for design for variety and novelty (Srinivasan & Chakrabarti, 2011). To assess the creativity of concepts, Oman et al. (2013) proposed comparative creative assessment (CCA) and multi-point creative assessment (MPCA). CCA is rated based on the uniqueness of an idea within a solution with respect to the pool of ideas in the alternative solutions. MPCA is rated using adjective pairs by a jury.

Research Gaps and Questions Patents are used as stimuli to develop analogies, and consequently, support ideation. There are some gaps in the existing work which are described below. • Notwithstanding the benefits of using patents as stimuli, most studies except (Song et al. 2017) and (Luo et al., 2021), use only a small sample of patents. These small samples may not be indicative of the size and diversity of patents that span the space of technologies, and therefore, offer limited support to ideation. • Existing studies which use patents as stimuli for creating analogies do not investigate the comparative benefits or limitations of scenarios without and with patents.

218

V. Srinivasan et al.

It is expected that such a study will reveal the pros and cons of patents as stimuli in supporting ideation. • The consequences of using patents for stimulation, which are located at various analogical distances on the performance of conceptualization have been mixed and no clear consensus has emerged. These could be due to using different metrics to estimate both analogical distance and ideation performance, using different stimuli with no commonality across studies, using different domains of stimuli, etc. • Prior work on analogical design used patents that are analogous to the given problem in terms of function, behavior, or structure. None of the researchers used knowledge similarity or proximity between domains as a metric of analogical distance. • Although in analogical design multiple ideation metrics have been used, seldom has quality been used as a metric, notwithstanding its value, to assess the effects of analogies or analogical distance when patents are used as stimuli. Based on these gaps, the following research questions are formulated: a. What is the effect of stimulation using patents on the (i) quantity, (ii) novelty, and (iii) quality of concepts generated? b. What is the effect of stimulation using patents which are at various analogical distances from the problem domain, on the (i) novelty and (ii) quality of concepts?

Research Methodology This section describes the research methodology used to investigate the research questions under focus. This section is sub-divided into Experiment and Data Analysis.

Experiment A design experiment is devised to facilitate data collection to allow investigation of the research questions. Data from an exercise on conceptualization in the 30.007 Engineering Design Innovation course offered at the Pillar of Engineering Product Development is used. The second- and third-year students of the pillar attend this course. For this study, the conceptualization part of a design project which is a part of this course is used. In this course, 21 teams are constituted of 105 students, where each team comprises 4–6 members. In this design project, all teams must design, prototype, and demonstrate their prototype for the chosen problem. In this course, the students are taught a stage-gate design process that involves: defining problems, identifying requirements, generating concepts, evaluating concepts, modifying concepts, selecting a concept, and developing and demonstrating a functional

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

219

prototype. During the concept generation phase, they generate multiple, alternative concepts. Each concept comprises multiple sub-functions and sub-systems. All the phases, except concept generation, are carried out in teams. In this study, data of concepts and the various forms of stimuli used for generating concepts are used. Two sets of patents are created to be given to the designers for generating concepts. The first set called “Most Cited” constitutes 121 patents, one each (most cited) from the 121 classes of technologies. The value of a patent depends on its forward citation count, and consequently, the Most Cited set becomes significant (Hall et al., 2005; Trajtenberg, 1990). The second set called “Random” constitutes 121 patents, one patent randomly selected from classes of technologies. Within the various classes of technologies, the patents in both sets are at different analogical distances from the spherical robots domains. A63 and B62, which, respectively, span the domains of Sports & Amusement and Land Vehicles, are the two classes of technologies that constitute the home domains of the spherical robots within the technology space of patents. These classes contain the most number of patents relevant to spherical robots. A63 has patents of several amusement devices and toys, in the form of spherical robots. B62 has patents of technologies pertaining to the functioning of robots. Since the broad problem centers around designing spherical robots, A63 and B62 are considered as the design problem domains. The patents in the two sets (Most Cited & Random) are located at various distances from the domains of the design problem. Within each design team, one-third of the members are provided no stimuli, another one-third are given 121 patents from the Most Cited set, and the rest of the one-third are given 121 patents from the Random set. Initially, the students are given only the title, abstract, and images from patents. If the students find any relevance and inspiration from this information for their problem, they are expected to peruse the given patents in more detail. All the students are allowed to access the internet, literature, etc., for additional stimulation. The students are informed to generate concepts that are novel and functional. During concept generation, the designers are asked to sketch concepts, annotate them, and describe how these concepts. They are also instructed to record the patents used for stimulation, other resources used for stimulation, and how the used stimuli are transformed into solutions. All teams are instructed to generate concepts in 1 week. All the concepts generated in this ideation exercise are analyzed to investigate the research questions. For the first research question, all the concepts generated and associated information of stimuli used during concept generation are analyzed. For the second research question, only the concepts generated with external stimulation and associated information of stimuli are analyzed.

Data Analysis The concepts’ novelty and quality are evaluated as follows. As an expert on robots, the fifth author here assessed the concepts’ novelty using a 0–3 scale, where the

220

V. Srinivasan et al.

whole numbers correspond to no, low, medium, and high novelty. This expert was not informed of the aim of this study prior to the novelty rating. For this rating, he was provided the conceptual sketches with annotations and description of how they work, without any information on whether or not the concepts were generated with patents. A solution’s quality is a measure of its requirements satisfaction. The function, working principles, and structures of the generated concepts are considered in the assessment of quality as follows: Q = 0.5 × f + 0.3 × w + 0.2 × s

(13.1)

where Q represents the quality of a concept, f the degree of satisfaction of the requirements by the functions in the concept, w the degree of satisfaction of the identified functions by the working principles, and s the degree of satisfaction of the working principles by the structure in the concept. 0.5, 0.3, and 0.2 are used as weightings for the function, working principle, and structure in the equation. The first author rated f, w, and s using a scale of 0–2 to denote no, partial, and complete satisfaction. A reliability test of inter-rater is conducted using the second and third authors, for the 20 concepts. After analyzing and reconciling differences with two rounds of iterations, Cohen’s Kappa ratio of 0.86 is found. The quality of the remaining concepts is assessed using the learning from these iterations. The frequency distribution of quality of concepts is categorized into low-, medium-, and high-quality depending on the quality values (Q < 1.2, 1.2 < Q < 1.7, Q > 1.7). A patent can be tagged with more than one class of technology in the IPC system. For instance, US3009235A, a patent on Separable Fastening Device, is tagged with the following classes of technologies: A44 (Haberdashery & Jewellery), B65 (Filamentary Material Handling) and D03 (Weaving). The analogical distance of a class of technology to A63 and B62, which are the problem domains, is the weighted average of the analogical distance of this class of technology to each of the problem domains. Weightings of 0.75 and 0.25 are, respectively, used for A63 and B62; these weightings are derived based on the proportion of patents that are relevant to spherical robots in A63 and B62. The classes of technology tagged in the patents that are used for stimulation are analyzed. The frequency distributions of the classes of technology in patents to generate concepts of various grades of novelty and quality are studied. The classes of technology tagged in these patents are located at different analogical distances from the problem domains.

Results 226 distinct concepts are generated by 105 students, where each student generated at least 1 concept. These concepts are at systemic level, and so, consist of multiple sub-functions and sub-systems. For example, a concept for mobile surveillance is

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

221

mobile (i.e., can accelerate, decelerate and steer), and can also record, store, receive, and transmit data.

Effects of Stimulation on Quantity, Novelty, and Quality of Concepts Within 226 concepts, the generation of 138 concepts is stimulated by patents and the generation of the remaining 88 concepts is stimulated without any patent. Since all the students have access to other resources, various kinds of stimulation are identified: (a) without any stimulus, (b) with patents, (c) with other resources, and (d) with patents and other resources. The generation of 23 concepts is stimulated with patents only; the generation of 10 concepts is stimulated without any external stimulus; the generation of more than half the total number of concepts, i.e., 115 concepts is stimulated using both patents and other resources and the generation of 78 concepts is stimulated with other resources only. Apart from the patents in the Most Cited and Random sets, the students also identify additional patents for further stimulation; this comprises the Own set. Eighty-seven concepts (~63%) are generated by stimulation with patents in the Own set, either individually (67 concepts) or with patents in other sets (20 concepts). Only 27 concepts and 24 concepts are generated by stimulation with patents from the Most Cited and Random sets, respectively. The average novelty of concepts which are generated by stimulation without and with patents is shown in Fig. 13.2. No significant difference in the average novelty is observed (2-tail t-test: t = −1.03, p = 0.31). The average novelty of concepts which are generated using various kinds of stimulation is shown in Fig. 13.3. The average novelty of concepts which are generated without any stimulation is lower than the average novelty of concepts which are generated by stimulation with patents and other resources (2-tail t-test: t = −2.12, p = 0.04), other resources only (2-tail t-test: t = −2.08, p = 0.04), and patents only (1-tail t-test: t = −1.42; p = 0.08, respectively). The average quality of concepts which are generated by stimulation without and with patents is shown in Fig. 13.4. It is found that stimulation with patents as compared to stimulation without patents helps improve quality of concepts (2-tail t-test: t = −4.61, p < 0.00001). The average quality of concepts which are generated by various kinds of stimulation is shown in Fig. 13.5. The stimulation of various kinds also yields significant differences in quality (ANOVA: F-ratio = 9.00, p < 0.01).

222

Fig. 13.2 Effect of stimulation without and with patents on novelty

Fig. 13.3 Effect of various kinds of stimulation on novelty

V. Srinivasan et al.

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

223

Fig. 13.4 Effect of stimulation without and with patents on quality

Fig. 13.5 Effect of various kinds of stimulation on quality

Effects of Analogical Distance on Novelty and Quality of Concepts To analyze the effects in this segment, only those concepts which are generated by stimulation using patents and the associated patents are considered.

224

V. Srinivasan et al.

Figure 13.6 represents the frequency of patents for stimulation to generate concepts of multiple scales of novelty and quality. The number of patents used in stimulation to generate concepts of medium novelty and medium quality is more than the other categories. The distribution of patents across the 121 classes of technology, used in stimulation to generate 138 concepts, is shown in Figs. 13.7 and 13.8; the 121 classes are split into two images for clarity. The technology classes are arranged in ascending order of analogical distance to the problem domains (A63 and B62). These images reveal that more patents from the classes of technology which are located closer to the problem domains are used as stimuli than those that are farther. Figure 13.9 shows the variation in knowledge proximity of classes of technology in patents used to A63 and B62, with novelty of concepts for multiple scales of quality. For low- and high-quality concepts, the concepts of increasing scales of novelty are generated by stimulation with classes of technology of decreasing proximity. For medium-quality concepts, the proximity of the classes of technology to the problem domains first increases and then decreases. In low-quality concepts, the differences in proximity are significant (One-way ANOVA: F = 5.662, p = 0.002, df = 3). Overall, it is found that patents from domains that are far than near in analogical distance help stimulate concepts of higher novelty. Figure 13.10 shows the variation in knowledge proximity of classes of technology in patents used to A63 and B62, with quality of concepts for multiple scales of novelty. For medium- and high-novelty concepts, the concepts of increasing scales of quality are generated using patents in classes of technology with increasing proximity. For no-novelty concepts, the proximity decreases and then increases as the quality

Fig. 13.6 Distribution of patents for stimulating concepts of various scales of novelty and quality

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

225

Fig. 13.7 Distribution of classes of technology of patents used for stimulation—1/2

Fig. 13.8 Distribution of classes of technology of patents used for stimulation—2/2

Fig. 13.9 Variation in average proximity (similarity) with novelty of concepts for various grades of quality

226

V. Srinivasan et al.

Fig. 13.10 Variation in average proximity (similarity) with quality of concepts for various grades of novelty

of concepts increases. But in low-novelty concepts, the proximity increases and then decreases as the quality increases. For no-novelty concepts, the differences in proximity are significant (One-way ANOVA: F = 6.086, p = 0.003, df = 2). In summary, it is found that patents from domains that are near than far in analogical distance help stimulate concepts of higher quality.

Discussion This research investigates two questions (see Section “Research Gaps and Questions”) and the major observations are as follows: (a) more number of concepts are developed by stimulation with patents than without them, (b) among concepts developed by stimulations of various kinds, most concepts are generated when stimulation is with combined patents and other resources, (c) the novelty of concepts developed when stimulated with patents, other resources, or their combination, is higher than the novelty of concepts developed without any stimulation, (d) the quality of concepts generated when stimulated with patents is higher than the quality of concepts generated when stimulated without patents, and (e) the quality of concepts developed when stimulated with patents, other resources, or their combination is higher than the quality of concepts developed without any stimulation, (f) patents which are located near than far in analogical distance helps stimulate concepts of higher quality, and

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

227

(g) patents which are located far than near in analogical distance helps stimulate concepts of higher novelty. These findings indicate the efficacy in the usage of patents to stimulate creative generation of concepts. An interesting observation has been the positive effect of stimulation using both patents and associated other resources on the concepts’ novelty, quantity, and quality, in comparison to stimulation using either patents or other resources. Information in other resources supplement the information in patents. This probably allows better assimilation of the abstruse content in patents, leading to better stimulation. Further, in concepts generated by stimulation with patents or with patents and other resources, it is found that: (a) concepts’ quality increases with a decrease in analogical distance and (b) concepts’ novelty increases with an increase in analogical distance. Design creativity is a combination of both novelty and quality (Sarkar & Chakrabarti, 2011). Therefore, to generate creative concepts with patents as stimuli, designers must choose a combination of patents that are located both close and far from the problem domain. This research has been distinct from existing research in the following ways. Several researchers (Chan et al., 2011; Fu et al., 2014; Murphy et al., 2014, etc.) studied the consequences of using patents for the stimulation of ideation and reported some benefits. However, these studies do not undertake any comparative investigation of stimulation: (a) with and without patents or (b) using various kinds of stimulation. The design of the study permits undertaking these comparisons and reports several benefits due to this undertaking. Further, existing studies use only small samples of patents for stimulation. These samples may not be representative of the diversity of the technology space of patents for stimulation and the resulting potential for aiding creativity. But, the experimental setup in this study uses a larger sample of patents and is also closer to the real-world scenario where designers have access to multiple sources for stimulation. In addition, all earlier studies (except Song et al. (2017)) used a controlled experiment with a limited number of designers. But the setting in this research used a setup that is closer to real world by instructing the designers to choose relevant patents for stimulation from any domain of the technology space and allowed them to access other resources. The time allotted to generate concepts is another parameter. The setup in the earlier studies involved controlled conditions. This allowed easier observations and tracking of the findings. However, in this study, the setup involved was semi-controlled, and was carried out over longer time. Existing research uses small samples of stimuli to investigate the effects of analogical distance on performances in conceptualization. But, in this study, 121 patents from the classes of technology encompassing the space of technology, are provided in two distinct sets. Further, the subjects are also given the freedom to search and identify patents on their own.

228

V. Srinivasan et al.

Conclusions The efficacy of using patents for stimulation in the creative generation of concepts was studied by investigating: (a) the effects of using patents for stimulation on the generated concepts’ novelty, quantity, and quality and (b) the effects of analogical distance on the generated concepts’ novelty and quality. Both these investigations revealed positive findings with the use of patents, without or with other resources (such as Wikipedia, YouTube, etc.) in stimulation for generating concepts. The findings from this research have implications on ideation methods and tools that use patents as stimuli to improve quantity, novelty, and quality of concepts.

References Alstott, J., Triulzi, G., Yan, B., & Luo, J. (2016). Mapping technology space by normalizing patent networks. Scientometrics, 1–37. https://doi.org/10.1007/s11192-016-2107-y. Altshuller, G. S. (1999). The innovation algorithm: TRIZ, systematic innovation and technical creativity. Technical Innovation Center, Inc. Chan, J., Dow, S. P., & Schunn, C. D. (2015). Do the best design ideas (really) come from conceptually distant sources of inspiration? Design Studies, 36, 31–58. https://doi.org/10.1016/j.destud. 2014.08.001 Chan, J., Fu, K., Schunn, C., Cagan, J., Wood, K., & Kotovsky, K. (2011). On the benefits and pitfalls of analogies for innovative design: Ideation performance based on analogical distance, commonness, and modality of examples. Journal of Mechanical Design, 133(8), 81004. Fantoni, G., Apreda, R., Dell’Orletta, F., & Monge, M. (2013). Automatic extraction of functionbehaviour-state information from patents. Advanced Engineering Informatics, 27(3), 317–334. French, M. (1988). Conceptual design for engineers (3rd ed.). Springer. Fu, K., Cagan, J., Kotovsky, K., & Wood, K. (2013a). Discovering structure in design databases through functional and surface based mapping. Journal of Mechanical Design, 135(3), 31006. Fu, K., Chan, J., Cagan, J., Kotovsky, K., Schunn, C., & Wood, K. (2013b). The meaning of ‘near’ and ‘far’: The impact of structuring design databases and the effect of distance of analogy on design output. Journal of Mechanical Design, 135(2), 21007. Fu, K., Murphy, J., Yang, M., Otto, K., Jensen, D., & Wood, K. (2014). Design-by-analogy: Experimental evaluation of a functional analogy search methodology for concept generation improvement. Research in Engineering Design, 26(1), 77–95. Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity. American Psychologist, 52(1), 45–56. Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. The RAND Journal of Economics, 36(1): 16–38. http://www.jstor.org/stable/1593752. Jaccard, P. (1901). Distribution de La Flore Alpine Dans Le Bassin Des Dranses et Dans Quelques Régions Voisines. Bulletin De La Societe Vaudoise Des Sciences Naturelles, 37, 241–272. Jaffe, A. B. (1986). Technological opportunity and spillovers of R & D: Evidence from firms’ patents, profits, and market value. The American Economic Review, 76(5), 984–1001. Koh, E. C. Y. (2013). Engineering design and intellectual property: Where do they meet. Research in Engineering Design, 24(4), 325–329. https://doi.org/10.1007/s00163-013-0153-5 Koh, E. C. Y. (2020a). Read the full patent or just the claims? Mitigating design fixation and design distraction when reviewing patent documents. Design Studies, 68, 34–57. https://doi.org/10. 1016/j.destud.2020.02.001

13 Do Analogies and Analogical Distance Influence Ideation Outcomes …

229

Koh, E. C. Y. (2020b). Read the full patent or just the claims? Mitigating design fixation and design distraction when reviewing patent documents. Design Studies, 68, 34–57. https://doi.org/10. 1016/j.destud.2020.02.001 Koh, E. C. Y., & De Lessio, M. P. (2018). Fixation and distraction in creative design: The repercussions of reviewing patent documents to avoid infringement. Research in Engineering Design, 29(3), 351–366. https://doi.org/10.1007/s00163-018-0290-y Luo, J., Sarica, S., & Wood, K. L. (2021). Guiding data-driven design ideation by knowledge distance. Knowledge-Based Systems, 218, 106873. https://doi.org/10.1016/j.knosys.2021. 106873 McAdams, D. A., & Wood, K. L. (2002). A quantitative similarity metric for design-by-analogy. Journal of Mechanical Design, 124(2), 173–182. https://doi.org/10.1115/1.1475317 Murphy, J., Fu, K., Otto, K., Yang, M., Jensen, D., & Wood, K. (2014). Function based designby-analogy: A functional vector approach to analogical search. Journal of Mechanical Design, 136(10), 1–16. Oman, S. K., Tumer, I. Y., Wood, K., & Seepersad, C. (2013). A comparison of creativity and innovation metrics and sample validation through in-class design projects. Research in Engineering Design, 24(1), 65–92. https://doi.org/10.1007/s00163-012-0138-9 Pahl, G., Beitz, W., Feldhusen, J., & Grote, K. H. (2007). Engineering design: A systematic approach. Springer. Sarica, S., Song, B., Luo, J., & Wood, K. L. (2021). Idea generation with Technology Semantic Network. AIEDAM, 1–19. https://doi.org/10.1017/S0890060421000020. Sarkar, P., & Chakrabarti, A. (2011). Assessing design creativity. Design Studies, 32(4), 348–383. https://doi.org/10.1016/j.destud.2011.01.002 Shah, J., Vargas-Hernandez, N., & Smith, S. (2003). Metrics for measuring ideation effectiveness. Design Studies, 24(2), 111–134. https://doi.org/10.1016/S0142-694X(02)00034-0 Song, B., Srinivasan, V., & Luo, J. (2017). Patent stimuli search and its influence on ideation outcomes. Design Science, 3, 1–25. https://doi.org/10.1017/dsj.2017.27 Srinivasan, V., & Chakrabarti, A. (2011). An empirical evaluation of a framework for design for variety and novelty. In S. J. Culley, B. J. Hicks, T. C. McAloone, T. J. Howard, & P. J. Clarkson (Eds.), Proceedings of the 18th International Conference on Engineering Design (ICED11)— Impacting Society Through Engineering Design, Copenhagen (pp. 334–343). Srinivasan, V., & Chakrabarti, A. (2010). Investigating novelty-outcome relationships in engineering design. AI EDAM, 24(2), 161–178. https://doi.org/10.1017/S089006041000003X Srinivasan, V., Chakrabarti, A., & Lindemann, U. (2015). An empirical understanding of use of internal analogies in conceptual design. AI EDAM, 29(2), 147–160. https://doi.org/10.1017/S08 90060415000037 Teece, D. J., Rumelt, R., Dosi, G., & Winter, S. (1994). Understanding corporate coherence. Theory and evidence. Journal of Economic Behavior and Organiation, 23(1), 1–30. Trajtenberg, M. (1990). A penny for your quotes: Patent citations and the value of innovations. The Rand Journal of Economics, 21(1), 172–187. https://doi.org/10.2307/2555502 Ward, T. B. (1998). Analogical distance and purpose in creative thought: mental leaps versus mental hops. In K. J. Holyoak, D. Gentner, & B. N. Kokinov (Eds.), Advances in analogy research: Integration of theory and data from the cognitive, computational, and neural sciences (pp. 221–230). New Bulgarian University. Wilson, J. O., Rosen, D., Nelson, B. A., & Yen, J. (2010). The effects of biological examples in idea generation. Design Studies, 31(2), 169–186. https://doi.org/10.1016/j.destud.2009.10.003 Yan, B., & Luo, J. (2016). Measuring technological distance for patent mapping. Journal of the Association for Information Science and Technology, 68(2). https://doi.org/10.1002/asi.23664.

V. Srinivasan is an Assistant Professor in the Department of Design at the Indian Institute of Technology Delhi (IIT Delhi). His academic interests are in the areas of Design Creativity and Innovation, Design Theory and Methodology, AI in Design, Virtual Reality and New Product

230

V. Srinivasan et al.

Development. He has a Ph.D. from the Indian Institute of Science in Bangalore and a Bachelors in Mechanical Engineering from the University of Madras. He is one of the Associate Editors of the International Journal of Design Creativity and Innovation, and a Co-Chair of the Special Interest Group on Design Creativity of the Design Society based in the UK.

Part VII

Critical Considerations

Chapter 14

Humiliation and Technology: Dilemmas and Challenges for State, Civil Society, and Industry Yashpal Jogdand

Abstract Despite its relevance to understanding toxic human behavior, the phenomenon of humiliation remains poorly understood. Humiliation constitutes an attack on human dignity. It is an important psychological construct rooted in complex power relations involving individuals and social groups. Attending to interactions between humiliation and technology should allow us to develop policies that protect human dignity in today’s technology-mediated world. This chapter reviews scientific research on humiliation and proposes a distinctive victim-centered, agentic, multi-level conceptualization. This conceptualization is applied to examine three issues: fraping, caste atrocities, and militant Islamic terrorism. The analysis highlights the complex role of technology in creating and expanding humiliation. Technology empowers perpetrators by enlarging the scope and impact of humiliation. Paradoxically, technology empowers victims and bystanders by creating awareness and facilitating government and civic intervention. Finally, the chapter highlights the crucial role of the state, civil society, and industry. Recommendations are made for eliminating technology-facilitated humiliation, including acknowledgment of victimhood, appropriate control and deletion of digital records, and humiliation dynamic informed platform governance. Keywords Humiliation · Emotion · Social identity · Social media · Human–technology interaction · Toxicity

Author Note The author gratefully acknowledges the support of the Indian Council of Social Science Research (ICSSR Grant No. 02/94/SC/2019-2020/RP/Major). Y. Jogdand (B) Department of Humanities and Social Sciences, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_14

233

234

Y. Jogdand

Humiliation in Human–Technology Relations The Netflix series Black Mirror (2011–present) is a powerful attempt to capture the impact of digital media technologies on the human condition. The series is disturbing and revealing because it emphasizes humiliation as a central dynamic in human–technology relations. In the episode “The National Anthem,” the British Prime Minister is forced to have sexual intercourse with a pig in front of the whole population in a live broadcast to save the kidnapped Princess Susannah. While the prime minister is undergoing his humiliation, it is revealed that the princess had actually been released before the broadcast. Still, nobody noticed as people were indoors watching the live broadcast. The episodes “Nosedive,” “Shut Up and Dance,” and “Hated in the Nation" further showcase a consuming interest in online humiliation and how the Internet has enabled and accelerated public judgment and derision. The Black Mirror elegantly crafts a society where humiliation is seen less as a disgusting spectacle and more as a tool of social control and entertainment. Do we live in a society very different from the one depicted in Black Mirror? One might argue that many differences exist between fiction and reality and between Black Mirror and contemporary human society. However, if Black Mirror is indeed crazy fiction, why are our timelines filled with roast jokes, memes, and videos depicting and/or dissecting someone’s devaluation and embarrassment? While we are aware of intentional abuse and victimization on the Internet, our curiosity about others’ social faux pas is routinely seen as an acceptable form of online engagement. As a result, anyone can become a matter of joke or derision on the Internet. The Internet and reality T.V. could make an event of public humiliation go viral in a moment. Recall “The Fappening,” also known as Celebgate of 2014, involving a systematic attack centered on illegally acquired nudes of celebrities (most prominently Jennifer Lawrence) distributed and discussed via anonymous image-board 4chan and Reddit.com. Such events are not outliers. The humiliation of women on the platforms such as Reddit.com and 4chan is possible because their “assemblage of design, policies, and norms” encourages “toxic technocultures” and rewards anti-social behavior with more visibility (Massanari 2015, p. 336). Digital media technologies and platforms now make it easy to bully, harass or humiliate others. Twitter, for instance, provides a high level of anonymity where a user can create multiple accounts and abuse the platform’s targeted advertisement feature to harass and humiliate vulnerable people (Mortensen, 2016; Quodling, 2016). Dragiewicz et al. (2018) point out that platforms such as Facebook and Twitter are deeply rooted in American cultural ideals of freedom of expression, openness, and free-market capitalism. The libertarian ideals of these platforms often clash with the challenges of the inclusion and safety of vulnerable people. Notwithstanding its toxic implications for individuals and societies, there has been an increasing demand for humiliating content. The contemporary media industry has started producing what is called “humiliatainment” where others’ humiliation is our entertainment— our “guilty pleasure” (Kohm, 2009; Mendible, 2004). In addition, the coronavirus pandemic has led to a significant increase in online abuse and harassment (Bags,

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

235

2021). Ignoring these issues could potentially normalize the patterns of toxic behavior in today’s technologically mediated society, thereby developing an apathy toward victimized individuals and groups. Such apathy and victimization could have disastrous consequences for a community and a nation. Contemporary “toxic technocultures” have created many dilemmas and challenges for the state, civil society, and industry, which bear the responsibility to protect the dignity and well-being of people. There has been increasing attention toward technology-facilitated toxic behavior such as online harassment, cyberbullying, and various other forms of victimization (Davidson et al., 2011). Despite its relevance to understanding technology-facilitated toxicity, the phenomenon of humiliation remains poorly understood. Respect for fundamental human dignity has been a cornerstone of modern institutions, practices, and policies. Humiliation constitutes an attack on human dignity (Guru, 2009). It is an important psychological construct rooted in complex forms of power relations involving both individuals and social groups (Klein, 1991; McCauley, 2017). Attending to interactions between humiliation and technology allows us to develop mechanisms to protect human dignity in today’s technology-mediated world. The aim of this chapter, therefore, is to introduce the phenomenon of humiliation and shed light on various dilemmas and challenges created by interactions between humiliation and technology. We will first review the scientific research on humiliation and outline a distinctive conceptualization that can account for both individual and group-based humiliation in different cultural contexts. We will then use this conceptualization to understand the complex role of technology in humiliating events from different cultural contexts. In the final section, we will consider the principles and policies that the state, civil society, and industry could adopt to address various technology-facilitated forms of humiliation.

What is Humiliation? Self, Social Interactions, and Culture Self or identity is a cognitive representation of how one views oneself. However, this is not an isolated phenomenon but rather depends on how others relate to us and treat us: “There is no sense of ‘I’… without its correlative sense of you, or he, or they” (Cooley, 1922, p.150). People negotiate and validate self-conceptions in face-to-face interactions and establish “frames” within which to evaluate the meaning of interaction (Goffman, 1959). Social interactions, thus, make it possible for individuals to become conscious of who they are and develop their private (self) and public (social) image. Since we depend on others for validation and recognition, the discrepancy between how we think of ourselves (self-image) and how others think of us (social image) leads to various cognitive, affective, and behavioral outcomes (Barreto & Ellemers, 2003). Humiliation is an experience arising out of discrepancies between self-image and social image.

236

Y. Jogdand

Historically, humiliation (or at least the threat of it) has been an inescapable part of social interactions in human societies (Frevert, 2020). These discrepancies in self-image and social image are often part of the social systems and institutions in various societies and reflect various transitions and conflicts in them. In the Indian caste system, untouchability is perceived as humiliating because it is discordant with the self-image one would expect simply based on being human. This discrepancy explains discord and conflict within married couples, romantic partners, and friends on an interpersonal level. A beautiful tale of unrequited love, recall Gabriel Garcia Marquez’s Love in the Time of Cholera, is also a tale of humiliation of a lover whose love is not reciprocated by a potential romantic partner. Most of our social interactions are culture-bound. Culture denotes a societallevel pattern of our shared values and practices that shapes our beliefs, attitudes, and behavior in a given context. To use Goffman’s term, culture provides a “frame” that determines the meaning people derive from social interactions. As Berry (2013) argued, basic psychological processes such as attention, perception, learning, and categorization may be common to our species, but culture shapes their development and expression. In the same vein, humiliation may be found in all human interactions, but culture shapes what exactly people see as humiliating and how humiliation is expressed. For instance, Miller (1995) shows how humiliation was an artifact of honor-based cultures in Icelandic Sagas of the medieval period, where the preservation of one’s reputation and social image was more valuable than life. In one of the sagas, Eigil, a Viking warrior, seeks to kill a man for offering him an excessively valuable gift! While this may sound absurd, Miller goes on to argue that we do not live in a very different world. Miller is indeed not arguing that we are violent in our ways as the Vikings were, but his main attempt is to capture how concern for social image, honor, and dignity figure in the social discomforts of everyday life. These social discomforts are not universal but culture-bound. The nature and texture of social interactions might change with time, but the sensitivity to one’s self, social image, honor, and dignity remain important features of human societies. If Miller’s assertion also holds true in the contemporary technology-driven society that we inhabit, then humiliation remains more relevant now than ever.

Definition and Consequences The word humiliation comes from “humus” in Latin which means earth or being put down with your face into the dust. In Hindi, “beizzat,” “avmanana” are the words used to refer to humiliation. In Marathi, “maan khanadana,” “maan hani” are the words used to refer to humiliation. If we deconstruct these words, we find “maan” or “izzat,” i.e., dignity, honor, and self-respect, as the common denominator of every word and a certain damage or threat to it. Etymological deconstruction tells us that humiliation involves certain damage, denial, or threat to dignity, honor, and self-respect at most generic levels. Philosophers examine humiliation through normative concepts of

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

237

dignity, honor, and self-respect (Guru, 2009; Statman, 2000). Psychologists, on the other hand, primarily conceptualize humiliation as a self-conscious emotion that is distinct and particularly intense compared to shame and anger (Elshout et al., 2017). Humiliation is defined as “a deep dysphoric feeling associated with being or perceiving oneself as being unjustly degraded, ridiculed, or put down—in particular, one’s identity has been demeaned or devalued” (Hartling & Luchetta, 1999, p. 264). Fernandez et al. (2015) have refined the conceptualization of humiliation by using a cognitive appraisal approach (Moors et al., 2013). They show that humiliation arises from two core cognitive appraisals: (1) seeing oneself as a target of an unjust treatment inflicted by others and (2) simultaneously internalizing a devaluation of the self. Humiliation has been associated with various deleterious consequences for individuals and groups. On an individual level, humiliation is known to cause numerous psychological disorders like anxiety, social phobias (Beck et al., 1985), and depression (Brown et al., 1995). On the interpersonal level, humiliation has been identified as an important factor that can lead to marital discord and domestic violence (Vogel & Lazare, 1990), violent responses to school bullying (Elison & Harter, 2007) as well as interpersonal rejection (Leary et al., 2006). On the group level, humiliation has been linked to genocide and mass killings in various parts of the world. The critical events of the twentieth century, such as the Rwandan genocide (Lindner, 2001), the 9/11 terrorist attacks (Saurette, 2005), and the war in Gaza (Fattah & Fierke, 2009), have humiliation as a common element that fuelled the conflict and violence. Paradoxically, humiliation has also been linked to the loss of power, resulting in what Ginges and Atran (2008) call the inertia effect—a tendency toward inaction that suppresses violent action. In Nazi concentration camps and the Indian caste system, humiliation was used as a tool of social control and subjugation (Geetha, 2009; Silver et al., 1986).

Neural Basis Psychological research has consistently identified a fundamental human need to belong and highlighted the negative implications of social rejection and other forms of victimization (Baumeister & Leary, 1995; James, 1890). A growing interest in neural systems and the neurobiology of social behavior has paved the way for a better understanding of the link between neurobiology and social psychology. Researchers have started investigating the neural basis of phenomena such as social exclusion, rejection, and humiliation. Electrophysiological evidence indicates that humiliation is an intrinsically intense experience that mobilizes far more attention and cognitive resources than emotions such as shame, anger, and happiness (Otten & Jonas, 2014). In two separate studies, Otten and Jonas asked participants to read scenarios involving different emotions and imagine how they’d feel in the described scenarios. Participants had an EEG strapped to their scalps which read their brain activity. The brain activity was measured by two measures: a larger positive spike (known as

238

Y. Jogdand

the “late positive potential” or LPP) and evidence of “event-related desynchronization,” a marker of reduced activity in the alpha range. Both these measures are signs of greater cognitive processing and cortical activation. Imagining being humiliated resulted in higher LPPs and more event-related desynchronization than any other emotion. Based on these results, Otten and Jonas concluded that “humiliation is a particularly intense and cognitively demanding negative emotional experience that has far-reaching consequences for individuals and groups alike” (p. 11). Eisenberger’s (2012) research on neural bases of social pain can help us understand the dysphoric aspect of humiliation. Social pain is an emotional reaction to a humiliating situation, which has the same neurobiological basis as physical pain. Physical pain has two components: sensory and affective. The former—or the intensity, duration, and localization—is processed by the primary somatosensorial cortex (S1), secondary somatosensorial cortex (S2), and the posterior insula. The affective component—or the suffering and anxiety provoked by pain—is processed by the dorsal part of the anterior cingulate cortex (dACC) and the anterior insula. An initial study on social pain examined the reaction of a group of participants to an experimental task involving social exclusion. Participants were scanned with functional magnetic resonance while playing a virtual ball tossing game (cyberball; Williams & Jarvis, 2006). People were made to believe that they were playing with other people via Internet, but, in reality, were playing with the computer. Participants played a round in which other “virtual participants” included them in the game. Subsequently, participants played a round in which “virtual participants” excluded them from the game. Compared to the round in which participants were included in the game when participants were excluded, they were characterized by an increase in activity of the dACC and anterior insula, with results similar to those observed in a study on physical pain. Based on this research, Eisenberg (2012, p. 133) has argued for the paying greater attention to social pain such as humiliation as that can be equally damaging as physical pain: Although physical pain is typically regarded as more serious or objectively dis- tressing because it has a clear biologic basis, … social pain could be argued to be just as distressing because it activates the same underlying neural machinery. These findings encourage us to think more carefully about the consequences of social rejection.

The research on humiliation is still in its infancy and focuses mainly on individual and interpersonal relations. However, humiliation involving social groups and nations is a salient concern around the world. The research on social cognition and group processes provides a useful lens to understand humiliation in intergroup relations.

Social Identity and Group-Based Humiliation The social identity approach (SIA) in social psychology (Reicher et al., 2010) has come to dominate the study of social cognition and group processes in recent years. SIA has provided insights into many cognitive and affective phenomena. The key

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

239

argument of the SIA is that humans can define their self on different levels of inclusiveness. One can define oneself in terms of a personal identity, “I” versus “you,” but it is also possible to define oneself as a member of a relevant social group, “us” versus “them,” in terms of a social identity. Both personal and social identifications are valid and meaningful ways of defining oneself and relating to others. Individuals often have access to a plethora of social identities. Different social identities become salient in different situations depending upon factors related to interaction between individual and social context. Social identities work as a conduit through which individuals make sense of their environment. Applying SIA to humiliation, there is a need to distinguish between personal and group-based humiliation. People humiliate or get humiliated as individuals and as group members. It is also important to note that humiliation studied at the individual level may not explain the humiliation at the group level. The adherence to individual and intra-individual processes in understanding group-level phenomenon has been widely criticized for inherent psychological reductionism (Fanon, 1967; Israel & Tajfel, 1972). In addition, decades of research on social identity processes have confirmed that people not only act as individuals but also as group members with common perceptions, identity, and goals (Brown, 2019; Hornsey, 2008). There is a distinctive social level of psychological processes that cannot be reduced to the individual (Oakes et al., 1994). These theoretical insights have led to a distinct conceptualization of emotions. The assumption that emotions are internal and personal reactions has been challenged. The social and cultural basis of emotions is now widely recognized (Parkinson, 1996; Barrett, 2006). However, we need to turn toward intergroup emotion theory (IET; Mackie et al., 2009) to understand how people experience emotions in social groups. Based on a combination of self-categorization theory (Turner et al., 1987) and cognitive appraisal theories of emotion (Moors et al., 2013), intergroup emotion theory proposes that self-categorization, i.e., shifting from seeing oneself as a unique individual to seeing oneself in terms of a salient group membership, has implication for emotional experience. Once people shift to a group-level self-category, they interpret and evaluate events and outcomes from the lens of that self-category. Sometimes emotions could also be experienced as a direct consequence of identifying with one’s group. The empirical research generated by deriving predictions from IET in the context of intergroup relations has successfully shown that when group membership is salient, people can experience emotions on the behalf of groups’ position or treatment, even if they had little or no actual experience of intergroup situations themselves. Various experimental studies have shown that humiliation could be vicariously experienced by witnessing an ingroup member being humiliated by an outgroup member. In two vignette-based experiments with university students in the U.K. and Dalits in India, the present researcher found that participants report humiliation when their social identity is devalued even when they are a witness and personally unaffected in the situation (Jogdand, 2015; Jogdand & Reicher, 2013). Similar results have been found in the studies that manipulated social rejection using cyberball paradigm (Williams & Jarvis, 2006) to induce feelings of humiliation (Veldhuis et al., 2014). In a more

240

Y. Jogdand

recent confirmation of the IET hypothesis of humiliation, researchers adopted a different mode of group devaluation focusing on an ingroup member being negatively stereotyped by an outgroup member and replicated the effect in three samples with different national and gender-based social identities (Vorster, Dumont, & Waldzus, 2021). As we will see, this has implications for technology-facilitated group-based humiliation in various contexts.

A Victim-Centered, Agentic, and Multi-level Approach Drawing upon a combination of social identity approach (Reicher et al., 2010; Tajfel & Turner, 1979; Turner et al., 1987) and emerging scholarship on caste and humiliation in South Asia that foregrounds Dalit (ex-untouchables) experience in Hindu society (Guru, 2009; Guru & Sarukkai, 2019), Jogdand et al. (2020) developed a distinct conceptualization of humiliation that is victim-centered, dynamic, and focused on agency and power relations. In this conceptualization, humiliation takes multiple forms ranging from the individual, interpersonal, group based to an institutional level. Humiliation needs to be understood by taking a victim’s perspective as the victim possesses the power over defining humiliation. Attempts of humiliation are distinguished from complete humiliation. Humiliation is conceptualized as a claim, which involves the cognitive appraisal of certain acts of victimization as humiliating, and the political act of communicating resentment to the perpetrator. The cognitive appraisal of humiliation cannot be taken for granted, as not all victims can possess a sense of value and rights and appraise its loss as humiliating. This is especially relevant to the non-Western contexts where hierarchy and dominance are valued and people’s relational universe is shaped by the social structure (Liu, 2015). There may be little awareness of value and rights in these cultural contexts, and the degradation and exclusion may become normalized and institutionalized in society. This is also true for the political use of humiliation. People will express humiliation in the contexts where equal rights and dignity are accorded to all and when such expressions of humiliation can be used to achieve goals in the context of certain power relations. Interestingly, the claims of humiliation can also be made by high power groups who perceive a threat to their group’s status. Overall, it is emphasized that humiliation is a relational, dynamic, culture-bound, context-sensitive, and multi-faceted construct that requires multiple levels of analysis. We now apply these insights to understand three cases of technology-facilitated forms of humiliation: Fraping, Caste Atrocities, and Militant Islamic terrorism.

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

241

Fraping and Online Identity In the contemporary era, the boundaries between online and offline contexts are fading. Digital media technologies are rapidly being integrated into everyday life and drastically shaping our social interactions and relationships (Baym, 2015). We care about our online social interactions as much as we care about face-to-face interactions. We value our online social image as it can build our reputation and consolidate our social status in the long run. Entities like social image and reputation are inherently fragile. Consider the phenomenon of “Fraping,” which involves “the unauthorized alteration of information on an individual’s (the victim’s) online social network site (SNS) profile by a third party (the ‘frapist’).” Moncur et al. (2016) qualitative exploration of fraping behavior among young adults in the U.K. revealed that fraping disrupts victims’ online identity and normal posting behavior. They note that fraping among this age group in the UK context is often done with the aim of amusement and is regarded as benign rather than malign activity. However, if we take a victim’s perspective, fraping, or any other form of interaction that disrupts one’s online identity could be potentially humiliating. There is, of course, a significant cultural difference that might dictate the cognitive appraisals underpinning humiliation. Fraping may be seen as mild teasing in individualistic cultural contexts, but it could be extremely humiliating and lead to deleterious consequences in collectivistic cultural contexts. This is because people in collectivistic cultural contexts value interdependence over independence (Markus & Kitayama, 1991) and are deeply affected by concerns about the social image (Rodriguez Mosquera, 2018). There are more insidious and far-reaching implications of causing disruptions to the identity of someone. Identities, as we discussed, can be personal or social. When social identities that signify one’s membership in a particular race, gender, caste, religion, or nation are targeted, it could become a more serious concern. Humiliation of a single group member could lead to humiliation of an entire group. Technology could play a critical role in widening the scope and impact of humiliation. Let us consider the contexts of the caste system and international terrorism to understand this more clearly.

Technology and Persistence of Caste in India The caste system in India is one of the most oppressive social orders existing in the world today. It is a system of graded inequality that divides people into pure and impure endogamous social groups called castes (Ambedkar, 1987). The people born into so-called impure castes are regarded as Untouchables and relegated to the lowest position in a complex system hierarchy with several arbitrary restrictions for their movement, choices, and opportunities. Despite several changes due to democratization and progressive social movements in India, the caste system still persists and dictates many cognitive, affective, and behavioral outcomes (Jogdand

242

Y. Jogdand

et al., 2016; Mahalingam, 2003; Natrajan, 2012). Caste atrocities against erstwhile untouchables, who now call themselves Dalits, have increased at an alarming rate in post-independent India, with perpetrators adapting new modalities of enforcing caste order (Teltumbde, 2011). Although there are crucial differences, caste atrocities bear a resemblance to spectacles of public lynching of African Americans in the Jim Crow South (Wood, 2011). Recent atrocities against Dalits highlight an increasing trend toward videography and disseminating humiliating episodes through social media and platforms like “WhatsApp.” Consider the recent Una and Nagaur atrocities: In the Una incident of 2016, four half-naked men from a Dalit leatherworking family were tied to a car, abused, and beaten with sticks and iron pipes while being paraded through a town. These men were skinning a dead cow when a cow protection vigilante group apprehended them for alleged cow slaughter. The vigilante group then stripped and brutally flogged the leather workers while parading them through the Una Town of Gujrat, India. This whole episode was recorded using mobile cameras and went viral on social media (ANI, 2016). In Rajasthan’s Nagaur District, a group of Rajput men allegedly stripped a Dalit youth after accusing him of stealing money, violated him with a screwdriver, and recorded the act on mobile phones. The video was then shared and went viral (Shekhawat, 2021). The role of digital technology and social media is critical in both the Una and Nagaur atrocities. Speaking to a journalist, a person accused in such a case said, “We had to make a video because these dedh chamars should always remember what it means to cross a line” (Shekhawat, 2021). It is important to note the underpinning social categorization in this statement. The pronoun “we” used by the accused perpetrator is not an arbitrary choice but reflects social identities (we versus “dedh chamars”), indicating boundaries between groups and meaning of group belonging (Reicher & Hopkins, 2001). The use of the term “dhed chamar” for the otherization of Dalits performs a particular function. “dhed chamar” is a legally prohibited caste slur that could remind Dalits of their historical humiliation and traditional low status in the social order. Digital technology and social media, in this sense, empower the upper caste perpetrators and help create a public spectacle of Dalit humiliation. As Wood (2011) has shown through a historical analysis of Jim Crow lynching in the United States, such spectacles work as a powerful means of establishing and affirming social order for the perpetrators, victims, and bystanders. From the perspective of victims, caste atrocities create social terror. The digital records of these events make it difficult to escape their impact and become part of the collective consciousness of Dalits. Even though a Dalit individual is not personally affected by the atrocities, its spectacle reinforces their caste identity and leads to an inescapable, vicarious feeling of humiliation. However, the positive role of technology should also be noted. Despite stringent laws like the SC/ST (Prevention of Atrocities) Act, there has been little success in prosecuting upper caste perpetrators in the cases of atrocities (Chakraborty et al., 2006). The perpetrators seem to act without fear of being brought to book. Digital technology and social media help create a record of these events and inform authorities (government, police), activists, and civil organizations. In the case of both Una and Naguar, several authorities,

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

243

activists, and civil organizations became aware of atrocities through social media and made timely interventions, including lodging FIRs against perpetrators.

Abu Ghraib and Militant Islamic Terrorism: The Role of Technology The American military scandal of Abu Ghraib during the Iraq war is an important case of interaction between humiliation and technology. In Abu Ghraib prison, U.S. soldiers humiliated Iraqi detainees to “soften up” before interrogation. A series of shocking photographs depicted American soldiers abusing, deliberately humiliating, and torturing scores of Iraqi men held in detention and under interrogation. The American soldiers appeared smiling broadly and taking enormous pleasure in various acts of humiliation. The photo of a prisoner, Abdou Hussain Saad Faleh, hooded and made to stand on a box while electric wires were attached to his fingers, toes, and genitals was one of the most memorable images to emerge in the Abu Ghraib scandal, eventually making it to the cover of “The Economist.” An American woman soldier is seen smiling at the camera in another image. She leans on top of a pile of naked Iraqi men forced to form themselves into a human pyramid. Without digital technology and networked media, the world would have never known the gross human rights violations at Abu Ghraib. These digital records of the humiliation of Iraqi (and Muslim) men had serious implications for international relations and global peace. U.S. global war on terror lost the goodwill and sympathy that flowed after the 9/11 attacks by Al-Qaeda. This led to an increase in global resistance to U.S. interests and policies and energized Islamic radicalism in different parts of the world (Saurette, 2006). What is important to note is the vicious cycle of humiliation and counterhumiliation between Islamic radicals and the United States. The main motive for the 9/11 attacks by Al-Qaeda was not to create fear and terror but to humiliate the United States. Bin Laden emphasized that the attackers “rubbed America’s nose in the dirt and dragged its pride through the mud….People discovered that it was possible to strike at America, that oppressive power, and that it was possible to humiliate it, to bring it to contempt, to defeat it” (Quoted in Saurette, 2005, p. 50). The humiliating narratives, such as the one created by the Abu Ghraib scandal, implicate broader categories of race, religion, nation, etc., and provide an important mobilization tool for non-state actors. Hafez’s (2007) study of Al Qaeda’s mobilization of suicide terrorism revealed a strategic use of humiliation along with numerous other instrumental, ideological, and religious arguments: “At the heart of the mobilizing narratives of insurgents is the theme of humiliation at the hands of callous and arrogant powers” (p. 99). We can find a similar use of humiliation in the narratives by militant Islamists in the Middle East to instigate support for restoring a transnational Muslim Ummah (nation/ community) damaged in the past by Western “crusaders” (Fattah & Fierke, 2009).

244

Y. Jogdand

The decentralized online communication system and the ease and accessibility of image-making tools and techniques empower non-state actors to incorporate humiliation to amplify the call to recruit followers for their cause. The so-called freedom fighters affiliated with the Islamic State of Iraq and Syria (ISIS) militant group are known for their efficient use of digital images and streaming of violent online viral videos through social media sites such as Twitter, Facebook, and YouTube to recruit new would-be members (Awan, 2017). While digital technology has the potential to highlight human rights violations, the digital records of an individual or group’s humiliation could also serve as a tool for hate mobilization, eventually leading to conflicts and violence. Across the issues of fraping, caste atrocities, and militant Islamic terrorism, we saw that technology plays a complex role in creating and expanding humiliation. Technology empowers perpetrators by enlarging the scope and impact of humiliation. Paradoxically, technology also empowers victims and bystanders by creating awareness for seeking government and civic intervention. As bystanders to the episodes of humiliation, the state, civil society, and industry bear the responsibility to protect human dignity. Their interventions are crucial for providing support to the victims and developing ways to eliminate societal humiliation. Let us consider some of the steps that these agencies can undertake.

Interventions by State, Civil Society, and Industry Given the deep roots in human social interactions shaped by culture, social structure, and power relations, humiliation is not likely to be automatically eliminated any time soon. State, civil society, and industry must think of improving their ability to understand and anticipate humiliation. At the very least, the state, civil society, and industry should make sure that they do not blindly reproduce humiliation. There are three main recommendations that could be made in this regard.

Acknowledge Victimhood Humiliation is not visible unless there are institutional indicators and criteria to identify humiliation. Many a time, victims do not make an attempt to communicate humiliation, fearing adverse consequences. It is vital for the state, civil society, and industry to provide a voice to victims and create policies and mechanisms that acknowledge humiliation in offline and online interactions. Failure to recognize humiliation among groups can have deleterious consequences. A substantial body of social psychological research suggests that when victimhood accords benefits, it can motivate a struggle for recognition among groups (De Guissmé & Licata, 2017; Honneth, 1996). The struggle for victimhood recognition (also called competitive victimhood) can induce prejudice and hostility against

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

245

other individuals and groups (Shnabel et al., 2013). Acknowledging that an individual or social group was humiliated can be seen as a way of recognizing its victimhood status. Several studies across multiple contexts and samples have shown that victimhood acknowledgment leads to less desire for revenge (David & Choi, 2009), improves attitudes toward other victims (Alarcón-Henríquez et al., 2010), and fosters well-being of victims and encourages willingness for reconciliation (Vollhardt et al., 2014).

Ensure Proper Control and Deletion of Digital Records of Humiliation We have discussed both the positive and negative implications of digital records of humiliation in the contexts of caste atrocities and Abu Ghraib scandal. The positive implications of digital records would be limited if the access to past records cannot be controlled. Given the nature of Internet, a more significant concern is that the digital records of humiliation do not get erased but become part of collective digital memory. The Internet consists of a neuron-like interconnected network of billions of computers. Once uploaded on the Internet, images, video, and text get copied onto servers and cached into the web search engine databases, making it possible to be downloaded by future users and reposted. The networked media, thus, allows multiple and perpetual points of access to humiliating digital content. Even when specific servers might deny uploading and/or downloading humiliating digital content, there are hundreds of other ways to track down the content on different servers and databases. The control and deletion of humiliating content on the web is almost impossible without strong policies and civic intervention (Abelson et al., 2008). The humiliated individuals and groups could potentially forever remain susceptible to stigma, shame, and trauma. The collective memories of humiliation can become a basis for “victim beliefs” or “victim consciousness,” leading to radicalization within the ingroup and could be used to legitimize violent action against outgroups (Vollhardt, 2012). State, civil society, and industry together will need to ensure proper control and deletion of digital records of humiliation. Of course, any such action should be taken into consultation with victimized parties.

Develop Humiliation Dynamic Informed Platform Governance It is vital that efforts for platform governance incorporate knowledge of humiliation. Platform governance involves shaping and regulating information and social environments. Dragiewicz et al., (2018, pp. 614–615) highlight that digital platforms have the ability to self-govern their information and social settings:

246

Y. Jogdand

The design of each platform’s front-end, user-facing features and affordances, as well as its back-end architectures and algorithms, shapes the possibilities and constraints of private and public communication on that platform, including the ability to spread and share abuse, to engage in creative counter-abuse tactics, or to report and block abusers.

The platform governance by the industry and other stakeholders should include an explicit commitment to identify humiliating interactions and make timely interventions. As Dragiewicz et al. pointed out, such a commitment is lacking among platforms as they are unable to anticipate and successfully resolve the value conflicts between a libertarian vision of free speech and the need for the inclusion and safety of vulnerable individuals and groups. Massanari (2017) has attributed the platform’s lack of commitment to its business models, which welcome online humiliation as it can trigger traffic and interaction among its users. It may, however, be possible to harness A.I. (Artificial Intelligence) programs to interpret and detect humiliation. This would require accurately identifying the humiliating content and the victims in humiliating interactions. Natural Language Processing (NLP) and Machine Learning (ML) could detect humiliation and identify victims. Indeed, the detection of humiliation is not purely a computer science or A.I. problem. Any such effort requires incorporating multiple levels of analysis that account for the context, culture, and situation specificities while defining the meaning of humiliation. As Sheth et al. (2021) have recently shown, an interdisciplinary, empirical model-based, context, and culture-sensitive approach to the detection of online toxicity is a way forward.

Conclusion We are yet to fully comprehend the implications of human–technology interactions that disrupt the social image, honor, and dignity of people. Humiliation could be a crucial conceptual lens to understand and intervene in toxicity causing these disruptions in online and offline settings. However, our conceptualization of humiliation need to move beyond its intra-psychic and interpersonal dimensions and include a focus on the interaction between social groups and culture on the one hand and the human mind on the other. As we saw in the context of fraping, caste atrocities, and militant Islamic terrorism, a victim-centered, agentic, and multi-level approach could be useful to understand human–technology interactions across different societies and cultures. While considering practical interventions for eliminating humiliation, the role of the state, civil society, and industry is absolutely crucial. If these agencies remained passive, humiliation could easily take an institutionalized form and normalize the patterns of toxic behavior in a community or nation. As we continue depending on digital technology and social media in our lives, humiliation is relevant now more than ever. Black Mirror may not be our reality yet; it is certainly our nightmare.

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

247

References Abelson, H., Ledeen, K., & Lewis, H. (2008). Blown to bits: Your life, liberty, and happiness after the digital explosion. Addison-Wesley. Alarcón-Henríquez, A., Licata, L., Leys, C., Van der Linden, N., Klein, O., & Mercy, A. (2010). Recognition of shared past sufferings, trust, and improving intergroup attitudes in Belgium. Revista De Psicología, 28(1), 81–110. ANI (2016, July 12). #WATCH Suspected cow leather smugglers thrashed by cow protection vigilantes in Somnath (Gujarat) (11.7.16) [Twitter Post]. https://twitter.com/ANI/status/752773597 333118976. Awan, I. (2017). Cyber-extremism: Isis and the power of social media. Society, 54(2), 138–149. Bags, M. (2021, November 15). Online hate speech rose 20% during pandemic: ’We’ve normalised it’. BBC News. https://www.bbc.com/news/newsbeat-59292509. Barreto, M., & Ellemers, N. (2003). The effects of being categorised: The interplay between internal and external social identities. European Review of Social Psychology, 14(1), 139–170. Baumeister, R. F., & Leary, M. R. (1995). The need to belong: Desire for interpersonal attachments as a fundamental human motivation. Psychological Bulletin, 117(3), 497. Baym, N. K. (2015). Personal connections in the digital age (2nd ed.). Polity Press. Beck, A., Emery, G., & Greenberg, R. (1985). Anxiety disorders and phobias: A cognitive perspective. Basic Book. Berry, J. W. (2013). Achieving a global psychology. Canadian Psychology/psychologie Canadienne, 54(1), 55. Brown, G. W., Harris, T. O., & Hepworth, C. (1995). Loss, humiliation and entrapment among women developing depression: A patient and non-patient comparison. Psychological Medicine, 25(01), 7–21. Brown, R. (2019). Henri Tajfel: Explorer of identity and difference. Routledge. Chakraborty, D., Babu, D. S., & Chakravorty, M. (2006). Atrocities on dalits: What the district level data say on society-state complicity. Economic and Political Weekly, 2478–2481. Cooley, C. H. (1922). Human nature and the social order (Rev). Scribner. David, R., & Choi, S. Y. (2009). Getting even or getting equal? Retributive desires and transitional justice. Political Psychology, 30(2), 161–192. Davidson, J., Grove-Hills, J., Bifulco, A., Gottschalk, P., Caretti, V., Pham, T., & Webster, S. (2011). Online abuse: Literature review and policy context. Project Report. European online grooming project. De Guissmé, L., & Licata, L. (2017). Competition over collective victimhood recognition: When perceived lack of recognition for past victimisation is associated with negative attitudes towards another victimised group. European Journal of Social Psychology, 47(2), 148–166. Dragiewicz, M., Burgess, J., Matamoros-Fernández, A., Salter, M., Suzor, N. P., Woodlock, D., & Harris, B. (2018). Technology facilitated coercive control: Domestic violence and the competing roles of digital media platforms. Feminist Media Studies, 18(4), 609–625. https://doi.org/10. 1080/14680777.2018.1447341 Eisenberger, N. I. (2012). The neural bases of social pain: Evidence for shared representations with physical pain. Psychosomatic Medicine, 74(2), 126–135. https://doi.org/10.1097/PSY.0b013e 3182464dd1 Elison, J., & Harter, S. (2007). Humiliation: Causes, correlates, and consequences. In J. L. Tracy, R. W. Robins, & J. P. Tagney (Eds.), The self-conscious emotions: Theory and research (pp. 310– 329). Guilford. Elshout, M., Nelissen, R. M. A., & van Beest, I. (2017). Conceptualising humiliation. Cognition and Emotion, 31(8), 1581–1594. https://doi.org/10.1080/02699931.2016.1249462 Fanon, F. (1967). Black skin, white masks. Pluto Press. Fattah, K., & Fierke, K. M. (2009). A clash of emotions: The politics of humiliation and political violence in the middle east. European Journal of International Relations, 15(1), 67–93. https:/ /doi.org/10.1177/1354066108100053

248

Y. Jogdand

Fernandez, S., Saguy, T., & Halperin, E. (2015). The paradox of humiliation: The acceptance of an unjust devaluation of the self. Personality and Social Psychology Bulletin, 41(7), 976–988. https://doi.org/10.1177/0146167215586195 Frevert, U. (2020). The Politics of Humiliation: A Modern History: USA: Oxford University Press. Geetha, V. (2009). Bereft of being: The humiliations of untouchability. In G. Guru (Ed.), Humiliation: Claims and context (pp. 95–107). New Delhi. Ginges, J., & Atran, S. (2008). Humiliation and the inertia effect: Implications for understanding violence and compromise in intractable intergroup conflicts. Journal of Cognition and Culture, 8(3), 281–294. https://doi.org/10.1163/156853708x358182 Goffman, E. (1959). The presentation of self in everyday life. Doubleday. Guru, G. (Ed.). (2009). Humiliation: Claims and context. New Delhi. Guru, G., & Sarukkai, S. (2019). Experience, caste, and the everyday social. Oxford University Press. Hafez, M. M. (2007). Martyrdom mythology in Iraq: How jihadists frame suicide terrorism in videos and biographies. Terrorism and Political Violence, 19(1), 95–115. https://doi.org/10.1080/095 46550601054873 Hartling, L. M., & Luchetta, T. (1999). Humiliation: Assessing the impact of derision, degradation, and debasement. Journal of Primary Prevention, 19(4), 259–278. Honneth, A. (1996). The struggle for recognition: The moral grammar of social conflicts. MIT Press. Hornsey, M. J. (2008). Social identity theory and self-categorization theory: A historical review. Social and Personality Psychology Compass, 2(1), 204–222. Israel, J., & Tajfel, H. (Eds.). (1972). The context of social psychology: A critical assessment. Academic. James, W. (1890). Principles of psychology. Holt. Jogdand, Y. (2015). Humiliation: Understanding its nature, experience and consequences. Unpublished Ph.D. thesis. St Andrews, UK: University of St Andrews. Jogdand, Y., Khan, S., & Reicher, S. (2020). The context, content, and claims of humiliation in response to collective victimhood. In J. Vollhardt (Ed.), The social psychology of collective victimhood (pp. 77–99). Oxford University Press. Jogdand, Y., & Reicher, S. (2013, 2013). Humiliation is a group based emotion: Experimental evidence from UK and India. In Paper presented at the British Psychological Society (BPS), Annual Social Psychology Section Conference. U.K.: University of Exeter. Jogdand, Y. A., Khan, S. S., & Mishra, A. K. (2016). Understanding the persistence of caste: A commentary on Cotterill, Sidanius, Bhardwaj and Kumar (2014). Journal of Social and Political Psychology, 4(2), 554–570. https://doi.org/10.5964/jspp.v4i2.603 Klein, D. C. (1991). The humiliation dynamic: An overview. Journal of Primary Prevention, 12(2), 93–121. Kohm, S. A. (2009). Naming, shaming and criminal justice: Mass-mediated humiliation as entertainment and punishment. Crime, Media, Culture: An International Journal, 5(2), 188–205. https://doi.org/10.1177/1741659009335724 Leary, M. R., Twenge, J. M., & Quinlivan, E. (2006). Interpersonal rejection as a determinant of anger and aggression. Personality and Social Psychology Review, 10(2), 111–132. Lindner, E. G. (2001). Humiliation-trauma that has been overlooked: An analysis based on fieldwork in Germany, Rwanda/Burundi, and Somalia. Traumatology, 7(1), 43–68. https://doi.org/ 10.1177/153476560100700104 Liu, J. (2015). Globalizing indigenous psychology: An East Asian form of hierarchical relationalism with worldwide implications. Journal for the Theory of Social Behaviour, 45(1), 82–94. Mackie, D. M., Maitner, A. T., & Smith, E. R. (2009). Intergroup emotions theory. In T. D. Nelson (Ed.), Handbook of prejudice, stereotyping, and discrimination (pp. 285–307). Lawrence Erlbaum Associates. Mahalingam, R. (2003). Essentialism, culture, and power: Representations of social class. Journal of Social Issues, 59(4), 733–749.

14 Humiliation and Technology: Dilemmas and Challenges for State, Civil …

249

Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224. Massanari, A. (2017). # Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3), 329–346. McCauley, C. (2017). Toward a psychology of humiliation in asymmetric conflict. American Psychologist, 72(3), 255. Mendible, M. (2004). Humiliation, subjectivity, and reality TV. Feminist Media Studies, 4(3), 335– 338. Miller, W. I. (1995). Humiliation: And other essays on honor, social discomfort, and violence. Cornell University Press. Moncur, W., Orzech, K. M., & Neville, F. G. (2016). Fraping, social norms and online representations of self. Computers in Human Behavior, 63, 125–131. https://doi.org/10.1016/j.chb.2016.05.042 Moors, A., Ellsworth, P. C., Scherer, K. R., & Frijda, N. H. (2013). Appraisal theories of emotion: State of the art and future development. Emotion Review, 5(2), 119–124. https://doi.org/10.1177/ 1754073912468165 Natrajan, B. (2012). The culturalization of caste in India: Identity and inequality in a multicultural age. Routledge. Oakes, P. J., Haslam, S. A., & Turner, J. C. (1994). Stereotyping and social reality. Blackwell. Otten, M., & Jonas, K. J. (2014). Humiliation as an intense emotional experience: Evidence from the electro-encephalogram. Social Neuroscience, 9(1), 23–35. Parkinson, B. (1996). Emotions are social. British Journal of Psychology, 87(4), 663–683. Reicher, S., & Hopkins, N. (2001). Self and nation. Sage. Reicher, S., Spears, R., & Haslam, S. A. (2010). The social identity approach in social psychology. In M. S. Wetherell & C. T. Mohanty (Eds.), Sage identities handbook (pp. 45–62). Sage. Rodriguez Mosquera, P. M. (2018). Cultural concerns: How valuing social-image shapes social emotion. European Review of Social Psychology, 29(1), 1–37. Saurette, P. (2005). Humiliation and the global war on terror. Peace Review, 17(1), 47–54. https:// doi.org/10.1080/14631370500292078 Saurette, P. (2006). You dissin me? Humiliation and post 9/11 global politics. Review of International Studies, 32(03), 495–522. Shekhawat, D. P. S. (2021, December 3). In Rajasthan, the viral video is used as a tool of violence against dalits. https://article-14.com/post/in-rajasthan-the-viral-video-is-used-as-a-tool-of-vio lence-against-dalits-61a986de439b5. Sheth, A., Shalin, V. L., & Kursuncu, U. (2021). Defining and detecting toxicity on social media: Context and knowledge are key. arXiv:2104.10788. Shnabel, N., Halabi, S., & Noor, M. (2013). Overcoming competitive victimhood and facilitating forgiveness through re-categorization into a common victim or perpetrator identity. Journal of Experimental Social Psychology, 49(5), 867–877. Silver, M., Conte, R., Miceli, M., & Poggi, I. (1986). Humiliation—Feeling, social-control and the construction of identity. Journal for the Theory of Social Behaviour, 16(3), 269–283. Retrieved from ://WOS:A1986E874800003. Statman, D. (2000). Humiliation, dignity, and self-respect. Philosophical Psychology, 13(4), 523– 540. https://doi.org/10.1080/09515080020007643 Tajfel, H., & Turner, J. (1979). An integrative theory of intergroup conflict. In W. G. Austin & S. Worchel (Eds.), The social psychology of intergroup relations (pp. 33–97). Brooks-Cole. Teltumbde, A. (2011). The persistence of caste: The Khairlanji murders and India’s hidden apartheid. Zed Books. Turner, J. C., Hogg, M. A., Oakes, P. J., Reicher, S., & Wetherell, M. (1987). Rediscovering the social group : A self-categorization theory. Basil Blackwell. Veldhuis, T. M., Gordijn, E. H., Veenstra, R., & Lindenberg, S. (2014). Vicarious group-based rejection: Creating a potentially dangerous mix of humiliation, powerlessness, and anger. PLoS One, 9(4), e95421. http://www.ncbi.nlm.nih.gov/pubmed/24759901.

250

Y. Jogdand

Vogel, W., & Lazare, A. (1990). The unforgivable humiliation: A dilemma in couples’ treatment. Contemporary Family Therapy, 12(2), 139–151. Vollhardt, J. R. (2012). Collective victimization. The Oxford handbook of intergroup conflict (pp. 136–157). Oxford University Press. Vollhardt, J. R., Mazur, L. B., & Lemahieu, M. (2014). Acknowledgment after mass violence: Effects on psychological wellbeing and intergroup relations. Group Processes & Intergroup Relations, 17(3), 306–323. Vorster, A., Dumont, K. B., & Waldzus, S. (2021). Just Hearing about it makes me feel so humiliated: Emotional and motivational responses to vicarious group-based humiliation. International Review of Social Psychology, 34(1). Williams, K. D., & Jarvis, B. (2006). Cyberball: A program for use in research on interpersonal ostracism and acceptance. Behavior Research Methods, 38(1), 174–180. Wood, A. L. (2011). Lynching and spectacle: Witnessing racial violence in America, 1890–1940. University of North Carolina Press.

Yashpal Jogdand is Assistant Professor of Psychology in the Department of Humanities and Social Sciences, IIT Delhi. He completed Ph.D. from School of Psychology & Neuroscience, University of St. Andrews, Scotland, UK, under the supervision of Prof. Stephen Reicher. His research examines the issues of prejudice, stereotyping, humiliation, leadership, caste/ casteism, and coping/resistance. He serves on the editorial boards of British Journal of Social Psychology, Asian Journal of Social Psychology and Journal of Social & Political Psychology and as Book Review Editor of the journal Psychological Studies. He is a recipient of the Young Psychologist Award by the National Academy of Psychology (India).

Chapter 15

Technology: Does It Help or Harm Intelligence—or Both? Robert J. Sternberg and Sareh Karami

Abstract During the twentieth century, James Flynn found that IQs around the world increased by 30 points. The average IQ remained 100 only because test publishers kept renorming the tests. Inevitably, scholars sought to understand the explanation for this spectacular rise. Although there have been many explanations, one popular explanation, accepted by Flynn himself, is that the increasing complexity of living in the world has increased the cognitive demands of the environment, resulting in people, over secular time, “matching” their environments by stretching their cognitive abilities. For example, even operating a hotel alarm clock is far more complex an operation today than it used to be a couple of decades ago. But is it possible that increases in IQ have been accompanied by decreases in other cognitive skills, in particular, wisdom? Many countries of the world (including the authors’) are rapidly degenerating into autocracies, the worldwide response to the pandemic of COVID19 has been unreflective, at best, and nations are allowing global climate change to become worse while not taking sufficient action to abate it. Is it possible that the very same forces that lead to an increase in one aspect of intelligence simultaneously can lead to a decrease in another aspect? We will argue that this is the case—that as IQ has increased, wisdom has decreased—and provide an explanation. Keywords Technology · Intelligence · Flynn effect · Wisdom · IQ

R. J. Sternberg (B) Department of Psychology, College of Human Development, MVR Hall, Cornell University, Ithaca, NY 14853, USA e-mail: [email protected] S. Karami Department of Counseling, Higher Education, Educational Psychology, and Foundation, mailstop:9727, Mississippi State MS 39762, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Mukherjee et al. (eds.), Applied Cognitive Science and Technology, https://doi.org/10.1007/978-981-99-3966-4_15

251

252

R. J. Sternberg and S. Karami

Introduction It might seem like a cause of rejoicing that, during the twentieth century, IQs in every country studied throughout the world increased by an average of 30 points (Flynn, 2007, 2012, 2016). The average IQ stayed at 100. But that was only because test publishers knew that, to retain the appearance of psychometric integrity, they had to keep the average IQ at 100. So, the publishers kept renorming the tests so that the average score would be 100 and the standard deviation as close to 15 as possible. Flynn realized that the average IQ scores failed to reflect rising raw scores, revealing increased levels of IQ across secular time. Predictably, scholars of intelligence tried to unearth the explanation for this spectacular rise in IQ—a full two standard deviations. That magnitude is comparable to the difference between someone who is borderline intellectually challenged and average (IQ of 70 versus 100) or between someone who is average and someone who is borderline intellectually gifted (IQ of 100 versus 130). It is hard to exaggerate this difference. It was as though a whole class of individuals who were once intellectually average now could barely get by in everyday life. There are multiple explanations of the rise in IQ (Neisser, 1998). The most plausible explanation is that as the world becomes more complex, the cognitive demands of the environment increase, and over secular time, people “match” their environments by stretching their cognitive abilities to fit the environments. The principle is little different from that of people sometimes needing to increase their physical strength to fit the demands of their environments. As an example, it used to be easy to operate a hotel alarm clock—a button here, a knob there, would do the trick. Today, it is a far more complex an operation. It is possible, however, that increases in IQ have been accompanied by stationarity or even decreases in other cognitive skills that are of value. It is also possible that the skills that IQ tests measure are ones that, during the twentieth century, were important, or at least were viewed as important, for adaptive success. But they may be less important today. In that case, we may be in an ironic situation in which focusing on skills that once were important may leave us developing students’ knowledge and skills that represent adaptation to the last century, not the current one. Just how beneficial are the 30 points of IQ that nations around the world gained during the twentieth century? Consider several examples (see also Sternberg, 2019a, 2021). One example is government. Many countries around the world (including the author’s) are in a process of change, and not for the better. These countries are rapidly degenerating into autocracies. Hungary, for example, is almost unrecognizable as a democracy today. China and Russia are essentially gone, with only the thinnest veneer in Russia of popular representation. Politicians and their followers in the United States who never would have pined for autocracy before are now openly doing so. One “conservative” TV celebrity spent a number of days in Hungary extolling its emerging dictatorship and obviously hoping for the same in the United States.

15 Technology: Does It Help or Harm Intelligence—or Both?

253

The worldwide response to the pandemic of COVID-19 has been ill-thoughtthrough, at best, and a total mess, at worst. In the United States, much of a political party has become a sort of death cult, with politicians and their followers alike embracing poor public-health practices that led to overwhelming numbers of hospitalizations and deaths in their states. It used to be thought—we used to think— that survival was the ultimate goal for which evolution had prepared us. We were wrong. People are dying—including young children—and still the people who decry the decaying of their imagined “freedom” continue practices that will increase the sickness and death. These people seem to have no problem with other vaccines, just the cynically politicized COVID-19 one. Apparently, mindless groupthink takes precedence over survival. Meanwhile, to show that problems with the adaptive value of IQ are quite broad, nations are demonstrating through inaction on global climate change that more IQ points are not the royal road to saving humanity (and a million other species) from itself. Baking hot temperatures, increasing flooding, more severe hurricanes, and widespread drought do not seem like sufficient stimuli to encourage people to act in ways that will preserve the world not only for themselves, but also for future generations. People today are able to handle technology that, in the past, might well have befuddled even bright people. One can see generational effects. The senior author has two sets of children, all of whom are quite a bit younger than he is. They likely have benefited from the Flynn effect and its interaction with technology in a way he has not. He has trouble with hotel alarm clocks. And there is a whole cottage industry of computers and cell phones designed especially for older people that are simpler to use than the computers and cell phones easily used by younger people. Those who have entered this niche market realize that many older people are befuddled by the technology that, for younger people who grew up with it, is second nature. The complexity of modern-day technology is probably one of the “blessings” of modern society that has encouraged people to develop their “mental muscles” for complex learning and thinking of a kind that fits a technological era. Those who cannot take advantage of this technology risk being left behind. Indeed, young people with high expectations of occupational or financial success often leave environments where jobs that require complex thinking are harder to find. There is a problem, though. The problem is that learning how to use technology does not guarantee wise or even reflective use of that technology. The same problem applies to all forms of technical expertise. For example, scientists who use highly complex and technical laboratory equipment may become very adept at the use of that equipment, but their technical expertise does not guarantee wise use of the equipment. For example, studying viruses requires a tremendous amount of technical expertise and extremely expensive laboratory equipment and safety precautions. But will scientists all be wise in their work? Will they always take the precautions? Will gain-of-function research lead to better means of combating viruses or will the viruses be allowed to escape from the laboratory? Or perhaps worse, will the research be used for biological warfare?

254

R. J. Sternberg and S. Karami

The risk in education today is that abstract-analytical skills—the kinds used to exploit technology—are not taught hand-in-hand with the skills to use the technology wisely. What exactly does it mean to use technology wisely? There is no real consensus on the nature of wisdom, although various models have been proposed seeking to represent some kind of consensus. For example, Grossmann and colleagues (2020) have suggested that wisdom comprises the following elements: balance of viewpoints, epistemic humility, context adaptability, and multiple perspectives. In the so-called “Berlin” paradigm for understanding wisdom, wisdom involves five elements: factual knowledge, procedural knowledge, lifespan contextualism, relativism of values, and awareness and management of uncertainty (Baltes & Smith, 2008; Baltes & Staudinger, 1993, 2000). In a three-dimensional model, Ardelt (2003, 2004) has proposed that wisdom draws on three basic kinds of mental processes— cognitive, reflective, and affective. In the MORE model of Glück and Bluck (2013), wisdom is hypothesized to require mastery, openness, reflectivity, and emotional regulation and empathy (see also Kunzmann & Glück, 2019). Finally, a Polyhedron Model of Wisdom (PMW) suggests that wisdom has as its components knowledge; reflectivity and self-regulation; pro-social behaviors and moral maturity; openness and tolerance; sound judgment and creativity; and dynamic balance and synthesis (Karami et al., 2020). Although the elements differ somewhat from one model to another, there is enough consensus among the accounts to make one thing clear: with regard to the model of wisdom, simply increasing the abstract-thinking skills required to understand and operate complex machinery or to operate analytically within a complex knowledgebased environment will not suffice to provide wisdom. There is no model of wisdom that would view teaching of knowledge and abstract-analytical skills as sufficient to impart the wisdom to use technology wisely. If there is a recent headline that captures the predicament of contemporary civilization and especially the United States, it may be one from an allegedly non-partisan source the morning on which we are writing this text: “GOP sees Biden vaccine mandates as energizing issue for midterms” (Manchester, 2021). The COVID-19 pandemic will go down in history as one of the world’s deadliest—perhaps the deadliest. On the day we are writing (9/13/21), there have been 219 million cases of COVID-19, resulting in 4.55 million deaths. There have been 41 million cases of COVID-19 in the United States, with 660 thousand deaths. And yet, some politicians see COVID-19 vaccine matters—which are serious matters of public health— as mere fodder for devising pitches to get re-elected and increase the representation of their political party in government. Vaccines, which all states require of children for entrance to school, are being politicized, at the cost of truly uncountable illnesses and deaths and overflow of hospitals, to favor the power of a political party. The political pitches may be creative; they may be smart; they cannot be wise. The reason they cannot be wise is that, in the end, wisdom must be about achieving some kind of common good, one that balances the interests of diverse individuals, groups, and large organized entities, such as states or provinces and countries (Sternberg, 1998, 2019b). Those counties and states in the US that are fighting mandates for wearing masks and for vaccines are having the worst outbreaks. In particular, as

15 Technology: Does It Help or Harm Intelligence—or Both?

255

of November 5, 2020, of 376 US counties with the highest number of new cases of COVID-19, on a per capita basis, 93% voted for Donald Trump in the 2020 election (Johnson et al., 2020). This is a staggering result, even more so because Trump lost the popular vote. It indicates that the pandemic has been very seriously politicized, with illnesses and deaths becoming a byproduct of ideological affiliation. The question is why, at this point in history, politicians, their followers, and their sympathizers in the media and among celebrities would promote what has literally amounted to a death cult. Does IQ really buy us that little? Why is survival, which would seem to be at the top of the scale in terms of Darwinian importance, being subordinated to the ambitions of cynical politicians and media personalities, at the expense of people who, whatever their IQs, are ready and eager to believe in evidence-free conspiracy theories, such as that vaccines against COVID-19 actually cause COVID-19, sterility, or mind control through the implantation of microchips (e.g., Douglas, 2021)? Why are the elevated IQs achieved through the Flynn effect not mitigating people’s skills in detecting and debunking obvious falsehoods? How can people be able to use complex technologies and then fall for hoaxes that would make P. T. Barnum look, perhaps, even credible? There are multiple reasons why people who have reasonably high IQs might nevertheless be credulous in the face of fact-free conspiracy theories or simple misinformation. We will discuss four.

Foolishness Sternberg (2004, 2005, 2019b) has argued that whereas foolishness is the opposite of wisdom, it is largely orthogonal to intelligence. Indeed, intelligent people might be even more susceptible to foolishness because they think they are immune and thus become more susceptible. It is like the person who believes themselves immune to a disease and thus does not take precautions against contracting it, only to make themselves more likely to come down with it. Sternberg has suggested that foolishness can be of 7 major kinds, although doubtless there are many more kinds: 1. Egocentrism. They believe the world revolves around them. They do not care about what other people think or how what they do affects others. 2. Unrealistic Optimism. They believe that if an idea is theirs, it will work out. They do not question their own ideas or motives. 3. Omniscience Fallacy. They believe they know everything and do not question their knowledge, however fallacious it may be. 4. Omnipotence Fallacy. They think they are all-powerful and so can do whatever they want. 5. Invulnerability Fallacy. They believe that they can get away with whatever they do, no matter if it is illegal, immoral, or unfairly infringing on the rights of others.

256

R. J. Sternberg and S. Karami

6. Sunk-cost Fallacy. They metaphorically throw good money after bad. Having made a mistake, they neither correct nor learn from it, but rather dig themselves in deeper or keep making the same mistake again and again. 7. Myside Bias. They see things only from their own point of view. They believe other points of view are not merely different, but wrong. Thus, the world as they see it is the world as everyone should see it.

Obedience to Authority Some people seek to submit themselves to authority. They are uncomfortable when they have to think for themselves. They decide whom they believe authorities are and then are credulous of what they say. For example, when Donald Trump said the election was stolen, they believed it without evidence. Scores of court cases questioning the election were thrown out but they believed the authoritarian. More than two-third of Republicans believe the 2020 election was stolen from Donald Trump (Black, 2021), despite the absence of empirical evidence and the dismissal of court cases up to the US Supreme Court. Many of these same people have believed in the utility of worthless drugs for treating COVID-19, such as hydroxychloroquine and Ivermectin. At some point, they engage in thinking, but it is thought that is programmed by cynical authority figures with agendas of their own.

Groupthink Janis (1972) proposed a concept of groupthink, according to which people who operate in groups, even very “intelligent” people, often come to believe things that are far removed from the truth. The groups are not necessarily working groups. They might be political, ideological, or religious groups, and the people in the group are not necessarily physically together. Groupthink involves eight characteristics: 1. Denial of Individual and Group Vulnerability. The members of the group do not acknowledge that they might be erroneous in their beliefs. 2. Rationalization and Minimization of Objections to Group Decisions to Create an Illusion of Unanimity. Group members rationalize their decisions and strongly discourage objections to group decisions. 3. Peer Pressure to Create the Illusion of Unanimity. Members are pressed to agree with group processes and decisions, thereby creating an illusion of unanimity. 4. Believe in the Righteousness of the Group. They believe their group is right and righteous and other groups are flawed and are ignorant or have malign intentions. 5. Negative Stereotyping of Outgroups. They believe that members of outgroups are inferior in their decision-making and likely in many other ways as well.

15 Technology: Does It Help or Harm Intelligence—or Both?

257

6. Development of One or More Self-appointed Mindguards. The role of the mindguard is to enforce adherence to group norms and decisions. Groupthink, according to Janis, is very common, and is found in groups of people without regard to their level of intelligence as it is commonly defined.

Self-Imposed Bias and Limitations in Information-Seeking and Information-Interpretation The amount of information available in the world, especially through the Internet, is practically mind-boggling. But people often do not seek much of this information, and often seek only information that they want to find. So, they seek information that represents their own point of view (see “myside bias” above) and then interpret that information to fit their own preconceptions, whatever those preconceptions may be.

Conclusion Schools place great emphasis on the transmission of knowledge. Such transmission is important, especially in a world in which there is so much information to transmit and in which technology requires greater and great knowledge bases for people to adapt successfully. But there needs to be much more emphasis placed on wise utilization and deployment of information. Schools have been lacking in this regard, with the result that students can finish schooling, even in prestigious schools, and end up smart and knowledgeable, on the one hand, but unwise and susceptible to seriously biased thinking, on the other. We then end up with cynical leaders who manipulate people to believe what they wish, and believers who believe what the cynical leaders want them to believe. We thus acquire countries with advanced technologies, on the one hand, and mindless acceptance of the self-serving beliefs of autocrats, on the others. We can do better. We must do better!

References Ardelt, M. (2003). Empirical assessment of a three-dimensional wisdom scale. Research on Aging, 25, 275–324. Ardelt, M. (2004). Wisdom as expert knowledge system: A critical review of a contemporary operationalization of an ancient concept. Human Development, 47, 257–285. https://doi.org/10. 1159/000079154. Baltes, P. B., & Smith, J. (2008). The fascination of wisdom: Its nature, ontogeny, and function. Perspectives on Psychological Science, 3, 56–64. https://doi.org/10.1111/j.1745-6916.2008.000 62.x

258

R. J. Sternberg and S. Karami

Baltes, P. B., & Staudinger, U. M. (1993). The search for a psychology of wisdom. Current Directions in Psychological Science, 2(3), 75–80. Baltes, P. B., & Staudinger, U. M. (2000). A metaheuristic (pragmatic) to orchestrate mind and virtue toward excellence. American Psychologist, 55, 122–136. https://doi.org/10.1037/0003066X.55.1.122 Black, E. (2021, May 5). Poll: Most republicans haven’t given up thinking 2020 election was stolen. Minn Post. https://www.minnpost.com/eric-black-ink/2021/05/poll-most-republicanshavent-given-up-thinking-2020-election-was-stolen/. Douglas, K. M. (2021). COVID-10 conspiracy theories. Group Processes & Intergroup Relations, 24(2), https://doi.org/10.1177/1368430220982068. Flynn, J. R. (2007). What is intelligence? Beyond the Flynn effect. Cambridge University Press. Flynn, R. J. (2012). Are we getting smarter? Cambridge University Press. https://doi.org/10.1017/ CBO9781139235679 Flynn, J. R. (2016). Does your family make you smarter? Nature, nurture, and human autonomy. Cambridge University Press. Glück, J., & Bluck, S. (2013). The MORE life experience model: A theory of the development of personal wisdom. In M. Ferrrari & N. Weststrate (Eds.), The scientific study of personal wisdom (pp. 75–97). Springer. Grossmann, I., Weststrate, N. M., Ardelt, M., Brienza, J. P., Dong, M., Ferrari, M., Fournier, M. A., Hu, C. S., Nusbaum, H. C., & Vervaeke, J. (2020). The science of wisdom in a polarized world: Knowns and unknowns. Psychological Inquiry, 31(2), 1–31. https://doi.org/10.1080/104 7840X.2020.1750917 Janis, I. L. (1972). Victims of groupthink. Houghton-Mifflin. Johnson, C. K., Fingerhut, H., & Deshpande, P. (2020, November 5). Counties with worst virus surges overwhelmingly voted Trump. AP. https://apnews.com/article/counties-worst-virus-sur ges-voted-trump-d671a483534024b5486715da6edb6ebf. Karami, S., Ghahremani, M., Parra-Martinez, F., & Gentry, M. (2020). A polyhedron model of wisdom: A systematic review of the wisdom studies in three different disciplines. Roeper Review, 42(4), 241–257. Kunzmann, U., & Glück, J. (2019). Wisdom and emotion. In R. J. Sternberg & J. Glück (Eds.), Cambridge handbook of wisdom (pp. 575–601). Cambridge University Press. Manchester, J. (2021, September 13). GOP sees Biden vaccine mandates as energizing issue for midterms. The Hill. https://thehill.com/homenews/campaign/571758-gop-sees-biden-vaccinemandates-as-energizing-midterm-issue?rl=1. Neisser, U. (Ed.) (1998). The rising curve: Long-term gains in IQ and related measures (pp. 81–123). American Psychological Association. Sternberg, R. J. (1998). A balance theory of wisdom. Review of General Psychology, 2, 347–365. Sternberg, R. J. (2004). Why smart people can be so foolish. European Psychologist, 9(3), 145–150. Sternberg, R. J. (2005). Foolishness. In R. J. Sternberg & J. Jordan (Eds.), Handbook of wisdom: Psychological perspectives (pp. 331–352). Cambridge University Press. Sternberg, R. J. (2019a). A theory of adaptive intelligence and its relation to general intelligence. Journal of Intelligence. https://doi.org/10.3390/jintelligence7040023 Sternberg, R. J. (2019b). Why people often prefer wise guys to guys who are wise: An augmented balance theory of the production and reception of wisdom. In R. J. Sternberg & J. Glück (Eds.), Cambridge handbook of wisdom (pp. 162–181). Cambridge University Press. Sternberg, R. J. (2021). Adaptive intelligence: Surviving and thriving in a world of uncertainty. Cambridge University Press.

Robert J. Sternberg is Professor of Psychology in the College of Human Ecology at Cornell University and Honorary Professor of Psychology at Heidelberg University, Germany. Sternberg is a Past President of the American Psychological Association and the Federation of Associations in Behavioral and Brain Sciences. Sternberg’s PhD is from Stanford University, and he holds 13

15 Technology: Does It Help or Harm Intelligence—or Both?

259

honorary doctorates. Sternberg has won the Cattell Award and the James Award from the Association for Psychological Science and the Grawemeyer Award in Psychology. He was cited recently by research.com as the #7 top psychological scientist in the US and #15 in the world. Sareh Karami is an assistant professor of Educational Psychology at Mississippi State University. Karami earned her doctorate in Educational Studies from Purdue University. Sareh received her bachelor’s and first master’s in clinical psychology from the University of Tehran. She served as the head of the Iranian gifted school’s research and extracurricular programs department for more than ten years. Sareh left her job to do more graduate work in education at the University of British Columbia, Canada. She received her second master’s in education from UBC. She has developed two theories of wisdom and published several articles on wisdom, creativity, and intelligence.