250 70 11MB
English Pages 196 Year 2023
Socially Responsible AI Theories and Practices
This page intentionally left blank
Socially Responsible AI Theories and Practices
Lu Cheng University of Illinois at Chicago, USA
Huan Liu Arizona State University, USA
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Names: Lu, Cheng (Professor of computer science), author. | Liu, Huan, 1958– author. Title: Socially responsible AI : theories and practices / Cheng Lu, University of Illinois at Chicago, USA, Huan Liu, Arizona State University, USA. Description: New Jersey : World Scientific, [2023] | Includes bibliographical references and index. Identifiers: LCCN 2022050651 | ISBN 9789811266621 (hardcover) | ISBN 9780000991188 (paperback) | ISBN 9789811266638 (ebook for institutions) | ISBN 9789811266645 (ebook for individuals) Subjects: LCSH: Artificial intelligence--Social aspects. | Artificial intelligence--Moral and ethical aspects. Classification: LCC Q334.7 .L83 2023 | DDC 006.301--dc23/eng20230112 LC record available at https://lccn.loc.gov/2022050651
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
Copyright © 2023 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/13150#t=suppl Desk Editors: Logeshwaran Arumugam/Amanda Yun Typeset by Stallion Press Email: [email protected] Printed in Singapore
To mother Limin Yuan and my father Liangming Cheng. — Lu Cheng To my parents, wife, and sons — Huan Liu
This page intentionally left blank
Preface
Artificial intelligence (AI) has been showing great potential and promise in many areas: it can drive our cars, it can help doctors diagnose disease and biologists discover new drugs, it can assist judges and judicial systems to provide streamlined access to justice, and so on. However, AI is also vulnerable, unfair, incomprehensible, and may do harm to our society. History is never short of stories and discussions on the relations between AI and humans. Dating back to World War II, Professor Norbert Wiener coined the term cybernetics and wrote (Wiener, 1948): “. . . we are already in a position to construct artificial machines of almost any degree of elaborateness of performance. Long before Nagasaki and the public awareness of the atomic bomb, it had occurred to me that we were here in the presence of another social potentiality of unheard-of importance for good and for evil ”. But what are the “good” and the “evil”? In the context of this book, the “good” refers to quality AI systems: safe, reliable, and trustworthy AI systems that benefit society as a whole; the “evil” refers to the discrimination, lack of transparency, privacy leakage, and any harm a socially indifferent AI system can do to our society in the long run. Critically, all the potential “evil” is our responsibility, not the responsibility of AI applications, and therefore, it is also our responsibility to ensure the alignment between the use of AI and our values, principles, and priorities. Each of us plays a different role and has different responsibilities in achieving socially responsible AI. vii
viii
Socially Responsible AI: Theories and Practices
This book discusses the responsibilities of AI researchers and practitioners, especially those who design and develop AI systems. The responsibilities should not just include the research ethics of individuals and the conducting of good research, but also the responsibilities embedded in the development and deployment of AI (i.e., fairness, transparency, and reliability) and the responsibilities to protect and inform users, and prevent/mitigate the evils or harms AI could do. This book serves as a convenient entry point for researchers, practitioners, and students to discuss these emerging issues and to identify how their areas of expertise can contribute to making AI socially responsible. We hope you find it useful in your work and life.1
1 Please refer to the book website for slides and other related information: https://sites.google.com/view/srai/home
About the Authors
Lu Cheng is an Assistant Professor of Computer Science at University of Illinois Chicago (UIC), USA. Her research focuses on developing algorithmic solutions for socially responsible AI using both statistical and causal methods. Lu’s work has appeared in and been invited to top venues for AI (e.g., AAAI and IJCAI), data mining (e.g., KDD, WWW, and WSDM), and NLP (e.g., ACL and COLING). She is the web chair of WSDM’22 and a senior program committee member of AAAI’22–23. Lu was the recipient of the 2022 CS Outstanding Doctoral Student, 2021 ASU Engineering Dean’s Dissertation Award, 2020 ASU Graduate Outstanding Research Award, 2021–2022 ASU CIDSE Doctoral Fellowship, 2019 ASU Grace Hopper Celebration Scholarship, IBM Ph.D. Social Good Fellowship, and Visa Research Scholarship. Huan Liu is a Professor of Computer Science and Engineering at Arizona State University (ASU), USA. His research interests are in data mining, machine learning, social computing, and artificial intelligence. He is a co-author of a textbook, Social Media Mining: An Introduction, Cambridge University Press; Field Chief Editor of Frontiers in Big Data and its Specialty Chief Editor of Data Mining and Management. ix
x
Socially Responsible AI: Theories and Practices
He is a Fellow of the Association for Computing Machinery (ACM), Association for the Advancement of Artificial Intelligence (AAAI), American Association for the Advancement of Science (AAAS), and Institute of Electrical and Electronics Engineers (IEEE).
Acknowledgments
This book is a synergistic product of many minds. It would not have been possible without many discussions with our colleagues and friends, especially the members from the Data Mining and Machine Learning Lab (DMML) at Arizona State University: Ruocheng Guo, Jundong Li, Kai Shu, Ghazaleh Beigi, Isaac Jones, Tahora Hossein Nazer, Suhas Ranganath, Suhang Wang, Liang Wu, Nur Shazwani Kamrudin, Kaize Ding, Raha Moraffah, Bing Hu, Mansooreh Karami, David Ahmadreza Mosallanezhad, Walaa Abdulaziz M Alnasser, Weidong Zhang, Faisal Alatawi, Amrita Bhattacharjee, Tharindu Kumarage, Paras Sheth, Anique Tahir, Ujun Jeong, Zhen Tan, Nayoung Kim, and Bohan Jiang. We are truly grateful for the stimulating and enlightening research opportunities with Drs. Yasin Silva, Deborah Hall, and K. Selcuk Candan. We would like to say a special thank you to Dr. Kush R. Varshney for inspiring and guiding us in this exciting research field. We would also like to thank Drs. H. V. Jagadish, Virginia Dignum, Lise Getoor, Toby Walsh, Fred Morstatter, and Jiliang Tang, for their invaluable feedback. We would like to acknowledge World Scientific, particularly the Executive Editor Rochelle Kronzek and the Senior Editor Amanda Yun, for their patience, help, and encouragement throughout the development of this book. This work stems from part of research efforts sponsored by grants from the National Science Foundation (NSF #2036127 and #1909555), the Army Research Office
xi
xii
Socially Responsible AI: Theories and Practices
(ARO #W911NF2110030), and the Office of Naval Research (ONR #N00014-21-1-4002). Last but not the least, we are deeply indebted to our families for their support throughout this entire project. We dedicate this book to them, with love. All errors, omissions, and misrepresentations are ours.
Contents
Preface
vii
About the Authors
ix
Acknowledgments
xi
1.
1
Defining Socially Responsible AI 1.1 1.2 1.3 1.4 1.5
1.6
2.
Why NOW . . . . . . . . . . . . . . What is Socially Responsible AI . . The AI Responsibility Pyramid . . Socially Responsible AI Algorithms What Could Go Wrong? . . . . . . 1.5.1 Formalization . . . . . . . . 1.5.2 Measuring Errors . . . . . . 1.5.3 Bias . . . . . . . . . . . . . . 1.5.4 Data Misuse . . . . . . . . . 1.5.5 Dependence versus Causality Concluding Remarks . . . . . . . . 1.6.1 Summary . . . . . . . . . . . 1.6.2 Additional Readings . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
Theories in Socially Responsible AI 2.1
Fairness . . . . . . . . . . . . . . 2.1.1 Different Fairness Notions 2.1.2 Mitigating Unwanted Bias 2.1.3 Discussion . . . . . . . . . xiii
. . . .
1 3 4 7 8 9 9 10 11 12 13 13 13 15
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
16 16 22 29
Socially Responsible AI: Theories and Practices
xiv
2.2
2.3
2.4
2.5
3.
Interpretability . . . . . . . . . . . . . . . . . . . . 2.2.1 Different Forms of Explanations . . . . . . . 2.2.2 Taxonomy of AI Interpretability . . . . . . . 2.2.3 Techniques for AI Interpretability . . . . . . 2.2.4 Discussion . . . . . . . . . . . . . . . . . . . Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Traditional Privacy Models . . . . . . . . . . 2.3.2 Privacy for Social Graphs . . . . . . . . . . . 2.3.3 Graph Anonymization . . . . . . . . . . . . 2.3.4 Discussion . . . . . . . . . . . . . . . . . . . Distribution Shift . . . . . . . . . . . . . . . . . . . 2.4.1 Different Types of Distribution Shifts . . . . 2.4.2 Mitigating Distribution Shift via Domain Adaptation . . . . . . . . . . . . . . . . . . . 2.4.3 Mitigating Distribution Shift via Domain Generalization . . . . . . . . . . . . . . . . . 2.4.4 Discussion . . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . 2.5.1 Summary . . . . . . . . . . . . . . . . . . . . 2.5.2 Additional Readings . . . . . . . . . . . . .
Practices of Socially Responsible AI 3.1
3.2
3.3
Protecting . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 A Multi-Modal Approach for Cyberbullying Detection . . . . . . . . . . . . . . . . . . . . 3.1.2 A Deep Learning Approach for Social Bot Detection . . . . . . . . . . . . . . . . . 3.1.3 A Privacy-Preserving Graph Convolutional Network with Partially Observed Sensitive Attributes . . . . . . . . . . . . . . . . . . . Informing . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 An Approach for Explainable Fake News Detection . . . . . . . . . . . . . . . . . . . . 3.2.2 Causal Understanding of Fake News Dissemination on Social Media . . . . . . . . Preventing . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Mitigating Gender Bias in Word Embeddings . . . . . . . . . . . . . . . . . . 3.3.2 Debiasing Cyberbullying Detection . . . . .
. . . . . . . . . . . .
30 31 31 33 45 46 47 53 59 65 66 67
. 70 . . . . .
76 84 85 85 85 89
. 89 . 90 . 96
. 99 . 103 . 104 . 107 . 112 . 112 . 116
Contents
3.4
4.
xv
Concluding Remarks . . . . . . . . . . . . . . . . . . 120 3.4.1 Summary . . . . . . . . . . . . . . . . . . . . . 120 3.4.2 Additional Readings . . . . . . . . . . . . . . 121
Challenges of Socially Responsible AI 4.1
4.2
4.3
4.4
Causality and Socially Responsible AI . . . . . . 4.1.1 Causal Inference 101 . . . . . . . . . . . . 4.1.2 Causality-based Fairness Notions and Bias Mitigation . . . . . . . . . . . . . . . . . . 4.1.3 Causality and Interpretability . . . . . . . How Context Can Help . . . . . . . . . . . . . . . 4.2.1 A Sequential Bias Mitigation Approach . . 4.2.2 A Multidisciplinary Approach for Context-Specific Interpretability . . . . . . The Trade-offs: Can’t We have Them All? . . . . 4.3.1 The Fairness–Utility Trade-off . . . . . . . 4.3.2 The Interpretability–Utility Trade-off . . . 4.3.3 The Privacy–Utility Trade-off . . . . . . . 4.3.4 Trade-offs among Fairness, Interpretability, and Privacy . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . 4.4.1 Summary . . . . . . . . . . . . . . . . . . . 4.4.2 Additional Readings . . . . . . . . . . . .
123 . . 123 . . 124 . . . .
. . . .
128 137 141 142
. . . . .
. . . . .
147 149 149 152 154
. . . .
. . . .
157 158 158 159
Bibliography
161
Index
177
This page intentionally left blank
Chapter 1
Defining Socially Responsible AI
1.1.
Why NOW
Artificial intelligence (AI) is omnipresent in our daily lives: from smartphones, self-driving cars, speech recognition to credit decisions, eCommerce, smart personal assistants, and healthcare. . .. And we do not always realize it. As a matter of fact, most people are unfamiliar with the concept of AI. According to a state survey1 published in 2017, among 1,500 senior business leaders in the US, only 17 percent said they were familiar with it. Despite the lack of familiarity in the public, AI has had and will continue to have a central role in countless aspects of life, livelihood, and liberty. It is bringing forth a sea-change that is not limited only to technical domains, but is a truly sociotechnical phenomenon affecting healthcare, education, commerce, finance, criminal justice, and many other sectors. Just a few years back, many of the applications today would have been unheard of: voice assistants, vaccine development, autonomous vehicles, and so on. It has been said that “artificial intelligence is a systematic, general-purpose technology not unlike electricity, and it will therefore ultimately scale across and invade every aspect of our economy and society” (Ford, 2015).
1
https://www2.deloitte.com/us/en/pages/deloitte-analytics/articles/cognitivetechnology-adoption-survey.html
1
2
Socially Responsible AI: Theories and Practices
Despite AI’s promise to transform the world, we are not yet ready to embrace it in its entirety. On the one hand, it is difficult for many businesses to collect data to properly train a quality AI algorithm; on the other hand, people do not trust AI in many applications due to the lack of transparency, potential harm, and unspecified social responsibilities when it goes wrong. Just as any new intervention, AI offers both promise and perils. As an illustration, a report published by Martha Lane Fox’s Doteveryone think tank (Miller, 2019) reveals that 59% of tech workers have worked on AI products they felt were harmful to society, and more than 25% of workers who had such an experience quit their jobs as a result. This has to do with the popular line of reasoning in platform regulation discussions of “amplification” today: platforms (e.g., social media platforms) amplify disinformation, hate speech, and other online misbehavior. Media companies have always curated the public sphere of the political community where they operate. With the help of the embedded recommender systems behind them, these media (and tech) companies are now determining what to amplify, what to reduce, and what to remove. They are shaping what we may learn or see, and whom we relate to. The rise of activism — which has been regarded as one of the current few mechanisms to keep Big Tech companies in check (Schwab, 2021) — against the negative social impacts of Big Tech has brought the Social Responsibility of AI into the media spotlight, the general public, and AI practitioners and researchers (Abdalla and Abdalla, 2021). In response, unprecedented efforts have been focused on developing fair, transparent, accountable, and trustworthy AI algorithms. The interest in responsible AI, AI ethics, and trustworthy AI/machine learning is attested by emerging research conferences such as the AAAI/ACM Conference on AI, Ethics, and Society (AIES) and ACM Conference on Fairness, Accountability, and Transparency (FAccT), books such as Dignum (2019), Varshney (2022), and departments/teams dedicated to ethical AI, trustworthy AI, and responsible AI in Big Tech companies such as Google, Meta, Microsoft, and IBM. To identify potential higher-order effects on safety, privacy, and society at large, academia and industry are gradually converging to recognize a common set of key characteristics a responsible and trustworthy AI system needs to exhibit: Fairness, Transparency, Robustness, Privacy, and Accountability. See, e.g.,
Defining Socially Responsible AI
3
IBM’s Trustworthy AI,2 Microsoft’s Responsible AI,3 and recent surveys such as Cheng et al. (2021d) and Liu et al. (2021). These characteristics are developed under the guidance of the five principles that commonly occur in ethics guidelines from different organizations: Privacy, Fairness and Justice, Safety and Reliability, Transparency, and Social Responsibility and Beneficence. By comparison, a topic routinely omitted from the discussions of responsible and trustworthy AI is the principle of beneficence, the application of AI for good purposes. The needs for an appropriate narrative and language to discuss these emerging issues, and to bridge the gap between the principles and practices, have motivated us to write a book on the topic of socially responsible AI. We specifically focus on the definitions and algorithmic solutions for socially responsible AI: algorithms that operationalize fundamental principles and that materialize AI for good. Our goal is to present the relations among AI, humans, and society, and improve the relations by developing socially responsible AI algorithms. Obviously, it is we, everyone directly and indirectly affected by AI, that should be responsible for the “perils” and for ensuring what is being developed aligns with our values, principles, and priorities. This book focuses on the responsibilities of AI researchers and practitioners. In the rest of this chapter, we introduce an inclusive definition of socially responsible AI (Section 1.2) and then present an AI responsibility pyramid that outlines four specific AI responsibilities to the society of which AI is a part (Section 1.3). In Section 1.4, we introduce the algorithms to achieve socially responsible AI. Lastly, we discuss the potential factors that may cause AI to go wrong (Section 1.5).
1.2.
What is Socially Responsible AI
Socially responsible AI includes efforts devoted to addressing both technical and societal issues. It is defined next in terms of principle, means, and purpose.
2 3
https://www.ibm.com/watson/trustworthy-ai https://www.microsoft.com/en-us/ai/responsible-ai
Socially Responsible AI: Theories and Practices
4
Definition 1.1 (Socially Responsible AI). Socially responsible AI refers to a human value-driven process where values such as Fairness and Justice, Transparency, Reliability and Safety, Privacy and Security, and Beneficence are the principles; designing socially responsible AI algorithms is the means; and addressing the social expectations of generating shared value — enhancing both AI’s ability and benefits to society — is the main purpose. Here, we identify three dimensions in Socially Responsible AI: the five coarse-level principles lay the foundation for the AI ethics guidelines; the means is to develop responsible AI Algorithms that are fair, transparent, privacy-preserving, and robust; and the purpose is to improve both AI’s capability and humanity, with the latter being the proactive goal. By including the term “socially”, we would like to highlight the societal view as opposed to the individual view of AI responsibility. Wikiquote.org4 defines personal responsibility (or individual responsibility) as “the idea that human beings choose, instigate, or otherwise cause their own actions” whereas Wikipedia5 defines social responsibility as “an ethical framework” and suggests that an entity, be it an organization or individual, has an obligation to act for the benefit of society at large. The potential argument here is that AI, unlike humans, can neither “choose” nor “instigate”, and is therefore free of individual responsibility. This naturally constrains the “responsibility” of AI into the broader societal side. However, the individual side of AI responsibility still holds if we take into account the individual responsibility of the AI designers, researchers, and practitioners who deploy AI, and the general public that actively uses AI techniques and products. This book mainly focuses on AI’s responsibilities to “act for the benefit of society at large”; that is the increasingly sociotechnical nature of AI and its interactions with society. 1.3.
The AI Responsibility Pyramid
Social Responsibility of AI should be framed in such a way that the entire range of AI responsibilities is embraced. Adapting Carroll’s 4 5
https://www.wikiquote.org/ https://en.wikipedia.org/wiki
Defining Socially Responsible AI
5
PHILANTHROPIC Responsibilies Be a good AI cizen. Build the AI ecosystem to address societal challenges. ETHICAL Responsibilies Be ethical. Obligaon to do what is right, fair, and just. Prevent harm. LEGAL Responsibilies Obey the law. Act for a certain sort of reason provided by the law. Play by the rules of the game. FUNCTIONAL Responsibilies Be funconal. Create technology that allows computers and machines to funcon in an intelligent manner.
Fig. 1.1: The AI responsibility pyramid, adapted from the Pyramid of CSR proposed by Carroll et al. (1991).
Pyramid of Corporate Social Responsibility (CSR) (Carroll et al., 1991) in the AI context, we suggest four kinds of AI responsibilities: functional, legal, ethical, and philanthropic responsibilities, as shown in Figure 1.1. By modularizing AI responsibilities, we hope to help AI practitioners and researchers reconcile these obligations and simultaneously fulfill all the components in the pyramid. All of these responsibilities have always existed, but functional responsibilities have been the main consideration until recently. Each type of responsibility requires close consideration. The pyramid portrays the four kinds of AI responsibilities, beginning with the basic building block notion that the functional competence of AI undergirds all else. Functional responsibilities require AI systems to perform in a manner consistent with profit maximization, operating efficiency, and other key performance indicators. Meanwhile, AI is expected to obey the law, which codifies the acceptable and unacceptable behaviors in our society. That is, legal responsibilities require AI systems to perform in a manner consistent with the expectations of governments and the law. All AI systems should at least meet the minimal legal requirements.
6
Socially Responsible AI: Theories and Practices
At its most fundamental level, ethical responsibilities are the obligation to do what is right, just, and fair, and to prevent or mitigate negative impact on stakeholders (e.g., users, the environment). To fulfill its ethical responsibilities, AI systems need to perform in a manner consistent with societal expectations and ethical norms, which cannot be compromised in order to achieve AI’s functional responsibilities. Finally, in philanthropic responsibilities, AI systems are expected to be good AI citizens and to contribute to tackling societal challenges such as cancer and climate change. Particularly, it is important for AI systems to perform in a manner consistent with the philanthropic and charitable expectations of society to enhance people’s quality of life. The distinguishing feature between ethical and philanthropic responsibilities is that the latter are not expected in an ethical sense. For example, while communities desire AI systems to be applied to humanitarian projects or purposes, they do not regard the AI systems as unethical if they do not provide such services. We explore the nature of the Social Responsibility of AI by focusing on its components to help AI practitioners reconcile these obligations. Though these four components are depicted as separate concepts, they are not mutually exclusive. It is necessary for AI practitioners and researchers to recognize that these obligations are in a constant but dynamic tension with one another. How socially responsible AI differs from its peers? Based on Definition 1.1 and the AI responsibility pyramid, we compare socially responsible AI with other similar concepts. We present the results in Table 1.1. In comparison, socially responsible AI holds a societal view of AI. It subsumes existing concepts and considers both the fundamental responsibilities of AI systems — to be functional, legal, and ethical, and their philanthropic responsibilities — to benefit society. In particular, existing concepts (e.g., ethical AI) tend to define what an AI system needs to follow so that it can be used harmlessly, while socially responsible AI ensures that all norms of a society are being followed in the right manner in the entire AI life cycle. Therefore, socially responsible AI encourages us to see and keep the morals of a society and environmental targets in mind. Even though AI Ethics, trustworthy AI, and socially responsible AI are closely intertwined, our definition of socially responsible AI is focused more on an AI’s obligation to society. Ethics is typically perceived as a broader concept that encompasses obligations of AI researchers, engineers,
Defining Socially Responsible AI Table 1.1:
7
Definitions of concepts similar to socially responsible AI.
Concepts
Definitions
Robust AI
AI systems with the ability “to cope with errors during execution and cope with erroneous input” (Wikipedia, 2021a). AI systems that do what is right, fair, and just. Prevent harm. AI systems that are lawful, ethically adherent, and technically robust. Trust needs to be established in the development, deployment, and use (Thiebes et al., 2020). AI systems absent from “any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics” (Mehrabi et al., 2021). AI systems deployed in ways that do not harm humanity (Feige, 2019). AI systems that focus on reliability, verifiability, explainability, and security (Singh et al., 2021). AI systems that are “continuously improving because of human input while providing an effective experience between human and robot”.6
Ethical AI Trustworthy AI
Fair AI
Safe AI Dependable AI
Human-centered AI
shareholders, customers, and other stakeholders. “Trust” refers to a relationship from a trustor (the subject that trusts a target entity) to a trustee (the entity that is trusted) (Tang and Liu, 2015). Trustworthy AI therefore describes a single-direction relationship that goes from users to AI. It is “we” that are responsible for socially responsible AI. 1.4.
Socially Responsible AI Algorithms
AI practitioners and researchers play a role that manifests a number of responsibilities. The most obvious responsibility is developing 6
https://www.cognizant.com/glossary/human-centered-ai
8
Socially Responsible AI: Theories and Practices
accurate, reliable, and trustworthy algorithms that can be relied upon by their users. Yet, this is not a trivial task. For example, due to the various types of explicit and implicit human biases, e.g., confirmation bias, gender bias, and anchoring bias, AI practitioners and researchers often inadvertently inject these same kinds of biases into the developed algorithms, especially when using machine learning techniques. For example, supervised machine learning is a common technique for learning and validating algorithms through manually annotated data, loss functions, and related evaluation metrics (Zafarani et al., 2014). A number of uncertainties — e.g., imbalanced data, ill-defined criteria for data annotation, over-simplified loss functions, and unexplainable results — potentially lurk in this “beautiful” pipeline and will eventually lead to negative consequences such as biases and discrimination. With the growing reliance on AI in almost any field in our society, we must bring upfront the vital question about how to develop socially responsible AI algorithms. In this regard, we define socially responsible AI algorithms as follows: Definition 1.2 (Socially Responsible AI Algorithms). Socially responsible AI algorithms are the intelligent algorithms that prioritize the needs of all stakeholders as the highest priority, especially the minoritized and disadvantaged users, in order to make just and trustworthy decisions. These obligations include protecting and informing users; preventing and mitigating negative impact; and maximizing the long-term beneficial impact. Socially responsible AI algorithms constantly receive feedback from users to continually accomplish the expected social values. In this definition, we highlight that the functional (e.g., maximizing profits) and societal (e.g., transparency) objectives are integral parts of socially responsible AI algorithms.
1.5.
What Could Go Wrong?
Socially responsible AI cannot be achieved automatically. Without conscientious effort, AI algorithms can be socially indifferent or irresponsible. Here, we walk through major factors that may lead AI to an undesired road. They are formalization, measuring
Defining Socially Responsible AI
9
errors, biases, privacy, and dependence versus causality (Getoor, 2019; Mehrabi et al., 2021). 1.5.1.
Formalization
AI algorithms encompass data, label, loss function, and evaluation metrics. We unconsciously make some frame of reference commitment to each of these formalizations. Firstly, the social and historical contexts are often left out when transforming raw data into numerical feature vectors. Therefore, AI algorithms are trained on pre-processed data with important contextual information missing. Secondly, data annotation can be problematic for a number of reasons. For example, what are the annotation criteria? Who defines the criteria? Who are the annotators? How can it be ensured that they all follow the criteria? What we have for model training are only proxies of the true labels (Getoor, 2019). Ill-formulated loss functions can also result in socially irresponsible AI algorithms. Many loss functions in the tasks are over-simplified to solely focus on maximizing profits and minimizing losses. The concerns of unethical optimization have been recently discussed by Beale et al. (2019). Unknown to AI systems, certain strategies in the optimization space that are considered as unethical by stakeholders may be selected to satisfy the simplified task requirements. Lastly, the use of inappropriate benchmarks for evaluation may push algorithms away from the overarching goal of the task and fuel injustice. 1.5.2.
Measuring Errors
Another factor is the errors used to measure algorithm performance. When reporting results, researchers typically proclaim the proposed algorithms can achieve certain accuracy or F1 scores. However, this is based on assumptions that the training and test samples are representative of the target population and their distributions are similar enough. Yet, how often does the assumption hold in practice? As illustrated in Figure 1.2, with non-representative samples, the learned model can achieve zero training error and perform well on the testing data at the initial stage. However, with more data being tested later, the model performance deteriorates because the learned model does not represent the true model.
10
Socially Responsible AI: Theories and Practices
Fig. 1.2: An example of measuring errors. The green line denotes the learned model and the blue one is the true model. “+” and “−” represent training data belonging to different classes; “X” represents testing data. Image taken from Getoor’s slides for 2019 IEEE Big Data keynote (Getoor, 2019) with permission.
1.5.3.
Biases
Bias is one of the most discussed topics regarding responsible AI. We focus here on data bias, automation bias, and algorithmic bias (Getoor, 2019). Data Bias. Data, especially big data, is often heterogeneous data with high variability of types and formats, e.g., text, image, and video. The availability of multiple data sources brings unprecedented opportunities as well as unequivocally presented challenges (Li et al., 2017). For instance, high-dimensional data such as text is infamous for the danger of overfitting and the curse of dimensionality. Additionally, it is rather challenging to find subset of features that are predictive but uncorrelated. The required number of samples for generalization also grows proportionally with feature dimension. One example is how the US National Security Agency tried to use AI algorithms to identify potential terrorists. The Skynet project collected cellular network traffic in Pakistan and extracted 80 features for each cell phone user with only seven known terrorists (Gershgorn, 2019). The algorithm ended up identifying an Al Jazeera reporter covering Al Qaeda as a potential terrorist. Data heterogeneity is also against the well-known i.i.d. assumption in most learning algorithms (Li et al., 2017). Therefore, training these algorithms on heterogeneous data can result in undesired results. Imbalanced subgroups are another source of data bias. As illustrated in Mehrabi et al. (2021), a regression analysis based on the subgroups with balanced fitness
Defining Socially Responsible AI
11
levels suggests a positive correlation between BMI and daily pasta calorie intake, whereas that based on less balanced data shows almost no relationship. Automation Bias. This type of bias refers to our preference for results suggested by automated decision-making systems while ignoring contradictory information. With too much reliance on the automated systems without sparing additional thoughts to making final decisions, we might end up abdicating decision responsibility to AI algorithms. Algorithmic Bias. Algorithmic bias regards biases added purely by the algorithm itself (Baeza-Yates, 2018). Some algorithms are inadvertently taught prejudices and unethical biases by societal patterns hidden in the data. Typically, models fit better to features that frequently appear in the data. For example, an automatic AI recruiting tool will learn to make decisions for a given applicant of a software engineer’s position using observed patterns such as “experience”, “programming skills”, “degree”, and “past projects”. For a position where gender disparity is large, the algorithms mistakenly interpret this collective imbalance as a useful pattern in the data rather than undesirable noise that should have been discarded. Algorithmic bias is a systematic and repeatable error in an AI system that creates discriminated outcome, e.g., privileging wealthy users over others. It can amplify, operationalize, and even legitimize institutional bias (Getoor, 2019). 1.5.4.
Data Misuse
Data are the fuel and new currency that have empowered tremendous progress in AI research. Search engines have to rely on data to craft precisely personalized recommendation that improves the online experience of consumers, including online shopping, book recommendation, entertainment, and so on. However, users’ data are frequently misused without the consent and awareness of users. One example is the Meta-Cambridge Analytical scandal (Wikipedia, 2021b) where millions of Meta users’ personal data were collected by Cambridge Analytica, without their consent. In a recent study (Caba˜ nas et al., 2020), researchers show that Meta allows advertisers to exploit its users’ sensitive information for tailored ad campaigns. To make
12
Socially Responsible AI: Theories and Practices
Weather + Electric Bill
+
+
Ice Cream Sales
Fig. 1.3: Confounders are common reasons for spurious dependence between two variables that are not causally connected.
things worse, users often have no clue about where, how, and why their data are being used, and by whom. The lack of knowledge and choice over their data causes users to undervalue their personal data, and further creates issues such as privacy and distrust. 1.5.5.
Dependence versus Causality
AI Algorithms can become socially indifferent when statistical dependence is misinterpreted as causality. For example, in the diagram in Figure 1.3, we observe a strong dependence between the electric bill of an ice cream shop and ice cream sales. Apparently, high electric bill cannot cause the ice cream sales to increase. Rather, weather is the common cause for the increase in electric bill and sales, i.e., high temperatures lead to a high electric bill and increased ice cream sales. Weather — the confounder — creates a spurious dependence between the electric bill and ice cream sales. Causality is a generic relationship between the cause and the outcome (Guo et al., 2020). While statistical dependence helps with prediction, causality is important for decision-making. One typical example is Simpson’s Paradox (Blyth, 1972). It describes a phenomenon where a trend or association observed in subgroups may be opposite to that observed when these subgroups are aggregated. For instance, in the study of the sex bias in graduation admissions at UC Berkeley (Bickel et al., 1975), the admission rate was found higher in male applicants when using the entire data. However, when the admission data were separated and analyzed over the departments, female candidates had equal or even higher admission rates over male candidates.
Defining Socially Responsible AI
1.6. 1.6.1.
13
Concluding Remarks Summary
In this chapter, we defined Socially Responsible AI and its algorithmic solutions driven by human values such as fairness, transparency, and reliability, and discuss AI’s responsibilities from four kinds of aspects: function, legalism, ethics, and philanthropy. With the recent burst of attention to several relevant topics such as ethical AI and trustworthy AI, we further make comparisons among these similar concepts.7 Critical to designing algorithmic solutions for socially responsible AI, major factors that can make AI behave “badly” are also discussed. The main takeaway message is that socially responsible AI is a complex and multi-faceted concept that no singular person, discipline, or field is equipped to understand and represent. With the growing interest in “AI”, a necessary first step is to provide a language to enable the discussion of socially responsible AI and to demystify AI’s capabilities and responsibilities, and its social implications. There surely remain other significant issues and trends that will not be discussed in this book, for example, deciding who should have a seat at the table when AI systems are being designed, and examining ethical issues in the deployment of AI systems. Answering these questions requires knowledge from various disciplines and requires that all of us, from developers to policy-makers, from endusers to bystanders, participate in the discussion and contribute to socially responsible AI. 1.6.2.
Additional Readings
For further reading related to socially responsible AI in general, we recommend the following. Surveys: • Cheng, L., Varshney, K. R., & Liu, H. (2021). Socially responsible AI algorithms: Issues, purposes, and challenges. Journal of Artificial Intelligence Research, 71, 1137–1181. 7
Same as socially responsible AI, these similar concepts are not well defined. The comparisons are made based only on one single definition among the many. Therefore, the results may contain our own implicit biases.
14
Socially Responsible AI: Theories and Practices
• Kaur, D., Uslu, S., Rittichier, K. J., & Durresi, A. (2022). Trustworthy artificial intelligence: A review. ACM Computing Surveys (CSUR), 55(2), 1–38. Keynotes: • Responsible Data Science by Lisa Getoor, 2019 IEEE Big Data. https://users.soe.ucsc.edu/∼getoor/Talks/IEEE-Big-DataKeynote-2019.pdf • Ethics Challenges in AI by Ricardo Baeza-Yates, 2022 WSDM. Tutorials: • Responsible AI in Industry: Practical Challenges and Lessons Learned by Kenthapadi et al. 2021, ICML. https://sites.google. com/view/ResponsibleAITutorial • Socially Responsible AI for Data Mining: Theories and Practice, Cheng et al. 2022, SDM. https://docs.google.com/presentation/ d/1oGA51wwiOkN2FP0sMQexZ2d 3LXk6y4T902-tnQXOvc/edit? usp=sharing Articles: • Bird, S. J. (2014). Socially responsible science is more than “good science”. Journal of Microbiology & Biology Education, 15(2), 169–172. • Borenstein, J., Grodzinsky, F. S., Howard, A., Miller, K. W., & Wolf, M. J. (2021). AI ethics: A long history and a recent burst of attention. Computer, 54(1), 96–102. • Shneiderman, B. (2020). Bridging the gap between ethics and practice: guidelines for reliable, safe, and trustworthy human-centered AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS), 10(4), 1–31. Books: • O’neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Broadway books. • Christian, B. (2020). The Alignment Problem: Machine Learning and Human Values. WW Norton & Company. • Varshney, K. R. (2022). Trustworthy Machine Learning. Chappaqua, NY, USA: Independently Published.
Chapter 2
Theories in Socially Responsible AI
Theories in socially responsible AI closely follow the five principles we mentioned earlier. These principles often appear in ethics guidelines from different organizations: • • • • •
privacy, fairness and justice, safety and reliability, transparency, and beneficence.
Note that we do not see these principles as perfect nor applicable to all possible organizations and sectors (e.g., governments and private sectors). Over the last several years, numerous ethics principles for AI and machine learning have been proposed by different groups from different sectors and different parts of the world. However, an important fact is that organizations in more economically developed countries have been more active than in less economically developed countries. Therefore, these commonly adopted five principles may be framed entirely based on Western philosophy. This book serves as an example of socially responsible AI that simply uses these mainstream principles. Interested readers can refer to Varshney (2022) for a detailed discussion about the differences and similarities of ethics principles across various sectors. Regardless, the lack of tangible ways to measure these principles may lead to the failure of these initiatives achieving their potential. In this chapter, we thus look into the ways of operationalizing some of these AI ethics principles, including fairness, privacy, reliability 15
16
Socially Responsible AI: Theories and Practices
(generalizability in particular), and transparency (interpretability in particular). For more in-depth discussions on any of these topics, readers can refer to existing comprehensive surveys such as those listed in Section 2.5.2. 2.1.
Fairness
Fairness in AI has gained substantial attention in both research and industry since 2010. For decades, researchers found it rather challenging to present a unified definition of fairness in part because fairness is a societal and ethical concept. Fairness can be subjective; it changes over social context, and evolves over time, making it a rather challenging goal to achieve in practice. Despite its ubiquity and importance, there is no globally accepted definition for fairness. Because socially responsible AI intensively involves decision-making processes commensurate with social values, we adopt the following definition for fairness in the context of decision-making: Definition 2.1 (Fairness). “Fairness is the absence of any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics” (Mehrabi et al., 2021). Note that even an ideally “fair” AI system defined in a specific context might still lead to biased decisions as the decision-making process can involve various kinds of elements. While the concept of fairness is difficult to pin down, unfairness/bias/discrimination might be easier to identify. There are six types of discrimination (Mehrabi et al., 2021). Direct discrimination results from protected attributes of individuals, while indirect discrimination arises from seemingly neural and non-protected attributes. Systemic discrimination relates to policies that may show discrimination against subgroups of the overall population. Statistical discrimination occurs when decision makers use average statistics to represent individuals. Depending on whether the differences among different groups can be justified or not, we further have explainable and unexplainable discrimination. 2.1.1.
Different Fairness Notions
There are two main types of fairness notions in AI: (1) group fairness and (2) individual fairness. Group fairness, simply put, is to
Theories in Socially Responsible AI
17
treat different groups equally (Dwork et al., 2012). It is the idea that the performance of an algorithm is the same across different groups. Individual fairness requires similar predictions for similar individuals (Dwork et al., 2012). It is the idea that individuals with similar features should receive similar model predictions. We will revisit some of the commonly used definitions in each category. 2.1.1.1.
Group fairness
We first define the basic notations: A denotes a set of protected attributes and the protected group is represented by S and the other group by T . X is the rest of the observable attributes. Y denotes the outcome to be predicted and Yˆ is the predictor dependent on A and X. Group fairness is therefore to compare members in T and members in S on average. Demographic/Statistical Parity. This fairness notion is closely related to the key concept of disparate impact in unwanted discrimination: practices that adversely affect S more than T irrespective of the decision maker’s intent and the decision-making procedure. Formally, a predictor Yˆ satisfies demographic/statistical parity if the likelihood of a positive outcome is the same irrespective of the individual being from S or T . That is, Yˆ ⊥⊥ A (independence): P (Yˆ |A = S) = P (Yˆ |A = T ).
(2.1)
In recruitment, this means that the fraction of people in S being recruited (i.e., selection rate) is the same as in T . Suppose that male is the majority (T ) and female is the minority (S) in a hiring process, Figure 2.1 shows an example calculation of statistical parity. P (recruit = 1|S) = 4/18 and P (recruit = 1|T ) = 6/27. P (recruit = 1|S) = P (recruit = 1|T ) = 2/9. You might have already noted that in Figure 2.1, while the statistical parity is satisfied, most of the recruited people in the minority group are actually unqualified, a discrimination against the actual qualified female. In the long run, it can exacerbate the discrimination between majority and minority groups as people in T outperform those in S. This suggests that statistical parity is not a strong fairness notion and can be undesirable in certain settings. Equalized Odds. A stronger fairness notion is the equalized odds, defined as follows: A predictor Yˆ satisfies equalized odds if Yˆ and A
Socially Responsible AI: Theories and Practices
18
Fig. 2.1: Statistical parity in recruitment. Here, we assume male is the majority and female is the minority.
Fig. 2.2: works.
Illustration of separation Yˆ ⊥ ⊥ A|Y in a set of various Bayesian net-
are independent conditional on Y , i.e., Yˆ ⊥⊥ A|Y . It measures the separation of the prediction Yˆ and the protected attribute A by the true label Y , which can be captured by any of the three Bayesian networks in Figure 2.2. Formally, equalized odds is defined as P (Yˆ = 1|A = S, Y = y) = P (Yˆ = 1|A = T, Y = y), y ∈ {0, 1}. (2.2) Given a classification task and Y is binary (i.e., positive and negative classes), Eq. (2.2) describes a fairness metric based on model performance metrics rather than simply the selection rate. In particular, it involves two metrics in the Receiver Operating Characteristic (ROC): the true positive rate (TPR, i.e., the probability of an individual in the positive class being correctly assigned a positive outcome) and the false positive rate (FPR, i.e., the probability
Theories in Socially Responsible AI
Fig. 2.3:
19
Equalized odds in the recruitment example.
of a person in a negative class being incorrectly assigned a positive outcome). Equalized odds states that individuals in S and T should have equal rates for true positives and false positives. This also suggests that prediction does not provide information about A beyond what Y already does. Using the same recruitment problem, we show an example calculation of equalized odds in Figure 2.3. Equal Opportunity. One can also relax the definition of equalized odds and only consider TPR. That is, we think of Y = 1 as the “preferred” outcome, e.g., people who are qualified ought to have an equal opportunity of being recruited in the first place. Formally, it is defined as P (Yˆ = 1|A = S, Y = 1) = P (Yˆ = 1|A = T, Y = 1).
(2.3)
An example calculation of equal opportunity is shown in Figure 2.4. Calibration. When the predicted output is a continuous risk score R = r (Yˆ ), e.g., the probability of defaulting in loan applications one may use the fairness notion of calibration by group or sufficiency. It requires that outcomes are independent of A after controlling for the estimated risk. For fairness, the calibration should be true across S and T . For example, in bank loan applications, an estimated 10% chance of default in both white and black groups indicates that whites and blacks default at similar rates. Formally, P (Y = 1|R = r, A = S) = P (Y = 1|R = r, A = T ) = r.
(2.4)
20
Socially Responsible AI: Theories and Practices
Fig. 2.4:
Equal opportunity in the recruitment example.
Fig. 2.5: Illustration of sufficiency or calibration by group Y ⊥ ⊥ A|Yˆ in a set of various Bayesian networks.
Similarly, we can use a set of Bayesian networks to describe calibration as shown in Figure 2.5. Comparing Figure 2.2 with Figure 2.5, one may observe that separation and sufficiency are opposite each other with Y and Yˆ reversed. So, does this imply that these two fairness notions are mutually exclusive? The answer is “Yes” except for a perfect classifier. 2.1.1.2.
Individual fairness
As opposed to one protected attribute in group fairness, one can consider individual fairness as adding more protected attributes until all the “groups” become individuals that share all of their feature values. Individual fairness (also referred to as consistency) requires that individuals with the same feature values should receive the same predicted labels. Let Nk (xj ) denote the k nearest neighbors of j described by features xj , consistency is quantified for n individuals
Theories in Socially Responsible AI
as follows:
n 1 1 consistency = 1 − yˆj − n k j=1
i∈Nk (xj )
21
yˆi .
(2.5)
We want the consistency to be larger because the more the consistency, the more the similarities between the predicted outcome of j and its neighbors. The second term in Eq. (2.5) therefore measures the difference between the predicted output of j and the average of its k nearest neighbors. If individual fairness is satisfied, the second term becomes 0, therefore, consistency = 1, or consistency < 1 otherwise. The defining challenge in individual fairness is the distance metric used to quantify the “similarity” between two individuals. What kind of distance should be used? Should we include protected attributes? Can we directly use the observed features regardless of the potential measurement bias? In the following, we give three individual fairness notions commonly observed in the field. Fairness through unawareness. It is a na¨ıve method that directly removes the protected attributes. The hypothesis is that an algorithm is fair so long as any protected attributes are not explicitly used in the decision-making process (Grgic-Hlaca et al., 2016). Clearly, this is a very weak fairness notion as it excludes the possibility of including features correlated with A. For instance, zip code is highly correlated with race in some cases. Fairness through awareness. It defines fairness as “giving similar predictions to similar individuals” (Dwork et al., 2012). Therefore, any two individuals similar w.r.t. a similarity metric (inverse distance) defined for a certain task should receive a similar outcome. Counterfactual fairness. One special case of individual fairness is when two individuals have exactly the same set of features except for the protected features. In the previous recruitment problem, this is to have a female applicant and a male applicant who have identical application materials (e.g., resume and age) and only differ in gender. For counterfactual fairness (Kusner et al., 2017), the two applicants should receive the same predicted label: either both get the job or both do not get the job. Surely, this can only happen in an imaginative world as it is impossible to intervene to change an applicant’s gender. In this causal perspective of fairness, protected attributes are
22
Socially Responsible AI: Theories and Practices
the treatment and the predicted label is the outcome. Counterfactual fairness implies that a decision is considered fair if it is the same in both “the actual world” and “a counterfactual world” where, e.g., an individual belongs to a different group. Since some causal inference background can help us better understand the details, we will revisit this concept in Chapter 4. 2.1.2.
Mitigating Unwanted Bias
“Fairness through unawareness” suggests simply excluding protected attributes from the features (out of sight, out of mind). This suppression is clearly problematic as other features may have statistical dependencies with the protected attributes. Approaches for bias mitigation can either design fair AI algorithms or theorize on the social and ethical aspects of machine learning discrimination (Caton and Haas, 2020). The remainder of this section will discuss three types of interventions for bias mitigation: pre-processing (prior to modeling), in-processing (at the point of modeling), and post-processing (after modeling). One common condition of these interventions is the predefined protected attributes. If these attributes are not available (e.g., due to privacy), the fairness problem can be approached from a robustness perspective. One condition to use pre-processing approaches is that the algorithm is allowed to modify the statistics of the training data. The unwanted bias is then removed through data transformation. Inprocessing approaches eliminate bias by modifying algorithms during the training process. It is often achieved by either incorporating fairness notion into the objective function or imposing fairness constraint. When neither training data nor model can be modified, we can use post-processing approaches to reassign the predicted labels based on a defined function and a holdout set which will not be used in the model training phase. 2.1.2.1.
Pre-processing approaches
Pre-processing mitigates biases in the data, a result of design decisions made by the data curator. There are, in general, three common pre-processing approaches: (1) data augmentation, (2) sample reweighting, and (3) label altering. Most of these approaches hold a
Theories in Socially Responsible AI
23
“we’re all equal” worldview1 as they cannot access the model training. Data augmentation simply augments original data by generating additional or synthetic data using the original data, for example, replacing the original protected attribute value with another value (like counterfactual fairness in Section 2.1.1.2). To maintain the fidelity of data distribution, the augmented data are added sequentially and samples close to the modes of the original datasets have the highest priority. Another way to pre-process is to assign weights to the training data samples to improve the statistical parity. Reweighting works by postulating the independence of Y and A, that is, PY (yj )PA (aj ) = PY,A (yj , aj ). PY (yj ), PA (aj ), and PY,A (yj , aj ) denote the marginal probabilities of A and Y , and their joint probability, respectively. Therefore, the weight is defined as follows: wj =
PY (yj )PA (aj ) . PY,A (yj , aj )
(2.6)
So, wj = 1 if A and Y are independent. Two distinct propositions are available for label altering or massaging the data: (1) relabel the outcome such that the identified unfair decisions are corrected by changing the outcome to what ought to have happened and (2) relabel the protected attribute value. The second proposition is often not recommended as considering such protected attributes may be illegal or these protected attributes can provide critical information that would help those affected by the algorithm (Nielsen, 2020). An example of relabeling the outcome is that students from racial group S are historically disfavored in university admissions, we then find the most similar students from racial group T who had been admitted to the university and mark them to have the same outcome of students from racial group S who had been denied entry. In practice, the chosen samples are those closest to the decision boundary that have low confidence. These three approaches assume that we already have the data in hand. But to mitigate data bias from the source, one might consider a better practice while collecting the data such as improving the 1 “Within a given construct space, all groups are essentially the same”. (Friedler et al., 2021).
Socially Responsible AI: Theories and Practices
24
transparency of the collection process, increasing the diversity of the data annotators, and incorporating proper training. While using the data, we can create datasheets that would act like a supporting document that records the dataset creation method, its motivations, and so on (Gebru et al., 2021). 2.1.2.2.
In-processing approaches
In-processing focuses on fairness intervention during model training, that is, to modify the loss function in a traditional AI algorithm. One common technique adds an additional term to the overall loss function as the fairness regularization or constraint. The goal of regularization is to make a model more regular and generalizable than it would be if trained with a simpler loss function. The idea of regularization is not to allow any particular inputs to become unduly important. Given a loss function L in supervised learning (e.g., crossentropy loss), the regularization/constraint C (e.g., statistical parity difference), and the algorithm f ∈ F (e.g., logistic regression), the final loss function is defined as follows: n 1 yˆ(·) = arg min L yj , f (xj ) + λC(f ), n f ∈F
(2.7)
j=1
where λ balances between the prediction and the fairness penalty, which can come from the fairness notions introduced in Section 2.1.1. We next introduce one of the first works using fairness regularization, followed by another genre of in-processing approach. Prejudice Remover. The fairness regularization technique was first discussed in Fairness Aware Classifier with Prejudice Remover Regularizer by Kamishima et al. (2012). Slightly different from Eq. (2.7), the loss function includes both a traditional regularizer which penalizes the model complexity (i.e., as the coefficients become larger) and a fairness regularizer, as shown in the following: η − ln M [yj |xj , aj ; Θ] + λCP R (D, Θ) + wa 22 , (2.8) 2 (yj ,xj ,aj )∈D
a∈A
where D denotes the sample set, Θ denotes the model parameters, and wa denotes the weights of protected attributes. In a classification
Theories in Socially Responsible AI
25
problem, the first term is the traditional logistic regression penalty: M [y|x, a; Θ] = yσ(x w) + (1 − y)(1 − σ(x w)),
(2.9)
where σ(·) is a sigmoid function. The second term is the fairness regularizer aiming to remove prejudice; it is defined as follows: CP R (D, Θ) =
(xj ,yj )∈D y∈{0,1}
M [y|xj , aj ; Θ] ln
Pˆ [y|aj ] , Pˆ [y]
(2.10)
where Pˆ is the estimated probability. Equation (2.10) is a measurement of the mutual information 2 between the outcome Y and the protected attribute A. The value of this regularizer increases when a class is mainly determined by the protected attribute. When the domain of X is large, the computation cost Pˆ [y|aj ] and Pˆ [y] can be high. Therefore, they are replaced with sample mean. The third term, weighted by the parameter η, has both fairness implication and regularization implication as it also encourages the weights of protected attributes to be small. The above method shows that regularization might actually suggest a potential attribute of fairness: it is suspicious if any indicator has a strong influence on an outcome in an otherwise highdimensional dataset (Nielsen, 2020). For example, compared to the single criterion (e.g., a single college entrance exam) for university admission in regions such as Asia and Europe alike, university admission decisions in the US purport to be complex and more holistic (e.g., GPA and SAT score). Under this notion, fairness depends not only on the context but also on the culture and geography. Adversarial Debiasing. The idea of adversarial debiasing comes from the generative adversarial networks (GANs) (Goodfellow et al., 2014): given a generator that generates new examples and a discriminator that tries to distinguish the real examples from the fake examples, the goal of GANs is to train these two modules together in 2
Let (X, Y ) be a pair of random variables, the mutual information is defined as I(X, Y ) = DKL (PX,Y PX PY ), where DKL is the KullbackLeibler divergence. When X and Y are discrete variables, I(X, Y ) = PX,Y (x,y) P (x, y) log . X,Y y x PX (x)PY (y)
26
Socially Responsible AI: Theories and Practices
a zero-sum game (thus adversarial) until the discriminator is fooled about 50% of time, that is, the generator starts generating plausible real examples. Simply put, if one can train one model to perform a task, then s/he can train another model to outperform it on some measure related to the task. Eventually, both models improve over the competing process. Similarly, in adversarial debiasing, the goal is to train a predictor with high accuracy meanwhile fool the adversary such that it cannot properly predict the protected attribute values given the output of the predictor. This is because the outcomes and the predicted output should not give any information about the protected attribute values, ideally. A visualization of this process can be seen in Figure 2.6. To make it more effective, some tricks of the trade are necessary as introduced in (Zhang et al., 2018a). The challenge of adversarial debiasing is designing the predictor W and adversary U such that they are in fact trained in an adversarial way, pursuing conflicting goals. For the predictor LP (y, yˆ), we can use a simple logistic regression or more advanced deep neural networks (DNNs). Then, the output layer of the predictor (e.g., the output of the softmax layer. It should not be the discrete predictions.) is then fed into the adversary model with the loss LA (ˆ a, a). Depending on the fairness notions, the adversary has other inputs: ˆ • For demographic parity, the inputs include the predicted label Y. ˆ • For equalized odds, the inputs include Y and Y. • For equal opportunity on a given class y, the inputs include Yˆ of the training examples with Y = y.
Fig. 2.6: Adversarial debiasing procedure: two neural networks representing the predictor and the adversary, respectively. We minimize the prediction loss while maximizing the adversary detection loss.
Theories in Socially Responsible AI
27
The adversary U is updated by minimizing LA using the gradient ∇U LA . The trick here is to ensure a proper gradient calculation of the predictor W . Zhang et al. (2018a) define it as follows: ∇W LP − proj∇LA ∇W LP − α∇W LA .
(2.11)
The projection term proj∇LA ∇W LP is used to prevent the protected attribute predictor from moving in a direction that will help decrease LA . With the opposite sign of the predictor’s loss, the last term −∇W LA aims to increase LA to hurt the adversary. Therefore, eventually, it becomes the same goal: minimizing the predictor’s loss and maximizing the adversary’s loss. 2.1.2.3.
Post-processing approaches
Post-processing is used at the last stage of the data modeling pipeline. i.e., situations in which data have been collected and pre-processed and the model has been trained. Most likely, the model is a black box, i.e., an algorithm that produces useful information without revealing any information about its internal workings. For example, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm used by many states in the US to predict criminal recidivism has remained a black box to outsiders. It was only in 2016 that a group of people3 working at ProPublica found racial bias perpetuated by the black box algorithm. One of the most important works in post-processing is, the same work that proposed equalized odds and equal opportunity is by Hardt et al. (2016). Now, we explain how to “correct” the binary predictor Yˆ (i.e., altering the predicted outcomes) to achieve a nondiscriminating predictor w.r.t. equalized odds and equal opportunity. This predictor is referred to as the derived predictor Y˜ which is derived from Yˆ and A. Y˜ is a possible randomized function of the random variable (Yˆ , A) alone. It is independent of X conditional on (Yˆ , A). So, the derived predictor can only depend on
3
Julia Angwin is a senior reporter at ProPublica. Jeff Larson is ProPublica’s data editor. Surya Mattu is a contributing researcher at ProPublica.
Socially Responsible AI: Theories and Practices
28
four conditional probabilities Py,a = P {Y˜ = 1|Yˆ = y, A = a}: P = (P0,0 , P0,1 , P1,0 , P1,1 ). To achieve equalized odds for the binary predictor Yˆ , we need to first define γa (Yˆ ), γa (Yˆ ) = P Yˆ = 1|A = a, Y = 0 , P Yˆ = 1|A = a, Y = 1 . (2.12) The first term corresponds to the false positive rate of Yˆ with A = a and the second term corresponds to the true positive rate Yˆ with A = a. Then, we have the following lemma based on the fairness notions: Lemma 2.1. A predictor satisfies • an equalized odds iff γ1 (Yˆ ) = γ0 (Yˆ ), • an equal opportunity iff γ1 (Yˆ )2 = γ0 (Yˆ )2 , the second components of γ1 (Yˆ ) and γ0 (Yˆ ) in Eq. 2.12. The idea is that groups S and T each have a quadrilateral defined by points related to false positives and true positives of the discriminating predictor as well as extreme perfect (1,1) and imperfect points (0,0). Therefore, the optimal equalized odds and equal opportunity predictors lie in the overlapping region of the two quadrilaterals. For A with binary values, the quadrilaterals of A can be then defined by the following: Pa (Yˆ ) = {(0, 0), γa (Yˆ ), γa (1 − Yˆ ), (1, 1)} a ∈ {0, 1}. (2.13) Therefore, a derived predictor Yˆ should satisfy γa (Y˜ ) ∈ Pa (Yˆ ). An optimal equalized odds predictor, or the derived predictor, can be obtained by solving the following optimization problem: min E[(Y˜ , Y )]. Y˜
s.t.
∀a ∈ {0, 1} : γa (Y˜ ) ∈ Pa (Yˆ ) (derived)
(2.14)
γ1 (Y˜ ) = γ0 (Y˜ ) (equalized odds). is a loss function that takes the inputs of Y˜ and Y and returns a real number (Y˜ , Y ), : {0, 1}2 → R. The output of Eq. (2.14) is the derived predictor Y˜ that minimizes the expected loss E[(Y˜ , Y )].
Theories in Socially Responsible AI
29
Another post-processing approach is based on calibration, which is very suitable in risk analysis, such as the COMPAS problem. Interested readers can refer to Pleiss et al. (2017). 2.1.3.
Discussion
Fairness involves various stakeholders, organizations, and sectors, and is a concept fusing vastly different disciplines such as philosophy, law, policymaking, and so on. Fairness measures are important to quantify fairness in the development of fairness approaches. However, creating generalized notions of fairness quantification is a challenging task (Caton and Haas, 2020). One fact regarding fairness (which is complex by nature) is that it is impossible to simultaneously satisfy all notions of fairness. As we mentioned earlier, it has been observed that there is an inherent trade-off between the calibration and equalized odds (Kleinberg et al., 2016). Therefore, discussions of fairness are often suggested to be placed in specific contexts and domain applications (Cheng et al., 2021d; Selbst et al., 2019). Another important notion to consider in fair AI is the standard independently identically distributed (i.i.d.) assumption. What we (and also the mainstream research) have covered so far is the fairness for i.i.d. data. But non-i.i.d. data abound in the real world, such as graphs, social networks, spatial data, and temporal data, just to name a few. What new challenges arise when handling these more complicated data? Do the assumptions for i.i.d. data still hold? One common concern in the existing literature on fair AI is the “accuracy–fairness trade-off”, which is almost taken as a given. In fact, the trade-off issue is not unique to fairness but a shared challenge among all of the responsible AI principles, as we will see in Chapter 4. Fairness evolves over time along with the constant changes in human values and social dynamics. This raises concerns about the commitment these fairness notions need to fulfill in the long term. Despite the various types of fairness notions, once introduced into the dimension of time, its size may be explosive. In addition, current fairness criteria may be considered unfair in the future. Fairness criteria are essentially designed to promote long-term wellbeing. However, even a static fairness notion can fail to protect the target groups when there is a feedback loop in the overall system (Liu et al., 2018).
30
Socially Responsible AI: Theories and Practices
What we covered later in this section are a few specific methods within each of the three categories of bias mitigation approaches (preprocessing, in-processing, and post-processing). Due to the ubiquity of bias and discrimination, these approaches have been used in a variety of domains such as natural language processing (NLP) and computer vision. Like in the prediction task, no algorithm can outperform all other algorithms on all datasets and fairness metrics. Each of these three approaches has its own domains of competence, which again suggests that fairness depends a lot on the sociotechnical context.
2.2.
Interpretability
Imagine you were applying for a bank loan to buy your first house; and though you were very sure that it would get approved, the result was the opposite. You would be disappointed or even outraged, but deep in your heart, you really would like to know “WHY”. So, you reach out to the bank manager; she simply answers it was the “magical” AI algorithm that told the bank that you were not qualified. This is surely not the answer you were looking for because it does not provide any useful information. “Is it because of my age, education level, or even RACE?” “It would be more helpful if the manager could tell me why this algorithm made such a decision”. Indeed, at least, you need to know if you were treated fairly. Interpretability is the key to increasing transparency of AI algorithms. It is an important attribute to consider when applying AI to high-stakes applications (e.g., bank loan applications), which deeply impact people’s lives. Unlike fairness, which has a variety of mathematical formulations, it is difficult to mathematically define interpretability. Some popular non-mathematical definitions are “Interpretability is the degree to which a human can understand the cause of a decision” by Miller (2019) and “Interpretability is the degree to which a human can consistently predict the model’s result” by Kim et al. (2016). The higher the interpretability of an AI algorithm, the easier it is for people to comprehend why certain decisions have been made, and the more trust is established between AI and humans. There have been discussions about the difference between interpretability and explainability. For example, interpretability is
Theories in Socially Responsible AI
31
the extent to which an outcome can be predicted, given a change in input or algorithmic parameters. Explainability, meanwhile, is the extent to which the internal mechanics of an AI system can be explained in human terms.4 “Intepretation” and “explanation” are the actual statements used to interpret and explain the model. 2.2.1.
Different Forms of Explanations
Before we proceed to any technical aspects, let us first clarify that different users require different forms of explanation in different contexts. An AI developer might need to know the technical details about how an AI system functions, while regulators might require assurance about how data are processed; the decision makers (e.g., bank managers) need to have enough information about the machine predictions to help them make final decisions, and those subject to a decision (e.g., bank loan rejection) might want to know what factors lead to a certain decision. In general, there are four different types of explanations reflecting the needs of different persons and the issues at play in different situations: • Data scientist: understanding the inner working mechanisms of AI to improve the model’s performance. • Regulator: ensuring the procedures in a machine learning life cycle are legal, safe, and compliant. • Decision maker: having high-level understandings of how the AI algorithm functions to assist their decision-making. • Affected user: identifying the factors that lead to a certain decision and what can be changed to reach the goal. To identify what types of explanations are needed, active stakeholder engagement and careful system design are both necessary. 2.2.2.
Taxonomy of AI Interpretability
Depending on the viewpoints, there are in general four different perspectives to look at existing interpretability techniques from: the purposes of interpretability, local versus global interpretability, 4 https://www.kdnuggets.com/2018/12/machine-learning-explainability-interp retability-ai.html
32
Socially Responsible AI: Theories and Practices Model Specific: Works with a single model or a group of models
Model Agnosc: Works with any model
Model Specific vs Model Agnosc
Local: Explain an Individual Predicon
Create Interpretable Models (Intrinsic)
Interpretability Techniques
Local vs Global Global: Explain the Overall Model
Purposes of Interpretability
Explain Black-Box Models (Post-Hoc) Enhance Fairness of a Model Test Sensivity of Predicons
Data Types
Tabular
Text
Image
Graph
Fig. 2.7: Taxonomy of AI interpretability. Figure is adapted from (Linardatos et al., 2020).
model agnostic versus model specific, and data types, as described in Figure 2.7. From the purpose perspective, we can further group these techniques into four categories. Interpretability for explaining black box models consists of methods that try to make sense of pre-trained black box models such as DNNs. It does not create an interpretable model but explains an existing model and is referred to as post hoc interpretability, also known as extrinsic interpretability. Conversely, we can also create an interpretable (i.e., white box) model that is inherently understandable to humans, such as linear regression. It is also known as intrinsic interpretability. As we mentioned in the bank loan application example, interpretability increases transparency and can be used to combat discrimination and enhance fairness. Therefore, another purpose of AI interpretability is to help enhance the fairness of AI systems from the societal and ethical aspects. The last category is using interpretability methods to analyze the sensitivity of an AI algorithm through subtle yet intentional changes. It is a tool to establish or enhance trust and reliability. Another perspective is to see whether the interpretability is for individual prediction or for the entire model. In global interpretability, users can understand how the model works globally by inspecting its inner working mechanisms such as the model structure and parameters. Local interpretability, on the other hand, examines why the
Theories in Socially Responsible AI
33
model makes a certain decision for an individual. If an interpretability technique can be applied to any model, then it is model-agnostic. Otherwise, it is model-specific. Model-agnostic interpretability does not inspect model parameters. The last perspective is a data perspective, that is, which type of data you are explaining for. Based on the data types, interpretability techniques can be applied to tabular data, text, image, or graph data. 2.2.3.
Techniques for AI Interpretability
This section will summarize the current progress of AI interpretability from the first three types of taxonomy: intrinsic versus post hoc, local versus global, and model-agnostic versus model-specific. Depending on when the interpretability methods are applicable (before, during, or after building the machine learning model), we have pre-model (before), in-model (during, i.e., intrinsic), and postmodel (after, i.e., post hoc) interpretability. Both model-agnostic and model-specific interpretability belong to post hoc interpretability. We therefore discuss four lines of AI interpretability in general: premodel interpretability, intrinsic interpretability (including globally and locally), post hoc global interpretability, and post hoc local interpretability. 2.2.3.1.
Pre-model interpretability
Pre-model interpretability is only applicable to the data itself. It requires an in-depth understanding of the data before building the model, e.g., the sparsity and dimensionality of the input data. Therefore, it is closely related to data interpretability (Carvalho et al., 2019), in which classic descriptive statistics and data visualization methods are often used, including Principal Component Analysis (Wold et al., 1987) and t-SNE (Maaten and Hinton, 2008), and clustering methods such as k-means (Hartigan and Wong, 1979). 2.2.3.2.
In-model interpretability
In-model interpretability asks for intrinsically interpretable AI algorithms, which can be also referred to as intrinsic interpretability. These models are self-explanatory and simple enough for users to understand how they make decisions based on their mathematical
34
Socially Responsible AI: Theories and Practices
forms. It can be achieved through imposition of constraints on the model such as causality, sparsity, structure (e.g., semantic monotonicity (Freitas, 2014)), or physical conditions from domain knowledge (Rudin, 2019). For example, for structured data, sparsity is useful for interpretability as humans can handle at most 7 ± 2 cognitive entities at once (Miller, 1956). There are two kinds of intrinsic interpretability: intrinsic local and intrinsic global interpretability. Decision trees, rule-based models, linear regression, attention network, Generalized Linear Models (GLMs), Generalized Additive Models (GAMs), and disentangled representation learning are popular inmodel interpretability techniques. Intrinsic Local Interpretability. These locally interpretable models provide explanations for a specific prediction. One representative scheme is the attention mechanism which has been extensively used in improving and explaining the performance of sequential models such as recurrent neural networks (RNNs) with long and/or complex sequences. Attention enables model interpretability as it can identify information that the model found important to the prediction. For example, in a cyberbullying detection task, given a comment, an attention-based interpretable model (e.g., hierarchical attention network (Cheng et al., 2019a, 2021b)) outputs both the prediction result and terms highly relevant to the prediction ranked based on the attention weight. The attention mechanism has been used in a variety of tasks to improve both models’ performance and interpretability, in machine translation (Luong et al., 2015), reading comprehension (Hermann et al., 2015), and language modeling (Liu and Lapata, 2018) in NLP and image caption generation (Xu et al., 2015) in computer vision, to name a few. The high-level goal of the attention mechanism is to first calculate non-negative weight for each input component (e.g., a word or a pixel) that sums to 1, then multiply the weights by their corresponding representations, and lastly, sum the resulting vectors into a single representation with fixed length. We here describe one of the most important attention mechanisms, introduced by Bahdanau et al. (2014). Attention is first introduced to tackle the disadvantage of traditional sequence-to-sequence (seq2seq) models, which normally are composed of an encoder–decoder architecture. In the context of
Theories in Socially Responsible AI
“Hui is a nice girl”
[ℎ1 , … , ℎ5 ]
Encoder
ℎ5
Feed Forward Neural Network Step 1
somax [ , … , 5] Step 2 1 Alignment scores Aenon weights
In = 1: Output= Hidden state=
[ 1, … ,
5]
Calculate context Step 3 vector 1 = · ℎ + ⋯ + 5 · ℎ5 1 1 1
Hidden state of the decoder in previous me step
Step 5
35
Concatenaon Step 4 Decoder
concat_vector=
1
Fig. 2.8: Simple illustration of attention mechanism in neural machine translation.
neural machine translation, the encoder takes the input sequence and encodes/compresses the information into a context vector with fixed length, also known as the final state of the encoder. The decoder then starts generating output initialized with this context vector. As you might have noted, the context vector is incapable of remembering longer sequences. Attention resolves this issue by utilizing all the intermediate states to construct the context vector. There are five steps for the seq2seq models to predict the next word, as described in Figure 2.8. The attention mechanism consists of three steps: alignment scores, weights, and context vector. (1) Alignment scores: The alignment model (e.g., a feed-forward neural network) takes all the intermediate hidden states of the encoder as input and outputs the alignment scores. These scores align the input sequence with the current output at time step t. For example, for the first word “Hui”, s1 and s2 should be larger than s3 , s4 , s5 . (2) Attention weights: A softmax function is then applied to these alignment scores to generate attention weights which sum to 1. T (3) Context vector : The context vector ct = t=1 et,i hi is the weighted sum of all encoder hidden states. ct is then concatenated with the previous output of the decoder (< ST ART > at t = 1) and the result is then fed into the decoder. To enhance interpretability, we can further visualize these attention weights (e.g., using different shades of color). They help users to understand how words in one language depend on words in another language in neural machine translation. However, recent findings
36
Socially Responsible AI: Theories and Practices
have shown attention weights can be very noisy and are not a fail-safe indicator (Serrano and Smith, 2019). Intrinsic Global Interpretability. In contrast to local interpretability, this type of interpretability provides global explanations for the structures and parameters for the model studied. Many simple machine learning models are white box, i.e., inherently interpretable. For example, linear regression, logistic regression, decision trees, decision rules, and so on. Here, we discuss three of them to help get a sense of inherently global explanations. (1) Logistic Regression: In the classification tasks, logistic regression models use the logistic function to transform the output of a linear equation into a value between 0 and 1. Given the weights w1 , ..., wd and the features x = x1 , ..., xd , where d denotes the feature dimension, the logistic function in a binary classification task (e.g., the bank loan application) is defined as P (Yˆ = 1|X = x) =
1 , 1 + e−η
(2.15)
where η = w1 x1 + w2 x2 + · · · + wd xd and Yˆ is the predicted label. Logistic regression is a simple transformation of linear regression as we prefer the output to be the probabilities between 0 and 1 in classification. (2) Decision Trees: Tree-based models split the data multiple times based on the cut-off values of features. By following this IF ... THEN ... rule, all of the input data are divided into different subsets (i.e., the internal nodes or split nodes) and eventually fall into the final subsets, also referred to as leaf nodes or terminal nodes. The prediction of a sample in the leaf node is the average outcome of the training data in the same leaf node. The interpretability is enabled by computing the overall feature importance in a decision tree. We can trace all the splits for which the feature was used and measure how much it has reduced the variance compared to the parent node. The larger the variance, the more important the feature. The sum of all feature importances is scaled to 100 and each feature importance describes the shares of the overall model importance. At the end, the interpretation is a series of IF ... THEN ... rules connected by AND.
Theories in Socially Responsible AI
37
We mentioned earlier that we can improve models’ interpretability by imposing constraints such as sparsity and semantic monotonicity. Sparsity forces the model to use relatively fewer features for prediction, and monotonicity imposes the monotonic relations between the features and prediction. To enhance the interpretability of decision trees, we prune the trees by replacing subtrees with leaves to encourage long and deep trees instead of wide and more balanced trees (Quinlan, 1987). (3) Disentangled Representation: What we have discussed so far are the features in the input space such as the age of the applicant. However, in many scenarios, these features might not make total sense to users, e.g., individual pixels in the identity image submitted in your bank loan application. They lack semantics. Alternatively, we can add semantically meaningful constraints to generate more high-level and understandable explanations. One solution is to learn a representation in which the dimensions are independent from one another and might provide information not captured in other input data. For example, the representation can capture the face contour of the applicant which otherwise is left out. Such representations are referred to as disentangled representations. Another example is an interpretable convolutional neural network (CNN) that adds regularization loss to higher convolutional layers of CNN to identify knowledge representations in these layers (Zhang et al., 2018b). Low convolutional layers (i.e., layers close to the input layer) are more likely to capture low-level textures while high convolutional layers (i.e., layers close to the output layer) capture more semantically meaningful features such as the object parts. 2.2.3.3.
Post hoc local interpretability
Inherently interpretable models are white boxes. What we often use in reality, however, are black box models that achieve very good performance but that are themselves not interpretable. This might come from the belief that these black box models can capture some ‘hidden patterns’ in the data which otherwise are left out by humans. To acquire explanations for individual predictions of these black box models, post hoc local interpretability identifies the contributions of each feature in the input to an individual prediction. It is also
38
Socially Responsible AI: Theories and Practices
referred to as the attribution method. Below, we introduce both model-agnostic and model-specific models. Model-agnostic Interpretability. This type of technique can interpret any black box model without looking into the model’s internal mechanisms. Despite its wide applications, it may generate explanations unfaithfully, reflecting the decision-making process of a model. An explanation is faithful if it can accurately represent the true reasoning behind the model’s final decision. We specify two techniques in the following. Local Surrogate Interpretability. The high-level goal of this technique is to generate samples around the target sample to approximate an inherently interpretable model, e.g., linear regression, GLM, and decision trees. Then, the prediction of the black box model can be explained by the parameters of the white box model. The interpretable model itself does not need to work well globally, but it should have good approximation of the black box model in a small neighborhood near the target sample. One representative example is the local interpretable model-agnostic explanation (LIME) (Ribeiro et al., 2016), in which the authors proposed to use linear models as the local surrogate models. Let g denote the local surrogate model (e.g., a logistic regression model), G denote the family of all possible surrogate models, and f denote the black box model (e.g., XGBoost). The proximity measure πx defines how large the neighborhood around the target sample x is. The mathematical formulation of the local surrogate model is then defined as follows: explanation(x) = arg min L(f, g, πx ) + Ω(g),
(2.16)
g∈G
where L is the loss function that measures how accurately g can approximate the prediction of the original model f . Ω(·) is the model complexity and its value is forced to be small as we prefer fewer features. When linear surrogate models cannot approximate the black box model well, one needs to use a nonlinear local surrogate model such as the if-then rules (Ribeiro et al., 2018). These rules are also called the anchor explanation, which “anchors” the prediction locally; that is, changes to the rest of the feature values of the target example do not change the prediction. For example, “(not bad)” is an anchor in the sentiment classification that (almost) always leads to the same predictions.
Theories in Socially Responsible AI
39
Perturbation-based Interpretability. The core idea of this line of work is to find features that can lead to large changes in prediction scores when perturbed. One representative example is the counterfactual explanation and it answers the following question: “What would Y have been if we had changed the feature xi to another value?” While it is often believed that counterfactual explanations imply the “causal relation” between the features and the predicted outcome, the boundary between counterfactual explanations and adversarial examples is not very clear. A counterfactual explanation of a prediction describes the smallest change to the feature values that can lead to a different prediction result (Molnar, 2020). We will revisit it when we detail causality and interpretability in Section 4.1.3.2. Model-specific Interpretability. Some interpretable techniques are designed for specific types of models. Here, we focus on DNNspecific methods. Back-propagation-based Interpretability. As an effective way to interpret an individual prediction for DNN, back-propagation-based techniques calculate the gradient or its variants w.r.t. the input using back-propagation to identify the feature/attribute contributions. When back-propagating the gradient — the partial derivative of the model’s prediction score or P (Yˆ = 1|X) w.r.t. each feature dimension xi — a larger gradient magnitude indicates that the feature needs to be changed least to affect the class score the most. Therefore, this feature is more relevant to the final prediction (Simonyan et al., 2013). For images, the gradient can be visualized by another image, which is referred to as the saliency map. There are other forms of signals that we can back-propagate, such as discarding negative gradient values during the back-propagation process (Springenberg et al., 2014) or back-propagating the relevance score Ri of the final prediction score to the input layer w.r.t. feature xi (Bach et al., 2015). The sum of all relevance scores should be equal to the prediction: P (Yˆ = 1|X = x) ≈
d
Ri .
(2.17)
i=1
Ri > 0 is interpreted as a positive evidence of the presence of a structure while Ri < 0 denotes an evidence against its presence. Backpropagation-based techniques are often implementation-friendly,
40
Socially Responsible AI: Theories and Practices
simple and efficient. However, the generated explanation might be noisy (Du et al., 2019). Mask-perturbation-based Interpretability. The perturbation-based techniques in model-agnostic interpretability can be computationally expensive, especially for input with high-dimensional features. For DNN-specific interpretable techniques, a more efficient approach (Fong and Vedaldi, 2017) is to use mask perturbation or gradient descent optimization. Take images as an example, let m : Λ → [0, 1] be a mask. m is a function of each pixel u ∈ Λ, and m(u) is a scalar value. The high-level goal of mask perturbation is that if we mask out an image x with f (x) = +1 to “remove” certain regions and the prediction of the perturbed image x0 is f (x0 ) = +1, then the removed regions are not expected to be the explanations of f (x) = +1. For implementation, the perturbation can be replacing the original pixel with a constant, noise, or a blur. Therefore, given the mask operator [φ(x; m)](u), the goal is to find the smallest set of masks m such that f (φ(x; m)) f (x). Formally, m∗ = arg min λ1 − m1 + f (φ(x; m)),
(2.18)
m∈[0,1]|Λ|
where |Λ| denotes the number of pixels and λ encourages fewer masks to be turned on. m∗ then represents a highly informative region for the prediction of DNN. We generally need to add various regularizations to the masks to ensure semantically meaningful explanations. 2.2.3.4.
Post hoc global interpretability
To recall, in contrast to local interpretability, global interpretability seeks to provide global understanding of the machine learning models, specifically, their model structures and parameters learned from the training data. As the superior power of DNN models has well separated it from traditional machine learning models in many aspects, as shown in Figure 2.9, we introduce various techniques for post hoc global interpretability from these two lines of research, respectively. Interpretability for Traditional Machine Learning. Traditional machine learning models typically rely on feature engineering to extract features from the raw data in the input space X , which
Theories in Socially Responsible AI
41
Tradional machine learning
Raw input
Feature Engineering
Features
Tradional ML models
Output
Deep learning
Raw input
DNN
Output
Fig. 2.9: Difference between the pipelines of traditional machine learning (ML) models and deep neural networks (DNNs). Adapted from (Du et al., 2019).
are then projected to the output space Y via the mapping function f (·). Earlier in this chapter, we mentioned several traditional machine learning models that provide inherent explanations. Many of them are linear models, thus, they cannot work properly in nonlinear cases. Even for nonlinear inherently interpretable models such as decision trees, they are prone to overfit, resulting in undesired prediction performance. Approaches to post hoc interpretability are therefore more desired. The mainstream technique is feature importance. The importance score of each feature indicates how much effect the specific feature has on the model that is being used to predict the output. Most of these techniques have been implemented in popular Python packages such as scikit-learn5 ; interested readers can try it out while we go through these techniques. Model-agnostic Interpretability. Model-agnostic techniques aim to explain any black box model without examining its internal mechanisms. For traditional machine learning with tabular data, an important method perturbs features and records to what extent each feature can change the overall performance (e.g., accuracy) of the black box models (Breiman, 2001). The permutation feature
5
https://scikit-learn.org/
Socially Responsible AI: Theories and Practices
42
importance score is defined as the decrease in a model score when a single feature value is randomly shuffled. This shuffling operation can break the relations between the feature and the output variable, therefore, the decrease in the performance is indicative of how much the model depends on the feature. Built upon this idea, we can have a more advanced permutation feature importance algorithm (Fisher et al., 2019), as briefly summarized in the following: Input: Trained model fˆ, feature matrix X, target variable y, and error measure L(y, fˆ) (e.g., square errors). (1) Calculate error of original model eorig = L(y, fˆ(X)). (2) For each feature xi i ∈ 1, . . . , d, generate Xperm by permuting xi and estimate the error after permutation: eperm = L(y, fˆ(Xperm )). (3) The permutation feature importance (FI) of xi is calculated by quotient FIi : FIi =
eperm eorig
(2.19)
or FIi = eperm − eorig
(2.20)
(4) Sort features based on FI scores. The permutation feature importance can be calculated based on the training set or the holdout set (i.e, validation set). The difference is that using the holdout set, it is possible to highlight which feature contributes the most to the generalization power of the black box model; features that are found important on the training set may cause overfitting. Model-specific Interpretability. There are techniques that explain a specific traditional machine learning model such as ensemble trees. Tree ensemble models combine several decision trees to overcome the overfitting issue of a single decision tree, therefore, generating better prediction results. However, ensemble models are black boxes. To reserve both performance and interpretability, an alternative is to apply interpretable model extraction, also referred to as mimic learning (Vandewiele et al., 2016). The basic idea of mimic learning is to approximate a complex model using an inherently interpretable
Theories in Socially Responsible AI
43
model such as decision trees, linear regression, or rule-based models. For instance, a genetic algorithm is applied to an ensemble of decision trees to merge decision trees into one single decision tree (Vandewiele et al., 2016), which is then used to explain the previous black box model. Alternatively, one can calculate the performance improvement after using a feature in tree branches. Similar to permutation, the rationale is that a feature is important if the data splitting based on this feature can lead to more accurate prediction. Another approach measures the feature coverage, the percentage of observations that have a specific feature xi . Since it is possible that not all the samples in the data have valid/reliable values for a specific feature, reasons accounting for this may be data corruption, noise, and so on. Therefore, feature coverage can be an indicator of how reliable a feature is. In the simplest case, we can count the number of times that a feature is used to split the data in the tree ensemble models. Interpretability of Deep Neural Networks. As shown in Figure 2.9, DNNs are capable of learning representations from the raw data, therefore, the need for feature engineering as in traditional machine learning is reduced. However, these latent representations captured by the neurons in the intermediate layers of DNNs are rarely understandable to humans. Here, we discuss the interpretable techniques for two major kinds of DNNs: CNN and RNN. Interpretability of CNN. For images, a CNN learns high-level features and concepts from raw image pixels: as shown in Figure 2.10, the first convolutional layer(s) learn features such as edges and textures; the intermediate convolutional layers capture more complex textures
Edges
Textures
Paerns
Parts
Objects
Fig. 2.10: Feature visualization of CNN trained on the ImageNet (Deng et al., 2009) dataset over many layers. Image from (Olah et al., 2017).
44
Socially Responsible AI: Theories and Practices
and patterns; the last convolutional layer learns objects or part of objects, which are more understandable to humans. The approach for visualizing the units in CNNs is referred to as feature visualization (Olah et al., 2017). The “unit” can be a convolution neuron, a convolution channel, a convolution layer, a neuron, a hidden layer, or the probability of the predicted class in classification. It visualizes a unit through activation maximization, that is, identifying the input that maximizes the activation of that unit. Given a trained CNN (i.e., all the weights of the CNN are fixed), we can formulate the problem of explaining a single neuron as looking for a new image (img) that maximizes the activation of this neuron, as follows: img ∗ = arg max hn,x,y,z (img),
(2.21)
img
where h(·) is the activation function of the neuron, n specifies the layer, x and y describe the spatial position of the neuron, and z is the channel index. We can also maximize the mean activation of the entire channel z in layer n, as follows: img ∗ = arg max hn,x,y,z (img). (2.22) img
x,y
There are several ways to solve the above optimization problems. For example, we can generate new images starting with random noise. By further adding different regularizations such that only small changes are allowed and that reduce variance of neighboring pixels, we can generate more meaningful visualizations. Another approach is to search for images in the training set that maximize the activation instead of generating new images. The issue of this approach is that the elements of an image can be correlated and it is unclear which region of the image the CNN is really looking for. For instance, when the found training image that maximizes the activation of a channel includes both a dog and a ball, we cannot know whether it is the dog or the ball that causes the action. Interpretability of RNN. Similar to images, researchers have been also interested in explaining DNNs widely used in NLP, RNNs particularly. RNNs have been one of the most successful class of neural networks designed for sequential data (e.g., language) and time series data (e.g., weather records). To uncover the myth of RNNs, one common task in NLP is to explain RNNs for language modeling, which
Theories in Socially Responsible AI
45
targets predicting the next token given its previous tokens. Multiple findings have shown that RNNs can indeed learn useful representations, e.g., (K´ad´ar et al., 2017; Karpathy et al., 2015; Peters et al., 1802). By looking for the inputs that maximally activate the units in the last layer of RNN, these studies found that some units of RNN are able to learn complex language characteristics such as syntax, semantics, and long-term dependencies. In a study of character-level language models (Karpathy et al., 2015), for example, the researchers reveal that although most of the RNN units are not understandable to humans, there exist interpretable units responsible for keeping track of long-range dependencies such as quotes, lengths, and brackets. A work focused on word-level language models (K´ad´ ar et al., 2017) further shows that some units pay selective attention to lexical categories and grammatical functions, some units are relatively more sensitive to words with a syntactic function, and others can even pass the activation values to the subsequent time steps. These findings indicate that RNNs are able to capture long-term dependencies in language and linguistic characteristics. 2.2.4.
Discussion
Interpretability is a multi-faceted concept and there is no formal, technical, and agreed upon definition. While we mostly discussed techniques to enhance model interpretability, interpretability itself goes beyond machine learning models and relates to data collection, stakeholders, tasks, performance metrics, and so on. So, it is critical to work with domain experts, users, and different stakeholders to figure out what kind of interpretability is desired. Before we use any interpretable machine learning models, a question to be asked is as follows: “Do we really need interpretability?” In tasks with less risk such as movie recommendations, there may be no big consequences for unacceptable results. Therefore, interpretable machine learning requirement for such tasks is low. Appropriate justification of the desire for interpretable machine learning can help avoid an unnecessary sacrifice of model performance as interpretability often comes at a cost. An equally (if not more) challenging issue in interpretable machine learning might be evaluation. There is no direct measurement; instead, we have to rely on measurable outcomes: Are the explanations useful to individuals? Can users predict the outcome based on
46
Socially Responsible AI: Theories and Practices
the explanations? How well can people detect a mistake? Some simple human-independent metrics such as size (e.g., number of nodes in a decision tree and size of a location explanation) cannot capture the semantics of the model. So, human-based evaluation based on usefulness, relevance, coherence with prior belief, and so on, is essential. Research efforts to develop an interpretability benchmark platform are still underway. 2.3.
Privacy
On May 26th, 2022, the social media giant Twitter agreed to a settlement including a $150 million civil penalty with the Department of Justice and Federal Trade Commission (FTC).6 This is in response to the complaint filed two days earlier by the US government stating that Twitter used private information for targeted ads. So what happened? It turned out that from May 2013 to September 2019, Twitter had been collecting users’ private information such as their telephone numbers and email addresses in order to help companies target potential consumers, while Twitter’s official reason for doing this was to address the account security issues. History is certainly not short of similar stories, but it is not until recently that the issue of data privacy has been lifted to the level of significance in public. Next time when you sign up for an app, there are some important questions that you might want to ask such as: Does the app store the data from users in a centralized database? Who has access to the data? Who gets to know users’ identities and other private information? How will the company use the data? Will it inform users about the possible use of the data and ask for their agreement before using it? “Data privacy” typically refers to the handling of critical personal information, also called “personally identifiable information” (PII) and “personal health information” (PHI). In the business context, data privacy also concerns the information that helps the company to operate. In Section 1.5.3, we discussed data bias and its potential harm to society during data usage. This section asks whether it is even right to collect and use certain data in the context of training 6 https://www.justice.gov/opa/pr/twitter-agrees-doj-and-ftc-pay-150-millioncivil-penalty-and-implement-comprehensive
Theories in Socially Responsible AI
47
AI systems. The capability of deep learning models has been greatly improved by emerging powerful infrastructures such as clouds and collaborative learning for model training. The fuel of this power, however, comes from data, which might contain users’ sensitive information. This has raised growing privacy concerns regarding issues such as the illegitimate use of private data and the disclosure of sensitive data (Boulemtafes et al., 2020). This section discusses the techniques for privacy preservation. We will start with traditional privacy models designed for tabular and micro data. It then looks into privacypreserving techniques applied to rich user-generated data on social media. 2.3.1.
Traditional Privacy Models
Two main use cases in preserving privacy are data publishing and data mining. Privacy-preserving data publishing (PPDP) publishes fully disclosed data without violating privacy via anonymization; privacy-preserving data mining (PPDM) queries the data while controlling the privacy at the individual level. Accordingly, the major techniques used in the two cases are syntactic anonymity for PPDP and differential privacy for PPDM. In both cases, the basic elements include (1) an explicit identifier, (2) a quasi-identifier, and (3) sensitive attributes. An explicit identifier is a set of attributes that reveal the identity of a person, e.g., name and social security number (SSN). A quasi-identifier is a set of attributes that do not uniquely identify a person but can reveal her identity when linked together through the re-identification process. Examples are zip code, gender, and group membership. Sensitive attributes are information people do not want to reveal such as salary, medical record, and disability status. These three sets of attributes do not overlap. The general pipelines of these two use cases are illustrated in Figure 2.11. 2.3.1.1.
PPDP via syntactic anonymity
Syntactic anonymity works by modifying the quasi-identifier to protect data privacy via suppression, generalization, and shuffling. In this section, we introduce several major techniques. k-anonymity is the simplest form and also one of the first techniques for protecting data privacy (Samarati and Sweeney, 1998; Sweeney, 2002). By suppressing (e.g., replacing the values with null
Socially Responsible AI: Theories and Practices
48
Privacypreservaon
Data Publishing
Data Mining
Syntacc Anonymity
Differenal Privacy
Quasi-idenfiers
Sensive aributes
Suppress, generalize, or shuffle Quasi-idenfiers
Sensive aributes
Quasi-idenfiers
Sensive aributes
Add noise
Quasi-idenfiers
Sensive aributes
Fig. 2.11: The pipelines of the two traditional privacy-preserving use cases: Privacy-preserving data publishing (PPDP) and privacy-preserving data mining (PPDM). Figure adapted from (Varshney, 2022).
value) or generalizing the quasi-identifier (e.g., {Engineer, Artist} → Professional), the aim of k-anonymity is to anonymize each sample in the dataset such that it is indistinguishable from at least k − 1 other samples w.r.t. the quasi-identifier. So, it protects data against identity disclosure, which occurs when an individual is linked to a particular record in the released data. For each group, there are k records and the probability of linking a victim to a specific record is at most 1/k. With a total number of N samples in the data, there should be N/k groups in the anonymized dataset after applying k-anonymity. Each group of k-anonymous records that share the same value for quasi-identifier attributes is referred to as the equivalence class. To preserve as much information as possible, a minimum number of generalizations and suppressions is performed in k-anonymity.
Theories in Socially Responsible AI
49
k-anonymity is vulnerable against two simple attacks: homogeneity attack and background knowledge attack (Machanavajjhala et al., 2007). In the first attack, the adversary can infer a person’s sensitive attributes if the sensitive values in an equivalence class lack diversity. The second attack takes advantage of additional information of subgroups having specific distributions of sensitive attributes (e.g., it is known that the Japanese have an extremely low incidence of heart disease) to infer possible sensitive attribute values of individuals. In summary, k-anonymity cannot protect against attribute disclosure; that is, when new information about some individuals is revealed after releasing the data. l-anonymity (Machanavajjhala et al., 2007) aims to protect data against homogeneity attacks and background knowledge attacks. To ensure that the sensitive attribute values in each equivalence class are diverse, it requires at least l well-represented values for the sensitive attributes in each class. We can then use the following two instantiations to check the l-diversity principle: entropy l-diversity and recursive (c, l)-diversity. Entropy l-diversity requires that: (1) each equivalence should have a sufficient number of different sensitive attribute values and (2) each sensitive attribute value must be distributed uniformly enough. Recursive (c, l)-diversity requires that the most frequent value should appear frequently enough in the dataset. The weakness of l-anonymity is that it cannot protect the privacy of data when the distribution of sensitive attributes in the equivalence class is different from the distribution in the entire dataset (Li et al., 2007). For example, when the sensitive attributes have skewed distributions, l-anonymity is vulnerable against the skewness attack. Another attack is known as the similarity attack, which happens when the sensitive attributes in an equivalence class are distinct but semantically similar. For example, given a 3-diversity dataset in which Disease is the sensitive attribute with three different values: {lung cancer, stomach cancer, skin cancer}, then an attacker linking an individual to this dataset can easily infer s/he has cancer. t-closeness (Li et al., 2007) protects against attribute disclosure and also addresses the limitations of l-anonymity. It ensures the closeness between the distribution of a sensitive attribute of each k-member group in an equivalence class and the distribution of the same sensitive attribute in the overall dataset. “Closeness” is defined
50
Socially Responsible AI: Theories and Practices
as the distance between the two distributions being smaller than a threshold t. A dataset is said to have t-closeness if all equivalence classes satisfy the t-closeness principle. However, t-closeness is computationally expensive and it protects data against attribute disclosure but not identity disclosure. Principles of k-anonymity, l-anonymity, and t-closeness can be interpreted in terms of mutual information. Let X be the random ˜ the random variable of quasi-identifiers in the original dataset, X variable of the quasi-identifiers in the anonymized dataset, and A the random variable of sensitive attributes, then the three principles can be specified as follows: ˜ ≤ log N , (1) k-anonymity: I(X, X) k ˜ ≤ H(A) − log l, (2) l-anonymity: I(A, X) ˜ ≤ t. (3) t-closeness: I(A, X) I(·) denotes mutual information of two random variables and H(·) denotes the entropy. By formulating them in the common statistical language of information theory, they can be studied alongside other problems in socially responsible AI within a broader context. 2.3.1.2.
PPDM via differential privacy
In contrast to PPDP that releases an anonymized dataset that can be used freely, PPDM still maintains the control of the dataset but allows others to query the dataset. Some example queries are to return a specific row/column, the mean of this row/column, or even a machine learning classifier trained on the dataset. Differential privacy (Dwork, 2008) is a powerful technique in PPDM as it can provide a strong privacy guarantee. It has been commonly used in both industry (e.g., Google and Apple) and academia. The core idea is adding statistical noise to sensitive attributes before sharing the data, as shown in Figure 2.11. The intuition behind differential privacy is that the risk of disclosing a person’s privacy should not be increased as a result of his/her participating in a database (Dwork, 2008). For example, suppose that we have a dataset D1 with N social media users. Later, a new user joined and was added to D1 , generating a new dataset D2 with N + 1
Theories in Socially Responsible AI
51
users. Now, a query function f (·) asks the number of users older than 50. To protect users’ sensitive information such as age, a differential privacy system returns a noisy version of f (D) by adding a random value to it: f (D) + noise. Formally, differential privacy is defined as follows: Definition 2.2 (Differential Privacy). Given two datasets D1 and D2 differing in at most one element, a query function f (·), a mechanism M (·) with an output range R satisfies -differential privacy for D1 and D2 iff P M (f (D1 )) = R ∈ R ≤ e . (2.23) P M (f (D2 )) = R ∈ R
> 0 is a small positive parameter specifying how much privacy we need, i.e., privacy budget. Therefore, it is also referred to as
-differential privacy. f (·) can be the following question: “How many social media users are older than 50?” M (·) is a random function that returns the results of a query, e.g., an algorithm gives the answer of “1000 users older than 50”. R in this example would be {0, 1, . . . , N }. When = 0, e = 1, or D1 and D2 are required to be equal. This means that the two datasets are indistinguishable, i.e., we cannot tell the difference in the query result after the new user was added, which is exactly the goal of differential privacy. Differential privacy can be either interactive or non-interactive. In an interactive setting, customers query the dataset D and the data publisher responds to the customer with M (f (D)). By contrast, the non-interactive models transform D into a new anonymized dataset D = M (f (D)). D is returned to the customer who can then make arbitrary statistical queries. The major challenge of differential privacy is figuring out the kind of noise and the amount we should add to f (D). Common noise includes adding Laplacian or Exponential noise to the query answer. The “how much” question depends on the easiness of the query, which can be quantified using global sensitivity. It measures how much an element in the dataset impacts the query value. An easier query, that is with smaller global sensitivity, needs lower strength of noise to achieve -differential privacy. For instance, the amount of added Laplacian noise exclusively depends on the global sensitivity (GS) of
52
Socially Responsible AI: Theories and Practices
the query function. Global sensitivity is defined as GSf = Δ(f ) =
max
D1 ,D2 :d(D1 ,D2 )≤1
f (D1 ) − f (D2 )1
(2.24)
for any two datasets D1 and D2 that differ in at most one element. · 1 represents the 1 norm and d(·) is the distance between D1 and D2 . The amount of added Laplacian noise is then drawn from the following Laplacian distribution: noise ∼ Lap(
− Δ(f ) ) ∝ e Δ(f ) .
(2.25)
The random function M (·) is defined as M (f (D)) = f (D) + noise.
(2.26)
M (·) is most effective when Δ(f ) is small, i.e., removing any instance from the dataset would not change the output of the query. A large Δ(f ) implies that there is a great difference hidden by noise generated by the data publisher. The differential privacy guarantee may not hold if there are correlations or dependencies among different data samples (Kifer and Machanavajjhala, 2011; Liu et al., 2016). For example, it is possible to infer the location of a social media user from differentially private responses by looking into his/her social relationships. The reason is that differential privacy underestimates the amount of noise added to guarantee privacy when data are correlated (Liu et al., 2016). To address this limitation, one may explicitly consider correlations in the data via dependent differential privacy (DDP) (Liu et al., 2016). First, let us define two dependent neighboring datasets. Definition 2.3 (Dependent Neighboring Datasets). Two datasets D1 (depsize , depprob ) and D2 (depsize , depprob ) are dependent neighboring datasets if the change of one sample in D1 (depsize , depprob ) causes change in at most depsize −1 other samples in D2 (depsize , depprob ) for the probabilistic dependence relationship depprob among the samples. Let us break it down a little bit: depsize is the dependence size denoting the number of samples in D that are dependent on at most
Theories in Socially Responsible AI
53
depsize − 1 samples. depprob is the probabilistic dependence relationship among the depsize dependent samples. This dependence can be caused by social, behavioral, or genetic relationships. An instance of depprob is a probabilistic dependence in a social network. A generalized version of differential privacy is then defined as follows: Definition 2.4 ( -Dependent Differential Privacy). Given the query function f (·), a mechanism M guarantees -dependent differential privacy for any two dependent neighboring datasets D1 (depsize , depprob ) and D2 (depsize , depprob ) with an output range R iff P r M f (D1 (depsize , depprob )) = R ∈ R ≤ e . (2.27) P r M f (D2 (depsize , depprob )) = R ∈ R We can see that the differences lie in the consideration of data correlation quantified by depsize and depprob . Accordingly, we can incorporate conventional Laplacian noise with data correlation parameterized by a dependence coefficient between two different samples (Liu et al., 2016). Similarly, we can also interpret differential privacy from the information theory perspective. Let Y˜ be the output of M (·), i.e., the noisy query results. The goal of differential privacy is to make two probabilities — P (Y˜ |D1 ) and P (Y˜ |D2 ) — as close as possible. This is the same as wanting the mutual information I(D, Y˜ ) between the dataset D and noisy output Y˜ to be minimized. Obtaining zero mutual information indicates that the query learns nothing from the dataset. 2.3.2.
Privacy for Social Graphs
The ubiquity of social graphs, e.g., friendships, follower–followee relations, mobility trace, and spatio-temporal data, mandates the attention to privacy issues of graph data. It is known that the graph structure can be used as a quasi-identifier for graph data. Therefore, preserving privacy for social graphs is more complex. In this section, we first study how to de-anonymize graphs (i.e., attacks) and then discuss potential solutions to anonymizing graph data (i.e., defense). Graph De-anonymization. Graph de-anonymization is a kind of technique for attacking an anonymized graph to infer users’ private
54
Socially Responsible AI: Theories and Practices
information. But why do we need an anonymized graph in the first place? Although many of these graph data come from online social media platforms where users have explicitly chosen to publish their social links, there are domains where users expect strong privacy such as e-mail and messaging networks and those “members-only” online communities. These domains are social networks in their “purest forms”: no user attributes, no timestamps, no text, and including exclusively plain nodes and edges. What we are often interested in is the graph structure: the links (e.g., who corresponded with whom), connectivity, node-to-node distances, and so on. To reserve the research value of these graph data while protecting users’ private “names” (e.g., email address and phone number), one common solution is to replace the “names” with a random user ID. But does it work? Unfortunately, even with one copy of an anonymized social graph, adversaries can de-anonymize individuals by combining contextual information and the anonymized graph G (Backstrom et al., 2007). Depending on whether the attack occurs before or after releasing the anonymized graph, we have two kinds of attacks: active attacks and passive attacks. In active attacks, the attacker creates k (it is a small integer) new user accounts, which are connected to form a subgraph H. The goal is to find out whether the edge exists between each pair of users (i, j) from a set of targeted users {1, 2, . . . , U }. The attacker then uses these k users to create connections with {1, 2, . . . , U } before the anonymized graph is produced. By finding the copy of H that it planted in G, the attacker can locate the targeted users and infer the edges among them. Typically, H is generated with a pattern that can stand out in G. The attacker in passive attacks tries to deanonymize an already anonymized (often undirected) graph. In this case, the attackers themselves are among the anonymized users in G. What they do is identify themselves in the released graph and then infer the relations among users to whom they are connected. The motivation behind a passive attack is that if a user can collude with a coalition of k − 1 friends after releasing G, s/he will be able to find out other nodes connected to the coalition and learn the edge information among them. The tricky parts of active attacks are efficiently identifying H in G and wrongly assuming that attackers can always access the graph before its release. Passive attacks are susceptible to some defense
Theories in Socially Responsible AI
55
approaches (e.g., (Al-Qurishi et al., 2017)). Therefore, an improved attack assumes that the attacker has access to another unanonymized graph (also known as background or auxiliary graph knowledge) which shares the anonymized graph with some users (Narayanan and Shmatikov, 2009). Among these shared users, it also assumes that the attacker knows the information of a small set of users, i.e., the seed users. This type of attack is referred to as a social graph de-anonymization attack and is defined as follows (Narayanan and Shmatikov, 2009): Definition 2.5 (Social Graph De-anonymization Attack). Given an auxiliary graph Gu = (Vu , Eu ) and a target anonymized graph Ga = (Va , Ea ), a de-anonymization scheme is to find a 1-1 mapping function σ: Gu → Ga . Under σ, ∀i ∈ Va , its mapping is σ(i) ∈ Vu . An identity is revealed under σ if we find i = σ(i). The background knowledge can be obtained through, e.g., data aggregation, data mining, collaborative information systems, and knowledge/data brokers. In contrast to attacks that need seed users, seed-free de-anonymization does not need any seed users. Let us take a look at how these two lines of approach work. Seed-based De-anonymization. Suppose that a service provider would like to use human mobility patterns for traffic forecasting. One solution is to use a distributed mobile sensor computing system to collect location information from GPS sensors in cars to infer traffic conditions. To protect car owners’ identities, the provider also uses a simple anonymization method that removes personally identifying information (PII), such as their name, zip code, and gender. However, later, the provider was shown that by employing the social network information in Meta data, the identity information of these car owners can be easily inferred. In this example, the auxiliary graph Gu with vertex set Vu is the social network on Meta, and the target anonymized graph Ga with vertex set Va is the network recording the vehicular mobility trace. The seed users can be social influencers or celebrities whose identities are easy to disclose. There are two major steps in seed-based de-anonymization: (1) A set of seed users are mapped from Ga to Gu and thus are re-identified. (2) The mapping and de-anonymization are propagated from the seed users to the remaining unidentified users. Let us take a look at these steps in more detail with examples.
56
Socially Responsible AI: Theories and Practices
A popular de-anonymization approach exclusively uses graph topology (Narayanan and Shmatikov, 2009), e.g., node degree. It begins with the re-identification of seed mappings using specific graph properties such as high degree and some special structures. In the second step, both the seed users and graph topological information are used to expand the mappings. This propagation process is iterated until no more users in Ga can be identified. The same idea can also be applied to de-anonymizing the network of mobility traces, using discriminative features (Srivatsa and Hicks, 2012). To see what we mean here, let us first take a look at a simplified example shown in Figure 2.12. The orange nodes represent a public social network on Meta with four users {v1 , v2 , v3 , v4 }. Their corresponding anonymized version in the mobility trace network is denoted by the blue nodes {v1 , v2 , v3 , v4 }. Suppose v1 and v2 are two landmark nodes and can be easily deanonymized (i.e., they are seed users), that is, we successfully find the mappings v1 → v1 and v2 → v2 . Then, we can use the following discriminative features to de-anonymize v3 , v4 : v3 is the neighbor of both v1 and v2 while v4 is the neighbor of v1 . Let N (v) denote the neighbors of user v. Then, the following constraint holds: v3 ∈ N (v1 ) ∩ N (v2 ) and v4 ∈ N (v1 ) \ N (v2 ). Now, we can transfer these constraints in the social network to v3 and v4 in the mobility trace network: v3 ∈ N (v1 ) ∩ N (v2 ) and v4 ∈ N (v1 ) \ N (v2 ). With these constraints, we uniquely map the identities of v3 and v4 to v3 and v4 , respectively. However, if v3 and v4 are both friends of v1 and v2 ,
Fig. 2.12: De-anonymizing a simplified mobility trace network using discriminative features. Blue nodes represent the mobility trace network. Orange nodes represent the social network.
Theories in Socially Responsible AI
57
then we will not be able to differentiate v3 from v4 because there are no discriminative features between v3 and v4 . Seed-free De-anonymization. The weakness of seed-based de-anonymization is finding the right seed users because its effectiveness depends on the size of seed set. Now, the question is can we deanonymize without the seed users? To answer this question, we can start from a social network with one of its simplest forms: the Erd¨ os– R´ enyi (ER) random graph, which assumes that every edge exists with identical probability p. In this example (Pedarsani and Grossglauser, 2011), we assume that there are two unlabeled random graphs over the same vertex set. The goal of the attacker is to match the vertices of these two graphs whose edge sets are correlated but not necessarily equal. The attacker may not observe the complete random graphs and only has access to the structure of these graphs for the re-identification of nodes. The broad result of this approximate graph matching problem is an interesting finding that the mean node degree needs only grow slightly faster than log n with network size n for nodes to be identifiable (Pedarsani and Grossglauser, 2011). So, the answer is “Yes”. However, how to generalize the findings above into real-world scenarios is a challenging question to answer. After all, graphs in the real world are quite different from the ER random graph with the node degree following the Poisson distribution. A more realistic case is that it may follow any distribution, such as the power-law distribution, or exponential distribution, etc. Now, suppose G is a more general graph characterized by a generalized graph model, i.e., the configuration model (Newman, 2018). This means that Ga and Gu are specified by an arbitrary degree sequence that follows an arbitrary distribution. With this problem setting, a new de-anonymization approach (Ji et al., 2014) is described as follows: In each iteration of the attack, two sets of nodes are selected from the anonymized Ga and auxiliary graphs Gu , respectively. We need an error function to quantify how well these two sets of nodes match with each other. Given any deanonymization scheme σ = {(vi , vi ), vi ∈ Vu , vi ∈ Va } ⊆ Vu × Va , we define the error on a user mapping (vi , vi ) ∈ σ as Errvi ,vi = |N (vi ) \ N (vi )| + |N (vi ) \ N (vi )| ∀vi ∈ Vu , vi ∈ Va , (2.28)
58
Socially Responsible AI: Theories and Practices
where |·| denotes the cardinality. Equation (2.28) measures the neighborhoods’ difference between vi ∈ Vu and vi ∈ Va under the particular σ. The overall error over Ga and Gu is obtained by ErrGa ,Gu = Errvi ,vi vi ∈ Vu , vi ∈ Va . (2.29) vi ,vi
We can now map the set of anonymized nodes to nodes in the auxiliary graph by minimizing the error function. How about we de-anonymize in a more heterogeneous environment, such as social media? Can we still successfully de-anonymize an anonymized social graph? Social media data are heterogeneous, i.e., they include various types of data such as structure, textual, and location information. They are rich in content and relationship and contain sensitive information. This presents both pros and cons. On the bright side, we can leverage this rich user-generated information to more effectively de-anonymize a target network; on the other side, we need to consider different ways of anonymizing a graph. For example, given only two types of information, text (e.g., posts) and structure (e.g., friendships), there are four different cases of anonymization, spanning from anonymizing none to anonymizing either one of them, to anonymizing both types of information. To de-anonymize user u, a traditional approach is to find a list of target users in the social network and gather the background knowledge B(v) for each user v before initiating the attack. This process can be time-consuming. Instead, the adversaries can query the social media API anytime during the adversarial process without the target users and background knowledge for starting the attack. Definition 2.6 (Social Media Adversarial Attack (Beigi et al., 2018)). Given an anonymized social media network D, the goal of an adversarial attack is to find 1-1 mapping between each user in D and a real identity in the targeted network J . The difference between this social media adversarial attack and a traditional de-anonymization attack is illustrated in Figure 2.13. We can then use the following three steps to de-anonymize a graph (Beigi et al., 2018): (1) extracting the most revealing information of u via social media API; (2) searching through information in the search
Theories in Socially Responsible AI
Fig. 2.13: De-anonymization via adversarial attacks de-anonymization. Image source: (Beigi et al., 2018).
59
versus
traditional
engine of the targeted social media, which returns a list of candidates whose posts include the query; and (3) identifying the candidateprofiles most similar to user u. The key is the third step, in which the hidden relations between different aspects of social media data can be exploited to define the similarity metric. 2.3.3.
Graph Anonymization
The motivation behind the research on graph de-anonymization attacks is essentially to help understand private data leaks and enhance the capability of graph anonymization techniques. From now on, we will focus on the “defender” side: How to preserve the privacy of users in a social network? Back to the example of the mobility trace network where we previously had access to the anonymized graph; here, we will look at the techniques for anonymizing this mobility trace network while preserving its useful information for downstream applications. Continuing the discussion of traditional privacy-preserving approaches in Section 2.3.1, we discuss the extension of k-anonymity-based approaches and differential privacy-based approaches to graph anonymization.
60
2.3.3.1.
Socially Responsible AI: Theories and Practices
k-anonymity-based graph anonymization
k-anonymity has been one of the most important anonymization approaches for tabular data. It is based on the concept of dividing the data into several anonymity groups of k members. Recall that the goal of k-anonymity is to make the target user indistinguishable from at least k − 1 other users w.r.t. quasi-identifiers. Similarly, for k-anonymity for graphs, we first need a property P to identify a cluster of users that share similar values of P . P can be vertex degree, local neighborhood structure around a vertex, or structural properties in general. Then, we have the following definitions: Definition 2.7 (k-anonymity for Graphs (in terms of neighborhoods)). A graph is k-anonymous if every node shares the same value of P with at least k − 1 other nodes. As a graph can also be represented as an adjacency matrix, another definition is as follows: Definition 2.8 (k-anonymity for Graphs (in terms of records)). A graph is k-anonymous if every row (a record) in the adjacency is repeated at least k times. For example, if P is the node degree, then a k-anonymous graph requires that for each user there are at least k other users with the same degree (Liu and Terzi, 2008); if P is the neighborhood (i.e., the immediate neighbors of the target user), then a k-neighborhood anonymity is achieved when there are at least k − 1 other users who have isomorphic neighborhoods (Zhou and Pei, 2008). The k-neighborhood anonymity may not be as intuitive as the k-degree anonymity. Let us illustrate it with a simplified social network as shown in Figure 2.14(a). Here, we can see that Alice has two 1-hop friends who also know each other, and another two 1hop friends who do not know each other. If the attacker knows this information, then Alice can be uniquely identified in the network because no other users have the same 1-hop graph (Figure 2.14(c)). The same applies to John. Therefore, simply removing identifiers of all users is insufficient as shown in Figure 2.14(b). One solution is to add noise to the graph, e.g., adding a noisy edge between Yelena and Serena (Figure 2.14(d)). Then, Alice and John have the same 1-neighborhood graph, and actually, this applies to every individual
Theories in Socially Responsible AI Alex
61
Ed
John
Cathy
Fred
Tony Alice Yelena
Serena
(a)
John
(b)
Cathy
Alice Serena
Yelena
(c)
(d)
Fig. 2.14: Neighborhood attacks in a social network: (a) A social network; (b) Network with anonymous nodes; (c) The 1-hop neighborhood graph of Alice; (d) Privacy-preserved anonymous network. Figure adapted from Zhou and Pei (2008).
user in this anonymous graph. Therefore, k = 2, that is, an attacker with the knowledge of the 1-neighborhood graph cannot identify any individual from this anonymous graph with a confidence higher than 1 2 . Another solution is to generalize the labels of some nodes toward a common label. What we have discussed so far are anonymization approaches that assume the same privacy level. But does every individual always have the same privacy preference? The answer is a firm “No”. We have different privacy preferences, e.g., some like to share our locations on social media while others do not; females are typically more sensitive to their age information compared to males. As a matter of fact, social media apps (e.g., Meta) often allow users to select information they want other people to know. This suggests that we might want to consider different privacy levels. In (Yuan et al., 2010), the three levels for the attacker’s knowledge about the target user are as follows: (1) Level 1: only users’ attribute information (e.g., Bob’s age is 26); (2) Level 2: both attribute and degree information (e.g., Bob’s age is 26 and his degree is 3); and (3) Level 3: a combination of attribute,
62
Socially Responsible AI: Theories and Practices
node degree, and neighborhood information (e.g., Bob’s age is 26, his degree is 3, and Bob’s three connections’ types are researcher, businessman, and reporter). Accordingly, for Level 1 protection, one can use node label generalization for graph anonymization. For Level 2 protection, one can add noisy nodes/edges based on the protection at Level 1. For Level 3 protection, an edge label generalization may be added on top of the protection of Level 2. 2.3.3.2.
Differential privacy-based graph anonymization
Differential privacy is another technique for releasing accurate graph statistics while preserving a rigorous notion of privacy. Continuing our discussion about differential privacy on traditional data such as tabular data in Section 2.3.1.2, in this section, we will learn how to apply differential privacy to graph data. First, let us recall the intuition behind differential privacy: for every record, the output distribution of the algorithm is similar with or without the existence of this record in the dataset. We can then extend the definition of differential privacy to graph data as follows: Definition 2.9 (Differential Privacy for Graph). Given a privacy budget > 0, an algorithm A is -differentially private if all pairs of neighbor graphs (G, G ) and all set S of possible outputs produced by A satisfy the following condition: P [A(G) ⊆ S] ≤ e · P [A(G ) ⊆ S].
(2.30)
A pair of graphs (G, G ) is considered to contain neighbors if one can be obtained from the other by deleting/adding a node and its adjacency edges. Accordingly, there are two variants of differential privacy for graphs. Node-differential privacy defines neighbor graphs based on deleting a node while edge-differential privacy is based on deleting an edge. Node-differential privacy is stronger than edge-differential privacy because it protects nodes and their adjacent edges, that is, all information pertaining to this node. It is also more challenging to achieve node-differential privacy. We want to design differentially private algorithms (i.e., privacy) A that are accurate (i.e., utility) on realistic graphs. Suppose the analyst needs to evaluate a real-valued function f on the private input graph G, then our goal is to release a good approximation of the true
Theories in Socially Responsible AI
63
value of f . We can achieve this by minimizing the expectation of the error: ErrA (G) = |A(G) − f (G)|.
(2.31)
Accuracy can regard graph statistics such as the number of edges, small subgraph (e.g., triangles) counts, and degree distribution. Utility and privacy are often conflicting goals: it is impossible to have both in the worst case. Therefore, the second-tier solution is ensuring differential privacy for all graphs, and accuracy for a subclass of graphs. Edge-differential Privacy. The first approach for edge-differential privacy is proposed by Nissim et al. (2007), in which local sensitivity (as opposed to global sensitivity, which we introduced in Eq. (2.24)) was first defined and exploited. Local sensitivity (LS) considers instance-specific additive noise; that is, the amount of noise depends not only on the query function f but also on the data itself: LSf (G) = max f (G) − f (G )1 . G
(2.32)
The maximum is taken over all node neighbors G of G. Based on this, it was then shown how to estimate, with edge-differential privacy, the cost of a minimum spanning tree and the number of triangles in a graph. These techniques and results were further extended and investigated in a variety of tasks such as subgraph counts (Karwa et al., 2011), degree distributions (Hay et al., 2009), and parameters of generative statistical models (Mir and Wright, 2012). Instead of directly perturbing in the data domain, one can also project the data to other domains (e.g., the graph spectral domain) or the parametric model space that characterizes the observed graph. For example, perturbation can be done in the eigenvalues and eigenvectors of the corresponding adjacency matrix (Wang et al., 2013). The weakness of these approaches is that they require adding a massive amount of noise, which demands high computational cost. An alternative solution uses node connection probabilities rather than the presence of or absence of the observed edges (Xiao et al., 2014). The intuition is, by using node connection probability, we can capture understandable and statistically meaningful properties of the graph while simultaneously significantly reducing the magnitude of
64
Socially Responsible AI: Theories and Practices
noise added to hide the change of a single edge. But how do we calculate the connection probabilities? One approach makes use of a statistical hierarchical random graph (HRG) model (Clauset et al., 2008), which maps nodes into a hierarchical structure (referred to as a dendrogram) and records connection probabilities between any pair of nodes in the graph. There are three steps in this approach: (1) differentially privately sample a good dendrogram Tsample ; (2) compute the probabilities associated with Tsample ; and (3) generate the anonymized graph according to the identified HRG. Here, the total privacy budget is divided into 1 used in (1) and 2 used in (2). As expected, the noise magnitude is significantly reduced in comparison to directly using observed edges. Node-differential Privacy. More recent research started tackling the more challenging case of node-differential privacy. The goal of node-differential privacy algorithms is to ensure that their output distribution does not change significantly when deleting/adding a node and its adjacent edges. The major difficulty in the design of node-differential privacy is that real-world graphs are typically sparse. When inserting a well-connected node, the properties of a sparse graph can be altered tremendously. For example, for basic graph statistics such as the number of edges and the frequency of a particular subgraph, their changes can swamp the data statistics in sparse graphs one wants to release. Therefore, the utility of the private algorithms can plummet. To make it less sensitive, one solution is to “project” the input graph onto the set of graphs with bounded degrees; that is, the maximum degree is below a certain threshold (Kasiviswanathan et al., 2013). The benefits are two-fold: (1) node privacy is easier to achieve in bounded-degree graphs as the change of a node only influences a subgraph and (2) we lose relatively less information when the degree threshold is selected carefully. Let us get to some details. First, given the set of all n-node graphs Gn , G ∈ Gn and G ∈ Gn differ in nodes by the deleting/adding operation. The 1 global node sensitivity of a function f : Gn → Rp is defined as follows (Dwork et al., 2006): Δf =
max
G,G node neighbors
f (G) − f (G )1 .
(2.33)
For example, the number of edges in G has node sensitivity n since rewiring a node can add or delete at most n nodes. However, the number of nodes has node sensitivity 1 because only 1 node can be
Theories in Socially Responsible AI
65
rewired. The -node private algorithm is the output of the Laplace mechanism: A(G) = f (G) + Lap(Δf / )p , i.e., the i.i.d. Laplacian noise is added to each entry of f . Given a set S of “preferred” graphs (e.g., a set of graphs with a maximum degree) that may contain input graph G, the Lipschitz constant (also referred to as restricted sensitivity) of f on S can be defined as Δf (S) = max
G,G ∈S
f (G ) − f (G)1 , dnode (G, G )
(2.34)
where dnode is the node distance between two graphs, e.g., the number of node insertions and deletions needed to go from G to G . Then, there exists an algorithm AS that is -differentially private such that for all G ∈ S, E[AS (G) − f (G)] = O(Δf (S)/ 2 ).
(2.35)
Here, we try to add Laplacian noise proportional to the Lipschitz constant of f on S. Instead of finding one “nice” subset, another approach is to add noise proportional to a quantity related to, but often much smaller than, the local sensitivity (Chen and Zhou, 2013). It is known as down sensitivity. The down sensitivity of f at a graph G is the Lipschitz constant of f when restricted to the set of induced subgraphs of G: DSf (G) =
max
H,H neighbors,HH G
|f (G ) − f (G)|,
(2.36)
where H G describes that H is an induced subgraph of G, i.e., H can be obtained by deleting a set of nodes from G. 2.3.4.
Discussion
Despite the variety of privacy-preserving machine learning techniques and laws to protect users’ private data, non-privacy-aware machine learning algorithms are still being developed and used, and users’ private data are still being collected without consent. There might be several issues that we need to consider before deploying these privacy-preserving machine learning techniques (Al-Rubaie and Chang, 2019): (1) Flexibility. Many of the existing privacy-preserving algorithms are closely tied to specific machine learning algorithms. Given that AI and machine learning fields are constantly filled with
66
Socially Responsible AI: Theories and Practices
new ideas and advances, these techniques may need to be repurposed regularly to adapt to new AI algorithms. (2) Scalability. Since some privacy-preserving machine learning techniques impose additional processing and communication costs, they can have limited practical use, especially when used to cope with large-scale data. (3) Policy. It is unclear if privacy policies that help protect users’ private information (e.g., specifying what to be shared) will have the same binding force on the client side. It is important to transform the policies that can limit all other uses with potential privacy threats. 2.4.
Distribution Shift
The Epic system is one of the largest providers of health information technology and is used primarily by large US hospitals and healthcare systems. It has developed many widely used clinical AI systems to help clinicians such as identifying hard-to-spot cases. Among these is a sepsis-alerting model which calculates and indicates “the probability of a likelihood of sepsis”. This alerting model had to be deactivated in April 2020 due to its malfunction reported by the researchers at the University of Michigan Medical School in Ann Arbor.7 The tool, they wrote, “identifies only 7% of patients with sepsis who were missed by a clinician . . . highlighting the low sensitivity of the [Epic Sepsis Model] in comparison with contemporary clinical practice. The [Epic Sepsis Model] also did not identify 67% of patients with sepsis despite generating alerts on 18% of all hospitalized patients, thus creating a large burden of alert fatigue”. The question is, what happened during 2019–2020 that could cause this dramatic performance degradation? It is probably too easy to get this answer wrong: the COVID-19 pandemic. When numerous people with different demographic characteristics were hospitalized due to the corona virus, it fundamentally changed the relationship between fevers and bacterial sepsis, resulting in many spurious alertings. The hospital’s clinical AI governing committee then discontinued the use of this sepsisprediction model. From the above clinical AI example, we might summarize some common causes of distribution shifts (Finlayson et al., 2021): 7 https://www.fiercehealthcare.com/tech/epic-s-widely-used-sepsis-predictionmodel-falls-short-among-michigan-medicine-patients
Theories in Socially Responsible AI
67
(1) changes in technology, such as adopting a new module, or routine updates to an existing platform; (2) changes in population and environment, such as demographic characteristics. For example, a model trained in predominantly White populations may underperform on patients from underrepresented groups; and (3) changes in behavior, such as the over-reliance on the AI system, worsening the sensitivity of a clinician to disease; i.e., automation bias. The solutions may not be as obvious as the causes. We need to develop more reliable AI systems that are robust against distribution shift and prepare the clinicians to identify circumstances in which AI systems fail to perform their intended function reliably. Different Types of Distribution Shifts
2.4.1.
Distribution shift describes situations in which the distributions of (train) the training data and the test data are not identical: PX,Y (x, y) = (test)
PX,Y (x, y). There are different types of distribution shifts. To better understand their differences, we briefly discuss the concept of causality. More is covered in Chapter 4. Causality is not the same as correlation, the ability to predict, or statistical dependence. Causality goes beyond prediction by modeling the outcome of intervention and formalizing the act of imagination (i.e., counterfactual reasoning) (Pearl, 2009). In a causal graph, a node Xi is a cause of node Xj if there exists a causal path Xi → · · · → Xj from Xi to Xj such that intervening on Xi affects Xj . Because causal graphs are Directed Acyclic Graphs (DAGs), all causal paths should not be reversible, e.g., Xj cannot affect Xi . The key operation in causality is doing (Pearl, 2009). do(Xi ) means that we remove all the edges in the causal graph directed toward Xi . Coming back to the distribution shift problem, there are primarily (train) (test) four ways that PX,Y (x, y) is different from PX,Y (x, y): (1) Label shift, also known as prior probability or target shift, is when the label distributions of training and test data are different but the features given to the labels are the same: (train)
PY
(test)
(y) = PY
(train)
(y) and PX|Y
(test)
(x|y) = PX|Y (x|y).
This comes from sampling differences.
(2.37)
Socially Responsible AI: Theories and Practices
68
(2) Covariate shift is when the distribution of features given to the labels are the same between the training and test data, but the feature distributions are different. (train)
PX
(test)
(x) = PX
(x) and
(train)
(test)
PY |X (y|x) = PY |X (y|x). (2.38)
This comes from sampling differences. (3) Concept shift is when the labels given to the features are different but the distributions of features are the same between the training and test data: (train)
PX
(test)
(x) = PX
(x) and
(train)
(test)
PY |X (y|x) = PY |X (y|x). (2.39)
This comes from measurement differences. (4) Conditional shift is when the labels given to the features are different but the distributions of labels are the same between the training and test data: (train)
PY
(test)
(y) = PY
(y) and
(train)
PX|Y
(test)
(x|y) = PX|Y (x|y). (2.40)
This comes from measurement differences. (5) Compound shift refers to any potential combination of two or more types of distribution shifts. Let us use the sepsis-alerting system example to illustrate each type of distribution shift. There will be a label shift if the proportions of people who got sepsis before and after COVID-19 pandemic are different. A covariate shift occurs if the distributions of features of the patients are different; For example, if people diagnosed with COVID19 and sepsis are older or they had been working in poor working environments. There will be a concept shift or conditional shift if the actual mechanism connecting the features and the sepsis changes after the pandemic. An example of a concept shift is that sepsis may have been more common in people who are very young before the pandemic, while after the pandemic, sepsis may be more common in the elderly due to other health problems. If the symptoms caused
Theories in Socially Responsible AI
69
by sepsis also differ before and after the pandemic, then we have conditional shift. Alternatively, we can also use causal graphs to describe the different types of distribution shifts. Given the environment variable E (unobserved) that might change the features and labels, the corresponding causal graphs for covariate shift, label shift, concept shift, conditional shift, and examples of the compound shifts, are presented in Figure 2.15. Therefore, label shift and conditional shift are anti-causal learning problems (i.e., Y → X, predicting causes), while covariate shift and concept shift are causal learning (i.e., X → Y , predicting effects) problems.
Covariate Shi
Label Shi
( )≠
( )≠
( )
( )
(a)
Concept Shi
Condional Shi |
( | )≠
( | )
|
|
( | )≠
|
( | )
(b)
( )≠
Covariate Shi:
( )
( )≠
Label Shi:
& Concept Shi:
|
( )
& ( | )≠
|
( | )
Condional Shi:
|
( | )≠
|
( | )
(c)
Fig. 2.15: Summary of different types of shifts, and their inherent casual relationships. The unobserved variables are drawn with dashed lines. The red circle is used to emphasize the causal relations.
70
2.4.2.
Socially Responsible AI: Theories and Practices
Mitigating Distribution Shift via Domain Adaptation
There are two scenarios in distribution shift mitigation depending on the availability of the unlabeled test data during training. When you have access to the unlabeled test data X (test) , the goal becomes learning a concept from the labeled training data that can be well adapted to the test data. This problem is referred to as domain adaptation, the training data are referred to as the source domain, and test data are referred to as the target domain. The other scenario is that you do not have any information about the test data and you need to make the model generalizable and robust to any unseen distributions. This is a problem of domain generalization, which is typically more challenging. In this section, we walk through some popular domain adaptation methods for label shift, covariate shift, concept shift, and conditional shift. 2.4.2.1.
Label shift
The label shift problem typically surfaces in diagnosis (e.g., diseases cause symptoms) and recognition (e.g., objects cause sensory observations) tasks. For example, during the COVID-19 outbreak, P (Y |X) (e.g., the probability of pneumonia given cough) might rise but P (X|Y ) (e.g., the probability of cough given pneumonia) might stay the same. To simplify the notation, P in the following denotes the source distribution and Q denotes the target distribution; p and q denote the probability density function (pdf) or probability mass function (pmf) associated with P and Q, respectively. Under label shift, the distribution in the target domain can be factorized as the following: q(y, x) = q(y)q(x|y) = q(y)p(x|y).
(2.41)
The second equation is based on Eq. (2.37). Compared to covariate shift, label shift is curiously under-investigated. As the relationship between features and labels does not change, the adaptation is then between p(y) and q(y). One simple and effective solution for label shift is through weighting, i.e., estimating the ratio wl = q(yl )/p(yl ) for each label l with training data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), unlabeled test data [x1 ; . . . ; xm ], and a black box predictor f : X → Y . The difficulty is to approximate q(y) in the target distribution.
Theories in Socially Responsible AI
71
In addition to the assumption A.1 implied in Eq. (2.37) that PX|Y (x|y) = QX|Y (x|y), we also need the following two assumptions: A.2: For every y with q(y) > 0, we require p(y) > 0. A.3: The expected confusion matrix Cp (f ) := p(f (x), y) is invertible. The second assumption, A.2, basically requires that the target label distribution’s support should be a subset of sources. For discrete Y, this means that the source domain should include examples of all classes. The last assumption, A.3, describes that the outputs from f for each class are linearly independent. It often holds in the typical case in which given an image actually labeled as yi , f predicts it as class yi more often than any other class. With these assumptions, w ˆ is obtained by solving a linear system: W x = b, where W is computed based on the training data and b is the average output of f computed on the test data. In a binary classification problem, the algorithm (Lipton et al., 2018) works as follows: (1) Train a black box classifier on one random split of the training data to get f and compute confusion matrix C ∈ R2×2 on the rest of the training data. (2) Run f on the unlabeled test data and compute the probabilities of positives and negatives as a vector:
p(f (X test ) = 1) b= . (2.42) p(f (X test ) = 0) (3) Compute w ˆ = C −1 b, w ˆ ∈ R2×1 . (4) Reweight the training samples in the first random split and retrain f . 2.4.2.2.
Covariate shift
Most of the works in domain adaptation literature solve the problem of covariate shift, also referred to as sample selection bias. Under the covariate assumption, we can factorize the target distribution as q(y, x) = q(x)q(y|x) = q(x)p(y|x).
(2.43)
The second equation is based on Eq. (2.38). We discuss both conventional methods and more advanced deep learning-based methods next. Importance Reweighting. Similar to label shift, a straightforward solution is by importance reweighting: wi = q(xi )/p(xi ). This is an
72
Socially Responsible AI: Theories and Practices
easier problem compared to label shift as we observe samples from both the source and target distributions. Basically, we can first estimate pˆ(x) and qˆ(x) from the training and test data, respectively. The environment risk minimization-based loss function L (e.g., log loss or least square) can then be reweighted as follows: 1 qˆ(xi ) yˆ(·) = arg min L(yi , f (xi )). n pˆ(xi ) f n
(2.44)
i=1
Instead of estimating the individual densities separately, we can directly estimate the weight, which is substantially easier than density estimations. The idea is to see the domain index as the label and train a classification model such as logistic regression to predict if a sample is from the source or target distribution. A sample with label “1” is from the target distribution, and 0 otherwise. Suppose that the predicted probability of the classifier is s(xi ), the importance weight can be computed as follows: wi =
s(xi ) ntest (1−s(xi )) ntrain
=
ntrain s(xi ) , ntest (1 − s(xi ))
(2.45)
where ntrain and ntest denote the number of samples in the source and target domains, respectively. Generally, the idea of importance reweighting is to upweight the training samples that are similar to the test samples. Eventually, the training samples look like they were drawn from the test distribution. Domain-Invariant Feature Learning. More recent efforts in domain adaptation try to align source and target domains by creating a domain-invariant feature representation, typically in the form of a feature extractor neural network. This is to avoid dealing with the features in the original space but to do so in a latent space that encodes features into representations. A feature representation is domain invariant if its distribution is the same in both the source and target domains. Of course, the implicit assumption here is that the feature representation exists. The goal of domain-invariant feature learning is to estimate a domain-invariant feature extractor as illustrated in Figure 2.16. Methods differ in how to align the source and target domains, i.e., the Alignment Module in Figure 2.16. One straightforward idea is
Theories in Socially Responsible AI
(Labeled) Source Data
Feature Extractor
73
Task Classifier
Class Label
Weights Sharing or Regularizaon
(Unlabeled) Target Data
Feature Extractor
Alignment Component
Fig. 2.16: A general network architecture for domain adaptation methods learning domain-invariant features. The major differences of various methods are the Alignment Component, i.e., how the source and target domains are aligned during training, and whether the Feature Extractors share none, some, or all of the weights between domains.
to minimize the distance between the distributions, often measured by divergence. Some common options of divergence include maximum mean discrepancy (MMD), correlation alignment, contrastive domain discrepancy, and the Wasserstein metric. We use MMD as an illustrative example. In MMD, our goal is to find whether two given samples belong to the same distribution or not. Given a feature map function φ : X → H, where H denotes a reproducing Kernel Hilbert Space, MMD is formulated as MMD(P, Q) = EXtrain ∼P [φ(Xtrain )] − EXtest ∼Q [φ(Xtest )]H . (2.46) What MMD does is to apply a kernel to transform the variables such that all their moments are computed, and the distance between two distributions is then the average of the differences of all moments. Therefore, we have MMD(P, Q) = 0 iff P = Q. Given the source dataset Dtrain and target dataset Dtest , an empirical estimate of MMD is train ntest 1 n 1 MMD(Dtrain , Dtest ) = train φ(xi ) − test φ(xj ) . n n H i=1
j=1
(2.47) φ(·) denotes the feature map associated with the kernel map k(xtrain , xtest ) = φ(xtrain ), φ(xtest ). When applied to domain adaptation, linear kernel is often used for simplicity: k(xtrain , xtest ) = φ(xtrain ) φ(xtest ) + c, where c is an optional constant.
(2.48)
Socially Responsible AI: Theories and Practices
74
Minimizing the divergence may be useful for simple domain adaptation problems but does not work well for larger disparities. Perhaps the most popular approaches for domain adaptation nowadays follow an adversarial training scheme. To provide some context, back in 2014, Ian Goodfellow proposed a revolutionary idea: making two neural networks compete with each other. One neural network tries to generate realistic data such as an image of a dog, and the other neural network tries to discriminate between the real data and data generated by the generator network. The generator network is trained using the discriminator as a loss function and the discriminator network updates its parameters by picking out fake data from real data. Eventually, the game reaches “equilibrium”, where the generator network generates data that fool the discriminator network. That is, the generator accurately creates new data from the same distribution as the training data and the discriminator can only take a random guess. This is a simplistic view of Generative Adversarial Network (GAN) (Goodfellow et al., 2014). We have seen how to use GAN to reduce biases in Section 2.1.2.2. The idea of GAN is naturally applied to domain adaptation, in which the discriminator tries to distinguish the encoding of the training data from that of the test data. The goal is to eliminate the domain differences from the encodings while simultaneously learning the label classifications. As shown in Figure 2.17, there are two encoders, one for the source domain, and the other for the target domain. The classifier (i.e., a label predictor) is used to predict the labels of data from the source domain, and the inputs of the
Source Data
Source Encoder
Classifier
Target Data
Target Encoder
Discriminator
Fig. 2.17: Domain adaptation using adversarial methods. The two Encoders (feature extractors) can share the weights. The Classifier is a Label Predictor and the Discriminator is a domain classifier. The Encoders are trained by minimizing the label prediction loss and maximizing the domain classification loss.
Theories in Socially Responsible AI
75
discriminator (i.e., a domain classifier) include both the source and target feature encodings. The algorithm (Ganin et al., 2016) works as follows: (1) Features from either source or target domains are fed into the Feature Extractor, or Encoder. (2) Extracted features are fed to (a) the label predictor and domain classifier if they are from the source domain, or (b) the domain classifier only if they are from the target domain. (3) Train the label predictor and the domain classifier by optimizing the corresponding classification tasks. The loss function can be cross-entropy. The last step is similar to the GAN objective in that we want the encoders to fool the discriminator by generating source and target encodings that are difficult to differentiate. But in domain adaptation, we also need the same classifier to be effective for both datasets. The optimization process of the feature extractor is a bit different from conventional gradient descent because, in this game, we are minimizing the loss of the label predictor and maximizing the loss of the domain classifier, i.e., to fool the discriminator. Therefore, a gradient reversal layer is used when the error from the domain classifier is back-propagated to the feature extractor. When the “equilibrium” (the stationary point of the min–max saddle point problem) is reached in this min–max game, the feature extractor can produce features that are domain invariant and also useful for label prediction. 2.4.2.3.
Concept shift & conditional shift
Under concept shift or conditional shift, (at least part of) the labeled test data are required because the relationship between the features and the labels changes. Therefore, we need to be cautious to choose which subset of test data to get labels for, assuming that the labeling process is quite expensive. Concept shift or conditional shift typically occurs in an aging system and the goal of the mitigation algorithms is dynamically adapting to settings evolving over time. The challenge of concept shift or conditional shift is distinguishing the outlier of
Socially Responsible AI: Theories and Practices
76
Data
Memory
Learning algorithm
Predicon
Loss funcon
Change detecon
Alarm
Fig. 2.18: A general framework of the adaptive systems for concept shift or conditional shift.
Y from the real shift in the relationship that binds X and Y . Some common solutions are • Periodically retrain the model with the freshly labeled test data. • Periodically retrain the model with both the old labeled data and freshly labeled test data. • Weight the training data inversely proportional to their age. Intuitively, more recent data are more useful. • Use an iterative scheme where the new model learns to correct the most recent model. • Detect the shift and select a new model accordingly. Generally, there are four major components in an adaptive algorithm for concept shift or conditional shift as shown in Figure 2.188 : the Memory component decides which part of the data is used to train the model; the Learning Algorithm predicts the outcome of interest; the Loss estimation function keeps track of the performance of the Learning Algorithm; and the Change Detection component takes the input of the loss estimation and updates the Learning Algorithm when necessary. 2.4.3.
Mitigating Distribution Shift via Domain Generalization
When you do not have access to any information about the test data, which is often the case in the real world, all you can do is make the model more robust to different data distributions. Surely, the new distributions have to share some similarity with the training 8
https://towardsdatascience.com/a-primer-on-domain-adaptation-cf6abf7087a3
Theories in Socially Responsible AI
77
distributions in order to make the model work. Under domain generalization, we typically have data from multiple training domains. The test distribution is unknown, different from but similar to the training distributions. Without any test data, we need to modify the learning objective and procedure. 2.4.3.1.
Label shift
Under label shift in domain generalization, one potential solution is to choose an optimal value for p(y) so that the worst-case performance in the unknown distribution q is as good as possible. We can formulate it as a min-max problem: arg min max R(p(y), q(y)), p(y)
(2.49)
q(y)
where R(·), known as a Bayes risk function, is defined as follows: R = (c10 −c00 )q(y)pF P (p(y))+(c01 −c11 )(1−q(y))pF N (p(y)). (2.50) Bayes risk is a performance metric often used in finding the optimal decision functions. Let us break down this equation. First, this is in a binary classification setting. So, we have y ∈ {0, 1}, yˆ is the predicted label, and the cost function c(ˆ y , y) is denoted as c(0, 0) = c0,0 , c(0, 1) = c0,1 , c(1, 1) = c1,1 , and c(1, 0) = c1,0 . pFN and pFP are the false negative rate and false positive rate, respectively. Recall that, given the number of false negative samples nFN , we have pFN ≈ nFN /(nFN + nTP ) and pFP ≈ nFP /(nFP + nTN ). Since we usually do not penalize when the predictions are right, c0,0 = c1,1 = 0. Bayes risk achieves the optimal (minimal) value when p(y) = q(y). In this min-max problem, we first solve the inner max to find a value of q(y) given a fixed p(y) and then solve the outer min to find a value of p(y) given the value of q(y) from the inner max. Once we have p(y), we can use it in the threshold of the decision function in the source domain to mitigate the label shift. 2.4.3.2.
Covariate shift
Among all of the distribution shifts, covariate shift is the most studied one. In general, we can categorize the techniques for covariate shift in domain generalization into unsupervised representation
78
Socially Responsible AI: Theories and Practices
learning (training without labels), supervised model learning (training with labels), and optimization (training with labels). Unsupervised Representation Learning. In this line of work, the goal is learning representations generalizable to different distributions by embedding prior knowledge into the learning process. Disentangling the variations in data into distinct and informative representations is considered a potentially good solution. This is mostly achieved by variational autoencoders (VAEs). We will spare readers the details of how VAEs work. But simply recall that autoencoders try to use two neural networks for dimensionality reduction, the encoder maps X into Z in the latent space, and the decoder tries to reconstruct X from Z. The final Z should only contain the main structured part of the input. VAEs are different from the regular autoencoders as the result of their encoder is not a point but a distribution parameterized by mean μ(X) and variance Σ(X). In VAEs, we maximize a quantity known as the Evidence lower-bound (ELBO). Suppose that P and Q9 are functions parameterized by θ and φ, respectively, the ELBO objective becomes EZ∼Q [log Pθ (X|Z)] − KL(Qφ (Z|X)Pθ (Z)),
(2.51)
where KL-divergence measures the “closeness” between P and Q. By maximizing ELBO, we are • maximizing the log Pθ (X|Z) given Z is drawn from the guide distribution Q(Z), and • penalizing the distance between the approximate posterior Q(Z|X) and P (Z). ELBO allows us to do approximate posterior inference of Z, which, otherwise, would be computationally intractable.10 Disentanglement is then decomposing Z into several independent factors zj by modifying the KL term in the ELBO function to 9
To make it consistent with the literature, we use the same notations as for source and target distributions introduced earlier. 10 Otherwise, we have to compute and minimize the KL divergence between the approximate and exact posteriors.
Theories in Socially Responsible AI
79
(Chen et al., 2018) Epθ (x) KL(qφ (z|x)pθ (z)) = KL(qφ (z, x)qφ (z)pθ (x)) + KL(qφ (z)Πj qφ (zj )) + KL(qφ (zj )pθ (zj )). (2.52) j
The interesting part is the second term: if zj is indeed independent, then we have q(z) = Πj q(zj ), therefore, when penalizing this term, we are enforcing zj to be independent. Supervised Model Learning. When you have the labels for the training data, you can design various model architectures and learning strategies to improve the generalization ability. The idea behind the domain-invariant kernel is similar to the domain-invariant feature learning we introduced earlier. The difference is, here, we aim to learn an invariant transformation using training data that come from multiple different domains and the test domain is not observed during the training process. In particular, the kernel-based algorithm (Muandet et al., 2013) finds an orthogonal transformation B of the data that: (1) minimizes the difference between p(x) of domains as much as possible while (2) preserving the functional relationship p(y|x), i.e., Y ⊥⊥ X|B(X). The distribution difference is measured by the distributional variance. To preserve p(y|x), one way is to identify the central subspace, i.e., the minimal subspace that captures the functional relationship between X and Y . You can also twist the training strategy a bit to achieve generalization. For example, given any model, you can first separate all of the source domains into a set of virtual training and virtual test domains. Then, the objective function is designed to minimize the loss on the (virtual) training domains and ensure that the gradient moves in the direction that also minimizes the loss on the test domain. This learning paradigm is known as meta-learning. The algorithm (Li et al., 2018) works as follows: (1) Meta-Train. Let S denote all of the source domains and V the virtual test domains, then S = S − V is the virtual training data.
80
Socially Responsible AI: Theories and Practices
The loss function on S is defined as L(·) =
Ni S 1 1 (i) (i) Θ (ˆ yj , yj ). S i=1 Ni j=1
(2.53) (i)
Here, domain i has Ni samples, each sample j has a label yj , and the model is parameterized by Θ. Optimization will update Θ : Θ = Θ − α∇Θ , where α is the meta-train step size and ∇Θ is the gradient. (2) Meta-Test. The model is then evaluated on the virtual test data. The loss for the updated parameter Θ on the meta-test domain is as follows: G(Θ ) =
Ni V 1 1 (i) (i) Θ (ˆ yj , yj ). V Ni i=1
(2.54)
j=1
(3) Summary. The meta-train and meta-test are optimized simultaneously with the following loss function: arg min L(Θ) + βG(Θ − α∇Θ ),
(2.55)
Θ
where β balances between the meta-train and meta-test. (4) Final Test. After we get the final Θ , we can then deploy it on the real target domain. The simplest way to combat covariate shift is probably data augmentation. It alters existing data to create more data for training. The intuition is to increase the heterogeneity of the training data, which further improves the models’ generalization ability. Data augmentation has been commonly seen in image processing; some simple techniques include image rotation, image shifting, image flipping, and so on. Optimization. Optimization for domain generalization directly formulates the generalization objective and optimizes it with theoretical guarantees. The basic idea is to formulate domain generalization as a min-max optimization problem. Here, generalization means to find a model that performs as efficiently as possible in the worst case. By slightly modifying Eq. (2.44) in Importance Reweighting, we have
Theories in Socially Responsible AI
81
the following loss function: 1 wi L(yi , f (xi )), n n
yˆ(·) = arg min max f
w
(2.56)
i=1
where w is the set of non-negative weights of each training sample that sum to 1 and lead to the largest error possible. The learned classifier f is therefore expected to be more robust than standard ones. More advanced optimization methods follow the Distributionally Robust Optimization (DRO), in which the loss function is formulated as follows: arg min max EX,Y ∼Q [(f (X, Y ))], f
(2.57)
Q∈P(Ptr )
where Ptr denotes the distribution set close to the training distribution Ptr . So, the key here is formulating the distribution set P(Ptr ). Some common formulations include f -divergence and Wasserstein distance. Interested readers can refer to (Rahimian and Mehrotra, 2019) for more details. 2.4.3.3.
Concept shift, conditional shift, and other distribution shifts
Under concept or conditional shift in domain generalization, a model has to extrapolate outside of what it observes in the training domain as the relationship between features and labels in the training domain no longer holds in the test domain. This is surely one of the most challenging tasks in machine learning and we need to make additional assumptions in order to enable the models to extrapolate to unseen domains. One such assumption is that all of the features can be divided into causal (often referred to as stable or invariant features) and spurious features. Which features are causal is, of course, unknown to us. We briefly touched on causality earlier; here, we continue our journey to what Judea Pearl — the 2012 ACM Turing Award Winner — sees as the guidance for AI to achieve “human level intelligence” (Pearl, 2018). Causality is an essential part of socially responsible AI and you will see more of it along the development of the book.
82
Socially Responsible AI: Theories and Practices
Causal features are invariant/stable predictors of labels across different domains. The relationship between hot weather and the electric bill holds true in the U.S., France, or any other place in the market economy. By contrast, spurious features are only predictive in one domain or a few domains, e.g., “electric bill” is a spurious feature of “ice cream sale”. What makes an AI system extremely challenging to generalize is that it is “lazy” and learns these spurious correlations from data. A robust and generalizable model should rely on causal features, whose relationship with the label is invariant across all of the domains, i.e., invariant prediction. To make it more concrete, the causal features are the direct parents of Y in the causal graph, denoted as P ay . For instance, X is the cause of Y in the Covariate Shift shown in Figure 2.15. Formally, given a set of domains E = {1, 2, . . . , E}, P (y e |P aey ) is the same for any e ∈ E. In the following, we introduce two classic approaches that incorporate the invariance property into predictions. Invariant Causal Prediction (ICP). In their pioneering work on ICP, Peters et al. (2016) first tried to exploit the invariance property of causal models for inference. The idea is that the conditional distribution of the target variable y, given the complete set of direct causes, should not change when we intervene for all other variables except for the target variable. The goal of ICP is then to discover the causal parents of a given variable directly pointing to the target variable without constructing the entire causal graph. We consider the setting where multiple domains e ∈ E exist, and in each domain e, there is a predictor variable X e ∈ RD and a target variable Y e ∈ R. ICP relies on the following two assumptions: Assumption 2.1 (Causal Assumption). The structural equation model (SEM) Y e ← fY (XPe ay , eY ), eY ⊥⊥ XPe ay
(2.58)
remains the same across all of the domains e ∈ E, that is, eY , ∀e, follows the same distribution as Y . Assumption 2.2 (Invariance Assumptions). There exists a subset of features XSe ∗ ⊆ X e that satisfies the following condition: P (Y e |XSe ∗ ) is the same for all e ∈ E.
(2.59)
Theories in Socially Responsible AI
83
Now, let us see what these two assumptions tell us. First, SEM is a set of statistical techniques used to measure and analyze the relationships of observed and latent variables. Basically, it is how we turn a causal graph into a set of mathematical formulations. fY is as simple as a linear model which takes into account the measurement error Y . Assumption 2.1 requires that the direct causes of Y should be the same across domains. Then, under Eq. (2.59), features should include (at least a subset of) these direct causes. With training data from multiple domains, ICP then fits a linear (Gaussian) regression in each domain. The goal is to find a set of features that results in invariant predictions between domains. In particular, ICP iterates over subsets of features combinatorially and looks for features in a model that are invariant across domains, i.e., invariant coefficients or residuals. The intersection of these sets of features is then a subset of the true direct causes. There are certain limitations: First, ICP defines the training domains as specific interventions on the causal graph. Second, ICP is constrained by the conventional unconfoundedness assumption (Pearl, 2009): no unobserved confounders exist between the input features and the target variable. However, unobserved/unmeasured confounders almost always exist in practice, therefore, violating the invariance assumption in ICP. Invariant Risk Minimization (IRM). Causal graphs are, in many cases, inaccessible, e.g., the causal relations between pixels and a target predicted. Without the need to retrieve direct causes of a target variable in a causal graph, IRM (Arjovsky et al., 2019) elevates the invariance by extending ICP into a more practical setting. IRM seeks to learn the causal mechanism in the latent space instead of original feature space. First, let us take a look at the IRM’s invariance assumption: Assumption 2.3 (IRM’s Invariance Assumption). There exists a data representation φ(X) meeting the following condition:
E[Y |φ(X e )] = E[Y |φ(X e )]
∀e, e ∈ Etr ,
(2.60)
where Etr denotes the set of all of the training domains. It generalizes the ICP’s invariance assumption to the representation level.
Socially Responsible AI: Theories and Practices
84
Then, the goal of IRM is learning the data representation φ that can predict accurately and elicit an invariant predictor w across Etr . This can be formulated as the following optimization problem with constraints: min Le (w · φ(X), Y ) φ(X),w
s.t.
e∈Etr
(2.61)
w ∈ arg min Le (w · φ(X)), w
∀e ∈ Etr .
Let us break down the equation to have a better understanding. First, we have multiple training domains, which can be different locations or different hospitals where we collect the patients’ electronic medical records. L is a loss function such as cross-entropy loss. So, the top line means that the predictor w should have the least total loss of all training domains. The second line further requires that the predictor should achieve the smallest loss in each individual domain e. In practice, the prediction performance in the defined training domains is almost certainly reduced due to the exclusion of some spurious correlations, but IRM allows for better out-of-distribution generalization, especially when the training domains are very diverse. However, whether the constraint in the second line really works or not is a question that the research community has been struggling with. For example, researchers found that, when carefully implemented, the standard environment risk minimization, that is, minimizing the sum of losses across all domains, is actually the most robust to general distribution shifts (Gulrajani and Lopez-Paz, 2020). 2.4.4.
Discussion
In addition to the distribution shift types introduced above, other important research challenges to be addressed can be: (1) continuous domain generalization, where a system consumes streaming data with non-stationary statistics, (2) generalizing the labels in the training domains to novel categories, i.e., both domain and task generalization, and (3) performance evaluation for domain generalization. In case (1), the key is to efficiently update domain generalization models to overcome the forgetting issue and adapt to new data. Case (2) is conceptually similar to the goal of meta-learning and zeroshot learning. In case (3), research (Gulrajani and Lopez-Paz, 2020)
Theories in Socially Responsible AI
85
has found that most domain generalization algorithms have almost the same performance as Environment Risk Minimization (ERM, the standard training scheme based on the i.i.d. assumption that training and testing data are identically and independently distributed). Some (Wang et al., 2022) argue that this can be due to inappropriate evaluation schemes or insufficient gaps between the training and test domains. This requires us to design reasonable and realistic experimental settings, model selection, and evaluation benchmarks. 2.5. 2.5.1.
Concluding Remarks Summary
In this chapter, we presented some existing efforts seeking to materialize the four mainstream principles in socially responsible AI: fairness, interpretability, privacy, and reliability. Fairness has an overwhelmingly large number of different notions, and some of them are even incompatible with each other. Interpretability, however, has few such quantifiable definitions or proper metrics to measure if an explanation is understandable to the users. Privacy is a concept regarding individuals and it is comparatively better defined. But privacy-preserving techniques designed for the i.i.d. or tabular data can be out of date given the abundance of graph data and heterogeneous, multi-modal data. There are four types of distribution shifts, but the research focus in the field has been on covariate shift. While most existing research study each of these principles independently, some of them observably overlap. For example, both fairness and privacy target sensitive attributes, and interpretability can be used to detect biases and unfairness. We might even ask ourselves: Can AI ever achieve all of the properties guided by these principles? When there are conflicting goals among these principles, how should AI prioritize? Keeping these in mind may help us focus on the significant issues in socially responsible AI in the long run. 2.5.2.
Additional Readings
As we only cover the basics of each topic in the book, we encourage readers to take a look at more advanced machine learning techniques developed for these responsible AI principles. Some further readings
86
Socially Responsible AI: Theories and Practices
regarding each aspect discussed in this chapter are recommended in the following: Fairness: • Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35. • Fairness in Machine Learning, Solon Barocas and Moritz Hardt, Tutorial at NeurIPS 2017. https://fairmlbook.org/tutorial1.html • 21 fairness definitions and their politics, Arvind Narayanan, Tutorial at FAT 2018. https://www.youtube.com/watch?v=wqamrPk F5kk • Friedman, B. & Nissenbaum, H. (2017). Bias in computer systems. In Computer Ethics (pp. 215–232). Routledge. Interpretability: • Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. • Moraffah, R., Karami, M., Guo, R., Raglin, A., & Liu, H. (2020). Causal interpretability for machine learning: problems, methods and evaluation. ACM SIGKDD Explorations Newsletter, 22(1), 18–33. • Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communications of the ACM, 63(1), 68–77. • Molnar, C. (2020). Interpretable machine learning. Lulu. com. • Lage, I., Chen, E., He, J., Narayanan, M., Kim, B., Gershman, S. J., & Doshi-Velez, F. (2019, October). Human evaluation of models built for interpretability. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (Vol. 7, pp. 59–67). Privacy: • Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1–19). Springer, Berlin, Heidelberg.
Theories in Socially Responsible AI
87
• Fung, B. C., Wang, K., Chen, R., & Yu, P. S. (2010). Privacypreserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), 42(4), 1–53. • Beigi, G. and Liu, H. (2019). A survey on privacy in social media: Identification, mitigation, and applications. ACM Transactions on Data Science, 1(1), Article 7 (January 2020), 38. Distribution Shift: • Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W.,. . . & Yu, P. (2022). Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering. • A comprehensive paper list: https://out-of-distribution-generalization.com/
This page intentionally left blank
Chapter 3
Practices of Socially Responsible AI
Chapter 2 has gone through the main principles — Fairness, Interpretability/Explainability, Privacy, and Generalizability — that guide AI to achieve its ethical responsibilities, the third-level responsibility in the AI responsibility pyramid (Figure 1.1). Special attention has been paid to the fundamental machine learning techniques to help operationalize these principles. This chapter focuses on the top-level AI responsibility: its philanthropic responsibility. We look into what AI can do to directly contribute to addressing societal challenges for social good. We describe the three dimensions of AI’s philanthropic responsibility, as illustrated in Figure 3.1: to (1) Protect (e.g., users’ personal information protection), (2) Inform (e.g., fake news early detection), and (3) Prevent/mitigate (e.g., social bias mitigation). To materialize AI for good — for example, the protecting dimension aims to cover or shield humans from harm, injury, and negative impact; the informing dimension aims to deliver the facts or information in a timely way; and the preventing dimension aims to prevent/mitigate the negative impact of AI algorithms — we illustrate each dimension with specific examples related to some current societal issues. 3.1.
Protecting
The Protecting dimension aims to cover or shield humans (especially the most vulnerable or at-risk) from harm, injury, and negative impact of AI systems, and to intervene. This can be the protection 89
Socially Responsible AI: Theories and Practices
90
Socially Responsible AI Pracce
To Protect
To Inform
To Prevent/Migate
Cover or shield humans from harm, injury, and any negave impact of AI systems. Example: Privacy preserving
Deliver the facts or informaon tousers, parcularly the potenal negave results, in a mely manner. Example: Fake news detecon
Prevent/migate the negave impact of socially indifferent AI algorithms. Example: Bias migaon
Fig. 3.1:
The three pillars of AI’s philanthropic responsibility.
of teens from cyber bullying, users from malicious social bots, and users’ personal data when interacting with AI systems. Three representative examples are described in the following. 3.1.1.
A Multi-Modal Approach for Cyberbullying Detection
Cyberbullying, commonly defined as the electronic transmission of insulting or embarrassing comments, photos, or videos, has become increasingly prevalent on social networks. Reports from the American Psychological Association and the White House, for example, reveal that more than 40% of teenagers in the US indicate that they have been bullied on social media platforms (Dinakar et al., 2012). The growing prevalence and severity of cyberbullying on social media and the link between cyberbullying and such negative outcomes as depression, low self-esteem, and suicidal thoughts and behaviors have led to the identification of cyberbullying as a serious national health concern. It has also motivated a surge in research in psychology and computer science aimed at better understanding the nature and key characteristics of cyberbullying in social networks. Within the computer science literature, existing efforts toward detecting cyberbullying have primarily focused on text analysis. These works attempt to build a generic binary classifier by taking high-dimensional text features as the input and making predictions accordingly. Despite their satisfactory detection performance in
Practices of Socially Responsible AI
91
practice, these models inevitably ignore critical information included in the various social media modalities, such as image, video, user profile, time, and location. For example, Instagram1 allows users to post and comment on any public image to express their opinions and preferences. In light of this, bullies can post humiliating images or insulting comments, captions, or hashtags, edit and then re-post someone else’s images, and even create fake profiles pretending to be other individuals altogether. Therefore, it is critical to exploit the rich user-generated content within a multi-modal context to gain greater insight into cyberbullying behaviors and generate more accurate predictions. Figure 3.2 illustrates the cyberbullying detection problem within a multi-modal context. 3.1.1.1.
Challenges
Despite the potential benefits, performing cyberbullying detection within a multi-modal context presents multiple challenges. First, information from different modalities might be complementary, thereby facilitating better learning performance, especially when the data are sparse. However, heterogeneous information from different modalities might not be compatible and, in the worst case, some modalities may be entirely independent. Thus, a key problem that has not been sufficiently addressed in cyberbullying detection is how
Social Media Post Image/Video
Time
Locaon
Fig. 3.2: Illustration of cyberbullying detection within a multi-modal context: the left-hand side of the figure represents a social media session (e.g., a post) with rich user-generated information such as an image, video, user profile, time, location, and comments. In addition, different sessions are inherently connected with each other through user–user social relations. The goal is to predict if a particular session is bullying or not by leveraging its multi-modal context information. 1
https://www.instagram.com
92
Socially Responsible AI: Theories and Practices
to effectively encode the cross-modal correlation among different types of modalities. Second, social media data are typically not i.i.d. but rather intrinsically correlated, limiting the applicability of conventional text analysis approaches. For example, if two social media sessions (e.g., posts) are from the same user or are posted by a pair of friends, their content similarity is expected to be high based on the homophily principle. Considering this, it is important to model structural dependencies among different social media sessions when performing cyberbullying detection. Third, although multi-modal social media data can be useful in understanding human behavior, it is difficult to directly make use of it because different modalities are frequently associated with rather diverse feature types (e.g., nominal, ordinal, interval, ratio, etc.), and in some cases, some modalities that identify particular entities (e.g., users) cannot be simply represented as feature vectors.2 Therefore, it is crucial that the solution framework uses an expressive way to represent modalities with diverse feature types. 3.1.1.2.
The approach: XBully
Definition 3.1 (Cyberbullying Detection within a MultiModal Context). Given a corpus of social media sessions C (e.g., posts) with M modalities, cyberbullying detection within a multimodal context aims at identifying instances of cyberbullying by leveraging multiple modalities, such as textual features, spatial locations, and visual cues, as well as the relations among sessions. Now, we describe an approach (Cheng et al., 2019b) that uses multimodal information to enhance cyberbullying detection. The definition of multi-modal cyberbullying detection builds on the concept of multi-modality learning in machine learning. Here, we emphasize the multi-modal context of social media sessions and use the following modalities extracted from an Instagram session: • User — It is a typical type of nominal data and we use the relations among users to decode the dependencies between social media sessions. 2 We refer to modalities with attributes (e.g., location) and without attributes (e.g., user index) as modes and nominals, respectively.
Practices of Socially Responsible AI
93
• Image — The associated meta-information of an image forms a tuple composed of the number of shares, the number of likes, and the labels describing the category of this image. • Profile — The meta-information of a user forms a tuple with the number of followers, the number of follows, the total number of comments, and the total number of likes received. • Time — The timestamp of posting an image. We consider the time of the day (24h range) and convert the raw time to the range of [0, 86400] by calculating its offset (in seconds) w.r.t. 12:00 am. • Text — We perform psychometric analysis on the textual information of the session, i.e., description of the image and comments, and obtain the psychological features through LIWC (Pennebaker et al., 2001). XBully consists of three steps as shown in Figure 3.3: attributed modality hotspot detection, network representation learning, and classification. Attributed Modality Hotspot Detection. The diversity of feature types in different attributed modalities can result in an
Fig. 3.3: The proposed XBully framework. Given a corpus of social media sessions, we first attempt to discover hotspots for each attributed modality (Phase I), and then based on the detected hotspots and instances in non-attributed modalities, we leverage the co-existence and neighborhood relations to construct a heterogeneous network, which is later divided into several modality subnetworks (Phase II). Each subnetwork consists of two modalities. Nodes in these subnetworks are then mapped into the same latent space through network representation learning. Finally, we can concatenate embeddings of nodes in each session and apply off-the-shelf machine learning models for cyberbullying detection (Phase III).
94
Socially Responsible AI: Theories and Practices
extremely large feature space for each social media session. The highdimensional feature representation not only suffers from the data sparsity issue but also poses great challenges to downstream learning tasks due to the curse of dimensionality. To address this issue, we propose the concept of an attributed modality hotspot, which provides a succinct yet accurate summarization of similar feature values in the same attributed modality. Our definition of attributed modality hotspot is based on kernel density estimation (KDE), which is a non-parametric method to estimate the density function from a collection of data samples. With KDE, we do not need to establish any prior knowledge about the data distribution, as it provides automatic discovery of arbitrary modes from complex data spaces. Definition 3.2 (Attributed Modality Hotspot). Given a corpus of social media sessions C, the attributed modality hotspots for the attributed modality m (m ∈ {1, 2, . . . , M }) are the set of local maximums of the kernel density function estimated from m. Then, given n sessions containing attributed modality m in a d-dimensional feature space Xm = (x1m , x2m , . . . , xnm ), the kernel density at any point x with attributed modality m is given by n 1 x − xim f (x) = d K , nδm δm
(3.1)
i=1
where K(·) is a predefined kernel function and δm is the kernel bandwidth for the attributed modality m. The core idea of the above definition is to explore the latent hotspots in each attributed modality with KDE. Network Representation Learning. Based on the identified attributed modality hotspots and the nodes from non-attributed modalities, we investigate how to build a heterogeneous network by exploiting the co-existence and neighborhood relations, such that both the cross-modal correlations and structural dependencies are properly captured. Specifically, the co-existence relation is established between two nodes when they co-exist in the same social media session, e.g., images, comments, and user profiles in the same Instagram post. The neighborhood relations for attributed modality hotspots are built upon the idea of modality continuity, which implies that nearby things are more related with each other than
Practices of Socially Responsible AI
95
distant things. We first define the node kernel, based on which the neighborhood relations are formed: Definition 3.3 (Node Kernel). For two attributed modality hotspots ui and uj in attributed modality m with feature vectors xi and xj , the kernel strength between them is ⎧ 2 2 ⎨ exp(−xi − xj /2δm ) , 2 w(ui , uj ) = 2πδm ⎩ 0,
if xi − xj ≤ δm , otherwise.
Therefore, given an attributed modality m, the neighbors of an attributed modality hotspot v in the heterogeneous network are the set of attributed modality hotspots that produce a non-zero kernel strength value with hotspot v. In addition to that, for non-attributed modality nodes, we define the neighborhood relations based on its structural information by making use of the dependencies (e.g., social relations) between different sessions. For example, an Instagram session could have five different modalities: user, image, profile, time, and comments (text). From the definition of co-existence relations, we construct 10 types of edges in the heterogeneous network, such as the user-image hotspot and user-profile hotspot. With the abovedefined edge types, we define the weight of an edge considering the following three scenarios: (a) the normalized co-existence count (between 0 and 1), (b) kernel strength (between 0 and 1), and (c) the dependencies between non-attributed modality nodes (0 or 1). We then build on (Tang et al., 2015) to decompose the heterogeneous networks into multiple modality subnetworks (with two modalities) and learn embeddings within each subnetwork. In this approach, the learned embeddings can capture the node proximity across different types of edges. In what follows, we provide the details of the joint embedding model. First, let us denote GS as the set of all modality subnetworks, then for any two different modalities A, B ∈ (1, 2, . . . , M +N ), we can construct a modality subnetwork GAB ∈ GS . Then, the probability of node j with modality B generated from node i with modality A is now defined by the following conditional probability: p(j|i) =
exp(vjT · vi )
k∈B
exp(vkT · vi )
,
(3.2)
96
Socially Responsible AI: Theories and Practices
where vj denotes the embedding representation of node j with modality B and vi is the embedding vector of node i with modality A. Next, we learn embeddings by minimizing the distance between the conditional distribution of the context nodes given the center node and the empirical distribution. The empirical distribution of node i is defined w as p (j|i) = diji , where wij is the weight of the edge i − j and di is the out-degree of node i, i.e., di = j∈B wij . Therefore, we define the loss function as follows:
OAB = di KL p (·|i)||p(·|i) . (3.3) i∈A
By omitting the constants, the above loss function can be reformulated as OAB = − wij log p(vj |vi ). (3.4) i∈A,j∈B
As each modality subnetwork is composed of four different types of edges, A − A, A − B, B − A, and B − B, the overall loss function of a modality subnetwork GAB is as follows: ZAB = OAA + OAB + OBB + OBA .
(3.5)
With the learned representation of each node, the session representation is then the concatenation of different types of node representations. For classification, we can feed the session representations into any supervised machine learning model for training and testing. 3.1.2.
A Deep Learning Approach for Social Bot Detection
Social media has made it possible for people to have massive-scale and real-time communication. It was praised by researchers for its power to democratize discussion by openly discussing political issues. On the downside, a great number of bots, i.e., automated social media accounts disguising as human users, have been created for malicious uses such as political propaganda, disinformation, and so on. Concerns about these malicious social bots are mounting.
Practices of Socially Responsible AI
97
One recent example3 of malicious bots on social media happened in May 2022, where the Instagram accounts of UK-based Iranian women’s rights activists were attacked by fake accounts with harassing messages and a surge in follow requests. These bots not only targeted these prominent feminist accounts but also users with smaller followings who had engaged with content posted by these feminists. As questioned by Norouzi — one of the targeted feminists — “Who has the money for this attack? More than 30 pages, and we are still receiving fake followers, each hour more than 100 followers. Who is paying for this?” There is a clear link between the social unrest in Iran and what is happening to them on Instagram. Detecting social bots has been studied at the account level and tweet level. Task 1 is account-level detection: given a record of activity of an account (e.g., historical tweets posted by the account), a machine learning model then determines if this account is a bot or not. Therefore, this type of approach needs to examine the overall activity of an account, such as tweeting behavior, the content and sentiment of its posts, and network structure. A large amount of data are necessary for this approach to work. By contrast, we can also directly predict if a single tweet comes from a bot or a human user, i.e., tweet-level detection, or Task 2. We describe solutions (Kudugunta and Ferrara, 2018) for both tasks in the following. 3.1.2.1.
Account-level detection
Account-level detection with benchmarking datasets such as the one presented in Cresci et al. (2017) was shown to have nearly perfect performance without using deep learning approaches. A typical pipeline starts with feature extraction from user metadata such as Follower Count and Verified. As real-world datasets for bot detection are imbalanced, i.e., there are more human users than social bots, the next step is performing oversampling or undersampling to balance the dataset. This is achieved by the synthetic minority oversampling technique (SMOTE). The last step employs a multitude of out-ofthe-box classical machine learning approaches such as random forest and logistic regression.
3
https://www.codastory.com/newsletters/iran-activists-instagram-protests/
98
3.1.2.2.
Socially Responsible AI: Theories and Practices
Tweet-level detection
Tweet-level bot detection turns out to be more challenging as the performance of the same classical machine learning models drops massively (e.g., 98% → 78%), as shown in (Kudugunta and Ferrara, 2018). So, deep learning approaches such as Long Short-Term Memory (LSTM) models are used given that they are designed to learn dependencies in sequential data like text. But before we use LSTM, data pre-processing needs to be done to fit the input of LSTM. It is an important step, especially for social media data because they are very noisy. A common workflow is described as follows: first tokenizing the text, followed by a “cleaning” procedure, such as replacing hashtags, URLs with the tags “hashtag” and “url”; replacing emojis with tags such as “smile” and “heart”; removing stop words (e.g., “the”); and converting all tokens into lower case. Then, this tokenized clean text is transformed into embeddings using a pre-trained model such as Global Vectors for Word Representation (GloVE). The output of LSTM is the predicted probability of a piece of text being written by a social bot. So far, we only used textual data, but what about users’ metadata? In account-level detection task, users’ metadata tends to be the best predictor. Therefore, if we add user metadata to the previous LSTM model, its performance might improve. One way to add this auxiliary information is that, before the output layer, concatenating the metadata with the output vector of LSTM which still takes the input of text, the resulting vector is then given as input to another neural network (e.g., a 2-layer fully connected neural network) to get the predicted probability. A further improvement can be adding another auxiliary loss as a regularization. The finding is that simply employing LSTM with text can decrease the error rate by nearly 20%, and when used with text, the user metadata seems to be a weak predictor of the nature of a social media account. But this might have to do with the way of incorporating the meta-information. Direct concatenation may not be effective as these metadata have different types of modalities.
Practices of Socially Responsible AI
3.1.3.
99
A Privacy-Preserving Graph Convolutional Network with Partially Observed Sensitive Attributes
Our last example (Hu et al., 2022) in the Protecting dimension is preserving users’ privacy on graphs. We particularly look into one type of neural network scheme: the graph neural network (GNN). GNN has seen a recent rise in the research field. It is a class of neural networks best for representing and processing graph data. Despite its remarkable performance in a wide range of applications, GNN is vulnerable to attribute inference attacks, in which an attacker leverages a machine learning model to infer a target user’s sensitive attributes (e.g., location, sexual orientation, and political view) from its public data. For example, in a social network, an adversary can infer the political view of a user by extracting his/her behavioral record or information of his/her friends in a social network. Existing privacy-preserving GNN models assume that sensitive attributes of all users are known beforehand. In practice, users have different privacy preferences (e.g., male users are typically less sensitive to their age information than female users), therefore, it is not guaranteed that we have access to all sensitive information. Take the scenario in Figure 3.4 as an example. There are six users in this
Age Gender Location
20 25
Hobby 30
User 1 User 2 User 3 User 4 User 5 User 6
Fig. 3.4: A privacy problem with partially observed sensitive attributes in GNNs. Three private users (User 2, User 4, and User 6) treat age as their sensitive information and do not reveal it. However, the other three non-private users are willing to share their age information. Potentially, the age information of private users is leaked due to the homophily property and message-passing mechanism of GNNs.
100
Socially Responsible AI: Theories and Practices
social network. User 2, user 4, and user 6 are sensitive to their age and unwilling to reveal it. In contrast, user 1, user 3, and user 5 do not mind sharing their age information to make more friends. Here, we define the first type of users as private users (who are not willing to reveal their sensitive attributes) and the others as non-private users (who are willing to reveal their sensitive attributes). In this scenario, a GNN adversary can easily infer the age information of user 2, user 4, and user 6 using the observed age information of their neighbors. This is partly because the graph homophily property and message-passing mechanism of GNNs can exacerbate privacy leakage, leading to the age information of private users being exposed. Generally, in homophilous graphs, nodes with similar sensitive attributes are more likely to connect to each other than nodes with different sensitive attributes. For example, young people tend to make friends with people of similar age on the social network. This phenomenon in Figure 3.4 severely violates individual privacy regulations. Thus, it is critical to investigate this important and practical problem: learning privacy-preserving GNN with partially observed sensitive attributes. To achieve both effective privacy-preserving performance and competitive performance in downstream tasks, we confront two major challenges: First, user dependency. Users in a graph are typically dependent on each other due to their high connectedness. With the graph homophily property and message-passing mechanism of GNNs, a GNN adversary may infer a private user’s sensitive information through her/his non-private neighbors. Thus, we need to minimize the impact of the observed sensitive attributes of non-private neighbors on revealing the sensitive attributes of private users. Second, attribute dependency. Some non-sensitive attributes are naturally correlated with sensitive attributes. For example, a user’s hobby may be related to her/his gender; zip code is often correlated with race. Previous studies (Yang et al., 2020; Zhang et al., 2020) have found that simply removing sensitive attributes still leads to privacy leakage due to the correlation between the non-sensitive and sensitive attributes. Therefore, we need to remove the hidden sensitive factors from non-sensitive attributes.
Practices of Socially Responsible AI
3.1.3.1.
101
Problem definition
The most popular GNN architecture in the literature are Graph Convolutional Networks (GCNs) (Kipf and Welling, 2016). Suppose there is a social network G = (V, E) where part of the users have revealed their sensitive attributes Sk due to their different privacy preferences. In addition, some users’ node labels VL are known while the rest are not. One of the downstream tasks is to learn the GCN parameter θ in order to predict the labels of all unlabeled users. The challenge is that with the partially observed sensitive attributes, during the training process, an adversary may easily infer the unknown sensitive attributes which private users prefer not to reveal on social media. This is mainly because users in social network G are not independent of each other. Therefore, to protect the sensitive attributes of private users, the authors study the problem of how to learn users’ latent representations in privacy-preserving GCN with partially observed sensitive attributes: Definition 3.4 (Learning Privacy-Preserving GCN with Partially Observed Sensitive Attributes). Given a network G = (V, E) with labels Y and partially observed sensitive attributes Sk of non-private users, the goal is to learn users’ latent representation X, which excludes users’ sensitive information, and the GCN parameter θ to classify unlabeled users accurately using X. 3.1.3.2.
The approach: DP-GCN
The framework DP-GCN is built upon Disentangled Representation Learning (DRL) to learn a privacy-preserving GNN with partially observed sensitive attributes. DP-GCN includes two modules: DRL and Node Classification based on Non-sensitive Latent Representation (NCL), as illustrated in Figure 3.5. DRL removes users’ sensitive factors by disentangling the original feature representations into sensitive and non-sensitive latent representations that are orthogonal to each other. NCL aims to execute downstream tasks based on users’ non-sensitive latent representations. Disentangled Representation Learning with Orthogonal Constraint. DRL seeks to remove the sensitive information from the original feature representation of each user for privacy preservation.
Socially Responsible AI: Theories and Practices
102 Input Graph M
DRL Module
31
M 31
NCL Module
Orthogonal subspace learning
F 20
Sensitive latent
2
ve
iti
ns
se
nNo
Hidden layers
..
....
......
1
Node labels
Non-sensitive latent Downstream task
F
20
Update two parameters
Sensitive attributes
Update GCN parameters
Non-sensitive attributes
Latent attributes
Fig. 3.5: The overall framework of the proposed model DP-GCN. It includes two modules: DRL module and NCL module. DRL module removes users’ sensitive information by learning two orthogonal subspaces. NCL module executes downstream tasks based on non-sensitive latent representations. Observed sensitive attributes
User
Minimize Decompose
Observed non-sensitive attributes Sensitive attribute
Sensitive representation Non-sensitive representation
Non-sensitive attribute
Fig. 3.6: Learning two orthogonal subspaces for a user’s sensitive and nonsensitive latent representations from the observed non-sensitive attributes.
The main challenges in this task include the previously introduced User Dependency and Attribute Dependency. To overcome the above two challenges, we assume that (a) sensitive and non-sensitive attributes are linearly correlated and (b) the non-sensitive attributes can be decomposed into two linear independent factors that correspond to sensitive and non-sensitive factors, respectively. To this end, we propose to learn two orthogonal subspaces W1 and W2 such that the sensitive (non-sensitive) information in original non-sensitive attributes is projected into a sensitive (non-sensitive) latent space. The orthogonal constraint enables the non-sensitive latent representation to have minimal linear dependency with sensitive latent representation. In Figure 3.6, we illustrate how to learn two orthogonal subspaces. Given a non-private user v ∈ G described by a feature vector with six attributes [x1 , s1 , x2 , s2 , x3 , s3 ] where {x1 , x2 , x3 } denotes a set of non-sensitive attributes and {s1 , s2 , s3 } denotes a set of sensitive
Practices of Socially Responsible AI
103
attributes, we first decompose the original feature vector into a nonsensitive vector x = [x1 , x2 , x3 ] and a sensitive vector s = [s1 , s2 , s3 ]. Since non-sensitive attributes may be correlated with sensitive information, we then disentangle the non-sensitive vector x into a sensitive latent representation and a non-sensitive latent representation. The goal is to make the non-sensitive latent representation linearly independent of v’s sensitive latent representation as much as possible. This can be done by learning two orthogonal projections W1 and W2 such that the user’s sensitive information and non-sensitive information in x are mapped to two orthogonal subspaces, respectively. Node Classification based on Non-sensitive Latent Representation. The NCL module aims to learn the GCN parameter θ to classify the unlabeled users accurately based on the non-sensitive latent representations of users learned from DRL. Since users’ sensitive information has been disentangled from the non-sensitive latent representations through the orthogonal constraint, i.e., linear dependency relation between the sensitive and non-sensitive latent representations is removed, users’ privacy in terms of sensitive attributes is protected. be the non-sensitive latent feature matrix with each row Let X representing the latent feature vector of a user. We further update θ of the GCN classifier f by minimizing the following negative log likelihood loss: y) = min LGCN (θ, A, X, θ
A)i , yi ), l(f (θ, X,
(3.6)
i
where A is the adjacency matrix of the graph, y is the label of the node, l is the classification loss.
3.2.
Informing
The Informing dimension aims to deliver the facts or information to users, particularly the potential negative results, in a timely way. We illustrate it with two tasks: explainable fake news detection and causal understanding of fake news dissemination.
104
3.2.1.
Socially Responsible AI: Theories and Practices
An Approach for Explainable Fake News Detection
Social media has ushered the world into an unprecedented time of “fake news” — false or misleading information disguised as news articles to mislead consumers. A report estimated that over 1 million tweets were related to the fake news story “Pizzagate” by the end of 2016 presidential election.4 The social impact of fake news can be detrimental. The immediate loss is public trust in governments and journalism. Fake news may also change people’s attitude toward legitimate news, such as through diminishing trust in mass media, regardless of their age groups or political parties.5 If these two pieces of evidence still sound too far from you, let’s look at how “online” fake news can lead to “offline” societal events. In 2013, a fake tweet about Barack Obama getting injured in an explosion wiped out $130 billion in stock value in a matter of minutes.6 But if it has already brought so many damages, why don’t we simply develop a fake news detection model to help filter disinformation and create a healthy online environment? Because it is hard. Fake news is intentionally created to mislead readers, therefore, it looks very similar to true news. In addition, social media data are large-scale, multi-modal, and often contain a sea of noise. One potential solution to addressing these challenges is “looking beyond” the news content and exploring the “why” question: Why is a piece of news detected as fake? This has at least two merits: (1) the derived explanations are an additional source of information we might have overlooked before; (2) these identified interpretable features can help improve the fake news detection performance. For example, the perspectives of news content and user comments in Figure 3.7 contain verifiably false information as well as rich information from the crowd that is useful for detecting fake news.
4
https://en.wikipedia.org/wiki/Pizzagate conspiracy theory https://news.gallup.com/poll/195542/americans-trust-mass-media-sinks-newlow.aspx 6 https://www.forbes.com/sites/kenrapoza/2017/02/26/can-fake-news-impactthe-stock-market/?sh=2168b4d62fac 5
Practices of Socially Responsible AI
105
Fig. 3.7: A piece of fake news on PolitiFact, associated with a series of users comments. Some comments can be used to explain the content in the news. Image source: (Shu et al., 2019).
3.2.1.1.
Problem definition
Let A = {s1 , . . . , sN } denote a piece of news comprising N seni } further comprising of tences, and each sentence si = {w1i , . . . , wM i Mi words. C = {c1 , . . . , cT } is a set of T comments related to A. A is either labeled as “y=1” (fake news) or “y=0” (true news). The goal is to rank all comments (RC, a rank list of comments) and all sentences in each comment (RS, a rank list of sentences) based on the explainability of a sentence and a comment. The explainability of sentences represents the degree of how check-worthy they are and the explainability of comments denotes the degree to which users believed if this news is fake or true. We use RSk to denote the k-th most explainable sentence. A formal problem definition is as follows (Shu et al., 2019): Definition 3.5 (Explainable Fake News Detection). Given a news article A, associated with a set of user comments C and the label y, we aim to learn a fake news detector f : f (A, C) → (ˆ y , RC, RS) such that it maximizes detection accuracy while it simultaneously outputs both the comment and sentence lists ranked by their explainability.
106
3.2.1.2.
Socially Responsible AI: Theories and Practices
The approach: dEFEND
The dEFEND (Explainable Fake News Detection) approach consists of four major components: (1) a news content encoder, (2) a user comment encoder, (3) a sentence-comment co-attention component, and (4) a fake news prediction component. News Content Encoding. This component learns a compact representation of the news content via a hierarchical attention neural network. The “hierarchical structure” describes that a sentence consists of a sequence of words and a news article consists of a sequence of sentences. “Attention” is the weight associated with the word or the sentence, implying the importance of the word or sentence for fake news detection. So, the neural network first encodes a word-level representation through a word encoder (e.g., a recurrent neural network), and the sentence vector is then the weighted sum of the word representations. The sentence encoder takes the input of the sentence vector and outputs the sentence representation. Finally, the news representation is the weighted sum of the sentence representations. User Comment Encoding. User comments can be useful because they might contain people’s emotions and opinions toward a news article. So, the user comment encoding seeks to extract semantic information to help with fake news detection. Since user comments are usually shorter compared to news, a user comment encoder is a single-level encoder similar to the word-level encoder described previously. Sentence-Comment Co-attention. The goal of this component is to select out sentences in the news and user comments important for fake news detection. To make the motivation clearer, let us take a look at the following examples. “Michelle Obama is so vulgar she’s not only being vocal. . .” is highly related to the fake claim “Pence: Michelle Obama Is The Most Vulgar First Lady We’ve Ever Had”, while the fact of “The First Lady denounced the Republican presidential nominee” does not help detect and explain whether the news is fake. Similarly, a user comment “Where did Pence say this? I saw him on CBS this morning and he didn’t say these things. . .” is more explainable and useful to detect the fake news than other comments such as “Pence is absolutely right”. To capture the distinct importance of sentences and comments, we can still use the attention
Practices of Socially Responsible AI
107
mechanisms described in the section on news content encoding. The only difference is that we might want to learn the attention of sentences and comments simultaneously to capture their semantic affinity; therefore, the “co-attention”. So, in addition to the attention weights, we also need to learn an affinity matrix F that transforms users’ comment attention space to news sentence attention space. Fake News Prediction. This component is simply a standard binary classification loss, whose input includes the news representations and the user comment representations output from the co-attention component, and the news label y ∈ {0, 1}. 3.2.2.
Causal Understanding of Fake News Dissemination on Social Media
In fake news detection, comparatively less is known about what user attributes cause some users to share fake news. In contrast to the research focused on correlations between user profiles (e.g., age, gender) and fake news, Cheng et al. (2021a) saught a more nuanced understanding of how user profile attributes are causally related to user susceptibility to share fake news.7 The key to identifying causal user attributes with observational data is to find confounders — variables that cause spurious associations between treatments (user profile attributes) and outcome (user susceptibility). When left out, confounders can result in biased and inconsistent effect estimations. But what is the main source of confounding bias in fake news dissemination? Various studies in psychology and social science have shown the strong relationships of user behavior with user characteristics and activities such as information sharing, personality traits, and trust. Consequently, characterizing user behavior has become a vital means to analyzing activities on social networking sites. Informed by this, we argue that fake news sharing behavior, i.e., the usernews dissemination relations characterized by a bipartite graph (see
7
As we cannot know the exact intentions of users who spread fake news (e.g., gullible or malicious users) using only observed user engagement data, we propose a measure to approximate user susceptibility.
108
Socially Responsible AI: Theories and Practices
Fig. 3.8: Overview of our framework. We model the fake news dissemination under selection biases (➀) and design three effective estimations of propensity score (➁) to learn unbiased embeddings of fake news sharing behavior (➂). Following the causal graph with the fake news sharing behavior being the confounder (➃), we examine the causal relationships between user profile attributes and susceptibility. Note that the identified attributes are “potentially” causal because as with most other observational studies, no conclusive causal claims can be made.
Figure 3.8 ➀) is critical to address confounding in causal relations between user attributes and susceptibility. Learning fake news sharing behavior is challenging because virtually all observational social media data are subject to selection bias due to self-selection (e.g., users typically follow what they like) and the actions of online news platforms (e.g., these platforms only recommend news that they believe to be of interest to the users). Consequently, these biased data only partially describe how users share fake news. To alleviate the selection bias, one can leverage a technique commonly used in causal inference, Inverse Propensity Scoring (IPS) (Rosenbaum and Rubin, 1983), that creates a pseudopopulation to simulate data collected from a randomized experiment. In the context of fake news, propensity describes the probability of a user being exposed to a fake news piece. By connecting fake news dissemination with causal inference, we can derive an unbiased estimator for learning fake news sharing behavior under selection biases. We seek to: (1) answer why people share fake news by uncovering the causal relationships between user profiles and susceptibility and (2) show how learning fake news sharing behavior under selection biases can be approached with propensity-weighting techniques. 3.2.2.1.
Problem definition
Let U = {1, 2, . . . , u, . . . , U } denote users who share fake news, and C = {1, 2, . . . , i, . . . , N }. Yui ∈ Y is a binary variable representing
Practices of Socially Responsible AI
109
interactions between user u and fake news i; if u spreads i, then Yui = 1, else Yui = 0. Note that Yui = 0 can be interpreted as either u is not interested in i or u did not observe i. Suppose users have m profile attributes denoted by matrix A = (A1 , A2 , . . . , Am ). Each user u is also associated with an outcome B ∈ (0, 1), denoting u’s susceptibility to spread fake news. We aim to identify causal user attributes and estimate the effects, which consist of the following two tasks: • Fake News Sharing Behavior Learning. Given the user group U , the corpus of fake news C, and the set of user-fake news interactions Y, we aim to model the fake news dissemination process and learn fake news sharing behavior U under selection biases. • Causal User Attributes Identification. Given the user attributes A, the fake news sharing behavior U, and the user susceptibility B, this task seeks to identify user attributes that potentially cause users to spread fake news and estimate the effects. 3.2.2.2.
A causal approach
As with other observational studies, data for studying fake news are also subject to the common selection bias. We first provide mathematical formulations of the propensity-weighting model for fake news dissemination under selection biases. We then introduce three propensity score estimations for learning unbiased embeddings of fake news sharing behavior. Under the Potential Outcome framework (Rosenbaum and Rubin, 1983), these embeddings are then used to identify the causal relationships between user attributes and susceptibility. Figure 3.8 is an overview of the proposed framework. Modeling Fake News Dissemination. The key is the “implicit” feedback we collect through natural behavior such as news reading or news sharing by a user with unique profile attributes. By noting which fake news a user did and did not share in the past, we may infer the kind of fake news that a user may be interested in sharing in the future. To better formulate the process of fake news dissemination, we introduce two binary variables highly related to this process: interestingness Rui ∈ {0, 1} and exposure Oui ∈ {0, 1}. Rui = 1(0) indicates u is interested (not interested) in i; Oui = 1 denotes user u
110
Socially Responsible AI: Theories and Practices
was exposed to fake news i, and Oui = 0 denotes otherwise. Therefore, we assume that a user spreads fake news if s/he is both exposed to and interested in it: Yui = Oui · Rui , ⇒ P (Yui = 1) = P (Oui = 1) · P (Rui = 1). (3.7) Suppose we have a pair of fake news (i, j) with i = j and Dpair = U ×C ×C is the set of all observed (positive) interactions (u, i) and unobserved (negative) interactions (u, j). As both the interestingness variable and exposure variable are unobserved, the model parameters are learned by optimizing the pairwise Bayesian Personalized Ranking (BPR) loss that employs user-news interactions. In doing so, we assume that the observed user-news interactions better explain users’ preferences than the unobserved ones, and thereby, should be assigned higher prediction scores. Learning Unbiased Sharing Behavior. Next, we learn unbiased fake news sharing behavior based on existing positive interactions between users and fake news using IPS. IPS works as a reweighting mechanism by assigning larger weights to news that is less likely to be observed. The key is to quantify the propensity scores using observational data. Formally, we define the propensity score in the fake news dissemination as follows: Definition 3.6 (Propensity Score). The propensity score of user u being exposed to news i is θui = P (Oui = 1) = P (Yui = 1|Rui = 1).
(3.8)
For the second equation, we can do a reverse deduction: P (Oui = 1) = P (Yui = 1|Rui = 1) ⇒ P (Oui = 1) · P (Rui = 1) = P (Yui = 1|Rui = 1) · P (Rui = 1) = P (Yui = 1, Rui = 1) = P (Yui = 1). (3.9) The last equation is derived from Eq. (3.7). Intuitively, the news exposure probability can be approximated by the popularity of the news or the user, or by the content of the news itself. Accordingly, we can have three types of definitions of propensity scores: (1) The news-popularity-based propensity assumes that
Practices of Socially Responsible AI
111
the probability of a user observing a fake news piece is highly related to its popularity; (2) the user-news-popularity-based propensity also considers the bias induced by user popularity; that is, users who are popular and active on social media are more likely to be exposed to fake news; and (3) news-content-based popularity uses the news content as an exposure indicator. Definition 3.7 (News-Popularity-based Propensity). Propensity using relative news popularity is defined as η u∈U Yui news ˆ Pnews = θ,i = . (3.10) maxi∈C u∈U Yui Typically, popularity-related measures follow power law distributions, therefore, we include the smoothing parameter η ≤ 1 and set it to 0.5. With Pnews , we assume that the probability of a user observing a fake news piece is highly related to its popularity. Definition 3.8 (User-News-based Propensity). Propensity using both relative news popularity and user popularity is defined as η u∈U Yui · Fu user ˆ Puser = θu,i = , (3.11) maxi∈C u∈U Yui · Fu where Fu denotes the number of followers of u and η = 0.5. Puser also considers the bias induced by the user popularity. In the third formulation, we jointly estimate the propensity score and model fake news dissemination. Definition 3.9 (News-Content-based Propensity). Propensity encoded by neural networks is defined as Pneural = θˆ,ineural = σ(ei ),
(3.12)
where ei denotes the latent representations of news content and σ(·) is the sigmoid function. Here, we implicitly encode the popularity of fake news in the latent space based on the news content. Identifying Causal User Attributes. With multiple user attributes at hand, we are essentially tackling a multiple causal inference task where user attributes represent the multiple treatments
112
Socially Responsible AI: Theories and Practices
and user susceptibility denotes the outcome. The goal is to estimate simultaneously the effects of individual user attributes on how likely a user is to spread a fake news piece. But first things first, how do we quantify “user susceptibility”? Intuitively, the more fake news a user has tweeted before, the more susceptible s/he is. Therefore, one simple answer is to measure the percentage of shared fake news among all of his/her shared news in history. With the user attributes and approximated user susceptibility, we can build a linear regression model to estimate the coefficients of each attribute and compare their correlations with the outcome variable. However, we aim for a causal relation, that means we need to control for confounding bias, especially the hidden confounding bias. Previous findings (e.g., Talwar et al. (2019)) in social psychology suggest that we may consider the users’ news-sharing behavior as the confounder. So, this means we can simply add the learned representation of users’ sharing behavior to the linear regression model as a way of “controlling” for the confounding bias. The findings of this work are interesting. For example, the unbiased fake news dissemination model leads to a large gain of prediction accuracy under distribution shift. Furthermore, users’ fake news sharing behavior and true news sharing behavior are different in a way that the users who share fake news present more similar sharing behavior while those sharing true news have more diverse sharing behavior. We also found that the number of followers or friends is causally related to user susceptibility: more followers → less susceptibility. 3.3.
Preventing
The last dimension seeks to prevent/mitigate the negative impact of socially indifferent AI algorithms. We use two bias mitigation tasks in NLP to exemplify the Preventing dimension. 3.3.1.
Mitigating Gender Bias in Word Embeddings
The concept of word embeddings is engraved as one of the foundations in NLP. You can easily find hundreds or thousands of papers about word embeddings and their applications. The power of word
Practices of Socially Responsible AI
113
embeddings comes from the capability to encode the meaning of each word (or common phrase) w into a d-dimensional word vec→ tor − w ∈ Rd . They are trained based on word co-occurrence in text corpora and have two nice properties: (1) embeddings of words with similar semantic meanings tend to be closer to each other and (2) the relationships between two words can be reflected through their word embeddings. For example, in the “man is to king as woman is to x” analogy puzzle, a simple arithmetic of the word embeddings −→ −→ − − −−−− →=− −−→ will find x = queen because − man woman king − − queen. But are word embeddings free from inherent bias? In Chapter 1, we discussed potential sources of bias and a major source is data. As word embeddings are trained on large-scale text, it is almost certain that these embeddings imbibe the biases of the human world. Imagine searching for “cool programmer t-shirts” and the search engine, which relies on a language model such as Bidirectional Encoder Representations from Transformers (BERT), returns only male-form tees. Or asking the word embeddings to solve another analogy puzzle, “man is to computer programmer as woman is to x”, the embeddings will pinpoint sexism implicit in the text by returning “homemaker” as the best answer. Now, let us dive into the pioneering work by Bolukbasi et al. (2016) to see how gender bias in word embeddings is detected and mitigated. 3.3.1.1.
The problem
A text corpus can have both gender-neutral words such as “softball”, “receptionist”, and “programmer”, and gender-specific words such as “businesswoman”, “girl”, and “father”. Ideally, only gender-specific (gendered) words should present the gender differences while genderneutral (non-gendered) words should keep the same distance to both “man” and “woman”. However, it is observed that the gender-neutral words acquire stereotype and bias due to the context in which they are present in the corpus. The goal is to “debias” these non-gendered words while preserving identical semantic relationships among the word embeddings. “Debias” means that the non-gendered words should be equidistant to the gender pairs such as he-she or manwoman. Bias itself consists of direct bias and indirect bias. Direct bias manifests in the relative similarities between gendered words and
114
Socially Responsible AI: Theories and Practices
non-gendered words, e.g., “man” is closer to “programmer” while “woman” is closer to “receptionist”. Indirect bias comes from the relative geometry between non-gendered words themselves. For example, words “bookkeeper” and “receptionist” are much closer to “softball” than “football”, likely due to the female associations among “bookkeeper”, “receptionist”, and “softball”. 3.3.1.2.
The approach: Hard debiasing
There are three main steps to mitigate gender bias in word embeddings: identifying the gender subspace, and neutralizing and equalizing the word vectors. Identifying the Bias Subspace. The bias subspace is identified by the defining sets of words which define the concept of gender itself. Words in each set represent different ends of the bias. The defining sets for gender can be {she, he} and {woman, man}. To define a bias subspace, we first compute the vector differences between the word embeddings of words in each set and the mean word embedding over the set. An exemplar defining set is shown in Figure 3.9, where the right part is a difference matrix. We then identify the most k (a pre-defined constant) significant components {b1 , b2 , . . . , bk } of the resulting matrix using dimensionality reduction techniques such as Principal Component Analysis (PCA) or Singular Value Decomposition (SVD). It is observed that the first eigenvalue can explain most of the variance in the difference matrix. Therefore, a hypothesis is formed: the top principle component, denoted by the unit vector g, captures the gender subspace.
Fig. 3.9: A subset of equality pairs used to identify the gender subspace. “−” on the left-hand side denotes “minus” and the right-hand side is the corresponding difference matrix.
Practices of Socially Responsible AI
115
Removing Bias Components. The next step partially removes the subspace components from the embeddings of non-gendered words such that they remain equidistant from equality pairs like he-she. Technically, what actually happens is that we subtract the projection of the embedding on the bias direction from the original vector; i.e., neutralizing the embeddings. We also need to “equalize” the embeddings of gendered words, or average them out to have the same vector length such that the non-gendered words are equidistant to all equality pairs. For example, “programmer” should be equidistant to both he-she and man-woman pairs. Formally, for non-gendered words such as doctor and nurse, the debiasing method removes their bias components; for gendered words such as man and woman, it first centers their word embeddings and then equalizes the bias components. Formally, given a bias subspace B defined by a set of vectors {b1 , b2 , . . . , bk }, we get the projection of a word w onto the bias subspace B by wB =
k
w, bi bi ,
(3.13)
i=1
where ·, · denotes element-wise multiplication. We then neutralize word embeddings by removing the resulting component from nongendered words: w − wB w = , (3.14) w − wB where w are the debiased word embeddings, i.e., the projection onto the subspace orthogonal to B. We further equalize the gendered words in each equality pair E ∈ E. E is the family of equality sets. Specifically, for each w ∈ E, w = (µ − µB ) +
1 − µ − µB 2
wB − µB , wB − µB
(3.15)
1 where µ = |E| w∈E w is the average embedding of the words in the set. µB denotes the bias component in the identified gender subspace and it can be obtained via Eq. (3.13). So, to “equalize” is to equate the gendered words outside of B to their simple average µ − µB and then adjust the vectors so that they are in unit length.
116
3.3.2.
Socially Responsible AI: Theories and Practices
Debiasing Cyberbullying Detection
We have seen how to debias the input data in the upstream task. Now, let us take a look at an example where biases also exist in the downstream tasks such as cyberbullying detection. The motivation of Cheng et al. (2021c) was that the promising results in the literature of cyberbullying detection may come from a deeply biased model that captures, uses, and even amplifies the unintended biases embedded in social media data. That is, because humans are biased, human-generated language corpora can introduce human social prejudices into model training processes. Evidence of such bias has been found in toxicity detection and hate speech detection, revealing that tweets in African–American Vernacular English (AAVE) are more likely to be classified as abusive or offensive. Similarly, a cyberbullying classifier may simply take advantage of sensitive triggers, e.g., demographic-identity information (e.g., “gay”) and offensive terms (“stupid,” “ni*r”), to make decisions. Indeed, we find that in the Instagram data for benchmarking cyberbullying detection released by Hosseinmardi et al. (2015), 68.4% of sessions containing the word “gay” 89.4% of sessions containing the word “ni*r”, and 64.3% of sessions containing the word “Mexican”, were labeled as bullying. In Figure 3.10, we showcase differences in the performance of a standard hierarchical attention network (HAN) — a commonly used model for session-based cyberbullying detection — and a HAN that was debiased using our proposed strategy in sessions with and without sensitive triggers using the benchmark Instagram data. Specifically, the x-axis represents the probability of the classifier predicting a session as bullying; i.e., the decision scores F : p(label = bully|Z). The y-axis represents the conditional probability densities of the decision scores; i.e., p(F|Z). Figure 3.10(a) shows that the densities are dependent on Z and the dependencies are largely reduced by our mitigation strategy, as depicted in Figure 3.10(b). 3.3.2.1.
Problem definition
Cyberbullying is often characterized as a repeated rather than a oneoff behavior. This unique trait has motivated research that focuses on the detection of cyberbullying in entire social media sessions. In contrast to a single text, e.g., a tweet, a social media session is
Practices of Socially Responsible AI
(a)
117
(b)
Fig. 3.10: Conditional probability densities of standard HAN and debiased HAN on sessions with and without sensitive triggers z in the Instagram dataset released by (Hosseinmardi et al., 2015): (a) HAN; (b) Debiased HAN.
typically composed of an initial post (e.g., an image with a caption), a sequence of comments from different users, timestamps, spatial location, user profile information, and other social content such as the number of likes. Thus, a session-based approach adopted for cyberbullying detection presents a number of characteristics such as multimodality and user interaction. Because our goal is to mitigate bias in natural language, we focus on text (i.e., a sequence of comments) in a social media session. We formally define debiasing session-based cyberbullying detection as follows: Definition 3.10 (Debiasing Cyberbullying Detection in a Social Media Session). We consider a corpus of N social media sessions {∫1 , ∫2 , . . . , ∫N }, in which each session consists of a sequence of comments denoted as {c1 , . . . , cC }. A session is labeled as either y = 1 denoting a bullying session or y = 0 denoting a non-bullying session. Let D be the dimension of extracted textual features (e.g., Bag of Words), xi for ci , and S be the list of sensitive triggers (e.g., “gay”). Session-based cyberbullying detection aims to learn an accurate and debiased classifier using a sequence of textual data to identify if a social media session is a cyberbullying instance: F : S ∪ {x1 , . . . , xC } ∈ RD → {0, 1}.
(3.16)
118
3.3.2.2.
Socially Responsible AI: Theories and Practices
A non-compromising approach
An unbiased model for cyberbullying detection makes decisions based on the semantics in a social media session instead of sensitive triggers potentially related to cyberbullying, such as “gay”, “black”, or “fat”. In the presence of unintended bias, a model may have high performance for sessions with these sensitive triggers without knowing their semantics. Assessing Bias. Bias in a text classification model can be assessed by the False Negative Equality Difference (FNED) and False Positive Equality Difference (FPED) metrics. They are a relaxation of Equalized Odds in Section 2.1.1 and defined as FNED = |FNRz − FNRoverall |, (3.17) z
FPED =
|FPRz − FPRoverall |,
(3.18)
z
where z denotes cyberbullying-sensitive triggers, such as “gay”, “black”, and “Mexican”. FNRoverall and FPRoverall denote the False Negative Rate and False Positive Rate over the entire training dataset. Similarly, FNRz and FPRz are calculated over the subset of the data containing the sensitive triggers. An unbiased cyberbullying model meets the following condition: P (Yˆ |Z) = P (Yˆ ),
(3.19)
where Yˆ stands for the predicted label. By Eq. (3.19), we imply that Yˆ is independent of the cyberbullying-sensitive triggers Z; that is, a debiased model performs similarly for sessions with and without Z. This means that the classifier does not rely on these sensitive triggers for prediction. Mitigating Bias. The challenge of this task is how to debias while the model observes the comments sequentially. Essentially, a debiasing session-based cyberbullying detection is a sequential decisionmaking process where decisions are updated periodically to assure high performance. In this debiasing framework, comments arrive and are observed sequentially. At each time step, two decisions are made based on the feedback from past decisions: (1) predicting whether a
Practices of Socially Responsible AI
119
Agent state (comments[
action
])
Environment Session
Data
comment comment ... comment
action
state
Reward Function
Fig. 3.11: Overview of the proposed model. The agent (a classifier) interacts with the environment to gather experiences Mt that are used to update the agent.
session contains bullying and (2) gauging the performance differences between sessions with and without sensitive triggers. Our debiasing strategy is built on the recent results of reinforcement learning, particularly, the sequential Markov Decision Process (MDP). In this approach, an agent A interacts with an environment over discrete time steps t; the agent selects action at in response to state st . at causes the environment to change its state from st to st+1 and returns a reward rt+1 . Therefore, each interaction between the agent and the environment creates an experience tuple Mt = (st , at , st+1 , rt+1 ). The experience tuple is used to train the agent A through different interactions with the environment. The agent’s goal is to excel at a specific task such as summarizing text or generating text. Reinforcement learning here basically serves as an optimization framework that tries to alleviate the unintended bias while improving (or at least preserving) the classification accuracy. In particular, we consider a standard classifier F (e.g., HAN) as a reinforcement learning agent and a sequence of comments observed at time {1, 2, . . . , t}
120
Socially Responsible AI: Theories and Practices
as state st . The agent selects an action at ∈ {non-bullying, bullying} according to a policy function π(st ). π(st ) indicates the probability distribution of actions a in response to state st , whereas π(st , at ) shows the probability of choosing action at in response to state st . The action can be interpreted as the predicted label yˆ using the input comments. The reward rt+1 is then calculated for the state-action set (st , at ) and the cumulative discounted sum of rewards Gt is used to optimize the policy function π(st ). A dilemma often faced by researchers studying bias and fairness in machine learning is the trade-off between fairness and effectiveness. Under this trade-off theory, forcing cyberbullying classifiers to follow the proposed debiasing strategy would invariably decrease the accuracy. One interesting finding we observe is that, somewhat counterintuitively, this approach can outperform biased models w.r.t. overall cyberbullying detection accuracy, while also decreasing unintended biases in the data. This non-compromising approach may be attributed to the proposed reinforcement learning framework that makes dependent decisions given the sequential input.
3.4. 3.4.1.
Concluding Remarks Summary
In this chapter, we looked at the “philanthropic” role of AI systems; in particular, how to leverage advanced AI techniques to address challenging societal issues such as cyberbullying, misinformation, and privacy preservation via Protecting, Informing, and Preventing. Aiming to materialize AI for good, the first question we probably need to answer is “what makes an AI project good”8 : Is it the “goodness” of the application, such as how it improves users’ health, environment, or education? Or, is it the types of problems being solved, such as predicting fake news or detecting cancer earlier? Asking the right questions is key. It is also necessary to examine the responsible AI principles because applications in “AI for Social Good” are often
8 https://venturebeat.com/2020/10/31/how-to-make-sure-your-ai-for-good-pr oject-actually-does-good/
Practices of Socially Responsible AI
121
high-stakes. Again, the four AI responsibilities (functional, legal, ethical, and philanthropic responsibilities) are not mutually exclusive. To improve AI for good, we need to bring together scholars from communication studies, philosophy, law, computer science, and other disciplines, to better understand the nature of AI algorithms and to propose regulatory and technological paths forward. 3.4.2.
Additional Readings
We recommend the following readings regarding each dimension: Protecting: • Characterization, Detection, and Mitigation of Cyberbullying, Charalampos Chelmis and Daphney–Stavroula Zois. 2018 ICWSM Tutorial. http://www.cs.albany.edu/∼cchelmis/icwsm2018tutoria l/CyberbullyingTutorial ICWSM2018.pdf • Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104. • Boulemtafes, A., Derhab, A., & Challal, Y. (2020). A review of privacy-preserving techniques for deep learning. Neurocomputing, 384, 21–45. Informing: • Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. • Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F.,. . . & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094–1096. Preventing: • Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635– E3644. • Bias and Fairness in NLP, Chang et al. Tutorial, 2019 EMNLP. http://web.cs.ucla.edu/∼kwchang/talks/emnlp19-fairnlp/
This page intentionally left blank
Chapter 4
Challenges of Socially Responsible AI
Socially responsible AI is a large umbrella that has included a range of important topics such as those discussed in previous chapters. What we covered in this book are surely just the tip of the iceberg and there remains a variety of exciting research questions waiting for AI researchers and practitioners to explore, understand, and address. In this chapter, we select a few open problems and challenges facing socially responsible AI and discuss the status quo regarding each direction. 4.1.
Causality and Socially Responsible AI
Causality, a concept often posed against correlation, is inherently related to many of the hard problems in AI, such as generalization, fairness, and interpretability. These challenges arise because AI systems are typically trained to improve the prediction accuracy without knowing the underlying data generating process (DGP). Causality, with its focus on modeling and reasoning about interventions, can contribute significantly to understanding and tackling these problems and thus take the field to the next level (Sch¨olkopf, 2022). In the era of big data, especially, it is possible to learn causality by leveraging both causal knowledge and the copious real-world data — an emerging research field known as causal learning (Guo et al., 2020). In this section, we will discuss causality and its inherent connection to socially responsible AI. In particular, we will revisit some of the principles in Chapter 2 from a causal perspective. 123
124
4.1.1.
Socially Responsible AI: Theories and Practices
Causal Inference 101
Let us first differentiate several related terms. A non-technical definition of causality is the following: if doing A can make B happen or change, then A is the cause of B. There are two fundamental tasks in causality. Causal discovery aims to learn the causal relations among variables. Causal inference, on the other hand, is to estimate the strength of a causal relation (also known as causal effect estimation). Since the discussion of this first challenge is primarily based on causal inference, we briefly review its basics in the following paragraphs. Readers interested in a more detailed introduction to causal inference and causal discovery may refer to the surveys such as Guo et al. (2020) and books such as Pearl and Mackenzie (2018). The first thing we need to figure out is how to formulate causal knowledge using the causal models. A causal model is a mathematical abstraction that quantitatively describes the causal relations among variables derived from causal assumptions or prior causal knowledge. Causal models are typically incomplete; we then need to infer what is missing using observational data. If we can use observational data for effect estimation, then we say the effect is identifiable. The two fundamental causal models in causal inference are structural causal models (SCMs) (Pearl, 2009) and potential outcome (Rubin, 1974). SCMs rely on the causal graph, which is a special class of Bayesian networks with edges denoting causal relationships. A more structured format of SCMs is referred to as structural equations. One of the fundamental notions in SCMs is the do-calculus (Pearl, 2009), an operation for intervention. The difficulty to conduct causal study is the difference between the observational P (y|t) and interventional distribution P (y|do(t)), the latter describes what the distribution of outcome Y is if we were to set the treatment T = t. Structural Causal Models. An SCM often consists of two components: the causal graph (causal diagram) and the structural equations. A causal graph forms a special class of Bayesian networks with directed edges representing the cause–effect relations. Definition 4.1. Causal Graph. A causal graph G = (V, E) is a directed graph that describes the causal effects between variables, where V is the node set and E is the edge set. In a causal graph, each node represents a random variable including the treatment, outcome,
Challenges of Socially Responsible AI
(a)
125
(b)
Fig. 4.1: Causal graphs with and without intervention: (a)An example of backdoor path between T and Y ; (b) A causal graph under the intervention do(t ).
and other observed and unobserved variables. A directed edge t → y implies a causal effect of t on y. What makes P (y|do(t)) and P (y|t) different is the back-door path between T and Y . It is a path that contains an arrow into T , as shown in Figure 4.1(a). If the following back-door criterion (Pearl, 1993) is satisfied: (1) no node in X is a descendant of T and (2) X blocks every path between T and Y that contains an arrow into T , then we can use the following back-door adjustment formula to identify and estimate the effect of T on Y : P (Y |do(T = t)) = P (Y |T = t, X = x)dP (x). (4.1) Note that to satisfy (2), we need the causal intervention. An intervention is a change to the DGP described by the operator do(t ) denoting the intervention of setting the value of the variable T to t . With the do-calculus, the interventional distribution p(y|do(t)) describes the distribution of Y if we force T to take the value t while keeping the rest in the process same. This is the same as removing all the inbound arrows to T in the causal graph, as shown in Figure 4.1(b). If the path T → M → Y exists, we refer to M as the mediator since it contributes to the overall effect of T on Y . Given an SCM, the conditional independence embedded in its causal graph provides sufficient information about whether a causal inference method satisfies the criteria. Structural equations specify, by means of a set of linear equations, how a set of variables are related to each other in the causal graph. Given a unit i (i.e., an individual sample), let xi , ti ∈ {0, 1}, yi be the features (i.e., background variables), treatment assignment with
126
Socially Responsible AI: Theories and Practices
ti = 1 being under treated and ti = 0 under control, and outcome variable, respectively. Estimating causal effects is defined as follows: Definition 4.2 (Causal Effects Estimation). Given n units {(x1 , t1 , y1 ), . . . , (xn , tn , yn )}, estimating causal effects is to quantify the changes of Y as we alter the treatment assignment from 0 to 1. Under different contexts, effects can be estimated within the entire population, a subpopulation that is defined by background variables, an unknown subpopulation, or an individual. Average Treatment Effect (ATE) τ is typically used for assessing a population represented by the distribution of X: τ = EX [τ (x)] = EX [Y |do(T = 1)] − EX [Y |do(T = 0)],
(4.2)
where do(T = 1) and do(T = 0) indicate that the treatment assignment is “treated” and “control”, respectively. Potential Outcome Framework. The potential outcome framework interprets causal inference as follows: given the treatment and outcome, we can only observe one potential outcome. The counterfactuals — potential outcome that would have been observed if the individual had received a different treatment — however, can never be observed in reality. A potential outcome is defined as follows: Definition 4.3 (Potential Outcome). Given the treatment and outcome t and y, the potential outcome of unit i, yit , is the outcome that would have been observed if i had received treatment t. The potential outcome framework (Rubin, 1974) articulates the fundamental challenge of causal inference: only one potential outcome can be observed for each unit in the same study. When the population is heterogeneous, ATE defined in Eq. (4.2) can be misleading as the same treatment may affect individuals differently. With the potential outcome framework, we can estimate effects at a granular level. Under heterogeneity, a common assumption requires that each subpopulation is defined by a set of features, i.e., Conditional ATE (CATE): CAT E : τ (x) = E[Y |do(t = 1), x] − E[Y |do(t = 0), x].
(4.3)
Challenges of Socially Responsible AI
127
An Individual Treatment Effect (ITE) is a contrast between potential outcomes of a unit: τi = Yi (do(t = 1)) − Yi (do(t = 0)).
(4.4)
Note that τi is not necessarily equal to τ (x) as the latter is an average over a subpopulation. The goal of the causal effect estimation task is to learn a function τˆ that estimates ATE, CATE, or ITE, depending on the degree of homogeneity of the population, for binary treatment options: T = 0 and T = 1. Given ITEs, ATE under potential outcome framework can be formulated as the expectation of ITEs over the whole population i = 1, . . . , n: n 1 1 τ = Ei [τi ] = Ei [yi1 − yi0 ] = (yi − yi0 ). (4.5) n i=1
Standard Causal Assumptions. Similar to how P (y|do(t)) and P (y|t) are distinct in the SCM framework, P (y t ) and P (y|t = 1) are not the same within the potential outcomes framework. There are three standard causal assumptions. (1) The fundamental assumption in causal effect estimation is the Stable Unit Treatment Value Assumption (SUTVA). It delineates two conditions: well-defined treatment levels and no interference. A treatment is well defined if two different units i = j with the same treatment assignment values receive the same treatment. The condition of no interference requires that the potential outcomes of a unit are independent of the treatment assignment of other units. Formally, yit = yiti , where t ∈ {0, 1}n denotes the vector of treatments for all units. Causal studies of social media users connected in a social network can violate this assumption, e.g., the dependent happenings of COVID-19, where whether one person becomes infected depends on who else in the social network is vaccinated. (2) The second assumption, consistency, says that a unit’s potential outcome under his or her observed exposure history is the outcome that will actually be observed for that unit. Or, P (YT =1 = 1|T = 1) = P (Y = 1|T = 1).
(4.6)
It requires that there are no two versions of treatment such that T = 1 under both versions, but the potential outcome for Y would be different under the alternative versions.
128
Socially Responsible AI: Theories and Practices
(3) The last assumption, unconfoundedness (also called ignorability), posits that the set of confounding variables S can be measured. Formally, yi1 , yi0 ⊥⊥ ti |s, where s denotes confounders, each element of which is a feature that causally influences both the treatment t and the outcome yit . What it basically says is that S can fully block the back-door path between t and y. If the conditions P (t = 1|x) ∈ (0, 1) and P (x) > 0 are satisfied, along with the ignorability, we then have the strong ignorability assumption. 4.1.2.
Causality-based Fairness Notions and Bias Mitigation
4.1.2.1.
Causal fairness
Even though ethicists and social choice theorists have recognized the importance of causality in defining and reasoning fairness decades ago (e.g., Cohen, 1989; Dworkin, 2002), causality-based fairness notions have only been defined recently, even later than the statistical (i.e., correlation-based) fairness notions we discussed in Chapter 2. Empirical studies (e.g., Cappelen et al., 2013) suggested that when choosing policies and designing systems that will influence humans, we should minimize the causal dependence on uncontrollable factors such as their perceived race or gender (Loftus et al., 2018). To see why correlation-based fairness can be misleading sometimes, please refer to a detailed example of an automated teacher evaluation system given in the survey by Makhlouf et al. (2020). Basically, the moral of this example is to show that any statistical fairness notion that relies exclusively on correlation among variables can fail to detect some biases (e.g., the evaluation system will more likely fire teachers who have been assigned classes with low-level students at the beginning of the academic year). So, it is important to consider causal relations among variables when discussing fairness. As there already exist comprehensive surveys regarding causality-based fairness notions (e.g., Carey and Wu, 2022; Makhlouf et al., 2020), we only touch on some of them that are more widely used in practice. From the causal perspective, fairness can be formulated as estimating causal effects of a sensitive attribute, such as gender, on the outcome of an AI system. In the following, we discuss interventional fairness and counterfactual fairness.
Challenges of Socially Responsible AI
129
Interventional Fairness. This is defined based on the ATE under the SCM framework, specifically, the interventional distribution, P (Yˆ |do(A = a), X = x). It includes total, natural direct, natural indirect, and path-specific causal fairness notions. Total causal fairness answers the question of how would the outcome Y changes on average if the sensitive attribute A changed. It is defined as follows: Definition 4.4 (Total Causal Fairness). Given the sensitive attribute A and decision Y , total causal fairness is achieved if P (ya=1 ) − P (ya=0 ) = 0,
(4.7)
where P (ya=1 ) = P (y|do(a = 1)). For example, total causal fairness would require that the probability of all applicants being admitted being female and all applicants being male is the same. We can further decompose the total effect into direct and indirect effects, i.e., the causal effect of A on Y consists of both the direct discriminatory effect and indirect discriminatory effect. Pearl defines them as Natural Direct Effect (NDE) and Natural Indirect Effect (NIE), respectively. This might be better understood by the wellknown example of the Berkeley College Student Admissions. In the year 1973, Berkeley was sued because of the potential gender discrimination in their student admission (Bickel et al., 1975): out of all the female students who applied, only 35% were admitted, while out of all the male applicants, 44% of them were admitted. What was interesting is, when the data were broken into different departments, the result was surprisingly the opposite of what was expected: out of Berkeley’s six departments, four of them admitted more women than men. Let X1 denote the mediator that leads to NIE and X2 , the department category variable (also a mediator), we can describe this example using the causal graph in Figure 4.2. What this causal graph tells us is that A → X2 → Yˆ represents the explainable effect of gender on admissions through the mediator X2 , e.g., female and male applicants have different preferences in the department category. This may not be considered gender discrimination.1 A → X1 → Yˆ delineates the indirect gender discrimination 1 It is possible that these different preferences are due to historical bias against females. In this case, A → X2 → Yˆ is considered NIE.
130
Socially Responsible AI: Theories and Practices
Fig. 4.2: The causal graph of the Berkeley College Students Admission example. A → X2 → Yˆ represents the explainable effect of gender on admission through X2 (the department category), A → X1 → Yˆ delineates the indirect discrimination (NIE), and A → Yˆ delineates the direct discrimination (NDE).
through the mediator X1 , e.g., the funding source dependent on the gender. X1 is also called the redlining attribute. A → Yˆ delineates the direct gender discrimination. NDE is used to define the direct discrimination: NDEa=1,a=0 (y) = P (ya=1 , za=0 ) − P (ya=0 ),
(4.8)
where Z denotes the set of mediators. Equation (4.8) requires that A is set to 1 in the path A → Y and set to 0 in other paths. In the Berkeley admissions example, P (ya=1 , za=0 ) would be the probability of being admitted after changing the gender to Male while keeping X1 and X2 the same. NIE is used to define the indirect discrimination: NIEa=1,a=0 (y) = P (ya=0 , za=1 ) − P (ya=0 ).
(4.9)
Since we cannot differentiate the NIE and explainable effect of A, we need a definition at a granular level: the path-specific effect (PSE). Definition 4.5 (Path-Specific Effect (PSE) (Pearl, 2022)). Given a causal graph G and a causal path set π, the π-specific effect of changing the value of X from x0 to x1 on Y = y is given by PSEπx1 ,x0 (y) = P (yx1 |π,x0 |π ) − P (yx0 ),
(4.10)
where P (yx1 |π,x0 |π ) represents the post-intervention distribution of Y , the effect intervention do(x1 ) is transmitted only along π, and the effect of reference intervention do(x0 ) is transmitted along the other paths π. The PSE-based fairness notion can be then defined as follows:
Challenges of Socially Responsible AI
131
Definition 4.6 (Path-Specific Causal Fairness). Given the sensitive attribute A, decision Y , and redlining attributes R, π is the path set that contains some paths from A → Y . Path-specific causal fairness is achieved if the following condition is satisfied: PSEπa=1,a=0 (y) = P (ya=1|π,ya=0|π ) − P (ya=0 ) = 0.
(4.11)
Define πd as the path set that only includes A → Y and πi the path set that contains all the causal paths from A to Y which pass through R. Direct causal fairness is achieved if PSEπd (a = 1, a = 0) = 0 and indirect causal fairness is achieved if PSEπi (a = 1, a = 0) = 0. In the admission example, the direct discrimination can be explained as the expected change in decisions of female applicants (i.e., individuals from the marginalized group), if the university is told that these applicants were male (i.e., individuals from the non-marginalized group). The indirect discrimination is then the expected change in decisions of female applicants if the values of the redlining attributes in the profiles of these female applicants were changed as if they were male applicants. Note that for any path sets πd and πi , the following is not necessarily true: PSEπd (a = 1, a = 0) + PSEπi (a = 1, a = 0) = PSEπd ∪πi (a = 1, a = 0). This is because the direct and indirect discrimination might not be lineally correlated. Counterfactual Fairness. All of the previous fairness notions are defined at the population level without any prior observations. To describe fairness on the individual-level, we need to use the language related to counterfactuals. If we have certain observations of a set of attributes O = o and use them as conditions when estimating the effect, we can then perform causal inference on the sub-population specified by O = o only. Counterfactual inference is to infer the quantities involving two worlds simultaneously: the real world represented by the causal model M and the counterfactual world by Mc , given the interventional distribution conditioning on certain individuals specified by O = o. One such individual-level fairness is counterfactual fairness (Kusner et al., 2017), which says that a decision is considered fair if it is the same in both “the actual world” and “a counterfactual world”. We have briefly discussed it in Chapter 2, now we formally define this fairness notion using the causal language we learnt earlier.
132
Socially Responsible AI: Theories and Practices
Counterfactual fairness is defined as follows: Definition 4.7. Given x ∈ X and a ∈ A, the predictor Yˆ is counterfactually fair if P (YˆA←a (U ) = y|X = x, A = a) = P (YˆA←a (U ) = y|X = x, A = a) (4.12) holds for all y and any a ∈ A. U refers to a set of latent background variables in a causal graph. This definition states that given what we have observed (i.e., X = x, A = a), if the factual outcome probability P (ya |x, a) and the counterfactual outcome probability P (ya |x, a) are equal for an individual, then s/he is treated fairly as if s/he had been from another sensitive group. Or, A should not be the cause of Yˆ . Let us see what counterfactual fairness means in a slightly modified version of the Berkeley admissions example. Suppose the university used a machine learning model Yˆ to predict if an applicant should be admitted. a = 1 denotes being a male applicant. The probability of a female applicant getting admitted is P (Yˆ0 |x, 0). Now, assume the gender of the female applicant had been changed to male. The probability of getting admitted becomes P (Yˆ1 |x, 0). If P (Yˆ0 |x, 0) = P (Yˆ1 |x, 0), then the predictor achieves counterfactual fairness. Counterfactual fairness assumes that fairness can be uniquely quantified from observational data, which is not valid in certain situations due to the unidentifiability of the counterfactual quantity. It also assumes that we know the causal graph, which is mostly unavailable in practice. Similar to PSE, there is also path-specific counterfactual fairness (PSCF). It can be used to differentiate the explainable effect from the gender discrimination as shown in Figure 4.3. There are two causal
(a)
(b)
Fig. 4.3: (a) A causal graph with different kinds of paths between the sensitive attribute A and Y ; (b) A → X2 → Yˆ delineates a non-discriminatory path while A → X1 → Yˆ and A → Yˆ are discriminatory paths.
Challenges of Socially Responsible AI
133
paths showing gender discrimination: A → X1 → Yˆ and A → Yˆ . The former is indirect gender discrimination and the latter is direct gender discrimination. A → X2 → Yˆ is a non-discriminatory path because it is due to the free will (e.g., preference to the major) of the applicants. To achieve PSCF, the predictor needs to satisfy the following condition: P (Yˆ (a , X1 (a , X2 (a)), X2 (a)|X1 = x1 , X2 = x2 , A = a) = P (Yˆ (a, X1 (a, X2 (a)), X2 (a)|X1 = x1 , X2 = x2 , A = a). (4.13) In the admission example, PSCF measures, for the same female applicant, the difference of decisions between the actual world of being a female and the counterfactual world of being a male on path π. There is no counterfactual difference if the difference satisfies P (Y = yˆ |x, a, π) a=1 (4.14) ln ≤ δ, P (Y = yˆa=0 |x, a, π) where δ is the discrimination tolerance. 4.1.2.2.
Causality for bias mitigation
On a related note, we can also leverage causality to mitigate biases. Here, we hold an inclusive view of the forms of biases, so, they are not limited to social biases. Due to its interpretable nature, causal inference offers high confidence in making decisions and can show the relation between data attributes and an AI system’s outcomes. Some common practices have been seen to mitigate selection bias and other forms of biases using the propensity score-based method and counterfactual data augmentation. Propensity Score. Propensity score is used to eliminate treatment selection bias and ensure the treatment and control groups are comparable. It is defined as the “conditional probability of assignment to a particular treatment given a vector of observed covariates” (Rosenbaum and Rubin, 1983). Due to its effectiveness and simplicity, propensity score has been used to detect and reduce unobserved biases in various domains, e.g., quantifying the algorithmic fairness (Khademi et al., 2019). One of the most popular domains
134
Socially Responsible AI: Theories and Practices
might be recommender systems. To make sense of it, we need to take an interventional view. In recommender systems, a recommendation can be seen as a treatment/intervention, either under our (as developers) control (e.g., recommending a movie) or chosen by others (e.g., a userselected movie). But in either case, we can only observe the factual outcome under the chosen treatment, not the outcome under a different treatment. The resulting consequence is the selection bias due to users’ self-selection or the recommended items. We then have a biased observation: the observed ratings or clicks do not represent the true ones, e.g., ratings obtained when users randomly rate items. It is the reasoning about treatments and counterfactuals that provide a formal basis to address the selection bias (Joachims et al., 2021). One standard practice is the Inverse Propensity Scoring (IPS) method. Given a user-item pair (u, i) and Ou,i {0, 1} denoting whether u observed i, we define the propensity score as Pu,i = P (Ou,i = 1), i.e., the marginal probability of observing a rating. During the model training phase, an IPS-based unbiased estimator is defined using the following empirical risk function (Schnabel et al., 2016): arg min θ
σ ˆu,i (r, rˆ(θ)) + Reg(θ), Pu,i
(4.15)
Ou,i =1
where σ ˆu,i (r, rˆ(θ)) denotes an evaluation function of the ground-truth rating r and predicted rating rˆ(θ), and Reg(θ) is the regularization for model complexity. In Section 3.2.2, we have seen how to reduce the selection bias when modeling fake news dissemination in a similar manner. Similarly, IPS can be used to address the position bias, which occurs in a ranking system where users tend to interact with items in higher ranking positions. To remedy this position bias, we can use IPS to reweight each data instance with a position-aware value. The idea is assigning the items ranked down in the list larger weights than those top-ranked items, therefore, to compensate for the effect of position bias on user exposure. Given the ranking model S ∈ S, the loss function of such models is defined as follows (Joachims et al., 2017): ˆ IPS (S)}. Sˆ = arg min{R s∈S
(4.16)
Challenges of Socially Responsible AI
135
So, we are looking for a ranking system that can give us the smallest empirical risk using implicit feedback such as which items a user actually clicked. As opposed to explicit feedback such as ratings, implicit feedback can only give us partial information: we do not know what happened to the items users have not clicked. Is it because the user considers that item irrelevant or because s/he did not see ˆ IPS (S) it due to its low rank? The solution to this problem is IPS. R in Eq. (4.16) is then an unbiased estimate of the risk of a ranking system S. Suppose we have N queries {q1 , . . . , qi , . . . , qN }, the result for each query qi is a ranking list S(qi ). y denotes an item in the list and rank(y|S(qi )) is the rank of y. ri (y) ∈ {0, 1} denotes the userspecific relevance (1 relevant and 0 otherwise) and oi (y) ∈ {0, 1} indicates whether y is revealed to users or not (1 revealed and 0 otherwise), then we have N ˆ IPS (S) = 1 R N
i=1 y:oi (y)=1 ∧ri (y)=1
λ(rank(y|S(qi ))) . Q(oi (y) = 1|qi , y i , ri )
(4.17)
Let us break down this equation for better understanding. The first summation is over all N queries. The second summation is for each query; we consider all of the clicked items. The assumption here is that an item is clicked if it is both revealed and relevant: y : oi (y) = 1 ∧ ri (y) = 1. For each clicked item, the nominator λ(rank(y|S(qi ))) can be any weighting function that depends on the rank of y in the ranking list S(qi ), such as the well-known Normalized Discounted Cumulative Gain (NDCG) metric. Then, the denominator Q(oi (y) = 1|qi , y i , ri ) is the propensity, which is defined as the position-based exposure probability. The intuition is that while we know that the items users clicked are most likely relevant, a missing click does not necessarily indicate an irrelevance. It is possible that the user does not even see the item in the ranking list. Therefore, the propensity mitigates position bias by dealing with this missingness. Specifically, it is the probability of y revealed to the user given the query qi , the presented ranking list y i , and the user-specific relevance ri . Note that in practice, the propensity is generally unknown and estimated based on some model of user behavior, see, e.g., (Agarwal et al., 2019; Fang et al., 2019).
136
4.1.2.3.
Socially Responsible AI: Theories and Practices
Counterfactual data augmentation
In Chapter 1, we discussed the potential bias sources such as data and misinterpretation of statistical dependence as causality. For example, representation bias in the data appears when we do not have enough samples from marginalized/disadvantaged groups, which may further lead to the spurious signals between the sensitive attribute and the outcome. So, if we can “manually generate” samples from the counterfactual world, we might have (ideally) sufficient samples from both non-marginalized and marginalized groups. Models are then trained on the data from both worlds. Empirically, this should reduce models’ reliance on semantically irrelevant signals, help combat spurious dependence, and improve the robustness of models under distribution shift. This is referred to as counterfactual data augmentation, a technique to augment training data with their counterfactually revised counterparts via interventions. But how to generate counterfactuals without knowing the causal graphs? One potential solution is to inject causal thinking (invoking interventions and counterfactuals) into real-world settings by leveraging human-in-the-loop feedback to identify causal features (Kaushik et al., 2019). In a sentiment analysis task, for example, the human annotators are presented with the document-label pairs and asked to make minimal modifications to the documents in order to flip their labels. The sentiment classifier trained on the augmented dataset is shown to assign little weight to the associated but irrelevant terms such as “will”, “my”, and “has”. Here, we are actually relying on the implicit causal knowledge embedded in the mind of human annotators. However, it is almost impossible to adapt the method to realworld data such as natural language and images due to their large scale and complexity. The natural solution is to generate counterfactuals automatically, given a certain condition (e.g., when the sentiment changes from positive to negative). However, whether these generated samples are strictly “counterfactuals” is questionable. Furthermore, without the causal knowledge, we do not have control of how these samples are generated: the semantic meanings of these samples could be totally changed in the process or even lose meaning and become incomprehensible, when we replace the target with its counterfactual counterparts. Anyhow, let us see an example of using automatic counterfactual generation to reduce gender bias. The new dataset is used to
Challenges of Socially Responsible AI
137
encourage the algorithms not to capture any gender -related information. One such method (Lu et al., 2020) generates a gender-free list of sentences using a series of sentence templates to replace every occurrence of gendered word pairs (e.g., he:she and her:him/his). Both the templates and word pairs are curated by humans. It formally defines counterfactual data augmentation as follows: Definition 4.8 (Counterfactual Data Augmentation). Given input instances S = {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )} and intervention c, a c-augmented dataset S is S ∪ {(c(x), y)}(x,y)∈S . The underlying assumption is that an unbiased model should not distinguish between matched pairs and should produce the same outcome for different genders (Lu et al., 2020). 4.1.3.
Causality and Interpretability
Let us first revisit the example in Section 2.2. So, your loan application was rejected by the bank and suppose that the bank has a nice interpretable model (for example, the LIME model we discussed in Section 2.2) that can tell you what attributes in your profile lead to such a decision. But you still have no idea what changes you need to adopt to get the application approved. Two example questions you might be interested in are as follows: (1) What will happen if I increase my annual salary by 10%? and (2) what would have happened had I earned 10% more annually? From what we learned about causal inference in Section 4.1.1, the first question needs interventions and the second one requires counterfactual thinking. Surely, both these questions cannot be answered using your current profile information as it only gives you the factual outcome. Instead, they are exclusively enabled by causal language and theory. In a high-stakes application, causal interpretability of black box models is almost indispensable. A rule-based learning algorithm may tell you a counterintuitive explanation, e.g., that patients with asthma are less likely to die from pneumonia. However, this is because an existing policy requires that asthmatics with pneumonia should be directly admitted to the intensive unit and receive better care (Zhao and Hastie, 2021). Such dependence-based explanations might help improve the models’ prediction accuracy but have little use or even a negative impact on the decision-making process which requires
Socially Responsible AI: Theories and Practices
138
Typical Acvity
Typical Queson
Example
3. Counterfactual Interpretability
Level
Imagining, Retrospecon
why?
Would my loan applicaon have been approved had I earned 10% more annually?
2. Intervenonal Interpretability
Doing
What if I do X?
1. Stascal Interpretability
Seeing
What is?
Fig. 4.4:
What will happen if I increase my annual salary by 10%? Which profile aribute has the biggest impact on my applicaon decision?
The causal ladder of interpretability.
Fig. 4.5: A three-layer feed forward neural network: (a) and its corresponding causal graph (b). We can further simplify the causal graph to (c) because we only observe the input and output layers.
understanding the laws of nature (i.e., science). We may better understand the differences using the causal ladder described in Figure 4.4. Since we have discussed statistical interpretability in Section 2.2, let us now take a look at causal interpretabiltiy in the following section. 4.1.3.1.
Model-based causal interpretability
Model-based causal interpretability seeks to understand the importance of each component of a DNN on its predictions. It estimates the causal effect of a particular input neuron on a certain output neuron in the network. The idea is to consider a neural network architecture as a causal graph where the nodes are the neurons and edges indicate the causal effect between neurons in adjacent layers. The goal is to estimate the effect of each neuron on the output based on the observed data and a learned function (i.e., a neural network), i.e., causal attribution. The natural hammer to use is the do-operation we introduced earlier. A causal graph of a three-layer feed-forward neural network and its transformation can be seen in Figure 4.5. Particularly, every n-layer neural network N (l1 , l2 , . . . , ln ) has a corresponding SCM M ([l1 , . . . , ln ], U, [f1 , . . . , fn ], PU ), where fi refers
Challenges of Socially Responsible AI
139
to the set of causal functions for neurons in layer li . U denotes a group of exogenous random variables that act as causal factors for input layer l1 . Pu defines the probability distribution of U . As we only observe the input and output layers, M can be further reduced to an SCM with only input layer l1 and output layer ln , M ([l1 , ln ], U, f , PU ), by marginalizing out the hidden neurons. Finally, we can estimate the average causal effect (ACE) of a feature xi ∈ l1 with value α on output y ∈ ln by Chattopadhyay et al. (2019) ACEydo(xi =α) = E [y | do (xi = α)] − baselinexi ,
(4.18)
where baselinexi = Exi [Ey [y | do (xi = α)]]. The “baselinexi ” is needed because xi is a continuous variable, in contrast to the binary value of a treatment in causal inference. 4.1.3.2.
Counterfactual explanations
Counterfactuals are at the highest level of the causal ladder. Different from model-based interpretability which deals with model parameters to determine the vital components of the model, counterfactual explanations typically describe scenarios such as “If X had not occurred, Y would not have occurred”. It is example-based interpretability, that is, explaining the behavior of a model by generating counterfactual examples. There are several differences comparing the above model-based interpretability with counterfactual explanations: First, the latter is a local explanation while the former is a global explanation. Second, a counterfactual explanation has simpler causal graphs than model-based interpretability which tries to explain the internal structure of a neural network that can consist of hundreds of neurons. Third, counterfactual explanations are more user friendly. Specifically, we consider the predicted outcome as the event Y and the features fed to the model as the causes X. A counterfactual explanation can be defined as a causal situation of the form where an output Y , which occurs given the feature input X, can be changed to a predefined output Y by minimally changing the feature vector X to X . One of the earliest works in counterfactual explanation is by Wachter et al. (2017) who proposed minimizing the mean squared error between the predicted outcomes of the counterfactuals (unknown) and counterfactual outcomes (known beforehand) as well as the distance between the original instances and
Socially Responsible AI: Theories and Practices
140
their corresponding counterfactuals in the feature space. The objective function is defined as 2 arg min max L (x, xcf , y, ycf ) = λ · fˆ (xcf ) − ycf + d (x, xcf ) , xcf
λ
(4.19) where x and xcf denote the observed and counterfactual features, respectively. y and ycf are the observed and counterfactual outcomes, respectively. The first term indicates the distance between the model’s prediction for the counterfactual input xcf and the desired counterfactual output. The second term delineates the distance between the observed features x and counterfactual features xcf . λ is the hyperparameter balancing the importance of the two distances. d(x, x ) is a distance function and is defined as follows:
d(x, x ) =
p xj − xj j=1
MADj
,
(4.20)
where MAD is the median absolute deviation. So, it is the Manhattan distance weighted with the inverse median absolute deviation of each feature and there are in total p features. 4.1.3.3.
Partial dependence plots
The relation between causal interpretability and the partial dependence plots (PDPs) introduced by Friedman (2001) starts from a curious observation (Zhao and Hastie, 2021). PDP is one of the most used visualization tools of black box models. What is interesting about PDP is that under certain conditions, the definition of PDP is exactly the same as the back-door adjustment formula (Eq. (4.1)) used to identify the causal effect of X on Y . Let us see what we mean here. Given a machine learning algorithm f (·), its partial dependence on variables XS ∈ X (or the partial effect of XS on Y ) is defined as fS (xS ) = EXC [f (xs , XC )] = f (xs , XC )dP (xC ), (4.21) where XC is the complement set of XS . Now, compare Eq. (4.21) with Eq. (4.1), they are surprisingly identical! To be more specific,
Challenges of Socially Responsible AI
141
what Pearl showed in Pearl (1993) is that if XC satisfies the graphical back-door criterion w.r.t. XS and Y , then the causal effect of XS on Y can be estimated using observed data by the following formula: P (Y |do(XS = xS )) = P (Y |XS = xS , XC = xC )dP (xC ). (4.22) Equation (4.22) implies that if the complement set XC can successfully block the back door between XS and Y , what PDP visualizes are then the causal relationships between XS and Y . In this regard, three requirements should be satisfied to enable causal interpretability (Zhao and Hastie, 2021): (1) A machine learning model with good predictive accuracy. This is to make sure this black box model is close to the true function. (2) A causal diagram defined by domain knowledge. This is to ensure the back-door criterion is satisfied. (3) Suitable visualization tools such as PDP. Interested readers can refer to (Zhao and Hastie, 2021) for the application of PDP to real-world datasets. 4.2.
How Context Can Help
Context is the core of socially responsible AI due to its inherently elaborate nature, e.g., the “Transparency Paradox”. To understand and quantify the relationships among the various principles (some are trade-offs and some are not), e.g., fairness, transparency, and safety, they have to be placed in specific contexts. One such context is the social context. Existing responsible AI algorithms, once introduced into a new social context, may render current technical interventions ineffective, inaccurate, and even dangerously misguided. A recent study (S¨ uhr et al., 2020) found that while fair ranking algorithms such as Det-Greedy (Geyik et al., 2019) help increase the exposure of minority candidates, their effectiveness is limited by the job contexts in which employers have a preference for a particular gender. How to properly integrate social context into socially responsible AI is still an open problem. Algorithmic context (e.g., supervised learning, unsupervised learning, and reinforcement
142
Socially Responsible AI: Theories and Practices
learning) is also extremely important when designing socially responsible AI for the given data. A typical example is the feedback loop problem in predictive policing. After updating the machine learning model for predictive policing using the discovered crime data (e.g., arrest counts), the retrained model can be susceptible to runaway feedback loops: e.g., where police are repeatedly sent to the same neighborhood regardless of the true crime rate (Ensign et al., 2018). A subtle algorithmic choice can have huge ramifications on the results. Consequently, we need to understand the algorithmic context to make the right algorithmic choices when designing socially responsible AI systems. Designing context-aware algorithms is the key to achieving socially responsible AI. In the following, we show two such examples. 4.2.1.
A Sequential Bias Mitigation Approach
We have shown earlier that current machine learning models for toxicity and cyberbullying detection exhibit problematic and discriminatory performances, resulting in poorer prediction and negatively impacting disadvantaged and minority groups. Despite promising efforts to debias toxicity detection and related tasks (e.g., cyberbullying detection), most research to date is based on two assumptions: (1) bias mitigation is a “static” problem where the model has access to all of the information and makes a one-time decision and (2) different types of biases are independent of one another. Yet, comments/words in social media often come in a sequence instead of all at once. In this environment, conventional batch processing can be impractical. Furthermore, the relations among different biases are complex. As shown in Figure 4.6, sessions containing comments written in Hispanic English (HE) or African American English (AAE) with swear words contribute larger portions of toxic sessions than those without swear words in a benchmark dataset. Recent work (e.g., Kim et al., 2020) also showed evidence of intersectional bias within toxicity detection: AAE tweets were 3.7 times as likely and tweets by African American males were 77% more likely to be labeled as toxic. In the social sciences, intersectionality is the idea that multiple identity categories (e.g., race and gender) combine interactively in ways that contribute to greater bias than the bias associated with each category alone. Informed by these
Challenges of Socially Responsible AI
143
Fig. 4.6: Percentages of toxic (red) and non-toxic (green) sessions containing different biases in the benchmark Instagram data (Hosseinmardi et al., 2015). “Swear in WE” denotes that there are swear words in sessions written in Whitealigned English (WE).
findings, we first hypothesize that biases can be correlated in toxicity detection. To effectively mitigate potentially correlated biases with a sequential input, we address two challenges: (1) making sequential decisions based on partial information, e.g., comments observed so far, given that “static” debiasing may cause unnecessary delay and is less responsive when the input is changing (e.g., topic diversion); and (2) effectively characterizing the relations among individual biases to reduce the overall bias. Conventional debiasing strategies provide a generic and one-size-fits-all solution. A straightforward approach is to add multiple fairness constraints w.r.t. different biases to the training process of a toxicity classifier. However, it overlooks the relations among various biases and confronts challenging optimization problems. This leads to our second hypothesis: with a sequential input, bias mitigation strategies that consider contextual information (e.g., historical comments) and capture bias relations can improve the debiasing performance in the presence of potentially correlated biases. To test our hypotheses, we study the novel problem of joint bias mitigation for toxicity detection via sequential decision-making (Cheng et al., 2022).
144
4.2.1.1.
Socially Responsible AI: Theories and Practices
The approach
Conventional debiasing approaches are less responsive when the conversations between users are changing. A desired debiasing strategy should process sequential comments and make dependent decisions (i.e., whether a comment is toxic) based on these observed contexts. In addition, prior research studied different biases either individually (i.e., debiasing one type of bias at a time) or independently (i.e., debiasing multiple biases that are independent of one another). Nevertheless, bias is complex by nature and different biases might be correlated. Therefore, it is important for the sequential debiasing strategy to identify and capture the relations among various biases while detecting toxic comments. In this section, we first discuss how to measure bias in the presence of multiple types of biases. Then, we detail the proposed joint bias mitigation approach for toxicity detection via sequential decisions. Measuring Bias. Measuring bias is key for addressing unfairness in NLP and machine learning models. This section presents two categories of bias metrics that quantify the differences in a classifier’s behavior across a range of groups within the same identity, e.g., {female, male, other} for gender. They are the Background Comparison Metric (BCM) and the Pairwise Comparison Metric (PCM). The core idea of BCM is to compare a measure m (e.g., False Positive/Negative Rate) of a group within the sensitive attribute p over the group’s background score using the same measure m. Formally, we define the BCM-based fairness metrics, FPEDBCM and FNEDBCM , as follows: FNEDBCM = |FNRz − FNRoverall |, (4.23) z∈p
FPEDBCM =
|FPRz − FPRoverall |,
(4.24)
z∈p
where z denotes the values that a sensitive attribute p ∈ P can be assigned to. P is a set of all considered sensitive attributes, e.g., P = {gender, race}. For example, in case of p = {male, f emale, other}, the FNRz and FPRz are calculated for every group z ∈ p. They are then compared to FNRoverall and FPRoverall — which are calculated
Challenges of Socially Responsible AI
145
on the entire population including all of the considered groups within p, e.g., FNR averaged over all three gender groups, male, female, and other. Note that this is different from another BCM-based metric defined in Eq. (3.18). FNRoverall in Eq. (3.18) is calculated over a pre-defined list of sensitive triggers. BCM allows us to investigate how the performance of a toxicity classifier for particular groups differs from the model’s general performance. When applied to settings with multiple biases, BCM can be less effective, as it treats each bias independently. In addition, when a toxicity classifier presents low performance, the BCM-based metrics may underestimate the bias. Here, we present PCM that quantifies how distant, on average, the performance for two randomly selected groups z1 and z2 within all groups in P is. We formally define the PCM-based metrics as follows: FNEDPCM = |FNRz1 − FNRz2 |, (4.25) P z1 ,z2 ∈( 2 ) FPEDPCM = |FPRz1 − FPRz2 |. (4.26) P z1 ,z2 ∈( 2 ) In both Eqs. (4.25) and (4.26), we measure the difference between every possible pair of groups in P. For example, in a simplified case with gender = {f emale, male} and race = {black, white}, there are totally three random pairs regarding each group. If z1 = f emale, then z2 can be male, black, or white. This forces the algorithm to look into the potential relations across various sensitive attributes. Sequential Bias Mitigation. When comments come in a sequence, a toxicity classifier needs to make decisions based on incomplete information, i.e., comments observed so far. The current decision will, in turn, influence both future prediction results and debiasing strategies. In addition, in the presence of multiple biases, debiasing a toxicity classifier can be more challenging due to the need to capture the potential relations among various biases. To tackle these challenges, in this section, we adopt a reinforcement learning framework similar to Section 3.3.2 that seeks to maximize prediction accuracy and minimize bias measures accounting for cross-bias relations at each time step.
146
Socially Responsible AI: Theories and Practices
In sequential bias mitigation for toxicity detection, the environment includes all of the training sessions, and the agent A is a biased toxicity classifier F. State st is a sequence of t comments A has observed so far. A selects an action a ∈ {toxic, non-toxic} based on an action-selection policy π(st ), which outputs the probability distribution over actions based on st . π(st , at ) represents the probability of choosing action at when observing t comments, i.e., st . After selecting at , the environment returns a reward value rt+1 based on the stateaction set (st , at ). The reward values defined by the toxicity prediction error and bias metrics are then used to calculate the cumulative discounted reward Gt (i.e., the sum of all rewards received so far) and optimize the policy π(st ). At each state st , the reinforcement learning framework maximizes the expected cumulative reward until t to force the agent to improve accuracy and mitigate bias. Essentially, the agent is making dependent decisions to adjust to the sequential input. The environment contains the training dataset D in which every session includes a sequence of comments. At t, the environment randomly selects a session and passes the first t comments of that session to the agent A, which is a toxicity classifier F that outputs a decision probability qˆt . We convert qˆt into an action at using the following criterion: toxic qˆt ≥ 0.5 at = (4.27) non-toxic qˆt < 0.5. Finally, we define the reward function using PCM-based bias metrics to jointly evaluate various types of biases as follows: ⎛ ⎜ 1 t t rPCM = −lF − αi · ⎝ |FPRtz1 |Spi | Spi ∈S z1 ,z2 ∈(p2i ) − FPRtz2 | + |FNRtz1 − FNRtz2 | , (4.28) t denotes the binary prediction loss (e.g., log loss) of the where lF toxicity classifier F, αi represents the importance value of bias related to the sensitive attribute pi ∈ P, and Spi denotes the sessions with
Challenges of Socially Responsible AI
147
sensitive attribute pi . Similarly, the reward function using BCMbased bias metrics can be defined as follows: t t rBCM = −lF − αi · |FPRtz Spi ∈S
− FPRtoverall | +
z∈pi
|FNRtz
− FNRtoverall |
.
(4.29)
The model is evaluated on two benchmark datasets and some interesting findings are that (1) different biases tend to correlate, (2) the joint debiasing strategy outperforms conventional approaches in terms of both bias mitigation and detection performance, and (3) while most of the empirical findings suggest that the size of accessible contextual information is critical for bias mitigation, future research is warranted to obtain more conclusive findings. 4.2.2.
A Multidisciplinary Approach for Context-Specific Interpretability
Hopefully, we are all in a general agreement on the need for interpretability now, though the concept itself lacks clarity on what it means and how it should be applied to different contexts. What we will introduce here is a multidisciplinary approach (Beaudouin et al., 2020) to interpretability that uses context as the starting point. Its “multidisciplinary” view comes from the integration of technical, legal, and economic approaches into a single methodology. The approach shown in Figure 4.7 consists of three major steps: (1) define the main contextual factors such as whom we explain to, the potential harm, and the legal framework; (2) explore potential tools for interpretable machine learning such as the post hoc approaches we covered in Section 2.2.3.3; and (3) choose the right level and form of global and local interpretability while considering the incurred cost. Context can be further decomposed into four factors: • Audience/Recipient factor: To whom the model is providing interpretability? What is their domain of expertise?
148
Fig. 4.7: 2020).
Socially Responsible AI: Theories and Practices
The three pillars of interpretability. Adapted from (Beaudouin et al.,
• Impact factor: What is the risk level of the current task and how might interpretability help? Typically, the higher the risk level, the higher the potential harm, the more the likelihood of interpretability being needed. • Regulatory factor: What is the regulatory environment for the task? What fundamental rights are affected? • Operational factor: To what extent is interpretability an operational imperative? Is it for safety certification or user trust? These four factors also help define the need for interpretability. For example, interpretability might be less important when an AI system serves in a decision-support role compared to a decision-making role, especially when the decision maker (e.g., an expert radiologist) recognizes the possibility of an automated analysis support tool misbehaving (Beaudouin et al., 2020). But more importantly, how do we weigh these four factors? One approach is to require an impact assessment for any AI system that is considered risky. This assessment (Yeung et al., 2019) consists of developing a set of risk assessments, listing everything that might go wrong for the AI system, such as societal harms and environmental harms, and evaluating the corresponding severity level and the probability of each bad event occurring.
Challenges of Socially Responsible AI
4.3.
149
The Trade-offs: Can’t We have Them All?
When talking about socially responsible AI, the one topic you might often encounter is the “utility-X” trade-off. “X” can be fairness, interpretability, privacy, etc. What this suggests is that the principles of socially responsible AI sometimes do not align with the goal of the machine learning models and we have to sacrifice one to improve the other. More likely, these trade-offs have been taken as given in the machine learning literature and are often accompanied by plenty of empirical results and theoretical analyses. But do we have to make those trade offs? We discuss this fundamental question and present some potential answers. 4.3.1.
The Fairness–Utility Trade-off
Let us start with the simplest fairness notions: demographic parity (or statistical parity) and equal opportunity. Recall that, in a setting with binary sensitive attribute a ∈ {0, 1}, demographic parity is defined as the probability of a positive prediction (ˆ y = 1) for both groups (i.e., the majority versus minority groups such as male versus female) being the same. Or formally, P (ˆ y = 1|a = 0) = P (ˆ y = 1|a = 1). The equal opportunity is trying to balance the true positive rate of a classifier: P (ˆ y = 1|a = 0, y = 1) = P (ˆ y = 1|a = 1, y = 1), or the accuracy for the subgroup of samples with positive labels. Despite the many possible ways to define utility, e.g., prediction performance and computational efficiency, we focus on the most common evaluation metric in classification: accuracy. Let us consider the trade-offs in the following four different settings2 : 4-way balanced dataset, group-balanced dataset, outcomebalanced dataset, and imbalanced dataset. They are categorized based on the distributions of a binary outcome variable y and a binary sensitive attribute a. We discuss each of these settings next. 4.3.1.1.
4-way balanced dataset
When the dataset is 4-way balanced (Figure 4.8(a)), i.e., it has two groups of equal size and each group has two classes of equal size, we 2
https://wearepal.ai/blog/when-and-how-do-fairness-accuracy-trade-offs-occur
150
Socially Responsible AI: Theories and Practices
(a)
(c)
(b)
(d)
Fig. 4.8: The four different settings to look into the fairness–utility trade-off. They are categorized based on the distributions of both the binary class labels y and binary sensitive attribute a.
then have the same sample size for (a = 0, y = 0), (a = 0, y = 1), (a = 1, y = 0), and (a = 1, y = 1): 0.25 = P (a = 0, y = 0) = P (a = 0, y = 1) = P (a = 1, y = 0) = P (a = 1, y = 1).
(4.30)
So, if we are fortunate to have an “ideal” classifier, that is, yˆ = y for all test samples, then naturally both the demographic parity and equalized odds/equal opportunity fairness notions defined in Eqs. (2.1) and (2.2) are satisfied. No trade-off is found in this dataset: it is a classifier with perfect utility (i.e., 100% test accuracy here) and perfect fairness.
Challenges of Socially Responsible AI
4.3.1.2.
151
Group-balanced dataset
The second setting relaxes the first setting a little. Here, the group size is still the same, but the classes are imbalanced in each group and the imbalance is the same across groups, as illustrated in Figure 4.8(b). For example, for each group, only 30% of the samples are positive. Then, the following equation holds for all y ∈ {0, 1} : 0.5 = P (a = 0|y = y ) = P (a = 1|y = y ), and P (a = 0) = P (a = 1). Bayes rule tells us that the demographic disparity is still satisfied: P (y = 1|a = 0) = =
P (y = 1, a = 0) P (a = 0|y = 1)P (y = 1) = P (a = 0) P (a = 0) P (a = 1|y = 1)P (y = 1) = P (y = 1|a = 1). P (a = 1) (4.31)
So, an ideal classifier still has perfect test accuracy and perfect fairness in terms of demographic disparity and equal opportunity. 4.3.1.3.
Outcome-balanced dataset
Now, let us relax the setting even more: the group size does not need to be equal between the two groups, but the class ratio has to be the same, i.e., the outcome-balanced dataset shown in Figure 4.8(c). For example, the dataset contains 30% female and 70% male applicants, and for each gender group, we have 30% positive samples. In this case, the ideal classifier will still satisfy the demographic parity and equal opportunity fairness notions because, within a group, P (y = 1|a = 0) = P (y = 1|a = 1). However, the danger is that we have less data representing the minority group, and the model will spend less “effort” on the minority group. So, a “realistic” classifier with imperfect test accuracy, that is, yˆ = y for some test samples, is more likely to perform worse for the minority group, thus, more susceptible to violate the two fairness notions. 4.3.1.4.
Imbalanced dataset
In this last setting, we relax all the constraints and consider the imbalanced dataset, which contains different group sizes and different class ratios across groups, as illustrated in Figure 4.8(d). The
152
Socially Responsible AI: Theories and Practices
ideal classifier will still satisfy the equal opportunity fairness notion because P (ˆ y = 1|a = 0, y = 1) = P (ˆ y = 1|a = 1, y = 1) = 100%. However, since the class ratios of the two groups are not equal anymore, demographic disparity is violated, even with ideal classifiers: P (y = 1|a = 0) = 30% and P (y = 1|a = 1) = 70%. For “realistic” classifiers, it is very possible that they are no longer fair with regard to equal opportunity due to the class imbalance. They cannot achieve demographic disparity fairness as the minority group (a = 0) has much less data and positive samples. To conclude, with regard to the demographic disparity and equal opportunity, there is no trade-off between fairness and utility when the data are ideal and unbiased. Therefore, the fairness–utility tradeoff does not necessarily occur. If the dataset is balanced in specific ways, fairness and utility can be compatible with each other. In practice, this suggests that the trade-off arises due to historic differences in opportunities and representation, making the positive and negative labels of the unprivileged group “less separable” (Dutta et al., 2020). 4.3.2.
The Interpretability–Utility Trade-off
The general understanding of the interpretability–utility tradeoff is that simpler methods (e.g., linear models) that are less capable of learning complex patterns in the data (therefore, low utility) tend to be more interpretable. So, here we discuss the interpretability trade-off via the trade-off between interpretability and model complexity: simpler models are more interpretable and, therefore, have lower utility. Is there only one type of relationship between interpretability and utility? Are simpler models always less accurate? Does interpretability have to come at some inherent cost? Here, we try to answer these questions by presenting you with the current findings in the field. Note that in the following we use model complexity as an approximation of model interpretability and utility is defined based on the prediction performance. While Occam’s Razor suggests that one should use the simplest model to explain the data well, the standard practice seems to be telling a different story: finding a simple-yet-accurate model is hard. A simpler model is often more interpretable and generalizable. The
Challenges of Socially Responsible AI
153
possibility of the existence of a simple yet accurate model may be understood by a large Rashomon set 3 of almost-equally-accurate models. Intuitively, if the Rashomon set is large enough, it is possible to find different but approximately-equally-well-performing models inside it; and if the set includes a simple model, then it is guaranteed to generalize well (Semenova et al., 2019). Given a training set of n data points S = {z1 , z2 , . . . , zn }, zi = (xi , yi ) is drawn i.i.d. from an unknown distribution D and xi ∈ X and yi ∈ Y. The goal is to learn a function f ∈ F : X → Y, where F is a hypothesis space. We define the loss function based on empirical risk minimization as φ : Y × Y → R+ . We then learn f by mini1 ˆ mizing the empirical risk : L(f ) = n ni=1 φ(f (xi ), yi ). The empirical Rashomon set is then a subset of functions in F that performs almost equally well as the ideal model, regarding a loss function. Definition 4.9 (Rashomon set (Semenova et al., 2019)). Given the Rashomon parameter θ ≥ 0, a dataset S, a hypothesis ˆ set (F, θ) is the space F, and a loss function φ, the Rashomon set R subspace of F: ˆ set (F, θ) := {f ∈ F : L(f ˆ ) ≤ L( ˆ fˆ) + θ}, R
(4.32)
where fˆ is an empirical risk minimizer for the training dataset S ˆ ). w.r.t. the loss function φ : fˆ ∈ arg minf ∈F L(f The true Rashomon set is the set of models with low true loss: Rset (F, γ) := {f ∈ F : L(f ) ≤ L(f ∗ ) + γ},
(4.33)
where f ∗ is a true risk minimizer. A simple example of the Rashomon set can be found in Figure 4.9. Given two finite hypothesis spaces: F1 denoting the simple models and F2 denoting all models, F1 ⊂ F2 . That is, F1 is uniformly drawn from F2 without replacement. We want the best true risk of F2 to be close to the best empirical risk of F1 . How do we define the “closeness” here? To spare you the details of the heavy theories and proof,4 the answer goes to the Rashomon ratio, which describes the 3
A set of models that all perform roughly equally well is referred to as a Rashomon set. 4 Interested readers please refer to (Semenova et al., 2019).
154
Socially Responsible AI: Theories and Practices
Fig. 4.9: A simple example of a Rashomon set in two-dimensional hypothesis ˆ set (F, θ). space F. Models below the line are in the Rashomon set R
fraction of models that are good, e.g., Rset|F(F2 |2 ,γ) . It takes input of the Rashomon set and outputs a numerical value between 0 and 1. It is also the model’s simplicity measure. So, if Rset (F, γ) is sufficiently large, then the Rashomon ratio is sufficiently large, and the high chance is that the best true risk of F2 is close to the best empirical risk of F1 and a simple-yetaccurate model is likely to exist. “Sufficiently” here is defined by a small constant and the Rashomon parameter θ. While we mostly talked about the simplicity of the model, as simplicity is directly related to model interpretability, the broad result in this section is that the utility–interpretability trade-off is not necessary, especially when the Rashomon set is large. In the possible scenario where all of the machine learning models perform similarly well on the validation set, it is likely to have a large Rashomon set and the trade-off likely does not exist. In a separate work studying the interpretability of the COMPAS algorithm (Angelino et al., 2017), it was demonstrated that it is possible to construct optimal sparse rule lists that are completely interpretable without sacrificing accuracy. 4.3.3.
The Privacy–Utility Trade-off
It appears very intuitive that we have to sacrifice utility for privacy: the more data you anonymize, the less value the data will have; the “further away” from reality these anonymized data become, the less useful it is for personalized analytics or model training. In the differential privacy algorithm introduced in Section 2.3, we also have
Challenges of Socially Responsible AI
155
a budget value for quantifying how willing we are to sacrifice some privacy for utility. So, when we gain privacy for individuals, we lose value for society as a whole. If we cannot entirely eliminate the tradeoff, can we “optimize” the trade-off so that we sacrifice as little utility as possible to get the maximal privacy? Or in which settings might we want to sacrifice some privacy to gain more utility? For example, in health recommender systems, users might disagree to share data for commercial purposes but show less concern when data are used for scientific purposes. Where we land on the spectrum between privacy and utility will vary depending on the privacy-preserving approaches, the nature of data, the utility measurement, and the privacy risks. This is nicely summarized by Mackey et al. (2016): “We can have secure houses or usable houses but not both. . .. An absolutely secure house would lack doors and windows and therefore be unusable. But that does not mean that all actions to make one’s house more secure are pointless, and nor does it mean that proportional efforts to secure my house are not a good idea. The deadbolt on my door may not help if a burglar comes armed with a battering ram or simply smashes my living room window but that does not mean that my lock is useless, merely that it does not (and cannot) provide absolute security”. Basically, what it suggests is that before we choose any privacypreserving approaches, we at least need to have a good understanding of the data and how to define utility gain and privacy loss. In Brickell and Shmatikov (2008), privacy loss of the published data is defined as the increase in the adversary’s ability to learn sensitive attributes regarding a given identity. The utility gain is defined as the increase in the accuracy of machine-learning tasks evaluated on the sanitized dataset. This direct comparison methodology certainly has flaws as it uses average privacy loss among all individuals while privacy is for every single user. So, it is problematic to compare the utility, an aggregate concept, with privacy, an individual concept. An alternative is to define privacy loss as the adversary’s knowledge gain about the sensitive values of specific individuals and define the utility loss as information loss about the sensitive values of large populations (Li and Li, 2009). What it argues is that specific information about a small group of individuals has a greater influence on privacy while aggregate information about a large group of individuals has a greater influence on utility. Under this evaluation framework, it is surmised
156
Socially Responsible AI: Theories and Practices
that a more sophisticated privacy-preserving approach can improve data utility and thus improve the quality of the privacy–utility tradeoff. But is it possible that utility and privacy are not antagonists? Suppose that a recommender system suggests the phenomenal TV series “Game of Thrones”, you struggle because if you click it, the recommender will better understand your need and recommend more similar videos; but you also disclose your private information to the system. At first glance, this is a trade-off between privacy and utility. However, assuming that a large number of users have watched “Game of Thrones”, your click actually makes you less distinguishable among other users compared to before you click it. So, there is no privacy– utility trade-off in this case. If you instead click on an esoteric movie, more of your private information will then be revealed. This again implies the importance of defining privacy and utility when we talk about the trade-off. In this example, the utility is defined based on the commonality of a user profile, e.g., how close a user’s preference is to that of others; specifically, the change of the user’s commonality before and after the click (Guerraoui et al., 2017). Formally, we first define the concepts of popularity and preferability of items: NLike (i) + NDislike (i) , (4.34) N where NLike (i) and NDislike(i) denote the number of users who like and dislike item i, respectively. Popularity(i) =
NLike (i) − NDislike (i) . (4.35) N The commonality of a user u can be computed as Commonality(u) = Popularity(i) · Preferability(i) · e(u, i), Preferability(i) =
i
(4.36) where
⎧ ⎪ u has clicked on i and likes it, ⎪1 ⎨ e(u, i) = 0 u has not clicked on i, ⎪ ⎪ ⎩−1 u has clicked on i and dislikes it.
Challenges of Socially Responsible AI
157
The utility of a click by user u is the change of Commonality(u) before and after she clicks the item: Utility(u) = Commonality (u) − Commonality(u).
(4.37)
Privacy is quantified by the disclosure degree of a user profile. Given a recommender dataset E, the privacy effect of a click by user u is δu = − log(P (Ei = e(u, i))). (4.38) i
The disclosure risk of the click is then defined as the difference of the disclosure degree before and after the click: Δδu = δu − δu .
(4.39)
So, a click with a positive disclosure risk compromises user privacy. The “safe zone” (i.e., no trade-off) is when Δδu < 0 and U tility(u) > 0. 4.3.4.
Trade-offs among Fairness, Interpretability, and Privacy
In a more challenging setting, we might want to achieve multiple responsible AI principles simultaneously, such as fairness and interpretability, rather than consider each of them independently. Will there be a trade-off among these principles then? Recent findings (while few) in the field seem to suggest “Yes”. For example, if a simple classifier is selected to improve interpretability, we might compromise on fairness and accuracy which are otherwise improved by a more complex classifier. This trade-off can present different trends depending on the correlations between sensitive and nonsensitive features, and class labels (Jabbari et al., 2020). While fairness research was only sparked after privacy research, and was mostly led by the same group of researchers working on privacy (e.g., Cynthia Dwork), an incompatibility theorem was proved between fairness and privacy: differential privacy and fairness are at odds with each other when considering a learning algorithm with non-trivial accuracy. For example, in a simple binary classification setting, any learning algorithm that is (, 0)-differentially private, and (even approximately) fair, cannot
158
Socially Responsible AI: Theories and Practices
outperform a constant classifier regarding accuracy (Agarwal, 2020). Since this line of research is still at a very early stage, we spare the details of these works. These interesting results, however, open exciting directions for future work on the intersections among the principles of socially responsible AI. 4.4. 4.4.1.
Concluding Remarks Summary
This chapter discussed three primary challenges of socially responsible AI: (1) causality as the key to tackling many of the challenging problems in machine learning and AI, such as fairness and interpretability; (2) AI is a socio-technical system and context is an indispensable element to achieving socially responsible AI; and (3) the trade-offs between utility and responsible AI principles (e.g., fairness, transparency, and reliability) or trade-offs among different principles are very complex issues and should not be taken as given. The discussion of the above three challenges paves the way for us to look into other less explored yet important problems such as responsible model release and governance, development of AI ethics principles and policies, and gaps in responsible AI in industry. Responsible model release and governance has been receiving growing attention from both industry and academia. It can bring together the tools, solutions, practices, and people to govern the built AI systems across their life cycles. At this stage, some research results suggest that released models be accompanied by documentation detailing various characteristics of the systems, e.g., what it does, how it works, and why it matters. Current AI principles and policies for ethical AI practice are still too vaguely formulated to be helpful in guiding practice, and are primarily defined by AI researchers and powerful people with mainstream populations in mind. It has been suggested to redefine AI principles based on philosophical theories in applied ethics and elicit the inputs and values from diverse voices. While seeing many potential benefits of developing responsible AI systems, such as increasing market share and long-term profitability, companies lack the knowledge of how to cross the “Responsible AI Gap” between principles and tangible actions. Tech companies need to examine every aspect
Challenges of Socially Responsible AI
159
of the end-to-end AI systems and achieve a transformational impact on society. 4.4.2.
Additional Readings
• Makhlouf, K., Zhioua, S., and Palamidessi, C. (2020). Survey on causal-based machine learning fairness notions. arXiv preprint arXiv:2010.09553. • Chen, J., Dong, H., Wang, X., Feng, F., Wang, M., and He, X. (2020). Bias and debias in recommender system: A survey and future directions. arXiv preprint arXiv:2010.03240. • Kleinberg, J. (2018, June). Inherent trade-offs in algorithmic fairness. In Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, pp. 40–40. • Friedler, S. A., Scheidegger, C., and Venkatasubramanian, S. (2021). The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM, 64(4), 136–143. • Dutta, S., Wei, D., Yueksel, H., Chen, P. Y., Liu, S., and Varshney, K. (2020, November). Is there a trade-off between fairness and accuracy? A perspective using mismatched hypothesis testing. In International Conference on Machine Learning, pp. 2803–2813, PMLR. • Semenova, L., Rudin, C., and Parr, R. (2022, June). On the existence of simpler Machine Learning models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1827–1858. • Salamatian, S., Calmon, F. P., Fawaz, N., Makhdoumi, A., and M´edard, M. (2020). Privacy-utility tradeoff and privacy funnel. Unpublished preprint. • Agarwal, S. (2020). Trade-offs between fairness, interpretability, and privacy in machine learning (Master’s thesis, University of Waterloo).
This page intentionally left blank
Bibliography
Abdalla, M. and Abdalla, M. (2021). The grey hoodie project: Big tobacco, big tech, and the threat on academic integrity, in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 287–297. Agarwal, A., Zaitsev, I., Wang, X., Li, C., Najork, M., and Joachims, T. (2019). Estimating position bias without intrusive interventions, in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 474–482. Agarwal, S. (2020). Trade-offs between Fairness, Interpretability, and Privacy in Machine Learning, Master’s thesis, University of Waterloo. Al-Qurishi, M., Al-Rakhami, M., Alamri, A., Alrubaian, M., Rahman, S. M. M., and Hossain, M. S. (2017). Sybil defense techniques in online social networks: a survey, IEEE Access 5, pp. 1200–1219. Al-Rubaie, M. and Chang, J. M. (2019). Privacy-preserving machine learning: Threats and solutions, IEEE Security & Privacy 17, 2, pp. 49–58. Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., and Rudin, C. (2017). Learning certifiably optimal rule lists for categorical data, arXiv preprint arXiv:1704.01701. Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. (2019). Invariant risk minimization, arXiv preprint arXiv:1907.02893 . Bach, S., Binder, A., Montavon, G., Klauschen, F., M¨ uller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear 161
162
Socially Responsible AI: Theories and Practices
classifier decisions by layer-wise relevance propagation, PloS one 10, 7, p. e0130140. Backstrom, L., Dwork, C., and Kleinberg, J. (2007). Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography, in Proceedings of the 16th International Conference on World Wide Web, pp. 181–190. Baeza-Yates, R. (2018). Bias on the web, Communications of the ACM 61, 6, pp. 54–61. Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473. Beale, N., Battey, H., Davison, A. C., and MacKay, R. S. (2019). An unethical optimization principle, arXiv preprint arXiv:1911.05116. Beaudouin, V., Bloch, I., Bounie, D., Cl´emen¸con, S., d’Alch´e Buc, F., Eagan, J., Maxwell, W., Mozharovskyi, P., and Parekh, J. (2020). Flexible and context-specific ai explainability: A multidisciplinary approach, arXiv preprint arXiv:2003.07703. Beigi, G., Shu, K., Zhang, Y., and Liu, H. (2018). Securing social media user data: An adversarial approach, in Proceedings of the 29th on Hypertext and Social Media, pp. 165–173. Bickel, P. J., Hammel, E. A., and O’Connell, J. W. (1975). Sex bias in graduate admissions: Data from berkeley: Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. Science 187, 4175, pp. 398–404. Blyth, C. R. (1972). On Simpson’s paradox and the sure-thing principle, Journal of the American Statistical Association 67, 338, pp. 364–366. Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., and Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings, Advances in Neural Information Processing Systems 29. Boulemtafes, A., Derhab, A., and Challal, Y. (2020). A review of privacy-preserving techniques for deep learning, Neurocomputing 384, pp. 21–45. Breiman, L. (2001). Random forests, Machine Learning 45, 1, pp. 5–32. Brickell, J. and Shmatikov, V. (2008). The cost of privacy: destruction of data-mining utility in anonymized data publishing, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 70–78.
Bibliography
163
´ Arrate, A., and Cuevas, R. (2020). Does Caba˜ nas, J. G., Cuevas, A., facebook use sensitive data for advertising purposes? Communications of the ACM 64, 1, pp. 62–69. Cappelen, A. W., Konow, J., Sørensen, E. Ø., and Tungodden, B. (2013). Just luck: An experimental study of risk-taking and fairness, American Economic Review 103, 4, pp. 1398–1413. Carey, A. N. and Wu, X. (2022). The causal fairness field guide: Perspectives from social and formal sciences, Frontiers in Big Data 5, p. 892837. Carroll, A. B. et al. (1991). The pyramid of corporate social responsibility: Toward the moral management of organizational stakeholders, Business Horizons 34, 4, pp. 39–48. Carvalho, D. V., Pereira, E. M., and Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics, Electronics 8, 8, p. 832. Caton, S. and Haas, C. (2020). Fairness in machine learning: A survey, arXiv preprint arXiv:2010.04053. Chattopadhyay, A., Manupriya, P., Sarkar, A., and Balasubramanian, V. N. (2019). Neural network attributions: A causal perspective, in ICML (PMLR), pp. 981–990. Chen, R. T., Li, X., Grosse, R. B., and Duvenaud, D. K. (2018). Isolating sources of disentanglement in variational autoencoders, Advances in Neural Information Processing Systems 31. Chen, S. and Zhou, S. (2013). Recursive mechanism: towards node differential privacy and unrestricted joins, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 653–664. Cheng, L., Guo, R., Shu, K., and Liu, H. (2021a). Causal understanding of fake news dissemination on social media, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 148–157. Cheng, L., Guo, R., Silva, Y. N., Hall, D., and Liu, H. (2021b). Modeling temporal patterns of cyberbullying detection with hierarchical attention networks, ACM/IMS Transactions on Data Science 2, 2, pp. 1–23. Cheng, L., Mosallanezhad, A., Silva, Y., Hall, D., and Liu, H. (2021c). Mitigating bias in session-based cyberbullying detection: A non-compromising approach, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the
164
Socially Responsible AI: Theories and Practices
11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2158–2168. Cheng, L., Varshney, K. R., and Liu, H. (2021d). Socially responsible ai algorithms: Issues, purposes, and challenges, Journal of Artificial Intelligence Research 71, pp. 1137–1181. Cheng, L., Guo, R., Silva, Y., Hall, D., and Liu, H. (2019a). Hierarchical attention networks for cyberbullying detection on the instagram social network, in Proceedings of the 2019 SIAM International Conference on Data Mining (SIAM), pp. 235–243. Cheng, L., Li, J., Silva, Y. N., Hall, D. L., and Liu, H. (2019b). Xbully: Cyberbullying detection within a multi-modal context, in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 339–347. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., and Crawford, K. (2021). Datasheets for datasets, Communications of the ACM 64, 12, pp. 86–92. Cheng, L., Mosallanezhad, A., Silva, Y. N., Hall, D., and Liu, H. (2022). Bias mitigation for toxicity detection via sequential decisions, in The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Madrid, Spain, 2022. Clauset, A., Moore, C., and Newman, M. E. (2008). Hierarchical structure and the prediction of missing links in networks, Nature 453, 7191, pp. 98–101. Cohen, G. A. (1989). On the currency of egalitarian justice, Ethics 99, 4, pp. 906–944. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race, in Proceedings of the 26th International Conference on World Wide Web Companion, pp. 963–972. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition (Ieee), pp. 248–255. Dignum, V. (2019). Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way (Springer Nature). Dinakar, K., Jones, B., Havasi, C., Lieberman, H., and Picard, R. (2012). Common sense reasoning for detection, prevention, and
Bibliography
165
mitigation of cyberbullying, ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3, p. 18. Du, M., Liu, N., and Hu, X. (2019). Techniques for interpretable machine learning, Communications of the ACM 63, 1, pp. 68–77. Dutta, S., Wei, D., Yueksel, H., Chen, P.-Y., Liu, S., and Varshney, K. (2020). Is there a trade-off between fairness and accuracy? A perspective using mismatched hypothesis testing, in International Conference on Machine Learning (PMLR), pp. 2803–2813. Dwork, C. (2008). Differential privacy: A survey of results, in International Conference on Theory and Applications of Models of Computation (Springer), pp. 1–19. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness through awareness, in Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226. Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis, in Theory of Cryptography Conference (Springer), pp. 265–284. Dworkin, R. (2002). Sovereign virtue: The Theory and Practice of Equality (Harvard University Press). Ensign, D., Friedler, S. A., Neville, S., Scheidegger, C., and Venkatasubramanian, S. (2018). Runaway feedback loops in predictive policing, in Conference on Fairness, Accountability and Transparency (PMLR), pp. 160–171. Fang, Z., Agarwal, A., and Joachims, T. (2019). Intervention harvesting for context-dependent examination-bias estimation, in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 825–834. Feige, I. (2019). https://faculty.ai/blog/what-is-ai-safety/ Finlayson, S. G., Subbaswamy, A., Singh, K., Bowers, J., Kupke, A., Zittrain, J., Kohane, I. S., and Saria, S. (2021). The clinician and dataset shift in artificial intelligence, The New England Journal of Medicine 385, 3, p. 283. Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research 20, 177, pp. 1–81. Fong, R. C. and Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation, in Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437.
166
Socially Responsible AI: Theories and Practices
Ford, M. (2015). Rise of the Robots: Technology and the Threat of a Jobless Future (Basic Books). Freitas, A. A. (2014). Comprehensible classification models: A position paper, ACM SIGKDD Explorations Newsletter 15, 1, pp. 1–10. Friedler, S. A., Scheidegger, C., and Venkatasubramanian, S. (2021). The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making, Communications of the ACM 64, 4, pp. 136–143. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine, Annals of Statistics, pp. 1189–1232. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2016). Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17, 1, pp. 2096–2030. Gershgorn, D. (2019). https://www.popsci.com/nsas-skynet-mightnot-be-able-to-tell-what-makes-terrorist/ Getoor, L. (2019). Responsible data science, in Big Data (IEEE), pp. 1–1. Geyik, S. C., Ambler, S., and Kenthapadi, K. (2019). Fairness-aware ranking in search & recommendation systems with application to linkedin talent search, in KDD, pp. 2221–2231. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets, Advances in Neural Information Processing Systems 27. Grgic-Hlaca, N., Zafar, M. B., Gummadi, K. P., and Weller, A. (2016). The case for process fairness in learning: Feature selection for fair decision making, in NIPS Symposium on Machine Learning and the Law, Vol. 1, p. 2. Guerraoui, R., Kermarrec, A.-M., and Taziki, M. (2017). The utility and privacy effects of a click, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665–674. Gulrajani, I. and Lopez-Paz, D. (2020). In search of lost domain generalization, arXiv preprint arXiv:2007.01434. Guo, R., Cheng, L., Li, J., Hahn, P. R., and Liu, H. (2020). A survey of learning causality with data: Problems and methods, ACM Computing Surveys (CSUR) 53, 4, pp. 1–37.
Bibliography
167
Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning, Advances in Neural Information Processing Systems 29. Hartigan, J. A. and Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1, pp. 100–108. Hay, M., Rastogi, V., Miklau, G., and Suciu, D. (2009). Boosting the accuracy of differentially-private histograms through consistency, arXiv preprint arXiv:0904.0942. Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. (2015). Teaching machines to read and comprehend, Advances in Neural Information Processing Systems 28. Hosseinmardi, H., Mattson, S. A., Rafiq, R. I., Han, R., Lv, Q., and Mishra, S. (2015). Detection of cyberbullying incidents on the instagram social network, arXiv preprint arXiv:1503.03909. Hu, H., Cheng, L., Vap, J. P., and Borowczak, M. (2022). Learning privacy-preserving graph convolutional network with partially observed sensitive attributes, in Proceedings of the ACM Web Conference 2022, pp. 3552–3561. Jabbari, S., Ou, H.-C., Lakkaraju, H., and Tambe, M. (2020). An empirical study of the trade-offs between interpretability and fairness, in ICML 2020 Workshop on Human Interpretability in Machine Learning. Ji, S., Li, W., Srivatsa, M., and Beyah, R. (2014). Structural data de-anonymization: Quantification, practice, and implications, in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1040–1053. Joachims, T., Swaminathan, A., and Schnabel, T. (2017). Unbiased learning-to-rank with biased feedback, in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 781–789. Joachims, T., London, B., Su, Y., Swaminathan, A., and Wang, L. (2021). Recommendations as treatments, AI Magazine 42, 3, pp. 19–30. K´ ad´ ar, A., Chrupala, G., and Alishahi, A. (2017). Representation of linguistic form and function in recurrent neural networks, Computational Linguistics 43, 4, pp. 761–780. Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer, in
168
Socially Responsible AI: Theories and Practices
Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer), pp. 35–50. Karpathy, A., Johnson, J., and Fei-Fei, L. (2015). Visualizing and understanding recurrent networks, arXiv preprint arXiv:1506.02078. Karwa, V., Raskhodnikova, S., Smith, A., and Yaroslavtsev, G. (2011). Private analysis of graph structure, Proceedings of the VLDB Endowment 4, 11, pp. 1146–1157. Kasiviswanathan, S. P., Nissim, K., Raskhodnikova, S., and Smith, A. (2013). Analyzing graphs with node differential privacy, in Theory of Cryptography Conference (Springer), pp. 457–476. Kaushik, D., Hovy, E., and Lipton, Z. (2019). Learning the difference that makes a difference with counterfactually-augmented data, in International Conference on Learning Representations. Khademi, A., Lee, S., Foley, D., and Honavar, V. (2019). Fairness in algorithmic decision making: An excursion through the lens of causality, in The World Wide Web Conference, pp. 2907– 2914. Kifer, D. and Machanavajjhala, A. (2011). No free lunch in data privacy, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 193–204. Kim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability, Advances in Neural Information Processing Systems 29. Kim, J. Y., Ortiz, C., Nam, S., Santiago, S., and Datta, V. (2020). Intersectional bias in hate speech and abusive language datasets, in ICWSM 2020 Data Challenge Workshop. Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907. Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores, arXiv preprint arXiv:1609.05807. Kudugunta, S. and Ferrara, E. (2018). Deep neural networks for bot detection, Information Sciences 467, pp. 312–322. Kusner, M. J., Loftus, J., Russell, C., and Silva, R. (2017). Counterfactual fairness, Advances in Neural Information Processing Systems 30.
Bibliography
169
Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T. M. (2018). Learning to generalize: Meta-learning for domain generalization, in ThirtySecond AAAI Conference on Artificial Intelligence. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., and Liu, H. (2017). Feature selection: A data perspective, CSUR 50, 6, pp. 1–45. Li, N., Li, T., and Venkatasubramanian, S. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity, in 2007 IEEE 23rd International Conference on Data Engineering (IEEE), pp. 106–115. Li, T. and Li, N. (2009). On the tradeoff between privacy and utility in data publishing, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526. Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2020). Explainable ai: A review of machine learning interpretability methods, Entropy 23, 1, p. 18. Lipton, Z., Wang, Y.-X., and Smola, A. (2018). Detecting and correcting for label shift with black box predictors, in International Conference on Machine Learning (PMLR), pp. 3122–3130. Liu, C., Chakraborty, S., and Mittal, P. (2016). Dependence makes you vulnerable: Differential privacy under dependent tuples. in NDSS, Vol. 16, pp. 21–24. Liu, H., Wang, Y., Fan, W., Liu, X., Li, Y., Jain, S., Liu, Y., Jain, A. K., and Tang, J. (2021). Trustworthy AI: A computational perspective, arXiv preprint arXiv:2107.06641. Liu, K. and Terzi, E. (2008). Towards identity anonymization on graphs, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 93–106. Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. (2018). Delayed impact of fair machine learning, International Conference on Machine Learning (PMLR), pp. 3150–3158. Liu, Y. and Lapata, M. (2018). Learning structured text representations, Transactions of the Association for Computational Linguistics 6, pp. 63–75. Loftus, J. R., Russell, C., Kusner, M. J., and Silva, R. (2018). Causal reasoning for algorithmic fairness, arXiv preprint arXiv:1805.05859. Lu, K., Mardziel, P., Wu, F., Amancharla, P., and Datta, A. (2020). Gender bias in neural natural language processing, in Logic, Language, and Security (Springer), pp. 189–202.
170
Socially Responsible AI: Theories and Practices
Luong, M.-T., Pham, H., and Manning, C. D. (2015). Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025. Maaten, L. v. d. and Hinton, G. (2008). Visualizing data using t-sne, JMLR 9, Nov, pp. 2579–2605. Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity, ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1, pp. 3–es. Mackey, E., Elliot, M., and O’Hara, K. (2016). The Anonymisation Decision-Making Framework (UKAN Publications). Makhlouf, K., Zhioua, S., and Palamidessi, C. (2020). Survey on causal-based machine learning fairness notions, arXiv preprint arXiv:2010.09553. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning, ACM Computing Surveys (CSUR), 54, 6, pp. 1–35. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63, 2, p. 81. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences, Artificial Intelligence, 267, pp. 1–38. Miller, C. and Coldicutt, R. (2019). People, power and technology: The tech workers’ view, https://doteveryone.org.uk/wp-content/ uploads/2019/04/PeoplePowerTech Doteveryone May2019.pdf Mir, D. and Wright, R. N. (2012). A differentially private estimator for the stochastic kronecker graph model, in Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 167–176. Molnar, C. (2020). Interpretable Machine Learning (Lulu.com). Muandet, K., Balduzzi, D., and Sch¨olkopf, B. (2013). Domain generalization via invariant feature representation, in International Conference on Machine Learning (PMLR), pp. 10–18. Narayanan, A. and Shmatikov, V. (2009). De-anonymizing social networks, in 2009 30th IEEE Symposium on Security and Privacy (IEEE), pp. 173–187. Newman, M. (2018). Networks (Oxford University Press). Nielsen, A. (2020). Practical Fairness (O’Reilly Media). Nissim, K., Raskhodnikova, S., and Smith, A. (2007). Smooth sensitivity and sampling in private data analysis, in Proceedings of the
Bibliography
171
Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. Olah, C., Mordvintsev, A., and Schubert, L. (2017). Feature visualization, Distill 2, 11, p. e7. Pearl, J. (1993). [Bayesian analysis in expert systems]: Comment: graphical models, causality and intervention, Statistical Science 8, 3, pp. 266–269. Pearl, J. (2009). Causality (Cambridge University Press). Pearl, J. (2018). Theoretical impediments to machine learning with seven sparks from the causal revolution, arXiv preprint arXiv:1801.04016. Pearl, J. (2022). Direct and indirect effects, in Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 373–392. Pearl, J. and Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect (Basic books). Pedarsani, P. and Grossglauser, M. (2011). On the privacy of anonymized networks, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1243. Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic inquiry and word count: LIWC. (Mahway: Lawrence Erlbaum Associates). Peters, J., B¨ uhlmann, P., and Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. arXiv preprint arXiv:1501.01332. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (1802). Deep contextualized word representations. arxiv 2018, arXiv preprint arXiv:1802.05365. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. Q. (2017). On fairness and calibration, Advances in Neural Information Processing Systems 30. Quinlan, J. R. (1987). Simplifying decision trees, International Journal of Man-Machine Studies 27, 3, pp. 221–234. Rahimian, H. and Mehrotra, S. (2019). Distributionally robust optimization: A review, arXiv preprint arXiv:1908.05659. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). “why should I trust you?” explaining the predictions of any classifier, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144.
172
Socially Responsible AI: Theories and Practices
Ribeiro, M. T., Singh, S., and Guestrin, C. (2018). Anchors: Highprecision model-agnostic explanations, in Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika 70, 1, pp. 41–55. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 5, p. 688. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1, 5, pp. 206–215. Samarati, P. and Sweeney, L. (1998). Generalizing data to provide anonymity when disclosing information, in PODS, Vol. 98, pp. 10– 1145. Schnabel, T., Swaminathan, A., Singh, A., Chandak, N., and Joachims, T. (2016). Recommendations as treatments: Debiasing learning and evaluation, in ICML. Sch¨ olkopf, B. (2022). Causality for machine learning, in Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 765–804. Schwab, K. (2021). ‘This is bigger than just timnit’: How google tried to silence a critic and ignited a movement, https://www.fast company.com/90608471/timnit-gebru-google-ai-ethics-equitabletech-movement Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., and Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems, in Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 59–68. Semenova, L., Rudin, C., and Parr, R. (2019). A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning, arXiv preprint arXiv:1908.01755. Serrano, S. and Smith, N. A. (2019). Is attention interpretable? arXiv preprint arXiv:1906.03731. Shu, K., Cui, L., Wang, S., Lee, D., and Liu, H. (2019). defend: Explainable fake news detection, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 395–405.
Bibliography
173
Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint arXiv:1312.6034. Singh, R., Vatsa, M., and Ratha, N. (2021). Trustworthy AI, in 8th ACM IKDD CODS and 26th COMAD, pp. 449–453. Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net, arXiv preprint arXiv:1412.6806. Srivatsa, M. and Hicks, M. (2012). Deanonymizing mobility traces: Using social network as a side-channel, in Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 628–637. S¨ uhr, T., Hilgard, S., and Lakkaraju, H. (2020). Does fair ranking improve minority outcomes? Understanding the interplay of human and algorithmic biases in online hiring. Sweeney, L. (2002). k-anonymity: A model for protecting privacy, International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems 10, 05, pp. 557–570. Talwar, S., Dhir, A., Kaur, P., Zafar, N., and Alrasheedy, M. (2019). Why do people share fake news? Associations between the dark side of social media use and fake news sharing behavior, JRCS 51, pp. 72–82. Tang, J. and Liu, H. (2015). Trust in social media, Synthesis Lectures on Information Security, Privacy, & Trust 10, 1, pp. 1–129. Tang, J., Qu, M., and Mei, Q. (2015). Pte: Predictive text embedding through large-scale heterogeneous text networks, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), pp. 1165–1174. Thiebes, S., Lins, S., and Sunyaev, A. (2020). Trustworthy artificial intelligence, Electronic Markets, pp. 1–18. Vandewiele, G., Janssens, O., Ongenae, F., De Turck, F., and Van Hoecke, S. (2016). Genesim: Genetic extraction of a single, interpretable model, arXiv preprint arXiv:1611.05722. Varshney, K. R. (2022). Trustworthy machine learning, Chappaqua, NY, USA: Independently Published. Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the gdpr, Harv. JL & Tech. 31, p. 841.
174
Socially Responsible AI: Theories and Practices
Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., and Yu, P. (2022). Generalizing to unseen domains: A survey on domain generalization, IEEE Transactions on Knowledge and Data Engineering. Wang, Y., Wu, X., and Wu, L. (2013). Differential privacy preserving spectral graph analysis, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer), pp. 329–340. Wiener, N. (1948). Cybernetics: or control and communication in the animal and the machine. Wikipedia (2021a). https://en.wikipedia.org/w/index.php?title=Ro bustness (computer science)&oldid=1009774103, page Version ID: 1009774103 Wikipedia (2021b). https://en.wikipedia.org/w/index.php?title=Fa cebook%E2%80%93Cambridge Analytica data scandal&oldid=10 35933869, page Version ID: 1035933869 Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis, Chemometrics and Intelligent Laboratory Systems 2, 1-3, pp. 37–52. Xiao, Q., Chen, R., and Tan, K.-L. (2014). Differentially private network data release via structural inference, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 911–920. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention, in International Conference on Machine learning (PMLR), pp. 2048–2057. Yang, G., Ye, X., Fang, X., Wu, R., and Wang, L. (2020). Associated attribute-aware differentially private data publishing via microaggregation, IEEE Access 8, pp. 79158–79168. Yeung, K., Howes, A., and Pogrebna, G. (2019). Ai governance by human rights-centred design, deliberation and oversight: An end to ethics washing, The Oxford Handbook of AI Ethics (Oxford University Press). Yuan, M., Chen, L., and Yu, P. S. (2010). Personalized privacy protection in social networks, Proceedings of the VLDB Endowment 4, 2, pp. 141–150. Zafarani, R., Abbasi, M. A., and Liu, H. (2014). Social Media Mining: An Introduction (Cambridge University Press).
Bibliography
175
Zhang, B. H., Lemoine, B., and Mitchell, M. (2018a). Mitigating unwanted biases with adversarial learning, in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340. Zhang, Q., Wu, Y. N., and Zhu, S.-C. (2018b). Interpretable convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8827–8836. Zhang, W., Tople, S., and Ohrimenko, O. (2020). Datasetlevel attribute leakage in collaborative learning, arXiv preprint arXiv:2006.07267. Zhao, Q. and Hastie, T. (2021). Causal interpretations of blackbox models, Journal of Business & Economic Statistics 39, 1, pp. 272–281. Zhou, B. and Pei, J. (2008). Preserving privacy in social networks against neighborhood attacks, in 2008 IEEE 24th International Conference on Data Engineering (IEEE), pp. 506–515.
This page intentionally left blank
Index
A activation maximization, 44 AI responsibility pyramid, 4 attributed modality hotspot, 94 average causal effect, 139 average treatment effect, 126
counterfactual data augmentation, 136 counterfactual explanation, 139 counterfactual fairness, 131 covariate shift, 68 cyberbullying, 90
B back-door path, 125 background comparison metric, 144 Bayes risk, 77
D data privacy, 46 defining set, 114 dendrogram, 64 differential privacy, 51 -dependent differential privacy, 53 dependent differential privacy, 52 global sensitivity, 51 local sensitivity, 63 directed acyclic graphs, 67 distributionally robust optimization, 81 domain adaptation, 70 source domain, 70 target domain, 70 domain generalization, 70
C causal attribution, 138 causal discovery, 124 causal effect estimation, 126 causal features, 82 causal inference, 124 causal model, 124 causal path, 67 causality, 124 identification, 124 consistency, 127 cause, 67 compound shift, 68 conditional average treatment effect, 126 conditional shift, 68 confounder, 107 counterfactual, 126
E explainability, 31 explanation, 31 explicit identifier, 47 177
178
Socially Responsible AI: Theories and Practices
F
H
fairness, 16 group fairness, 17 calibration, 19 demographic/statistical parity, 17 disparate impact, 17 equal opportunity, 19 equalized odds, 17 independence, 17 separation, 18 sufficiency, 19 in-processing approaches, 24 adversarial debiasing, 25 prejudice remover, 24 individual fairness, 20 counterfactual fairness, 21 fairness through unawareness, 21 post-processing approaches, 27 pre-processing approaches, 22 data augmentation, 23 label altering, 23 fake news, 104 false negative equality difference, 118 false positive equality difference, 118 false positive rate, 18
hierarchical random graph, 64
G gendered words, 115 generative adversarial networks, 25 equilibrium, 75 graph anonymization, 59 k-anonymity, 60 differential privacy, 62 edge-differential privacy, 63 node-differential privacy, 64 graph de-anonymization, 53 seed-based de-anonymization, 55 seed-free de-anonymization, 57 social graph de-anonymization attack, 55 social media adversarial attack, 58
I ignorability, see also unconfoundedness, 128 implicit feedback, 109 individual responsibility, 4 individual treatment effect, 127 informing dimension, 103 interpretability, 30 in-model interpretability, 33 intrinsic interpretability, see also in-model interpretability, 33 attention, 34 decision trees, 36 disentangled representation, 37 intrinsic global interpretability, 36 intrinsic local interpretability, 34 logistic regression, 36 post-hoc global interpretability, 40 feature visualization, 44 model-agnostic, 41 model-specific, 42 post-hoc local interpretability, 37 back-propagation-based, 39 local interpretable model-agnostic explanation (LIME), 38 mask-perturbation-based, 40 model agnostic, 38 model-specific, 39 pertubation-based interpretability, 39 pre-model interpretability, 33 interpretation, 31 intervention, 124 interventional distribution, 125 interventional fairness, 129 invariant prediction, 82 inverse propensity scoring, 134
Index
179
L
Q
label shift, 67 Lipschitz constant, 65
quasi-identifier, 47
M maximum mean discrepancy, 73 median absolute deviation, 140 mediator, 125 multi-modal cyberbullying detection, 92 N natural direct discrimination, 130 natural indirect discrimination, 130 node kernel, 95 non-gendered words, 115 P pairwise comparison metric, 144 partial dependence plots, 140 partial effect, 140 path-specific causal fairness, 131 path-specific counterfactual fairness, 132 path-specific effect, 130 potential outcome, 126 privacy-preserving data mining, 47 privacy-preserving data publishing, 47 propensity score, 133 protecting dimension, 89
R Rashomon ratio, 153 Rashomon set, 153 S sample selection bias, see also covariate shift, 71 sensitive attribute, 47 sensitive trigger, 116 sequential Markov decision process (MDP), 119 social responsibility, 4 socially responsible AI, 4 socially responsible AI algorithms, 7 stable unit treatment value assumption (SUTVA), 127 strong ignorability, 128 structural causal model, 124 structural equation model, 83 syntactic anonymity, 47 k-anonymity, 47 l-anonymity, 49 t-closeness, 49 T total causal fairness, 129 true positive rate, 18 U unconfoundedness, 83