Sociological Foundations of Computational Social Science (Translational Systems Sciences, 40) [2024 ed.] 9819994314, 9789819994311

This book provides solid sociological foundations to computational social science (CSS). CSS is an emerging research fie

117 78 2MB

English Pages 133 [130] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Acknowledgement
Contents
Chapter 1: Introduction
1.1 Backdrop and Purpose of the Book
1.2 Organization of the Book
References
Chapter 2: Sociological Foundations of Computational Social Science
2.1 Introduction: Computational Social Science and Sociology
2.2 Strength and Problems of Computational Social Science
2.3 How Can We Fill the Gap?
2.3.1 Agent-Based Modeling and Interpretation
2.3.2 Digital Data Analysis and Sociology
2.4 Conclusions: Toward Productive Collaboration Between Sociology and Computational Social Science
References
Chapter 3: Methodological Contributions of Computational Social Science to Sociology
3.1 Introduction
3.2 Machine Learning
3.2.1 The Culture and Logic of Machine Learning
3.2.2 Definition of Machine Learning
3.2.3 Prediction and Supervised Machine Learning
3.2.3.1 Sample Splitting
3.2.3.2 Bias-Variance Tradeoff
3.2.3.3 Three-Way Sample Splitting and K-Fold Cross-Validation
3.2.3.4 Regularization
3.2.4 Measurement, Discovery, and Unsupervised Machine Learning
3.3 Potential of Machine Learning
3.3.1 Breaking Away from the Deductive Model
3.3.2 The Machine Learning Paradigm as a Normative Epistemic Model
3.3.3 Machine Learning Model as a Human Decision and Cognitive Model
3.3.4 Challenges for Machine Learning
3.4 Conventional Methods of Statistical Analysis in Sociology
3.4.1 Aims of Quantitative Sociological Studies
3.4.2 Description
3.4.3 Prediction
3.4.4 Causal Inference
3.4.5 Problems of the Conventional Regression Model
3.5 Machine Learning in Sociology
3.5.1 Application of the Prediction Framework
3.5.2 Automatic Coding
3.6 Toward Further Applications in Sociology
3.6.1 Heterogeneity of Causal Effects
3.6.2 Scientific Regret Minimization Method
3.7 Conclusion
References
Chapter 4: Computational Social Science: A Complex Contagion
4.1 The First Wave
4.2 Agent-Based Modeling
4.3 Social Contagion
4.4 From Latte Liberals to Naked Emperors
4.5 The Second Wave
4.6 Structural Holes and Network Wormholes
4.7 Conclusion
References
Chapter 5: Model of Meaning
5.1 Introduction
5.2 Theories of Meaning in Sociology
5.2.1 Weber´s Social Action
5.2.2 Schutz´s Phenomenological Sociology
5.2.3 Bourdieu´s Cognitive Sociology
5.2.4 White and DiMaggio´s Sociology of Culture
5.2.5 In Summary: Semantic Models in Cultural Sociology Today
5.3 Preparatory Considerations on the Semantic Model
5.3.1 Function of Meaning: Prediction
5.3.2 Relationality of Meaning
5.3.3 Semantic Learning
5.4 Computational Linguistics Model
5.4.1 Topic Model
5.4.2 Word-Embedding Model
5.4.3 Topic Models and Word-Embedding Models
5.5 Language Model and Explanation of Actions
5.6 Conclusion
References
Chapter 6: Sociological Meaning of Contagion
6.1 Contagion as a Main Theme in Sociology
6.2 Complex Contagion
6.3 Role of Meaning and Interpretation in the Contagion Process
6.4 Big Data Analysis of Diffusions, Interpretation, and Meaning
6.5 Conclusion
References
Chapter 7: Polarization of Opinion
7.1 Background
7.1.1 The Concept of Opinion Polarization
7.2 The Mechanism of Opinion Polarization
7.2.1 Computational Social Science and Opinion Polarization
7.3 New Methodology and Data for Opinion Polarization Research
7.3.1 Network Analysis
7.3.2 Natural Language Processing
7.4 Digital Experiment
7.5 Discussion
References
Chapter 8: Coda
8.1 Revisiting the Relationship Between Computational Social Science and Sociology
8.2 Beyond the Deductive Approach
8.3 A New Way of Using Computational Linguistic Models in Sociology
8.4 What Is the Next Step?
References
Recommend Papers

Sociological Foundations of Computational Social Science (Translational Systems Sciences, 40) [2024 ed.]
 9819994314, 9789819994311

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Translational Systems Sciences  40

Yoshimichi Sato Hiroki Takikawa   Editors

Sociological Foundations of Computational Social Science

Translational Systems Sciences Volume 40

Editor-in-Chief Kyoichi Kijima, School of Business Management, Bandung Institute of Technology, Tokyo, Japan Hiroshi Deguchi, Faculty of Commerce and Economics, Chiba University of Commerce, Tokyo, Japan

Editors in Chief

*** • Kyoichi Kijima (Bandung institute of Technology) • Hiroshi Deguchi (Chiba University of Commerce) Editorial Board • • • • • • • • • • • • • • •

Shingo Takahashi (Waseda University) Hajime Kita (Kyoto University) Toshiyuki Kaneda (Nagoya Institute of Technology) Akira Tokuyasu (Hosei University) Koichiro Hioki (Shujitsu University) Yuji Aruka (Chuo University) Kenneth Bausch (Institute for 21st Century Agoras) Jim Spohrer (IBM Almaden Research Center) Wolfgang Hofkirchner (Vienna University of Technology) John Pourdehnad (University of Pennsylvania) Mike C. Jackson (University of Hull) Gary S. Metcalf (InterConnections, LLC) Marja Toivonen (VTT Technical Research Centre of Finland) Sachihiko Harashina(Chiba University of Commerce) Keiko Yamaki(Shujitsu University)

Yoshimichi Sato • Hiroki Takikawa Editors

Sociological Foundations of Computational Social Science

Editors Yoshimichi Sato Faculty of Humanities Kyoto University of Advanced Science Kyoto, Japan

Hiroki Takikawa Graduate School of Humanities and Sociology The University of Tokyo Bunkyo-ku, Tokyo, Japan

ISSN 2197-8832 ISSN 2197-8840 (electronic) Translational Systems Sciences ISBN 978-981-99-9431-1 ISBN 978-981-99-9432-8 (eBook) https://doi.org/10.1007/978-981-99-9432-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

Computational social science has not fully shown its power in sociology. This is the motivation for us to publish this book. It is true that intriguing articles using computational social science have been published in top journals of sociology, but it is another story whether computational social science has a dominant influence in sociology. To the best of our knowledge, it has not occupied a central position in sociology yet. Why not? We tried to answer this question in this book. Our answer is that computational social science has not attacked central issues in sociology, meaning and interpretation in particular. If it gave answers to research questions, which most sociologists deem important but conventional sociological methods such as statistical analysis of social survey data have been unable to answer, computational social science would be more influential in sociology. However, this has not happened yet. This issue stems from two reasons: First, computational social scientists are not necessarily familiar with important sociological concepts and theories, so they tend to begin with available digital data such as mobile data without seriously thinking how analyzing the data contributes to the advancement of the concepts and theories. Second, sociologists have not clearly formalized important concepts and theories so that computational social scientists would find it easy to connect them with their analysis and be able to substantively contribute to elaborating them. In a sense, computational social science and sociology are in an unhappy relation. If they efficiently collaborate with each other, sociology will reach a higher level. This book proposes ways to promote the collaboration. Especially, we focus on meaning and interpretation in several chapters and show how to incorporate them in analysis using computational social science. This is because, as abovementioned, they have been central issues in sociology. Thus, if they are properly incorporated in computational social scientific analysis, the result of the analysis makes a quantum leap in sociology, and sociologists realize the true power of computational social science. As a result, computational social science and sociology will have a happy marriage.

v

vi

Preface

We hope that readers of the book find it important to realize the collaboration between computational social science and sociology by ways proposed in the book for sociology to jump to a higher stage with the help of computational social science. Kyoto, Japan Bunkyo-ku, Tokyo, Japan

Yoshimichi Sato Hiroki Takikawa

Acknowledgement

This work was supported by JSPS KAKENHI Grand Number 21K18448. We appreciate the generous grant from Japan Society for the Promotion of Science. We also deeply appreciate Hiroshi Deguchi for his encouragement to publish this book.

vii

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshimichi Sato and Hiroki Takikawa

1

2

Sociological Foundations of Computational Social Science . . . . . . . . Yoshimichi Sato

11

3

Methodological Contributions of Computational Social Science to Sociology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroki Takikawa and Sho Fujihara

23

4

Computational Social Science: A Complex Contagion . . . . . . . . . . . . Michael W. Macy

53

5

Model of Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroki Takikawa and Atsushi Ueshima

65

6

Sociological Meaning of Contagion . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshimichi Sato

91

7

Polarization of Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa

8

Coda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Yoshimichi Sato and Hiroki Takikawa

ix

Chapter 1

Introduction Yoshimichi Sato and Hiroki Takikawa

1.1

Backdrop and Purpose of the Book

Computational social science opened a new door to advancing social scientific studies in two ways. First, social scientists became able to conduct rigorous thought experiments using the technique of agent-based modeling, one of the two main pillars of computational social science. Agent-based modeling creates many actors, who are called agents in the technique, and they are assumed to decide, act, interact with other agents, and learn from their own experience and vicariously (Macy & Willer, 2002). What is most important in agent-based modeling is that experimenters, that is model builders, do not order agents to behave in a certain way. Rather, agents voluntarily decide, act, interact, and learn. As a result of such voluntary behaviors, a social phenomenon emerges. What experimenters do is to set up an external environment in which agents behave and to observe which external environment creates which social phenomenon via interactions of agents. Because external environments and social phenomena exist at the macro-level and agents exist at the micro-level, agent-based modeling is an excellent tool by which social scientists precisely study the micro-macro linkage (Coleman, 1990). Macy and Sato (2002), for example, built an agent-based model to study why and how trust and cooperation propagate in society and the global market emerges. They assumed that the level of mobility of agents among local societies such as villages

Y. Sato (✉) Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan e-mail: [email protected] H. Takikawa Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_1

1

2

Y. Sato and H. Takikawa

has an inverted U-shaped effect on the emergence of trust, cooperation, and the global market. To check the validity of their assumption, they set up a model in which some of the agents are assumed to move among local societies, become newcomers, and interact with local people. Then, local people learn to trust and cooperate with the newcomers and decide to leave their local societies entering the global market. If the level of mobility is very low, local people do not have an opportunity to interact with newcomers and, therefore, cannot learn to trust and cooperate with them. If the level of mobility is very high, local societies become unstable because most of the agents leave their local societies and become newcomers, local people lose stable local societies in which they learn to trust and cooperate with newcomers. Only if the level of mobility is moderate, local people learn to trust and cooperate with newcomers and enter the global market, which leads to the propagation of trust and cooperation in society and the emergence of the global market. In their model, the external environment is the level of mobility of agents, and the social phenomena to be explained are the propagation of trust and cooperation and the emergence of the global market at the societal level. Agents decide whether to trust and cooperate with other agents and to enter the global market at the microlevel. Then, their trusting behavior, cooperating behavior, and entering the global market accumulate to the macro-level. As the result of the accumulation, trust and cooperation propagate at the macro-level, and the global market emerges. In this sense, Macy and Sato’s (2002) study is a textbook example of the micro–macro linkage. Big data (or digital data) analysis is the other main pillar of computational social science.1 This technique has radically changed the way social scientific studies are conducted. In conventional social scientific studies using empirical data, researchers conduct social surveys to collect data they need for their studies. Social surveys have various types: interview surveys, mail surveys, web surveys. What is common in any type is that researchers design questionnaires to collect necessary information on randomly selected respondents. For example, if a researcher wants to know the relationship between social class and life satisfaction, he or she includes questions on respondents’ occupation, education, income, and life satisfaction in the questionnaire and check whether positive relationship exists between class variables (occupation, education, and income) and life satisfaction using correlation analysis and cross-tabulation analysis. In a sense, social scientists actively collect necessary information on respondents in social surveys. In contrast, social scientists using big data collect information with limited availability in most cases.2 For example, if we use mobility data of smartphone users before and after a lockdown caused by the pandemic of COVID-19, we know how the lockdown changed mobility patters of the users. However, because we

1

Big data and digital trace data are used interchangeably in this chapter. Online experiments can collect necessary information on participants. Thus, the discussion here is not applicable to online experiments. 2

1

Introduction

3

cannot collect the users’ opinions about the lockdown from the mobility data, we do not know whether their attitudes toward the lockdown affect their mobility patterns. Furthermore, because we cannot collect information on social classes of the users from the mobility data, we do not know differential effects of the lockdown on the users by social class.3 This incompleteness of available data (Salganik, 2018: Chapter 2) is one of the serious problems of big data compared with social survey data. However, big data has advantages compensating for this disadvantage. As Salganik (2018: Chapter 2) points out, big data is literally big, always-on, and nonreactive. The bigness of big data has several advantages. Comparing with social survey data, the most important advantage is that results of analysis are robust and stable even if the data is divided into many categories because each category has enough samples for analysis. For example, when we create multiway tables using a social survey dataset, it often happens that many cells do not have enough samples for robust analysis. Big data analysis is exempt from this problem because we can collect as many samples as possible. Theoretically, of course, there is an upper limit of the size of big data, but our usual use of big data does not face this problem. Always-on is also a strong advantage of big data. When social scientists want to collect information on temporal change of characteristics of respondents such as income, marital status, and life satisfaction using social surveys, they conduct longitudinal data analysis. In the analysis they usually collect data on a regular base, for example, every month or every year. Thus, in a sense, social survey data is sporadically on, so we do not know temporal change of characteristics of respondent between two survey waves. In contrast, big data such as mobile data and Twitter data streams without break, so we can collect continuous time-series data on our target population. Nonreactivity of big data means that collecting big data does not affect behaviors of the target population. For example, even if they know that their mobile data is collected by their mobile phone company, they do not change their behaviors. They commute to their firms, go to restaurants and bars, and exercise at fitness clubs as usual. They tweet what they want to tweet. In contrast, respondents in social surveys tend to give socially desirable answers to questions asked by interviewers in an interview survey. For example, if a male respondent who supports sexual division of labor is asked whether he supports it or not by a female interviewer, he probably answers that he does not support it. This social desirability distorts distributions of variables used in the social survey, but big data does not suffer from this problem. So far, we observed advantageous characteristics of computational social science focusing on agent-based modeling and big data analysis. Because of the characteristics, computational social science has become influential and indispensable in social science including sociology. However, it is another story whether it has

3

The study of the spread of the corona virus by Chang et al. (2021) is innovative because they succeeded in combining mobility data and data on socioeconomic status and race to predict that the infection rates are higher among disadvantaged racial and socioeconomic groups.

4

Y. Sato and H. Takikawa

answered core research questions in sociology. We argue that it has not necessarily answered them yet because most of the studies using it do not properly include meaning and interpretation of actors in their analysis. Meaning and interpretation have been seriously studied by major sociologists since the early days of sociology. For example, Max Weber, a founding father of sociology, argued that social action is different from behavior (Weber, 1921–22). According to his conceptualization, a behavior becomes a social action if the former has a meaning. In other words, a behavior has to be interpreted and given a subjective meaning by the actor and other actors (and the observer, that is, a sociologist observing the behavior) in order to become a social action. Since Max Weber, many giants in sociology focused on meaning and interpretation as the main theme of their sociological studies. They are Alfred Schütz (1932), Peter Berger and Thomas Luckmann (1966), Herbert Blumer (1969), Erving Goffman (1959), and George Herbert Mead (1934), to name a few. Their sociological studies have attracted uncountable sociologists in the world. Thus, if it properly deals with meaning and interpretation, computational social science will enter the core in sociology and truly become indispensable to sociology. Furthermore, including meaning and interpretation in computational social science will enhance its analytical power. The purpose of this book exists here. That is, we try to explore how computational social science and sociology should collaborate to advance sociological studies as well as computational social science. Each chapter in this book implicitly or explicitly shares this purpose from different perspectives.

1.2

Organization of the Book

Chapter 2 “Sociological Foundations of Computational Social Science” by Yoshimichi Sato is literally a theoretical part of this book. He highly evaluates the strength of computational social science consisting of big data (or digital trace), data analysis, and agent-based modeling. He argues that the three characteristics of big data pointed out by Salganik (2018), that is, bigness, always-on, and nonreactivity, allow sociologists to study social phenomena that were unable to study by conventional social surveys. Agent-based modeling also opened a door to studying micro– macro linkages from new perspectives. It rigorously explains social phenomena as the accumulation of behaviors of many agents. This was impossible by conventional sociological approaches. However, computational social science is not exempt from some problems when it is applied to core research questions in sociology. Most of the studies of computational social science deal with behaviors of people and miss meaning and interpretation, while sociology has emphasized the importance of the two concepts. Thus, it is important to incorporate the two concepts in computational social science for it to be more influential in sociology. Sato examines Goldberg and Stein (2018) who, following Berger and Luckmann (1966), incorporate meaning and interpretation in their agent-based model and evaluates that Goldberg and Stein (2018) are an

1

Introduction

5

excellent example of collaboration between agent-based modeling and social theory. He also examines other works in line with ideas of Goldberg and Stein (2018) and concludes that sociologists should start their study with sociological theories and concepts, derive hypothesis from them, and apply techniques of computational social science to them to check their empirical validity. In Chap. 3 “Methodological Contributions of Computational Social Science to Sociology” by Hiroki Takikawa and Sho Fujihara, the authors discuss how machine learning, one of the central methods of computational social science, can be used to advance sociological theory. The authors describe machine learning as (1) learning from data and experience, i.e., data-driven, (2) aiming to improve the performance of task resolution, such as prediction and classification, and (3) aiming to develop algorithms that automate task resolution. Then, the potential theoretical contributions of machine learning in sociology are discussed, including breaking away from deductive models, establishing a normative cognitive paradigm, and using machine learning as a model of human decision making and cognition. The chapter thereafter identifies problems with conventional quantitative methods in sociology and discusses the predictive framework and automatic coding as examples of applications of machine learning in sociology, and how sociological theory can be improved through these applications. It is also pointed out that machine learning can shed new light on the elucidation of causal mechanisms, which is a central issue in sociological theory. Finally, it is argued that the Scientific Regret Minimization Method could be an important breakthrough in addressing the “interpretability” problem, which is a major challenge for machine learning. Thus, Chap. 3 shows that the common belief that machine learning is atheoretical and therefore does not contribute to the theoretical development of sociology is false, and that with appropriate and creative applications, machine learning, a central method in computational social science, has great potential for improving sociological theory. Michael Macy, the author of Chap. 4 “Computational Social Science: A Complex Contagion” looks back over his personal history as a computer social scientist and overlaps it with the development of computational social science. According to him, the first wave of computational social science explored implications derived from theories in social science. Macy (1989) was a pioneering work to understand how players learn to cooperate in the prisoner’s dilemma game. His computer simulation opened the door to a new research area, which will be called agent-based modeling later. Then he attacked how trust and cooperation diffuse among strangers by building agent-based models (Macy & Sato, 2002; Macy & Skvoretz, 1998). Then, Macy moved on to the study of diffusion or contagion on social networks. Inspired by the small world study by Watts and Strogatz (1998), Centola and Macy (2007) proposed a theory or model of “complex contagion.” The model assumes that, to adopt a new cultural item, an individual needs to be exposed to more than one individual with the item. This is a sociologically plausible assumption because an individual hesitates to adopt a new cultural item such as participating in a risky social movement. Thus, for him/her to participates in the movement, he/she needs more than one prior adopter or wide bridges with them.

6

Y. Sato and H. Takikawa

Homophily is a critical factor for wide bridges to be built, but it may also create polarization of beliefs, preferences, and opinions among people. People tend to be clustered with those with similar beliefs, preferences, and opinions, which could lead to polarization. To explore the mechanism creating polarization, Macy and his collaborators conducted computer simulation using an agent-based model (DellaPosta et al., 2015; Macy et al., 2021), laboratory experiments (Willer et al., 2009), and online experiments (Macy et al., 2019) to study detailed mechanisms of polarization. Then, the second wave of computational social science came into the picture: Big data analysis. Macy’s first study using big data was to examine the empirical validity of Burt’s (1992) theory of structural holes at population level. Burt himself tested the theory with data of entrepreneurs. In contrast, Macy and his collaborators used telephone communication records in the UK to report that, as suggested by the theory of structural holes, social network diversity calculated using the records is strongly associated with socioeconomic advantages (Eagle et al., 2010). Another study with big data analysis by Golder and Macy (2014) was to analyze tweets to find what factors affect people’s emotion. Their study showed that the level of happiness measured by positive and negative words is the highest when they wake up, but the level declines as time passes. In other words, the level is affected by sleep cycles. The most excellent point of Macy’s study in computational social science is that he starts his research with sociological theories and use techniques of computational social science—agent-based modeling and big data analysis—to test them. That is, theory comes first, and the techniques come second. Therefore, his research has substantively advanced sociological theories. Chapter 5 “Model of Meaning” by Hiroki Takikawa and Atsushi Ueshima discusses the potential contribution of computational social science methods to models of meaning and the elucidation of meaning-making mechanisms, which are central issues in sociological theory. The authors point out that in conventional sociology, issues of meaning have been examined almost exclusively through qualitative approaches, and that the theoretical development of sociology as a whole has been hindered by the lack of a quantitative model of meaning. In contrast, the methods of computational social science, coupled with the availability of largescale textual data, have the potential to help build quantitative models of meaning. With these prospects in mind, Takikawa and Ueshima firstly formulate meaningmaking as a computational problem and characterize it with three key points: the predictive function of meaning, relationality of meaning, and semantic learning. With this formulation, they discuss the utility of two computational linguistic models—the topic model and the word-embedding model in terms of theories of meaning. It is argued that the topic model and the word-embedding model have their own advantages and disadvantages as models of meaning, and that integration of the two is necessary. It is then pointed out that, in order to function as an effective model of meaning for sociology, it is necessary to go beyond merely a computational linguistics model for the representation of meaning and to clarify the mechanism that links semantic representations to human action. They propose a model for this

1

Introduction

7

purpose. Thus, in Chap. 5, it is argued that the computational language model of computational social science should not be interpreted merely as an analytical tool, but as a model of human cognition and meaning-making, on the basis of which a sociologically persuasive theory of meaning should be constructed. Chapter 6 “Sociological Meaning of Contagion” by Yoshimichi Sato explores ways to substantively combine sociological theories and big data analysis focusing on contagion or diffusion, one of the main research themes in sociology. In the beginning of the chapter, Sato cites Wu et al. (2020) to show that big data analysis is powerful to nowcast and forecast of COVID-19. However, he also points out that a contagion or diffusion of a virus is different from that of new ideas, values, and norms. Thus, he is in line with Centola and Macy (2007) in that both stress the difference. Then, he carefully reviews Centola and Macy (2007) and points out that their theory—theory of complex contagions—does not fully explain a contagion process. This is because the theory does not incorporate meaning and interpretation in the process. Reviewing a failure case of diffusion in Rogers (2003), sociological theories by Mead (1934) and Berger and Luckmann (1966), and the social mobilization theory by Snow et al. (1986), Sato argues that placing meaning and interpretation in the study of complex contagion would enhance its explanatory power. Then, he examines Bail’s (2016) study of frame alignment strategies of organ donation advocacy organizations by applying topic models to their Facebook messages and concludes that combining the theory of complex contagions and Bail’s theory of cultural carrying capacity would give us a deeper comprehension of the mechanism of complex contagion. Chapter 7 “Polarization of Opinion” by Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa discusses how sociology based on computational social science methods can contribute to today’s most pressing social problem of opinion polarization. The authors conceptualize opinion polarization as increases in antagonistic and extreme preferences over public policies, ideological orientations, partisan attachments, or cultural norms. Then, they point out the rise of opinion polarization has attracted extensive attention because its negative consequence may pose a disruptive threat to democratic societies. According to the authors, the limitation of the traditional quantitative social science research which has primarily relied on conventional statistical analyses of survey data has become rather apparent in the investigation of opinion polarization because of its complex mechanism and process. In the main part, they review the new methodology for opinion polarization research by classifying them into three parts: network analysis, natural language processing, and digital experiment. They conclude with three points about contribution of sociological theories of political polarization. First, new data and methods can help solve numerous long-standing obstacles once considered insurmountable. Furthermore, with new data and methods sociologists can pose new questions and formulate new theories. Finally, new data and methods can transform the social science paradigm— from the explanation of the current status to the prediction of the future. Chapter 8 reviews discussions shown in previous chapters and propose two ways for further collaboration between computational social science. The first way follows the deductive approach. A sociologist should begin his/her research with a

8

Y. Sato and H. Takikawa

sociological theory, derive hypotheses from it, and check their empirical validity using techniques of computational social science. The difference between this way and conventional sociological research is that, in the former, computational social science techniques can use data that conventional sociological methods were unable to access and, therefore, analyze. The second way is that a sociologist should find new patterns with the help of computational social science, generalize them to hypothesis, and create a new theory to explain them. The key point of the second way is that computational social science techniques find new patterns that could not be found by conventional sociological methods, and such new patterns lead to new theories. The chapter concludes that proper use of computational social science opens a new door to upgrading sociology.

References Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and social media engagement. Social Science & Medicine, 165, 280–288. Berger, P. L., & Luckmann, T. A. (1966). The social construction of reality: A treatise in the sociology of knowledge. Doubleday. Blumer, H. G. (1969). Symbolic interactionism: Perspective and method. University of California Press. Burt, R. (1992). Structural holes: The social structure of competition. Harvard University Press. Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113, 702–734. Chang, S., et al. (2021). Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589, 82–87. Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press. DellaPosta, D., Shi, Y., & Macy, M. (2015). Why do liberals drink lattes? American Journal of Sociology, 120, 1473–1511. Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science, 328, 1029–1031. Goffman, E. (1959). The presentation of self in everyday life. Doubleday. Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the emergence of cultural variation. American Sociological Review, 83(5), 897–932. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. Macy, M. W. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s dilemma. Rationality and Society, 1, 197–219. Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan. Proceedings of the National Academy of Sciences of the United States of America, 99(Suppl_3), 7214–7220. Macy, M. W., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A computational model. American Sociological Review, 63, 638–660. Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agentbased modeling. Annual Review of Sociology, 28, 143–166. Macy, M. W., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754 Macy, M. W., Ma, M., Tabin, D. R., Gao, J., & Szymanski, B. K. (2021). Polarization and tipping points. Proceedings of the National Academy of Sciences, 118(50), e2102144118.

1

Introduction

9

Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University of Chicago Press. Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press. Salganik, M. J. (2018). Bit by bit. Princeton University Press. Schütz, A. (1932). Der sinnhafte Aufbau der sozialen Welt: Eine Einleitung in die Verstehende Soziologie. Springer. Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment processes, micromobilization, and movement participation. American Sociological Review, 51(4), 464–481. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442. Weber, M. (1921–22). Grundriß der Sozialökonomik, III. Abteilung, Wirtschaft und Gesellschaft, Erster Teil, Kap. I. Verlag von J.C.B. Mohr. Willer, R., Kuwabara, K., & Macy, M. W. (2009). The false enforcement of unpopular norms. American Journal of Sociology, 115, 451–490. Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study. Lancet, 395, 689–697.

Chapter 2

Sociological Foundations of Computational Social Science Yoshimichi Sato

2.1

Introduction: Computational Social Science and Sociology

Computational social science consisting of digital (big) data analysis and agentbased modeling has become popular and influential in social science. Take Chang et al. (2021), for example. They analyzed mobile phone data to simulate geographical mobility of 98 million people. One of their major findings is that social inequality affects the infection rate. Their model predicts “higher infection rates among disadvantaged racial and socioeconomic groups solely as the result of differences in mobility” (Chang et al., 2021, p. 82). This finding is important and meaningful to sociologists because one of the most important research topics in sociology, social inequality, is studied from a new perspective with the help of computational social science. This chapter shows the gap between sociology and computational social science and how to fill the gap. Chang et al. (2021) give us a good clue for it. I will get back to this point later.

2.2

Strength and Problems of Computational Social Science

Strength of computational social science could be summarized as follows. Digital data analysis deals with data having three characteristics: “Big,” “Always-on,” and “Nonreactive” (Salganik, 2018). Analysis of data with these characteristics can study Y. Sato (✉) Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_2

11

12

Y. Sato

social phenomena that cannot be studied by analysis using conventional social survey data that is small, always-off, and reactive. For example, Salganik et al. (2006) created an artificial music market with about 14,000 participants to study the social influence in cultural markets. Participants were recruited from a website and randomly assigned to an independent condition without social influence and the other with social influence. In the former condition participants decided which songs they will listen to based only on the names of the bands and their songs and, while listening to them, ranked them. In the latter condition, in addition to the former condition, they could know the download counts of each song done by previous participants. In the latter condition, participants were randomly assigned to one of the eight artificial worlds so that observers could see how each world evolves independently. Salganik et al. (2006) report three major findings based on the online experiment. First, inequality is larger in popularity among songs in the social influence condition than in the independent condition. Second, the evolution of the inequality in popularity is unpredictable. Even though the eight artificial worlds are under the same conditions, the degrees of inequality are different across the eight worlds. Third, the unpredictability is higher in the social influence condition than in the independent condition. These findings clearly show that popularity of songs, cultural items in general, depends more on social influence than on their own quality and that the popularity evolves unpredictably. In theory these findings could be obtained by a conventional laboratory experiment. In practice, however, it is almost impossible to find about 14,000 people and ask them to come to a laboratory. The method developed by Salganik and his colleagues shows the strength of computational social science. Agent-based modeling, the other main pillar of computational social science, also has its own strength (Cederman, 2005; Gilbert, 2019; Macy & Willer, 2002; Squazzoni, 2012). The strongest point of agent-based modeling, I would argue, is that it can clearly study the micro–macro linkage and the emergence of a social phenomenon from interaction between agents. For example, Schelling’s model of residential segregation, a prototype of agent-based modeling, made a simple assumption about moving decision of individuals (agents) (Schelling, 1971). Agents are assumed to have homophily tendency. If the rate of an agent’s neighbors in the different characteristic, race, for example, from that of the agent, is smaller than his/her threshold, he/she stays in the same place. If the rate is larger than the threshold, he/she moves to a new vacant place. Then, after iterations, residential segregation emerges at the societal level. Schelling’s model clearly demonstrates the transition from the micro-level (homophily principle at the agent’s level) to the macro-level (residential segregation at the societal level). Other models deal with the macro–micro–macro transition. Macy and Sato (2002), for example, study the effect of mobility to the emergence of generalized trust and the global market. In the model agents are randomly allocated to neighborhoods. Then agents are assumed to have propensities to trust, cooperate, and enter the global market, that they make decisions comparing their propensities and randomly generated thresholds, and that they revise the propensities by social

2

Sociological Foundations of Computational Social Science

13

learning. Then they manipulate the mobility rate to study its effect on the formation of trust, cooperation, and the global market. The mobility rate at the macro-level affects the decision-making and behavior of agents at the micro-level, and then their behavior accumulates and determines the level of trust, cooperation, and the global market at the macro-level. So far, we observed the strength of computational social science. It provides powerful tools to social scientists, with which they can study new aspects of social phenomena that cannot be analyzed by conventional methods. However, computational social science, I would argue, finds it difficult to properly study central questions in sociology such as the emergence of social order and inequality because it does not include meaning in its models (Goldberg & Stein, 2018). Take the emergence of social order, for example. Social order has been a central research topic in sociology [e.g., Parsons (1951) and Seiyama (1995)]. Scholars in this research area would agree that coordinated behaviors among actors is a necessary condition for social order but not a sufficient condition. Coordinated behaviors must be justified by actors’ belief about them. Through their belief system, actors observing coordinated behavior of other actors interpret that the behavior is socially desired and that he/she should also behave in the same way. Here, as Weber (1921) points out, behavior turns to be social action. The abovementioned model of trust by Macy and Sato (2002) studies the emergence of social order in a sense. The high level of trusting and cooperating behavior and the emergence of the global market seem to be an example of the emergence of social order. However, they are coordinated behavior of agents. For the behavior to become social action and for social order to emerge, agents must interpret the coordinated behavior as just behavior. However, agents in the model are not equipped with a belief system through which they interpret behavior of other agents. There is a gap between social order and coordinated behavior, and the model cannot fill the gap. Social inequality is also a central research topic in sociology, and one of the major research questions in the field is why social inequality exists and persists. The abovementioned study by Salganik et al. (2006) clearly demonstrates how social inequality among songs emerges. However, it does not explain why the inequality persists. In general, social inequality cannot be sustained without people’s belief that it is just. It is true that powerful ruling class sustains inequality by suppressing people in other classes, but this asymmetric, unjust system leads to social unrest making society unstable and inefficient. Thus, for social inequality to persist, it is necessary that people believe that it is just and accept it. Although it is an excellent study of inequality, the study by Salganik et al. (2006) does not clearly show how and why the inequality persists. Here again, we observe a gap between social inequality at the behavioral level and people’s belief about it. I do not mean to criticize Macy and Sato (2002) and Salganik et al. (2006) in particular. I picked up them as examples of the studies in computational social science to show that most of the studies in the field focus on behaviors not on belief. This creates the abovementioned gap. There would be no problem if we are interested only in behavioral patterns and conduct studies within computational

14

Y. Sato

social science. However, I argue that this strategy does not fully exploit the power and potential of computational social science when it is applied in sociology. To do this, we need to fill the gap.

2.3 2.3.1

How Can We Fill the Gap? Agent-Based Modeling and Interpretation

How can we fill the gap? As abovementioned, we need to assume that actors have a belief system through which they interpret reality and add meaning to it. This is not a new assumption, though. Rather, this stands on sociological tradition. Berger and Luckmann (1966), a classical book of social theory, argued that social “reality” does not exit by itself. Rather, it is socially constructed. My interpretation of their theory on the emergent of social reality is as follows. In the beginning, an actor performs a behavior. Then, another actor observes it. However, he/she does not observe as it is. he/she interprets it, adds meaning to it, and decides whether to adopt it or not. If he/she adopts it, the third actor follows the same procedure. Then if many actors follow the same procedure, the behavior turns to be a social action and social reality. Goldberg and Stein (2018) adequately build an agent-based model based on Berger and Luckmann’s (1966) theory of social construction of reality. Because their model is an excellent example filling the gap, I will explain its details to show how we can fill the gap. Goldberg and Stein (2018) call their theory a theory of associative diffusion and assume a two-stage transmission, which is different from simple contagion. In the beginning of the two-stage transmission process, actor B observes that actor A smokes. Then, actor B interprets the meaning of smoking and evaluates it in his/her cognition system. Finally, actor B decides whether to smoke or not. If he/she decides to smoke, actor A’s behavior (smoking) is transmitted to actor B. Based on their theory, Goldberg and Stein (2018, pp. 907–908) propose the following proposition: Associative diffusion leads to the emergence of cultural differentiation even when agents have unobstructed opportunity to observe one another. Social contagion does not lead to cultural differentiation unless agents are structurally segregated.

Then, to check the validity of the proposition, they built an agent-based model and conducted simulation to show that the proposition is valid. What makes their model different from other models of contagion is that an agent in their model is equipped with an associative matrix. The associative matrix shows the strength of association between practices, which are exhibited by agents. In the model two agents—agents A and B—are randomly selected. Agent B observes agent A exhibiting practices i and j at certain probabilities that are proportional to agent A’s preference on them. Then, B updates his/her associative matrix. In the update process, the association between practices i and j becomes stronger. This is because

2

Sociological Foundations of Computational Social Science

15

he/she observed that agent A exhibited practices i and j simultaneously. Then, agent B updates his/her preference over practices. Note that the update does not automatically occur. Agent B calculates constraint satisfaction using the updated preference and the associative matrix. If it is larger than constraint satisfaction using the old preference and the associative matrix, he/she keeps the updated preference. Otherwise, he/she keeps the old preference. Constraint satisfaction is a concept developed in cognitive science and shows that an actor (agent) is satisfied if he/she resolves cognitive dissonance. Conducting simulation with this agent-based model, Goldberg and Stein (2018) show that cultural differentiation emerges even if agents are not clustered in different networks. I highly evaluate their study because they started with important sociological concepts such as meaning and interpretation. Agent-based modeling did not come first. Rather, they started with some anecdotes, argued that conventional contagion models cannot explain them, showed the importance of meaning and interpretation to explain them, and built an agent-based model in which agents interpret other agents’ behaviors and add meaning to them. This is what I want to emphasize in this chapter. Many of sociological studies using agent-based models, to my knowledge, apply the models to social phenomena focusing on the behavioral aspect of agents, not on interpretive aspects of agents. Take Schelling’s (1971) model of residential segregation, for example. As abovementioned, his model demonstrates how agents’ decision-making at the micro-level generates residential segregation at the macro-level. This is one of the strengths of agent-based modeling. However, an agent in the model does not interpret behaviors of his/her neighbors. He/she reacts to the rate of neighbors of the same character as his/hers. He/she does not ask himself/herself why a neighbor moves or stays; he/she does not add meanings to the behavior of the neighbor. In other words, agents in the model are a kind of automated machine just reacting to the composition of neighbors. Even such a simple model explains the emergence of residential segregation. However, an agent’s move can be interpreted in some ways [see also Gilbert (2005)]. If one of his/her neighbor interprets his/her move as a kind of “white flight,” he/she might follow suit. In contrast, if the neighbor interprets the move differently, he/she might stay, which would not lead to residential segregation. Thus, different interpretations of agents’ behavior result in different outcome at the macro-level. Take power for another example. Suppose that agent A does something, say smoking, in front of agent B. If agent B thinks that he/she is implicitly forced to smoke, too, he/she interprets that agent A exercises power on him/her. However, if agent B interprets that agent A smokes just for his/her fun, he/she does not feel forced to smoke. Thus, it depends on agent B’s interpretation of agent A’s behavior whether power relation between agents A and B is realized or not (Lukes, 1974, 2005). These examples clearly show that agents in agent-based modeling should be equipped with an interpretation system. As pointed out, Goldberg and Stein (2018)

16

Y. Sato

Following convention

Choosing a new action

(Backward-looking rationality)

(Forward-looking rationality)

Reflecting on their goal and discovering a new goal (Reflexivity)

Fig. 2.1 Relationship between following convention, choosing a new action, reflecting on the goal, and discovering a new goal. (Source: Sato 2017, p. 42, Fig. 3.2)

propose an agent-based model with such a system and report interesting findings based on the simulation of the model. Squazzoni (2012, pp. 109–122) also argues the importance of social reflexivity, which is closely related to the concepts of interpretation and meaning and cites two studies done by him and his colleagues (Boero et al., 2004a, b, 2008). In the first study (Boero et al., 2004a, b), firms in an industrial district are assumed to have one of the four behavioral “attitudes” and to replace one to another in certain conditions. In conventional agent-based models, agents are assumed to perform a behavior. In a model in which agents play a prisoner’s dilemma game, for example, they cooperate or defect. In contrast, an “attitude” is a bundle of strategies or behaviors. An agent reflects his/her attitude and replaces it to another attitude if necessary. In the second study (Boero et al., 2008), agents are placed in a toroidal 80 × 80 cell space and are assumed to stay or move based on their happiness. One of the scenarios in their model is that agents have one of the four “heuristics” and randomly change them under a certain condition. A “heuristic” is a behavioral rule like an attitude in their first study. Here again, agents reflect their heuristics and change them if necessary. An important characteristic of the models is that agents choose not a behavior but a rule of behaviors, which are called an attitude and a heuristic. An agent reflects the rule they chose under certain conditions and replace it with another rule if necessary. There is no interpretation process in the models. However, if the process is added to the models, agents can be assumed to interpret the conditions where they are placed, to reflect the current rule they chose, and to replace it to another rule if necessary. Sato (2017) also points to the importance of meaning and reflexivity to fill the gap between agent-based modeling and social theory. This is because social theories often point to their importance in the study of modern and postmodern society (Giddens, 1984; Imada, 1986; Luhmann, 1984). Thus, for agent-based modeling to contribute the advance of social theory, it needs to incorporate meaning and reflexivity in models. Sato (2017) revisits Imada’s (1986) triangular framework of action (Fig. 2.1) and proposes a framework for agent-based modeling to incorporate meaning and reflection. Imada’s framework consists of three types of action: Following convention, choosing a new action, and reflecting on the goal and

2

Sociological Foundations of Computational Social Science

17

Fig. 2.2 Sets of goals. (Source: Sato 2017, p. 44, Fig. 3.4)

discovering a new goal. Actors follow a convention as long as it does not cause a problem. Borrowing the terminology of rational choice theory, I argue that backward-looking rationality dominates in society to lighten the cognitive burden of actors. However, if an external change occurs and the convention cannot solve problems caused by the change, actors apply forward-looking rationality to the change and find a new action which they believe will solve the problems. If the new action actually solves the problem, it becomes a convention. In contrast, if it cannot, actors reflect on the current goal and try to discover a new goal that they believe would result in a better outcome. Then, if actors can find such a new goal and an action that achieves it, the action becomes a new convention. The key point of Imada’s framework when we incorporate it in agent-based modeling is that actors find a new goal if they succeed. How can this process be modeled? Logically, it is impossible because of the following reasoning. In Imada’s framework actors find a goal that has not been found. Take the concept of “sustainable development,” for example. When advanced countries enjoyed economic growth, their goal was only development without considering sustainability. As people, governments, and other organizations realize social problems caused by development, they invent the concept of “sustainable development” focusing both on economic development and on sustainability. This example shows that the set of goals is infinite. A goal is new because it has not been invented. Then, theoretically, the set of goals must be infinite. However, creating an infinite set in agent-based modeling is logically impossible. Then, how can we incorporate reflexivity in agentbased modeling? Sato (2017) proposes an assumption that agents have limited cognitive capacity; they consider only a limited number of goals, which they have known (Set A in Fig. 2.2). Then, set A is assumed to be included in a larger set, set B in Fig. 2.2. Agents do not know goals in B-A. If a goal in B-A enters set A and a known goal leave set A, the goal entering set A is new to agents. If agents interpret the new goal is better than goals in set A, they will choose the new goal. If set B is large enough

18

Y. Sato

for simulation, set A always has goals new to agents. This could be a second-best solution to the abovementioned problem. In this subsection, I referred to studies by Goldberg and Stein (2018), Boero et al. (2004a, b, 2008), and Sato (2017). Although their approaches are different from each other, all of them try to incorporate meaning, interpretation, and reflexivity in agentbased modeling. This line of research contributes to filling the gap between computational social science and sociology and advancing sociological studies with the help of computational social science, a powerful tool in social science.

2.3.2

Digital Data Analysis and Sociology

Digital data analysis is a rapidly growing body in sociology. For example, Chang et al. (2021), which was mentioned in the beginning of this chapter, is an excellent example of how digital data analysis advances sociological study of inequality. If the sociological aspect of inequality among social groups had not been included in their study, the study would not have been attractive to sociologists conducting conventional research on social inequality. My point on their paper is that the sociological research question on inequality among social groups led to digital data analysis suitable for answering the question. Another intriguing example of digital data analysis in sociology is the study on newspaper coverage of U.S. government arts funding done by DiMaggio et al. (2013). According to the authors, after a good relationship between National Endowment for the Arts (NEA) and artists for two decades, the government support for artists became contentious between the mid-1980s and mid-1990s. A good indicator of the contention is a decline in NEA appropriations from 1986 through 1997. To explain the decline, the authors investigate press coverage of government arts support. Their research question is “how did the press respond to, participate in or contribute to the NEA’s political woes?” (DiMaggio et al., 2013, p. 573). To answer the question, they applied Latent Dirichlet Allocation, a topic model of text analysis, to newspaper articles in Houston Chronicle, the New York Times, the Seattle Times, the Wall Street Journal, and the Washington Post published in the abovementioned period. The topic model extracted 12 topics, and the authors grouped them into three categories: (1) social or political conflict, (2) local projects and revenues, and (3) specific arts genres, types of grant, or event information. After detailed analysis of the topics and newspaper articles, the authors got the following findings (DiMaggio et al., 2013, p. 602). (1) Press coverage of arts funding suddenly changed from celebratory to contention in 1989, which continued in the 1990s. (2) Negative coverage of the NEA emerged when George H.W. Bush was elected president. (3) Press coverage reflected three frames for the controversy. (4) Press coverage of government arts patronage differed from newspaper to newspaper. As the authors emphasize, topic modeling is suitable for the study of culture because it can clearly capture the relationality of meaning. Moreover, one of the strong points of the paper is that it has a clear sociological research question with

2

Sociological Foundations of Computational Social Science

19

which they conducted topic modeling analysis. This became possible, maybe, because Paul DiMaggio, one of the authors, has deep expertise in sociology of culture. These two exemplars of digital data analysis suggest that excellent sociological study using digital data analysis should start with good research questions not with available data. Digital data analysis is a social telescope (Golder & Macy, 2014) with much higher resolution than conventional social surveys and experiments. However, it is sociological expertise, not data itself, that finds order and pattern in data obtained through the social telescope. Without such expertise and research questions based on it, digital data analysis would not contribute to the advance of sociological inquiries. In addition to sociological expertise, including meaning in digital data analysis would make the analysis substantively contribute to sociological studies. As pointed out in the previous subsection, meaning and interpretation are important concepts when we conduct sociological inquiries. The abovementioned study of newspaper articles by DiMaggio et al. (2013) is an excellent example of this approach. Take Twitter data, for example. Suppose that actor A tweets a message supporting politician X. It is not always the case that actor A expresses his/her true opinion in the message. He/she may do so, but he/she may not because he/she hides his/her true opinion expecting negative reactions of his/her Twitter followers. Then, actor B, a follower of actor A, does not receive actor A’s message as it is. He/she interprets it and tries to understand its meaning. Then, he/she tweets his/her message about actor A’s original message based on his/her interpretation of it. He/she also may express his/her opinion or may not. Other actors including actor A, in turn, read the message, interpret it, and tweet a message, and so on. To the best of my knowledge, most of the digital data analysis of Twitter data or other text data lacks this interpretation/expression process. However, Twitter data analysis with this process could unveil the relationship between actions (expressing messages) and the inner world of actors, which would attract sociologists studying in main domains in sociology. This is because interpretation and meaning have been central concepts in sociology. Analysis of only behavioral digital data would not enter the central segment in sociology. In contrast, analysis with the interpretation/ expression process would promote collaboration between sociologists and analysts of digital data and contribute to the advance of sociology.

2.4

Conclusions: Toward Productive Collaboration Between Sociology and Computational Social Science

Radford and Joseph (2020) emphasize critical roles of social theory when we use machine learning models to analyze digital data. Their coverage of topics is wide, but, to summarize their argument, machine learning models are useless unless researchers start with social theory in their research with machine learning models. Study using machine learning models without social theory is like a voyage without

20

Y. Sato

a chart. It could not create hypotheses important in social science. We would not know whether findings of the study are new. The crux of Radford and Joseph’s (2020) argument is that not available data for machine learning models, but social theory comes first. The main message of this chapter is in line with theirs. For studies using computational social science methods such as agent-based modeling and digital data analysis to be fruitful and contribute to the advance of sociology, we should start our research with sociological theories and concepts and create hypotheses based on them. Then, computational social science methods help us rigorously test their validity. And, most importantly, the methods may lead to new findings that could not have been found using conventional sociological methodology. This is highly plausible because the methods are new social telescopes with much higher resolution than that of conventional methodology. Then, we must bring the findings back to the original sociological theories and concepts, find their problems, and invent new theories and concepts by fixing them. This is a way to advance sociology with the help of computational social science and to have computational social science substantively contribute to the advance of sociology. Furthermore, in this way, computational social scientists could improve their methods so that improved methods could be more suitable for sociological analysis. This means that sociology contributes to the advance of computational social science. This collaboration would advance both of sociology and computational social science and open a door to new exciting interdisciplinary fields.

References Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology of knowledge. Doubleday. Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems (pp. 48–69). Springer. Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial Societies and Social Simulation, 7(2), 1. Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties: An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174. Cederman, L.-E. (2005). Computational models of social forms: Advancing generative process theory. American Journal of Sociology, 110(4), 864–893. Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021). Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589, 82–87. DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41, 570–606. Giddens, A. (1984). The constitution of society: Outline of the theory of structuration. Polity Press.

2

Sociological Foundations of Computational Social Science

21

Gilbert, N. (2005). When does social simulation need cognitive models? In R. Sun (Ed.), Cognition and multi-agent interaction: From cognitive modeling to social simulation (pp. 428–432). Cambridge University Press. Gilbert, N. (2019). Agent-based models (2nd ed.). Sage Publications. Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the emergence of cultural variation. American Sociological Review, 83(5), 897–932. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. Imada, T. (1986). Self-organization: Revival of social theory. Keiso Shobo. (In Japanese). Luhmann, N. (1984). Soziale Systeme: Grundriß einer allgemeinen Theorie. Suhrkamp. Lukes, S. (1974). Power: A radical view. Macmillan Education. Lukes, S. (2005). Power: A radical view (2nd ed.). Palgrave Macmillan. Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan. Proceedings of the National Academy of Science, 99(Suppl. 3), 7214–7220. Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agentbased modeling. Annual Review of Sociology, 28, 143–166. Parsons, T. (1951). The social system. Free Press. Radford, J., & Joseph, K. (2020). Theory in, theory out: The uses of social theory in machine learning for social science. Frontiers in Big Data, 3, 18. https://doi.org/10.3389/fdata.2020. 00018 Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856. Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.), Reconstruction of the public sphere in the socially mediated age (pp. 37–46). Springer Nature Singapore Pte. Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1, 143–186. Seiyama, K. (1995). Perspectives of theory of institution. Sobunsha. (In Japanese). Squazzoni, F. (2012). Agent-based computational sociology. Wiley. Weber, M. (1921). Soziologische Grundbegriffe. In Grundriß der Sozialökonomik, III. Abteilung, Wirtschaft und Gesellschaft. J.C.B. Mohr.

Chapter 3

Methodological Contributions of Computational Social Science to Sociology Hiroki Takikawa and Sho Fujihara

3.1

Introduction

Currently, the data environment for sociology is changing dramatically (Salganik, 2018). Traditionally, the main type of data used by quantitative sociology was survey data. Collecting survey data entailed significant financial and human costs; therefore, the data provided by surveys were scarce. Such data are clean, structured, and collected by probability sampling. In the digital age, however, people’s behavior is observed daily and recorded constantly, which creates vast amounts of behavioral data known as digital traces (Golder & Macy, 2014). In addition, surveys and experiments using crowdworkers, based on nonprobabilistic samples, are by far the least expensive and can be collected in large quantities, and they are suitable for various interventions (Salganik, 2018). In this digital age of computational social science, data are messy and unstructured yet abundant. As the data environment changes, the methodologies required in sociology also change. In the era of scarce data, the main focus of methodology was how to efficiently extract meaningful information from scarce data (Grimmer et al., 2021, 2022). However, in the new era of abundant data, different methodologies are needed. Moreover, strategies for how sociological theory should be developed using these methodologies must also be considered.

H. Takikawa (✉) Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] S. Fujihara Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_3

23

24

H. Takikawa and S. Fujihara

There are several possibilities for processing and analyzing such new data, but at the center of these methods are machine learning methods (sometimes more broadly referred to as “data science”) (Hastie et al., 2009; James et al., 2013; Jordan & Mitchell, 2015). Machine learning differs substantially from traditional quantitative methods used in the era of scarce data in its assumptions and “culture” (Grimmer et al., 2021). Sometimes it is argued that machine learning techniques are not useful for theory-oriented social sciences because they are not explanation-oriented and often produce uninterpretable black boxes. One pundit even argued that theory was unnecessary in the era of big data (cf. Anderson, 2008). In contrast, this chapter argues that these techniques are potentially very useful, not only for conventional sociological quantitative analysis but also for sociological theory building. To date, various machine learning approaches have been used in computational social science, but they have not necessarily been systematically utilized to contribute to sociological (social science) theory. Conversely, many sociologists underestimate the potential contribution of machine learning and computational social science to sociological theory. To change this situation, we aim to identify how machine learning methods can be used to contribute to theory building in sociology. In the next section, after noting the differences between the culture of machine learning and that of traditional statistics, we define the concept of machine learning as used in this chapter. We then explain the basic ideas and logic of machine learning, such as sample splitting and regularization. In Sect. 3.3, we summarize three potential applications of machine learning in sociology: breaking away from deductive models, establishing a normative cognitive paradigm, and using machine learning as a model of human decision making and cognition. Section 3.4 summarizes the statistical methods conventionally used in sociology, including description, prediction, and causal inference, and describes their challenges, focusing on the use of regression analysis. Sections 3.5 and 3.6 introduce current applications of machine learning in sociology and discuss future directions.

3.2 3.2.1

Machine Learning The Culture and Logic of Machine Learning

Machine learning differs from the methodologies traditionally used in the social sciences in terms of its assumptions, its conception of the desirability and value of research objectives and methods, and, more broadly, its cognitive culture (Grimmer et al., 2021) but that does not mean that its similarities with traditional statistics should be underestimated (cf. Lundberg et al., 2021). Breiman (2001) described the culture of traditional statistics as a data modeling culture and that of machine learning as an algorithmic culture. The distinction between the two can be explained as follows (Breiman, 2001). Consider that data are produced as a response y given an input x. How x is transformed into y is initially a black box. In the data modeling culture, we consider probabilistic models that model the contents of the black box,

3

Methodological Contributions of Computational Social Science to Sociology

25

such as linear or logistic regression models. In contrast, in algorithmic cultures, the goal is to discover an algorithm, f(x), on x that predicts the response y well, leaving the contents of the black box unknown. Thus, one can describe traditional statistics as generative modeling, which models the generative process of data, and machine learning as predictive modeling, which is oriented toward prediction rather than the generative process (Donoho, 2017). Mullainathan and Spiess (2017) stated that while social sciences are traditionally interested in the value of the estimated coefficients β of explanatory variables in their models, machine learning is interested in the predicted value y. In other words, social science is interested in modeling the contents of the black box, while machine learning is interested in predicting the results produced by the black box. From the above quick introduction, many social scientists may feel that a data modeling culture that models the contents of a black box is more beneficial to sociological theory building. However, this is not true. First, as Breiman emphasizes, the choice between data modeling and algorithmic modeling is not essential; what matters is what model should be used to solve the actual problem (Breiman, 2001). If the problem to be solved in the social sciences is better approachable by machine learning, its use should not be discouraged. Second, even if the problem to be solved in social science is to understand the causal mechanisms of social phenomena (i.e., the contents of the black box), data modeling is not always the most appropriate means for this purpose. Breiman argues that, in the end, if an algorithmic model is more effective in solving a problem, then it is also more useful in understanding the mechanisms that create the problem (Breiman, 2001).

3.2.2

Definition of Machine Learning

Machine learning is an extremely broad term that covers a wide range of ideas and is extremely difficult to define because it is used with slightly different meanings in different fields. Molina and Garip (2019, p. 28) describe machine learning succinctly as “machine learning (ML) seeks to automate discovery from data”; Grimmer et al. (2021, p. 396) state that “machine learning is a class of flexible algorithmic and statistical techniques for prediction and dimension reduction.” Athey (2018, p. 509) states that machine learning is “a field that develops algorithms designed to be applied to data sets, with the main areas of focus being prediction (regression), classification, and clustering or grouping tasks.” Jordan and Mitchell (2015, p. 255) describe machine learning as a field that addresses the following two questions. 1. “How can one construct computer systems that automatically improve through experience?” 2. “What are the fundamental statistical-computational-information-theoretic laws that govern all learning systems, including computers, humans, and organizations?”

26

H. Takikawa and S. Fujihara

The latter theoretical question is not explicitly asked in the context of applications in the social sciences, but it is sometimes necessary to ask such questions to think deeply about why machine learning applications succeed or fail in any given situation. Identifying the mechanisms by which humans and organizations learn is also a challenge for sociological theory itself. With the exception of the last theoretical question, the points can be summarized as follows from these definitions: 1. Learning from data and experience, i.e., data-driven, 2. Aiming to improve the performance of task resolution, such as prediction and classification, and 3. Aiming to develop algorithms that automate task resolution. Overall, machine learning can be defined as a procedure for learning from data and creating the best algorithm for automating the solution of tasks such as prediction and classification. Machine learning methods are often classified as supervised or unsupervised (Bishop & Nasrabadi, 2006). In this section, we first introduce supervised machine learning and then address unsupervised machine learning.

3.2.3

Prediction and Supervised Machine Learning

The goal of supervised machine learning is to predict a response variable y using predictors or “explanatory variables” x from data (Jordan & Mitchell, 2015). We call this approach supervised because in the training phase of the model, predictor and response variable pairs (x, y) are given in advance as training data. Training is performed based on these training data, and the resulting model is used to predict the value of the unknown response variable. The data x used for prediction can be extremely high-dimensional. Images and text data, for example, are examples of very high-dimensional data. Prediction is performed by learning a mapping f from x to y. This mapping can be a simple function, such as a linear regression model, or it can be a very complex mapping, such as an artificial neural network. In machine learning, we are more interested in y, the outcome of the prediction, than in the coefficient β of the explanatory variable (Mullainathan & Spiess, 2017). Additionally, in machine learning, the goodness of fit of a model to the data is not important. Rather, we are ultimately concerned with the performance of prediction and the generalization for unknown events. The most important point in machine learning is that the best-fitting model for the available data is not necessarily the best model for predicting unknown data.

3

Methodological Contributions of Computational Social Science to Sociology

3.2.3.1

27

Sample Splitting

How, then, can we discover models with good generalization performance? For this purpose, a procedure called sample splitting, which is the most important procedure in supervised machine learning, is used (Hastie et al., 2009). In sample splitting, data are divided into training data and test data. In the training data, pairs of (x, y) are given, on which the model is trained. This is data for which the answer to which x corresponds to which y is provided in advance, and in this sense, it is called supervised learning. Test data, however, are used to evaluate the trained model. Testing the predictive performance using test data is called out-of-sample testing. In this way, models with good generalization performance can be selected. We consider this process in more detail below. In the training data, the goal is to minimize the loss defined by a loss function. For continuous variables, the loss function can be, for example, the mean of the squared value of the correct answer yi minus the predicted value yi or the mean squared error (MSE). The formula for MSE is as follows: MSE =

1 n

n

ð yi - yi Þ 2 , i=1

where n is the sample size in the data set. The smaller the loss is, the better the model’s fit to the training data. The error defined on the training data is called the training error. However, improving the fit to the training data is not in itself the goal of machine learning models. The goal is to improve the prediction performance given new data. For this purpose, test data other than the training data are prepared to evaluate the prediction performance. The evaluation of test data is also based on the MSE (for continuous variables), but it is called the test MSE to distinguish it from training data. The error in the test data is called the test error or generalization error. Thus, in supervised machine learning, the goal is to minimize the generalization error, not the training error. Why not simply use all the data at hand as training data and adopt the model with the best fit? A good fit to the training data does not necessarily equate to good prediction performance for new, unseen data. This phenomenon is called overfitting, where the model overreacts to random fluctuations or noise in the training data set. When overfitting occurs, the variability of the model increases due to differences in the training data, resulting in poor predictive performance. Therefore, avoiding overfitting is of great practical importance in machine learning (Domingos, 2012).

3.2.3.2

Bias–Variance Tradeoff

To better understand the avoidance of overfitting, let us introduce the tradeoff between bias and variance (Hastie et al., 2009; James et al., 2013). The expected value of the test MSE (E(y - f x)2), which is the goal of optimization in machine

28

H. Takikawa and S. Fujihara

learning, can be decomposed into the variance of the prediction Var( f x ), the square of the bias of the prediction [Bias( f x )]2, and the variance of the error Var(ε), where f x represents the model prediction (James et al., 2013). The equation is as follows: E ðy - f x Þ2 = Varðf x Þ þ ½Biasðf x Þ]2 þ VarðεÞ: The variance of error Var(ε) is a characteristic of the true “world” and cannot be reduced by any model. Therefore, the goal of optimization is to simultaneously reduce Var( f x ) and Bias( f x ) of the predictions. Bias is the difference between the true model and the learned model. For example, if a simple linear regression model is used to approximate a complex nonlinear relationship, the bias will be large. In contrast, variance is the amount by which f x varies when using different data sets. A high variance would result in large changes in the parameters of the model depending on the data set. A tradeoff exists between bias and variance (Bishop & Nasrabadi, 2006; Yarkoni & Westfall, 2017). In general, bias can be reduced by using complex models. For example, suppose we use a polynomial regression model to improve the goodness of fit for a curve. Compared to that of a first-order polynomial regression (linear regression) model, the goodness of fit should increase monotonically as the number of variables is increased to the second and third orders. However, overly complex models have large variances because they are overfitted to a particular data set. In other words, a small change in the data set causes a large fluctuation in the estimates. Conversely, a simple model is less affected by differences in data sets, but it cannot adequately fit the data at hand, resulting in a large bias. Therefore, a model with a moderate complexity is preferred to avoid overfitting and reduce the variance while still providing a certain degree of goodness of fit and lowering the bias. Traditional social science is not concerned with the predictive performance of models and has preferred models with low bias [although multilevel analysis, which is heavily used in sociology, can actually be interpreted as an attempt to improve model performance by introducing bias (Gelman & Hill, 2006)] (Yarkoni & Westfall, 2017). In traditional methods, approaches such as splitting the data into samples are almost never used, and practically only training data are used, aiming solely to increase the goodness of fit of the model or to test the significance of a particular regression coefficient. Improving only the fit to the training data in this way may lower bias but cause overfitting, resulting in poor predictive performance for unseen data (or the test data). In such procedures, there is no explicit step of evaluating the performance of multiple models and making a choice. Some may note that even in sociology, there are cases where performance is compared among several different models and a selection is made. However, analytical methods traditionally used for model selection, such as likelihood ratio tests, AIC, and BIC, have various limitations and restrictions and may not sufficiently prevent overfitting. For example, in regard to AIC, as the data size increases, the penalty is not sufficient, and complex models are almost always preferred (Grimmer et al., 2021). In contrast, the main difference is that the model selection procedures with sample splitting and

3

Methodological Contributions of Computational Social Science to Sociology

29

out-of-sample testing used in machine learning are empirical, i.e., data-based and general-purpose selection procedures that can be applied to any model. This method enables the evaluation of the predictive performance of any supervised machine learning model.

3.2.3.3

Three-Way Sample Splitting and K-Fold Cross-Validation

Two points should be noted here. First, the training and testing phases must be strictly separated. For example, if an out-of-sample test is conducted and the prediction performance is low, the model is modified, and the out-of-sample test is conducted again with the same test data, the model will be overfitted for the test data. Rather than this, a portion of the training data should be spared for validation to verify the prediction performance. The model’s hyperparameters are adjusted using these data; then, the model trained using the training and validation data is tested on the test data. In this case, the data are divided into training, validation, and test data (Hastie et al., 2009). Second, while the division into training and test data is a means of avoiding overfitting, it also carries the risk of underfitting since saving data for testing reduces the size of the data for training. An alternative way to maximize the use of the data at hand is k-fold cross-validation (Hastie et al., 2009; Yarkoni & Westfall, 2017). It is common to use five or tenfold, although this number can vary depending on the complexity of the model. In this method, the data are randomly split into training and test data k times. Then, k out-of-sample tests are performed, and the predictive performance of the model is evaluated based on the average of the k out-of-sample tests. In this method, all data are used as training data, which is more efficient than a one-time split. However, it also has the problem that the computation time is k times longer.

3.2.3.4

Regularization

While sample splitting is a means of detecting overfitting, it does not tell us what kind of model to build to prevent overfitting itself. Regularization is a tool for building models to avoid overfitting (Hastie et al., 2009; Yarkoni & Westfall, 2017). Overfitting is more likely to occur when a model is complex. Regularization is therefore a means to improve the predictive performance of a model by introducing constraints on model complexity. The most commonly used approach is the idea of penalizing for complexity. Specifically, a penalty term for model complexity is introduced into the function that the model should optimize, in addition to loss minimization. For example, in the case of the Lasso regression model (Tibshirani, 1996, 2011), the training is set up to minimize the squared error plus a penalty term proportional to the sum of the absolute values of all regression coefficients. Generally, the more complex a model is, i.e., the more variables are used, the greater the extent to which the squared error can be reduced; however, the more complex a

30

H. Takikawa and S. Fujihara

model is, the larger the sum of the absolute values of the regression coefficients becomes. In other words, there is a tradeoff between minimizing the squared error and minimizing the sum of the absolute values of the regression coefficients. The objective of the Lasso regression model is to achieve an optimal balance between the two. From the perspective of the bias–variance tradeoff, the model can be positioned as an attempt to improve predictive performance by introducing a bias called a penalty term. The sum of the absolute values of the penalty terms used in Lasso regression is called the L1 norm. In contrast, the ridge regression model uses the square root of the sum of the squares of the regression coefficients, or the Euclidean norm, as the penalty term. This is also called the L2 norm. In general, the lasso regression model has an incentive to set some regression coefficients to zero and has a strong regularizing effect. Penalized regression is effective when the number of predictors (explanatory variables) is large relative to the sample size. However, the coefficient of the penalty term (hyperparameter), which determines how the penalty works, must be adjusted appropriately, and for this purpose, the sample should be divided into three parts. The concept of regularization itself is more general and is not limited to the addition of penalty terms in regression models. There are various other regularization devices in supervised machine learning, such as early stops, which stop the learning process early in artificial neural networks and deep learning, and dropouts, which remove certain nodes in the learning process (Goodfellow et al., 2016).

3.2.4

Measurement, Discovery, and Unsupervised Machine Learning

Unsupervised machine learning aims to discover the latent structure of data x without the correct answer label y. In other words, it involves reducing high-dimensional data x to low-dimensional, interpretable quantities or categories. Unsupervised machine learning includes cluster analysis such as k-means and hierarchical clustering, principal component analysis, and latent class analysis. In this sense, it is a technique that has been used relatively often in the traditional social sciences. If we reposition it in the context of machine learning, the following two points are important. First, the importance of ensuring the validity of classifications and patterns discovered by unsupervised machine learning is emphasized. Although it is difficult to assess the validity of unsupervised machine learning models, various methods have been proposed, including human verification (Grimmer et al., 2022). Second, applications to extremely high-dimensional data, such as text data, have been advanced. In the past, survey data were not high-dimensional, but with ultrahighdimensional data, such as text data, which contain tens or hundreds of thousands of variables (e.g., vocabulary appearing in a corpus), the discovery of patterns by

3

Methodological Contributions of Computational Social Science to Sociology

31

unsupervised machine learning is particularly beneficial. Semisupervised learning, a method that combines supervised and unsupervised learning, is also used. Topic models and word embedding models as unsupervised machine learning in text analysis are discussed in detail in Chap. 5.

3.3

Potential of Machine Learning

Machine learning has great potential to transform the traditional way of conducting social science research on various social science issues (Grimmer et al., 2021). Here, we summarize the potential of machine learning in three key areas.

3.3.1

Breaking Away from the Deductive Model

Traditional social science has taken the deductive model approach as its scientific methodology (Grimmer et al., 2021; cf. McFarland et al., 2016), which, according to Grimmer et al., is a suitable method for efficiently testing theories based on scarce data. Deductive models require a carefully crafted theory prior to observing data. Assuming the existence of such a theory, a hypothesis that is derived from the theory and can be tested by the data is then established. Data are then obtained, and the hypothesis is tested only once using the data. Deductive models are highly compatible with data modeling because they assume a theoretical model of how the data were generated in advance. However, such a deductive model has various drawbacks. First, deductive models often use data modeling approaches that are compatible with hypothesis testing frameworks, especially linear regression models. This leads to the exclusion of various other, more realistic, possible modeling possibilities. In sociology, this inflexibility of linear regression models has traditionally been the cause of the divide between sociological theory and empirical analysis (Abbott, 1988). Second, the deductive model assumes that the theory is available before the data are observed and thus cannot address issues such as how to create and elaborate concepts from data and how to discover new hypotheses from data (Grimmer et al., 2021). Compared to qualitative research in sociology (Glaser & Strauss, 1967; Tavory & Timmermans, 2014), traditional quantitative research does not explicitly incorporate a theory building phase. Third, although the deductive model also serves as a normative scientific methodology, it is difficult to rigorously follow such procedures in actual social science practice. As a result, social science hypotheses face major problems in terms of replicability and generalizability. In practice, in the social sciences, it is inevitable to reformulate theories and explore hypotheses by analyzing data. However, presenting

32

H. Takikawa and S. Fujihara

hypotheses thus obtained in the framework of a deductive model is an inappropriate practice, such as p-hacking (Simmons et al., 2011) or HARKing (Kerr, 1998). In contrast, machine learning is constructed with a different conception than deductive models, which provides more flexible modeling possibilities (Molina & Garip, 2019). First, because machine learning does not follow a hypothesis testing framework, not only simple linear models but also complex, nonlinear models with higher-order interactions are acceptable, as long as they have good predictive performance. In addition, machine learning algorithmic models can be interpreted as agnostic models (Grimmer et al., 2021). The agnostic approach is the idea that instead of assuming that there is a correct model that reflects the real-world process, one should choose the best model for the problem. This approach allows for flexible model selection depending on the problem. Furthermore, machine learning does not take a hypothetico-deductive approach, which is conducive to heuristic and exploratory research (Grimmer et al., 2021; Molina & Garip, 2019). Unsupervised machine learning can be very useful for discovering latent patterns in data. Supervised machine learning frameworks can also be used effectively for concept formation and theory building. This approach can also be used to uncover a variety of heterogeneities through complex modeling, specifically by allowing higher-order interactions (Grimmer et al., 2017; Molina & Garip, 2019). In particular, the identification of heterogeneity in causal effects is extremely useful for sociology, which seeks to elucidate causal mechanisms (Athey & Imbens, 2016; Brand et al., 2021). Finally, machine learning does not take the idea of deductive models, which might not be appropriate for a certain type of social science practice and thus can break free from inappropriate practices such as p-hacking. This is closely related to the second point discussed below.

3.3.2

The Machine Learning Paradigm as a Normative Epistemic Model

Watts (2014) suggests that machine learning should be the normative model for social science methodology in place of the traditional hypothetico-deductive model. Specifically, he argues that successful prediction should be the standard by which social science is evaluated. Many, if not most, sociologists agree that elucidating the causal mechanisms of social phenomena is the ultimate goal of sociology. However, the normative epistemic model for elucidating causal mechanisms has traditionally been assumed to be based on a hypothetico-deductive approach. In this model, the ideal is an experimental method conducted on the basis of testable hypotheses about causation. However, although the methods of computational social science have greatly expanded the range of application of large-scale digital experiments, the types of social phenomena that can be tested remain limited. Therefore, sociologists have

3

Methodological Contributions of Computational Social Science to Sociology

33

traditionally attempted to test causal hypotheses by applying basic regression models to nonexperimental observational data. However, this practice is now being strongly criticized in terms of statistical causal inference (Morgan & Winship, 2014). Watts argues that, in light of this situation, the criterion for evaluation should be the success or failure of predictions by out-of-sample testing rather than the testing of a priori hypotheses (Watts, 2014). As explained previously, the requirement for outof-sample testing is simple: the model’s predictions should be tested on data that are different from those used to form hypotheses and train the model and the model with the best predictive ability should be adopted. The strength of out-of-sample testing is that it allows for more flexible theory building than does hypothesis testing by making avoidance of overfitting a central policy. Out-of-sample tests do not require that a hypothesis be set prior to the study. Researchers can construct hypotheses, theories, and models a posteriori. In other words, they can complicate the model to make it fit the data at hand better and to reduce bias as long as the out-of-sample test can be passed. In contrast, hypothesis testing practice can be interpreted as an attempt to lower variance by adhering to a priori hypotheses and reducing flexibility (Yarkoni & Westfall, 2017). Out-ofsample testing can also be positioned as an attempt to address the bias–variance tradeoff by striking a balance between such tentative tests and posterior theory building to allow for appropriate model selection.

3.3.3

Machine Learning Model as a Human Decision and Cognitive Model

The third point is related to a view different from the previous two. Machine learning and artificial intelligence provide algorithms to discover and recognize patterns from data. Therefore, in a sense, machine learning can be said to simulate human judgment and cognition. The most obvious application of this aspect is automatic coding (Nelson et al., 2021). Sociology requires conceptualizing, classifying, and coding a variety of data and underlying events. When the number of data and events is small, it is possible for humans to classify and code them all manually. However, when the amount of data is large, this is impossible. Therefore, it is very beneficial for machines to recognize, classify, and code patterns in the data instead of humans. However, the simulation of judgment and cognition by machines and the classification and prediction based on that judgment and cognition have implications for sociology that go beyond mere practical purposes. Sociology is the study of human action, and it is essential to model how people judge, perceive, and categorize situations to understand the mechanisms of action (Schutz & Luckmann, 1973). In contrast, traditional analytical tools do not directly model people’s judgments and cognition but rather model the consequences of actions and the aggregate distribution of outcomes. Therefore, the connection to sociological theory is only indirect.

34

H. Takikawa and S. Fujihara

Conversely, machine learning is more than just an analytical tool; it can relate directly to sociological theory by modeling people’s judgments and cognitions (cf. Foster, 2018). For example, the models of natural language processing discussed in detail in Chap. 5 can be used not merely for the practical purpose of automatically summarizing and labeling textual content but also to formalize the way humans process textual meaning. This will be discussed thematically in Chap. 5. Furthermore, comparing machine learning-based classification and pattern precipitation with human classification and pattern recognition may reveal features and biases of human judgment (Kleinberg et al., 2015). This identification of features and biases in comparison to machine judgments may lead to further theorizing of human judgments, cognition, and decision making. Alternatively, machine learning can be used to address the possibility that people are reading and acting on certain signals from certain situations. For example, machine learning fictitious prediction methods (Grimmer et al., 2021) are useful in considering the question of whether people can read the social class or socioeconomic status of tweeters from their daily tweets (Mizuno & Takikawa, 2022). That is, information related to the socioeconomic status of tweeters is collected in advance through surveys and other means, and the question of whether this information can be predicted from tweets alone is examined. If machines can successfully predict socioeconomic attributes, it is highly likely that people are reading such signals in their daily interactions, and if advanced machine learning models cannot predict such signals at all, they may not exist for humans either, or if they do, they may be very weak. Thus, machine learning models can be used to examine how people understand and behave in terms of the actions (tweets) of others. This is an analytical method that has intrinsic relevance to theory.

3.3.4

Challenges for Machine Learning

We have discussed the potential of machine learning compared to traditional quantitative methods. One of the biggest challenges in applying machine learning to the social sciences is to ensure interpretability (Hofman et al., 2021; Molina & Garip, 2019; Yarkoni & Westfall, 2017). In traditional quantitative methods, simple linear models are typically used, and the magnitude of the coefficients of interest is examined when elucidating the “black boxes” that produce social phenomena. As we have already noted, this approach has sometimes been inappropriately practiced in the traditional social sciences and has led to various problems. In contrast, machine learning selects and evaluates models in terms of their overall predictive performance, not their individual coefficients. This allows for the construction of models with a more realistic degree of complexity. Model realism should also be an important prerequisite for elucidating the causal mechanisms of social phenomena. However, it still seems essential to clarify how the individual parts of the mechanism work to open the black box of social phenomena. In other words, being able to interpret what factors and characteristics contribute to the predictive performance of a model is also necessary to unravel the mechanism (Breiman, 2001). This issue of

3

Methodological Contributions of Computational Social Science to Sociology

35

interpretability is crucial to the task of applying machine learning to social science theory building. Closely related to interpretability is the problem of causal inference (Morgan & Winship, 2014). Causal inference focuses on the effect of a particular treatment. This is seemingly the opposite of machine learning, which focuses on the overall performance of a model. However, the application of machine learning to the effective implementation of causal inference is currently the most active area of research. For example, machine learning is effective in estimating propensity scores, which are often used to implement causal inference with observational data (Westreich et al., 2010). In addition, machine learning paradigms fit better than deductive models in tasks such as the exploratory discovery of heterogeneity in causal effects (Brand et al., 2021).

3.4 3.4.1

Conventional Methods of Statistical Analysis in Sociology Aims of Quantitative Sociological Studies

Sociology has utilized various statistical analysis methods to test sociological theories and hypotheses and to discover patterns and regularity in social phenomena. In this section, we introduce the conventional statistical analysis of sociological studies and the related problems. It is, however, difficult to cover all statistical methods in sociology. To contrast the conventional approach with the methods of machine learning, especially supervised learning, we narrow our focus to statistical methods that analyze data from social surveys using regression models. These methods are commonly used in sociology and other social sciences, such as economics and political science. Quantitative sociology often uses a variety of research methods, such as small local surveys, large national representative surveys, and panel surveys, to gather data about individuals and groups in societies. Although some sociological studies use experimental methods, the primary approach is to observe and gather data about individuals, groups, and societies through social surveys. Using these data, sociologists can quantify social phenomena, compare groups, estimate the size of the association between variables or the “effect” under a particular model, and make causal inferences. Through this process, sociologists aim to understand and interpret social phenomena and explain the underlying mechanisms that produce patterns and regularities in society. Data analysis can be divided into three main tasks: description, prediction, and causal inference (or causal prediction) (Berk, 2004; Hernán et al., 2019). To investigate patterns and regularities and test theories and hypotheses, sociologists often use generalized linear models, particularly linear regression models, to describe the association among variables, predict a variable of interest, and estimate the causal effect of a treatment on an outcome (Berk, 2004).

36

3.4.2

H. Takikawa and S. Fujihara

Description

Description involves measuring the central tendency of a social phenomenon and understanding associations between variables that indicate a social pattern and regularity. This can include simple calculations of mean and percentage over the entire sample or subsamples of distinct groups. However, regression analysis is sometimes conducted for the sake of description, especially for exploring associations between variables (Gelman et al., 2021) and sometimes their changes over time and across societies. Linear regression analysis for description constructs the bestfitting model under the assumption of linearity and interprets the estimated coefficients (Morgan & Winship, 2014). In linear regression analysis, a dependent variable ( y) of interest and several independent variables (x) believed to be related to it are included in the model (Elwert & Winship, 2010). The model is then improved through the addition or removal of variables or interaction terms, and the final model is chosen based on criteria such as R-squared and AIC. Sometimes independent variables are added or removed step by step, and comparisons of model fit among multiple models are conducted to show the importance of the variables and interpret why changes in coefficients have occurred. After the final (or preferred) model is chosen, the estimated coefficients (β), standard errors, confidence intervals, and p-values are presented. Because the coefficients are obtained under the strong but simple assumption of linearity, they are easy to interpret. The estimated coefficients of a linear regression model indicate the average change in y for a one-unit change in the independent variables, while the values of the other variables are held the same. This approach is interested in the association between the independent variables x and the dependent variable y or the “effect” estimated from the preferred model. Therefore, a regression model is a simple and effective tool for capturing social phenomena and regularity, especially in terms of linear associations. The family of regression models has been developed for analyses that incorporate the influence of higher-level characteristics, such as region and school, than individuals (multilevel modeling) and for analysis of categorical and limited dependent variables. However, such an interpretation makes sense only if the model correctly captures the relationship. It is also important to note that because the coefficients correspond to comparisons between individuals, not changes within individuals (Gelman et al., 2021), they do not represent causal effects but associations.

3.4.3

Prediction

The second type of sociological data analysis involves constructing a model for purely predicting y from x. Although similar to the first method of description above, the predictive task is more concerned with the dependent variable y or algorithms for predictions than with the independent variables x (Salganik et al., 2020). The focus is not on the estimation and interpretation of the coefficients of the independent

3

Methodological Contributions of Computational Social Science to Sociology

37

variables x but instead on the predictive accuracy of y under the model with these independent variables (Molina & Garip, 2019). For example, a model can be constructed to predict which individuals are more likely to drop out of high school or experience poverty. Machine learning, particularly supervised learning, has played a significant role in this type of prediction. However, the application of predictions to sociological studies has not been fully explored, which is one reason machine learning methods have not been fully exploited in sociological research compared to other social science studies and why machine learning methods have not made substantial contributions to sociology, especially studies using data from social surveys. However, predictive tasks are considered essential and have significant implications for sociological research. We introduce sociological studies using prediction and discuss its implications in Sect. 3.5.

3.4.4

Causal Inference

Regression models are also standard tools in causal inference. While randomized experiments are the gold-standard method for determining causal effects, causal inference in sociology usually has to rely on observational data, especially for ethical reasons. In causal inference, we typically establish a treatment variable (A) and an outcome variable of interest (Y ), use the potential outcome framework to derive causal estimands, and use a directed acyclic graph (DAG) to explain the data generation process and how confounding may occur (Morgan & Winship, 2014). Then, we specify confounding variables (L ) sufficient to identify the causal effects, use confounding variables as controls in the regression model, and estimate the coefficient of the treatment variable. Of course, there are many other methods for estimating causal effects in addition to conventional regression analysis (matching, inverse probability of treatment weighting, g-estimation, g-formula, etc.). For the selection of covariates to be used as control variables, it is recommended that (1) variables that affect one or both the treatment and the outcome are included, (2) those that can be considered instrumental variables are excluded, and (3) those that serve as proxies of unobserved variables that are common causes of the treatment and the outcome are entered into the model as covariates (VanderWeele, 2019). To avoid discrepancy between the estimated regression coefficients and the intended effect, we must consider what is a good control variable and what is a bad control variable (Cinelli et al., 2021). The selection of variables does not aim to increase the predictive or explanatory power of the model but to reduce or not amplify bias. In this respect, regression analysis for causal inference differs significantly from that for description and prediction. The determination of which variables are sufficient to identify a causal effect must be made by a human based on theory, prior research, domain knowledge, etc. Even if the data-generating process is clarified and confounding variables to identify the causal effect are obtained, linear regression analysis may not be appropriate for causal inference. This is because the

38

H. Takikawa and S. Fujihara

relationships among covariates, outcomes, and treatments may not be adequately modeled. Misspecification of the relationships (functional forms) among these variables may lead to biased results (Schuler & Rose, 2017). In cases where there are many variables or the relationships among variables are complex, machine learning methods can be used to build flexible models of the outcome and treatment. Clearly, theoretical considerations about covariates, treatments, and outcomes are necessary. Nevertheless, it is also essential to think in a data-driven manner about which covariates to include in the final analysis and what associations to assume among the covariates (Mooney et al., 2021). Although the purpose of prediction is different from that of causal estimation, predictive tasks are often an intermediate step in causal estimation (Naimi & Balzer, 2018). In addition to estimating propensity scores (Westreich et al., 2010), machine learning plays a significant role in g-computation, which uses predictions under interventions to identify causal effects (Le Borgne et al., 2021). Machine learning is used to both estimate flexible treatment and outcome models in causal inference. Recently, as a doubly robust method for estimating causal effects, targeted maximum likelihood estimation using the ensemble method (SuperLeaner), which combines predictions from several machine learning algorithms, has attracted considerable attention (Schuler & Rose, 2017; Van der Laan & Rose, 2011). Thus, statistical causal inference already incorporates machine learning methods rather than relying solely on conventional regression techniques. Further application of machine learning to causal inference methods will be discussed in Sect. 3.6.

3.4.5

Problems of the Conventional Regression Model

Regression analysis is widely used because it is a simple method of statistical analysis, and the results obtained are transparent (Kino et al., 2021) and (at first glance) easy to interpret. In addition, since regression is covered in textbooks on statistics and quantitative methods in all fields, even researchers who do not specialize in quantitative research know how to read and interpret the results of regression analysis. The interpretability of the estimates is essential in communicating with researchers who are not familiar with statistical methods (Kino et al., 2021). However, regression analysis in social science research has been subject to substantial criticism (Abbott, 1988; Achen, 2005; Morgan & Winship, 2014). Although regression analysis is undoubtedly a useful tool for a variety of research purposes, including description, prediction, and causal inference, the objectives of a research study, theory, hypothesis, and research question tend to be defined within the statistical model, such as estimating and showing significant regression coefficients, which can lead to not adopting the best approach to the research question or to narrowing the scope of the study too much (Lundberg et al., 2021). Although regression analysis methods may be used to test theories and hypotheses, these methods are unlikely to lead to the construction of a new theory or theoretically meaningful target quantity.

3

Methodological Contributions of Computational Social Science to Sociology

39

Regression analysis is used in many sociological studies to examine the association between x and y, controlling for other variables. In descriptive regression analysis with many independent variables, the interpretation of each coefficient (association or effect) can be difficult in practice because a distinction between treatment variables, confounding variables, mediating variables, and collider variables may not be made (Acharya et al., 2016; Keele et al., 2020; Westreich & Greenland, 2013). Inadequate use of control variables can also create biases or magnify existing biases (Cinelli et al., 2021). These coefficients do not tell us how y changes when x changes from one value to another (Morgan & Winship, 2014); they do not tell us information about the causal effect and the effect of the intervention. Regression analysis with a set of independent variables and covariates may be performed to estimate the causal effects of one or more variables, but usually, this does not yield causal effects. As mentioned earlier, causal inference with regression analysis requires different procedures and more specialized knowledge than does regression analysis for description and prediction (Hernán et al., 2019). Linear regression analysis has also been noted as a problematic method of description, data reduction, and smoothing (Berk, 2004). Linear assumptions fail to capture social reality, and more flexible models must be applied if reality is to be better described (Abbott, 1988). Additionally, interpretation of the estimated coefficients of a linear model may appear straightforward, but it is not meaningful if the model is not correctly specified both theoretically and empirically. Although the coefficient that is the best fit in the sample is estimated, it may not necessarily be the best fit out of the sample. Regression analysis is insufficient as an exploratory method because the procedure is not automated and requires a significant amount of researcher discretion, leading to a higher likelihood of p-hacking, low transparency, and poor reproducibility (Brand et al., 2021). This problem is also related to the fact that in quantitative sociological studies, exploratory methods are undervalued, and the exploratory approach has not been fully developed. This undervaluing of the heuristic approach has resulted in the underdevelopment of exploratory regression analysis with less researcher discretion (Molina & Garip, 2019, p. 39). Adopting a heuristic rather than confirmatory approach can also be useful in identifying heterogeneous causal effects. Therefore, an exploratory approach to identify heterogeneous causal effects using machine learning was proposed (Brand et al., 2021) and will be discussed in more detail later. As mentioned above, regression analysis and related methods have been widely used in sociological studies. However, in some cases, these conventional methods may not be effective in meeting the three main goals of data analysis: description, prediction, and causal inference (Berk, 2004; Hernán et al., 2019). To ensure the proper use of regression analysis, it is important to consider various factors. Machine learning, while primarily used for prediction, can also be useful for description and causal inference (Grimmer et al., 2021). It has the potential to address some of the limitations of traditional regression analysis for these purposes.

40

3.5

H. Takikawa and S. Fujihara

Machine Learning in Sociology

There are still few applications of machine learning in sociology (see also Molina & Garip, 2019 for applications in sociology). In this section, we review specific applications of machine learning by introducing the application of the prediction framework and automated coding in sociology. In the next section, we address a more advanced topic: the deployment potential of machine learning in relation to the elucidation of causal mechanisms.

3.5.1

Application of the Prediction Framework

As an example of an application of supervised machine learning in sociology, let us discuss life course prediction using panel data by Salganik et al. (2020). They used panel data from the Fragile Families and Child Wellbeing Study, which included several thousand U.S. families that had given birth to a child around the year 2000, with data from Waves 1–6 collected at the time of the study. They applied a common task framework commonly used in machine learning to challenge a large number of researchers with the following prediction task. The challenge was to predict various life outcomes measured in Wave 6 (e.g., child GPA and termination of primary caregiver) using information about children and families from Waves 1–5. Following a supervised machine learning framework, the Wave 6 data were split into training data and test data (holdout data). Researchers who applied to the challenge used the training data to train their proposed models, which were then evaluated on the test data. Despite a large number of participants (160 teams ultimately submitted models), including researchers, the challenge was not very accurate in predicting life outcomes. Although up to several thousand variables were available from Waves 1 to 5, even the best of the submitted models only marginally outperformed very simple linear and logistic regression models using only four variables in terms of predictive accuracy. This failure may have occurred because the survey data set lacked important variables that were originally needed for prediction or because social life is inherently too complex and unpredictable. What, then, is the significance for sociology of tackling the problem of predicting life outcomes? First, they seek its significance in identifying families and children at risk. This is a policy and practical significance that is easy to understand. Second, there may be new developments in analytical techniques through the prediction task, which has methodological significance. Finally, there are implications for sociological theory: according to Salganik et al., if predictability varies by social context, it stimulates the development of sociological theory to consider why this is so. We provide a more detailed explanation of the relationship between prediction and sociological theory by noting that the results of Salganik et al. show that prediction does not work uniformly well for all subjects but rather works reasonably

3

Methodological Contributions of Computational Social Science to Sociology

41

well for many subjects, with a small number of cases that cannot be predicted using any model. The overall tendency is that the predictions work reasonably well for many subjects, with a few cases that cannot be predicted using any model. This may stimulate the development of sociological theory by asking questions such as why some cases are not well predicted, what theory can explain the poor predictions, and what information not included in the survey should be focused on if an explanation is to be attempted [cf. This may stimulate theoretical development (cf. Garip, 2020)]. Additionally, although different from the present case, suppose that a complex model involving a variety of higher-order interactions is somewhat successful in making predictions. This model has good predictive performance but does not have the easily interpretable structure of an uncomplicated regression model. Therefore, the sociologist must reinterpret and understand the structure of the model to understand why this complex model can predict social reality to some extent. This will require the development of new sociological theories, which opens the possibility of developing sociological theory in a different way through the task of prediction by machine learning. In other words, sociological theory can be constructed in a way that improves the interpretability of models through moderately complex models that capture predictable aspects of the real world, rather than the overly complex, noisefilled real world itself. This is also an idea that leads to the scientific regret minimization method, which will be introduced later.

3.5.2

Automatic Coding

Another application of machine learning in sociology is automatic coding. In the digital society, the share of data that directly record actions, the so-called found data, such as digital trace data and archived text data, is increasing (Salganik, 2018). Such data pose new challenges for sociological research. A particularly large problem is what Salganik calls the incompleteness problem (Salganik, 2018), which can be distinguished from the missing variable problem and the construct validity problem. In the case of structured data, such as surveys, researchers can prepare questions that allow them to measure the concepts (variables) they are interested in. Ideally, the items are also operationalized in advance to adequately measure the constructs. However, in the case of found data that are not generated by the researcher, the researcher may not have the data to operationalize the concepts of interest to him or her, or if he or she does, he or she may still face challenges in operationalizing them appropriately. One way to address the issue of missing variables is to combine found and survey data. The following is an introduction to a procedure that Salganik refers to as amplified asking (Salganik, 2018). As a research case study, we focus on the study by Blumenstock et al. (2015). Their interest was to determine the distribution of wealth in Rwanda. They had log data from cell phones used by the majority of Rwandans, but they did not explicitly include information about income or wealth. Therefore, they conducted a survey asking a subset of cell phone users about their

42

H. Takikawa and S. Fujihara

income and wealth. By doing so, they obtained data (X, y) where X is the cell phone log data and y is the income or wealth level. This can be regarded as supervised data with label y for X. By training the model with this labeled data (X, y) and applying it to the remaining unlabeled “labeled” cell phone log data, they assigned income and wealth information to all data. Of course, the accuracy of the model depends on how it is constructed since the information is extrapolated from the cell phone log data, except for the information actually asked in the survey. Conversely, the problem of construct validity (Lazer, 2015) arises when linking the given found data to the construct y. For example, if one compares the theoretical construct of intelligence as measured by an established test, the Raven progressive matrices test, or by the criterion of writing long sentences on Twitter, the former could be considered much more valid (Salganik, 2018). While these issues always arise with survey data, found data are not created for research, so how to conceptualize the data becomes an even greater issue. The problem of how to create sociologically meaningful concepts from found data, which is not designed for research, is not unique to digital trace data. Traditional methods of content analysis using newspapers, books, and magazines address the same problem (Krippendorff, 2004). Traditionally, in these fields, researchers interpret texts (and sometimes photos and videos) and assign codes that represent sociological concepts. Although the problem of construct validity remains with such methods, it is highly likely that a certain degree of validity can be ensured by coding based on flexible human interpretation. However, this method is extremely expensive and has limited scalability. Therefore, the idea of replacing some or all manual coding with automatic coding by machines has emerged based on today’s large-scale digital trace data and found data. The question then arises as to how machineautomated coding can satisfy construct validity. One method of machine coding is partial automatic coding by supervised machine learning. The main framework is the same as in amplified asking. We have unlabeled found data X. This can be a newspaper article (Nelson et al., 2021) or an image posted on a social networking site (Zhang & Pan, 2019). For a subset of this X, a sociological construct y is first manually labeled by the researcher. The model is trained and evaluated using the data set (X, y) created in this way. Once a model with sufficient performance is trained, it can be used to automatically assign codes to the remaining data sets. Constructs in sociology can be very complex and multidimensional, for example, populism, social capital, and inequality. Nelson et al. (2021) conducted experiments to test the validity of measuring complex concepts in sociology in an automatic coding framework. Specifically, they manually coded inequality and its subconcepts and related concepts in news articles that may contain the concept of inequality and then used them as yardsticks to examine the extent to which partially automatic coding by supervised machine learning matches manual coding. The results show that supervised machine learning is capable of coding with a good degree of validity, with the F1 score, which is the harmonic mean of precision and recall, exceeding the guideline of 70. More interestingly, however, is the possibility that examining the idiosyncrasies and biases of machine coding may also lead to a rethinking of the

3

Methodological Contributions of Computational Social Science to Sociology

43

human manual coding framework. Nelson et al. note that it is important to consider the extent to which theoretically interesting categories can be categorized in terms of precision and recall and to choose a coding framework based on this. To extend this point further, it may be necessary to review the coding rules themselves so that machines can construct theoretically interesting categories that are easier to classify. As noted earlier, machine learning models are also formalized models of human cognition and judgment, so the fact that they have a reflexive relationship to human coding is particularly important when applied to complex, “socially constructed” concepts handled by sociology. Nelson et al. (2021) add to this by examining the possibility of fully automating coding through unsupervised machine learning. For example, unsupervised machine learning, such as topic models, can automatically identify potential topics that a given data X addresses. By examining the extent to which such automatic assignment of topics matches the categories assigned by humans, we can examine the possibility of automatic coding via unsupervised machine learning. Conclusively, it is difficult for unsupervised machine learning to reproduce a classification that corresponds exactly to a human predefined concept. Therefore, it would be a mistake to assume that automatic coding by unsupervised machine learning can completely replace manual coding by humans. Rather, the potential of unsupervised machine learning coding lies in its ability to discover new classifications. While machines cannot perform coding as flexibly as humans can, they are free from the biases and narrowness of vision inherent in humans and may discover latent patterns and propose new classifications that humans were previously unaware of. Of course, the usefulness of such classifications must be determined by humans from the standpoint of sociological theory. Adding machine “interpretations” to human interpretations in this way may enable concept formation that is also beneficial in advancing sociological theory.

3.6

Toward Further Applications in Sociology

As we have seen in the previous section, there are various possibilities for the application of machine learning to sociology. In this section, we continue to examine how applications of machine learning can contribute to the development of sociological theory, and in particular, we introduce two methods in relation to the goal of sociology, which is to elucidate causal mechanisms.

3.6.1

Heterogeneity of Causal Effects

Like other social sciences such as economics and political science, sociology has paid great attention to the causal factors that cause social phenomena. Moreover, the main theoretical goal of sociology is to elucidate the causal mechanisms of social

44

H. Takikawa and S. Fujihara

phenomena. When we speak of mechanisms, we mean not only the mere connections between causes and effects but also the elucidation of mechanisms at a deeper level that link causes and effects (Hedström & Ylikoski, 2010; Machamer et al., 2000; Morgan & Winship, 2014). Well known in sociology is the micro–macro mechanism elucidation research program formulated by Coleman (1990). This research program aims to elucidate the mechanisms for macro collective social phenomena from a more micro, action level. How can machine learning be used to elucidate such causal mechanisms? The identification of causal effects itself requires the use of a statistical causal inference framework, which is largely outside the current scope of machine learning. However, identifying the heterogeneity of causal effects is an important step toward a better understanding of causal mechanisms (Salganik, 2018). Heterogeneity of causal effects refers to the fact that causal effects vary by situation, context, and attributes of the intervention target. Machine learning is extremely powerful in the search for such heterogeneity. When searching for effect heterogeneity using the traditional hypothesis testing approach, one is confronted with a variety of problems. First, prior theoretical preconceptions and conventions dictate at what level effect heterogeneity is likely to exist. For example, gender, age, and socioeconomic status are preferred variables in sociology. This in itself is not necessarily a bad thing, but there is a risk that the scope of inquiry of sociological theory is narrowed beforehand (it is not limited to gender, age, and socioeconomic status). In addition, there may be a widespread practice of feeding various interaction terms into regression models and reporting the results of models that incorporate only those interaction terms that actually become significant, which amounts to clear p-hacking (Brand et al., 2021). Finally, heterogeneity is not always adequately captured by first-order interactions in regression models. It is quite possible that it is caused by second-order or higher interactions or even more complex nonlinear mechanisms (cf. Molina & Garip, 2019). However, it is difficult to consider such possibilities with existing quantitative methods (Brand et al., 2021). Athey and Imbens (2016) developed a combination of machine learning and causal inference called causal trees, which is a way to address these issues and is of great use to sociology. Causal trees apply the method of decision trees in machine learning to model the heterogeneity of causal effects in an exploratory manner and without overfitting. Decision trees are a method of constructing a tree for data with covariates and target variables by partitioning the covariates with the goal of predicting a certain target variable (Hastie et al., 2009). The tree consists of multiple nodes in a hierarchical structure. At the first node (sometimes called the root), data are split into two subnodes based on a threshold value for a given covariate. Splitting is performed in such a way that the values of the target variables are most similar within each node. Then, at each node, a further division based on the covariate threshold is made in a similar manner. This process is repeated to construct the final tree. The decision tree construction is highly transparent because the algorithm is relatively simple. It is also easy to interpret visually.

3

Methodological Contributions of Computational Social Science to Sociology

45

In a causal tree, the goal is to predict the treatment effect τ instead of the target variable. As in the usual decision tree, the tree is constructed in such a way that the heterogeneity within the nodes of τ is reduced. However, unlike the usual target variable, the treatment effect τ is a potential outcome and not an observable quantity. Specifically, Athey and Imbens (2016) developed a procedure called honest estimation. In “honest” estimation, the sample is split into data for partitioning the covariate space and data for estimating treatment effects within nodes. The partitioning of the nodes is set up in such a way that the heterogeneity of the treatment effects is captured as much as possible, while the uncertainty in the treatment effect estimation is minimized. In general, the finer the split, the greater the heterogeneity between nodes, while the estimation of the treatment effect within a node becomes more uncertain. In other words, there is a tradeoff between the two goals, and the goal is to make the partition in such a way that they are just balanced. An example of an application of causal trees in sociology is the work of Brand and colleagues (Brand et al., 2021). They use NLSY panel data to examine the extent to which a college degree is effective in reducing the time spent in low-wage jobs. Analysis with causal trees allows them to find not only the average causal effect of a college degree in reducing time in a low-wage job but also heterogeneous causal effects and the extent to which these effects vary across people with particular attributes. Moreover, using the tree enables them to examine not only linear effects but also complex interactions of various factors. The results of their analysis indicate unexpected heterogeneity due to such complex interactions. The effect of a college degree on the reduction of low-wage work was particularly large for those whose mothers were less educated, grew up in large families, and had less social control. Such unexpected findings provide an opportunity to further explore the causal mechanisms and lead to further development of sociological theory. Mediation analysis is another method for mechanism exploration (VanderWeel, 2015). Currently, the causal mediation analysis method is constructed from the perspective of causal inference. The quantities of interest (estimand) include direct intervening to set the mediator variable M to m as well as the treatment A to a, or the direct and indirect effects of setting that the mediator variable M to a natural value (Ma) after the treatment A is set to a. Conditions for the identification of these various direct and indirect effects have been examined, and several methods of estimation have been developed. Machine learning methods are useful in causal mediation analysis, just as they are useful in causal inference. In addition, to understand the effects of treatment variables that change over time, it is necessary to think carefully about the estimand, identification, and estimation. Machine learning methods can also be used for estimation in this context (Lendle et al., 2017; van der Laan and Rose, 2018).

46

3.6.2

H. Takikawa and S. Fujihara

Scientific Regret Minimization Method

The greatest problem with machine learning’s emphasis on prediction is that the theoretical interpretability of the results is limited. In other words, the internal mechanisms through which machine learning models produce good predictions are unknown. Nevertheless, the interpretability of a conventionally simple model does not mean that a simple model should be chosen at the expense of predictability (Yarkoni & Westfall, 2017). Rather, the fact that a model is predictable, even if it is a complex model, can in principle be considered to mean that there exists the possibility of theorizing in it and thus the possibility of making the model interpretable. Therefore, a methodology is needed to build an interpretable social science theory while preserving the predictive performance of machine learning models to the fullest extent possible. This can be positioned as a methodology for integrated modeling that aims to integrate predictive and explanatory capabilities (Hofman et al., 2021). For example, it is said that the coefficients of a linear regression model can be easily interpreted by combining the results into a single quantity, in contrast to machine learning models, which often lack a single interpretable quantity and can be difficult to understand. However, the average partial effect, which is a measure of the “effect” of a particular variable on the outcome of interest, can be obtained using machine learning predictions and interpreted in a similar way to coefficients in traditional regression analysis. If we want to know how much a partial change in one variable x will change y on average, holding other variables constant, we can compute them directly from the predictions. For example, to find the average partial effect, we can take the differences between the predicted value of y for two different values of x (x and x + Δ) and then divide that difference by Δ and average them (Lundberg et al., 2021). If we clearly define the target quantity we wish to obtain, it can be calculated from the predictions. If we prioritize ease of interpretation, we can choose a quantity that is easily understood. Defining a clear and easily interpretable target quantity can help to address many of the challenges associated with interpreting the results of machine learning. When a target quantity is well defined and meaningful from a theoretical perspective, it can be easier to understand and draw meaningful conclusions from the prediction by machine learning. In addition, the “scientific regret minimization” proposed by Agrawal et al. is considered a promising methodology (Agrawal et al., 2020; Hofman et al., 2021). This method seeks to improve social science models by preparing large-scale data and using machine learning methods to focus only on variances that can be explained in principle. Variances that can be explained in principle are those that could have been explained by a better model. In contrast, the inability to explain inherent noise is not a problem; rather, changing the model to accommodate the noise will lead to overfitting and loss of generalization performance. Specifically, the following steps should be taken: 1. Train a theoretically unconstrained machine learning model (black-box model) on a large data set to identify explainable variances in the data set.

3

Methodological Contributions of Computational Social Science to Sociology

47

2. Fit a simple, interpretable psychological model to the same data set. 3. Compare the black-box model with the simple model and improve the simple model. 4. If the predictions of both models are consistent, we have obtained a model that maximizes predictive and explanatory power simultaneously. 5. Validate the model obtained from the above exploratory analysis with new independent data. This method can be understood as a method of sequentially improving the model by alternating between the data and the model. The conventional method also focuses on the divergence between the model’s predictions and the data, and the process is to improve the model in the direction of closing the divergence. The difference between the traditional residual analysis and the scientific regret minimization method is that the former compares data to a social science model, while the latter compares a black-box model to a social science model. Let us denote the true function as f(x), the machine learning model as f ðxÞ, and the social science model as g(x). The goal of social science is to make the social science model as close to the true function as possible, that is, to minimize the difference f(x) - g(x) (the true residual) between the two. Nevertheless, since we cannot know the true model, we cannot know the true residuals either. Therefore, in conventional residual analysis, the model is successively modified to minimize the residual y - g(x) (“raw residual”) between the data and the social science model. In contrast, the scientific regret minimization method focuses on the residual f ðxÞ gðxÞ (“smoothed residual”) between the machine learning model f ðxÞ and the social science model g(x), rather than data y itself. The reason for this is that the larger the data, the more likely it is that the smoothed residuals reflect the true residuals rather than the raw residuals [see the current paper of Agrawal et al. (2020) for proof]. Conversely, attempting to reduce the raw residuals would result in overfitting the model to the noise, which would lead to overfitting. What specific theories could the scientific regret minimization method produce? Agrawal et al. apply this method to a large data set of moral machine experiments (Awad et al., 2018) to propose a more detailed and interpretable moral theory than previously possible. In another study (Peterson et al., 2021), this approach is applied to the domain of risky decision making to derive a modified model for expected utility theory and prospect theory (Kahneman & Tversky, 1979). In sociology, with “scientific regret minimization,” or more generally, with integrated modeling that aims to integrate predictive and explanatory capabilities, it should be possible to perform interpretable theory discovery and theory building to advance the development of sociological theory.

48

3.7

H. Takikawa and S. Fujihara

Conclusion

Today, the data environment surrounding sociology is changing drastically. Accordingly, it is necessary for sociological methods to incorporate new methods, in addition to traditional methods, in response to changes in the data environment. Although machine learning differs greatly from traditional sociological methods in culture, basic ideas, and logic, it has great potential for the development of sociological theory. In particular, machine learning has the potential to contribute to sociological theory in three ways. First, it can break away from the traditional deductive model and incorporate more flexible and heuristic ideas. Second, the machine learning cognitive paradigm offers new cognitive norms that differ from the traditional sociological norms embodied in hypothesis testing. This improves the status quo in terms of replicability and generalizability. Third, machine learning models can provide models of human cognition and judgment. Existing statistical methods in sociology have been concerned with description, causal inference, and prediction, but machine learning methods can approach these issues better or from new angles. In this chapter, we demonstrate the effectiveness of using machine learning methods for prediction, coding, and causal inference with real research examples. Nevertheless, there is a major challenge in using machine learning for the development of sociological theory: the problem of interpretability. Conventional machine learning models do not necessarily focus on interpretability, but by using ideas such as scientific regret minimization, the possibility is now open to build more interpretable models that can be used with sociological theory. This is a valuable chance to shift focus from interpreting the coefficients estimated from a model to constructing various theoretical quantities based on predictions, thereby broadening our perspective on analyzing data. We conclude that sociology could achieve healthier development by incorporating machine learning methods, in addition to traditional statistical methods, into its toolbox.

References Abbott, A. (1988). Transcending general linear reality. Sociological Theory, 6(2), 169–186. https:// doi.org/10.2307/202114 Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining causal findings without bias: Detecting and assessing direct effects. American Political Science Review, 110(03), 512–529. https://doi. org/10.1017/S0003055416000216 Achen, C. H. (2005). Let’s put garbage-can regressions and garbage-can probits where they belong. Conflict Management and Peace Science, 22(4), 327–339. https://doi.org/10.1080/ 07388940500339167 Agrawal, M., Peterson, J. C., & Griffiths, T. L. (2020). Scaling up psychology via scientific regret minimization. Proceedings of the National Academy of Sciences, 117(16), 8825–8835. Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired magazine, 16(7), 16–07.

3

Methodological Contributions of Computational Social Science to Sociology

49

Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial intelligence: An agenda (pp. 507–547). University of Chicago Press. Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113, 7353–7360. Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan, I. (2018). The moral machine experiment. Nature, 563(7729), 59–64. Berk, R. A. (2004). Regression analysis: A constructive critique. Sage. Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning. Springer. Blumenstock, J. E., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420 Brand, J. E., Xu, J., Koch, B., & Geraldo, P. (2021). Uncovering sociological effect heterogeneity using tree-based machine learning. Sociological Methodology, 51(2), 189–223. Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. Cinelli, C., Forney, A., & Pearl, J. (2021). A crash course in good and bad controls. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3689437 Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. Elwert, F., & Winship, C. (2010). Effect heterogeneity and bias in main-effects- only regression models. In R. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 327–336). Joseph Y. Halpern. Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of culture. Poetics, 68, 144–154. Garip, F. (2020). What failure to predict life outcomes can teach us. Proceedings of the National Academy of Sciences, 117(15), 8234–8235. Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University press. Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Aldine De Gruyter. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis, 25(4), 413–434. https://doi.org/10.1017/pan.2017.15 Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395–419. Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences. Annual Review of Sociology, 36(1), 49–67. https://doi.org/10.1146/annurev.soc.012809.102632 Hernán, M. A., Hsu, J., & Healy, B. (2019). A second chance to get causal inference right: A classification of data science tasks. Chance, 32(1), 42–49. https://doi.org/10.1080/09332480. 2019.1579578

50

H. Takikawa and S. Fujihara

Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M. J., Vazire, S., & Vespignani, A. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181–188. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–292. Keele, L., Stevenson, R. T., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1–13. https://doi.org/10. 1017/psrm.2019.31 Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. Kino, S., Hsu, Y.-T., Shiba, K., Chien, Y.-S., Mita, C., Kawachi, I., & Daoud, A. (2021). A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM Population Health, 15, 100836. https://doi.org/10.1016/j.ssmph.2021. 100836 Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. American Economic Review, 105(5), 491–495. Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage. Lazer, D. (2015). Issues of construct validity and reliability in massive, passive data collections. The City Papers: An Essay Collection from The Decent City Initiative. Le Borgne, F., Chatton, A., Léger, M., Lenain, R., & Foucher, Y. (2021). G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes. Scientific Reports, 11(1), 1435. Lendle, S. D., Schwab, J., Petersen, M. L., & van der Laan, M. J. (2017). ltmle: An R package implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statistical Software, 81(1), 1–21. https://doi.org/10.18637/jss.v081.i01 Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What is your estimand? Defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565. Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about Mechanisms. Philosophy of Science, 67(1), 1–25. https://doi.org/10.1086/392759 McFarland, D. A., Lewis, K., & Goldberg, A. (2016). Sociology in the era of big data: The ascent of forensic social science. The American Sociologist, 47(1), 12–35. Mizuno, M., & Takikawa, H. (2022). Computational social science on the structure of communication between consumers (Yoshida foundation report). Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45, 27–45. Mooney, S. J., Keil, A. P., & Westreich, D. J. (2021). Thirteen questions about using machine learning in causal research (you won’t believe the answer to number 10!). American Journal of Epidemiology, 190(8), 1476–1482. https://doi.org/10.1093/aje/kwab047 Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference: Methods and principles for social research (2nd ed.). Cambridge University Press. Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106. Naimi, A. I., & Balzer, L. B. (2018). Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33(5), 459–464. https://doi.org/10.1007/s10654-0180390-z Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research, 50(1), 202–237.

3

Methodological Contributions of Computational Social Science to Sociology

51

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using largescale experiments and machine learning to discover theories of human decision-making. Science, 372(6547), 1209–1214. Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press. Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A., Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., & Datta, D. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences, 117(15), 8398–8403. Schuler, M. S., & Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in observational studies. American Journal of Epidemiology, 185(1), 65–73. https://doi.org/10. 1093/aje/kww165 Schutz, A., & Luckmann, T. (1973). The structures of the life world. Northwestern University Press. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632 Tavory, I., & Timmermans, S. (2014). Abductive analysis: Theorizing qualitative research. University of Chicago Press. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 273–282. https://doi.org/ 10.1111/j.1467-9868.2011.00771.x Van der Laan, M. J., & Rose, S. (2011). Targeted learning. Springer. van der Laan, M. J., & Rose, S. (2018). Targeted learning in data science: Causal inference for complex longitudinal studies. Springer International Publishing. https://doi.org/10.1007/978-3319-65304-4 VanderWeele, T. J. (2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press. VanderWeele, T. J. (2019). Principles of confounder selection. European Journal of Epidemiology, 34(3), 211–219. https://doi.org/10.1007/s10654-019-00494-6 Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology, 120(2), 313–351. Westreich, D., & Greenland, S. (2013). The table 2 fallacy: Presenting and interpreting confounder and modifier coefficients. American Journal of Epidemiology, 177(4), 292–298. https://doi.org/ 10.1093/aje/kws412 Westreich, D., Lessler, J., & Funk, M. J. (2010). Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology, 63(8), 826–833. Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology. Perspectives on Psychological Science, 12(6), 1100–1122. Zhang, H., & Pan, J. (2019). Casm: A deep-learning approach for identifying collective action events with text and image data from social media. Sociological Methodology, 49(1), 1–57.

Chapter 4

Computational Social Science: A Complex Contagion Michael W. Macy

Computational social science is a multidisciplinary umbrella that includes a wide array of research practices enabled by advances in computation. These include computer simulation of social interaction on complex networks; collecting, processing, and analyzing digital trace data from online communities; data wrangling with massive numbers of incomplete and partially structured observations; machine learning; text analysis and natural language processing; geospatial data collection and analysis; and tracking global diffusion across online networks. The importance of social network analysis in computational social science created an opportunity for Sociology to become the disciplinary home of a game-changing field. This book addresses the discipline’s curious reluctance to embrace that opportunity and offers a compelling explanation: the need for a deeper theoretical grounding for the questions that drive the research agenda of computational social science. In a 2014 paper in the Annual Review of Sociology, Scott Golder and I pointed instead to methodological challenges and called for changes in graduate training across the social sciences. “A primary obstacle to online research by social scientists,” we argued, “is the need for advanced technical training to collect, store, manipulate, analyze, and validate massive quantities of semi-structured data, such as text generated by hundreds of millions of social media users. In addition, advanced programming skills are required to interact with specialized or custom hardware, to execute tasks in parallel on computing grids composed of hundreds of nodes that span the globe, and simply to ensure that very large amounts of data consume memory efficiently and are processed using algorithms that run in a reasonable amount of time. As a consequence, the first wave of studies of online behavior and interaction has been dominated by physical, computer, and information M. W. Macy (✉) Department of Sociology and Department of Information Science, Cornell University, Ithaca, NY, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_4

53

54

M. W. Macy

scientists who may lack the theoretical grounding necessary to know where to look, what questions to ask, or what the results may imply” (Golder & Macy, 2014, p. 144).

4.1

The First Wave

Although Scott and I referred to “the first wave,” the study of online interaction in global networks should more appropriately be termed “the second wave” of computational social science. The first wave began decades earlier and involved the use of computational models to explore the logical implications of a set of theoretical propositions. Most notably, the seminal work by Schelling (1971) and Axelrod (1984) examined the dynamics of residential segregation and the evolutionary origins of social order, questions that are central not only to Sociology but to Social Psychology and Political Science as well. In this chapter, I recount my personal involvement in computational social science over five decades, focusing on two themes: the technical obstacles that had to be overcome, and the foundational research questions that have motivated the field. My initial foray into what came to be known as computational social science dates back to my junior year in college and exemplifies the early fascination with abstract computational models. My mentor, Karl Deutsch, was one of the first social scientists to apply simulation, information theory, and system dynamics models to the study of war and peace. One afternoon Prof. Deutsch walked in to our weekly tutorial with a rumpled copy of The Prisoner’s Dilemma: A Study in Conflict and Cooperation by Rapaport and Chammah (1965). He handed me the book and told me to come back when I had finished reading it. The book focused on the dynamics of cooperation in iterated play and introduced computer simulation of stochastic learning models and human subject experiments to test model predictions. In the words of Martin Shubik (1970, p. 193), the book “adopts a completely different approach and starts to do precisely what I believe is necessary—to enlarge the framework of the analysis to include learning and sociopsychological factors, notably reciprocity.” Rapaport went on to win Robert Axelrod’s famous prisoner’s dilemma tournament by submitting the simplest of all strategies entered: “tit for tat.” Rapoport and Chammah’s experiments inspired me to see if I could use computer simulation to dig deeper into their learning-theoretic approach. Suppose the players are unaware of the game’s mathematical structure and instead apply a simple stochastic propensity to repeat behaviors associated with a satisfactory outcome and otherwise explore an alternative. Would the players learn to cooperate? To find out, I asked Prof. Deutsch if I could simulate an iterated two-person PD game, with each player using an identical stochastic strategy based on reinforcement learning. The stochastic learning model was exceedingly simple. The players have a single goal: to use their own behavior to induce the other player to cooperate. Each player begins by flipping a coin to choose whether to cooperate or defect and then

4

Computational Social Science: A Complex Contagion

55

updates its cooperative propensity based on satisfaction with the associated behavior. Players are satisfied when the other player cooperates and dissatisfied when they defect. The problem was that my school still relied on an IBM mainframe, programmed using punch cards. I would have to punch the cards with lines of code, stand in line waiting to submit the deck, wait hours to get back the paper output, learn that I had made a mistake, punch a new card, and resubmit the deck. Worse yet, the prohibitive cost discouraged the exercise of curiosity when funded by a small grant. After a few frustrating nights, I gave up. I complained about the problem to a friend, Jed Harris, who worked at Apt Associates, not far from campus. Jed told me that Apt had installed the latest DEC PDP-7, a computer that would allow me to observe, debug, and quickly modify the game dynamics as they played out in real time. Jed let me use the machine at night, after work. I could accomplish in one evening what would have taken me weeks on the mainframe, and it cost nothing to explore the parameter space. The simulations revealed a self-reinforcing cooperative equilibrium that could quickly recover from small perturbations introduced by occasional defections. However, the stochastic stability of mutual cooperation depended on the learning rate. Highly reactive players would quickly learn to cooperate and could easily recover if one of the players were to test the other’s resolve. In contrast, slow learners were doomed to endless cycles of recrimination, retaliation, and mutual defection. I wrote up the results for Prof. Deutsch and put the paper with his encouraging comments in a cardboard box, where it remained for 20 years.

4.2

Agent-Based Modeling

Fast forward two decades to when Peter Hedstrom called me up from the University of Chicago where he was helping James Coleman launch a new journal, Rationality and Society. Peter reminded me about my old prisoner’s dilemma simulation, which I had mentioned to him back when we were in grad school together and asked me to update the paper and submit it for the journal’s second issue. The paper (Macy, 1989) introduced what later came to be known as “agent-based modeling,” a new approach to theoretical research that replaced “a model of a population” with “a population of models,” where each model corresponds to an autonomous agent interacting with its neighbors. The point I want to underscore is that the roots of computational social science go back to pioneers like Karl Deutsch, Anatol Rapaport, and Albert Chammah. However, the field had to wait 20 years for a new generation of personal computers to catch up with the advances in social theory that first inspired me as an undergraduate. Recognizing the opportunities opened up by universal access to desktop computing, Bill Bainbridge organized a national meeting supported by the National Science Foundation on “Grand Computing Challenges for Sociology” (Bainbridge,

56

M. W. Macy

1994). John Skvoretz and I were both in attendance and were inspired by Bainbridge’s call. Our model of the diffusion of trust and cooperation among strangers (Macy & Skvoretz, 1998) was followed up by a PNAS paper with Mitch Sato on trust and market formation in the USA and Japan (Macy & Sato, 2002). I also collaborated with Andreas Flache to leverage the rapidly increasing power of desktop machines to explore the theoretical implications of “backward-looking” alternatives to the forward-looking rationality assumed in classical game theory. This culminated in a paper in the Annual Review of Sociology, “Beyond Rationality in Theories of Choice” (Macy & Flache, 1995). However, the impact of agent-based modeling extended far beyond our learning-theoretic approach. Robb Willer and I later co-authored a paper in the Annual Review of Sociology (Macy & Willer, 2002) that highlighted a fundamental analytical shift made possible by agent-based modeling, from “factors” (interactions among variables) to “actors” (interactions among agents embedded in social networks).

4.3

Social Contagion

The shift in the focus of computational social science—from collective action to network interaction—was consolidated with the discovery of small world networks by Watts and Strogatz (1998). As a graduate student at Cornell, Watts was intrigued by an empirical puzzle posed by Stanley Milgram in the 1960s—the “six degrees of separation” popularized as the “Kevin Bacon Number.” This is the startling hypothesis that any two randomly chosen people are connected to one another by a mere handful of intermediates. How is this possible among the billions of people scattered across the planet, each embedded in a small circle of friends and family, and many living in remote towns and villages? Watts and Strogatz found the answer: it takes only a small number of bridge ties between structurally distant communities to give highly clustered networks the short mean geodesic of a random graph. The discovery of small world networks inspired my paper with Damon Centola on the diffusion dynamics of “complex contagions” (Centola & Macy, 2007). Watts and Strogatz modeled “simple contagions,” which entail transmission from a single prior adopter, as occurs in the spread of pathogens or viral information. For example, if you are exposed to Omicron BA.2, you do not need to be infected by a second individual in order to acquire the disease. However, that is not the case if you want to know whether to protest against public health regulations or to participate in a risky collective action or whether to adopt an expensive but unproven innovation. Complex contagions have higher adoption thresholds, that is, they require social reinforcement from multiple prior adopters. In a paper published in AJS (Centola & Macy, 2007), Damon and I enumerated four reasons why social reinforcement may be necessary: 1. Strategic Complementarity: Simply knowing about an innovation is rarely sufficient for adoption (Gladwell, 2000). Many innovations are costly, especially for

4

Computational Social Science: A Complex Contagion

57

early adopters but less so for those who wait. The same holds for participation in collective action. Studies of strikes (Klandermans, 1988), revolutions (Gould, 1996), and protests (Marwell & Oliver, 1993) emphasize the positive externalities of each participant’s contribution. The costs and benefits for investing in public goods often depend on the number of prior contributors—the “critical mass” that makes additional efforts worthwhile. 2. Credibility: Innovations often lack credibility until adopted by neighbors. For example, Coleman et al. (1966) found that doctors were reluctant to adopt medical innovations until they saw their colleagues using it. Markus (1987) found the same pattern for adoption of media technology. Similarly, the spread of urban legends (Heath et al., 2001) and folk knowledge (Granovetter, 1978) generally depends upon multiple confirmations of the story before there is sufficient credibility to report it to others. Hearing the same story from different people makes it seem less likely that surprising information is nothing more than the fanciful invention of the informant. The need for confirmation becomes even more pronounced when the story is learned from a socially distant contact, with whom a tie is likely to be relationally weak. 3. Legitimacy: Knowing that a movement exists or that a collective action will take place is rarely sufficient to induce bystanders to join in. Having several close friends participate in an event often greatly increases an individual’s likelihood of also joining (Finkel et al., 1989; Opp & Gern, 1993), especially for high-risk social movements (McAdam & Paulsen, 1993). Decisions about what clothing to wear, what hair style to adopt, or what body part to modify are also highly dependent on legitimation (Grindereng, 1967). Non-adopters are likely to challenge the legitimacy of the innovation and innovators risk being shunned as deviants, until there is a critical mass of early adopters (Crane, 1999; Watts, 2002). 4. Emotional Contagion: Most theoretical models of collective behavior—from action theory (Smelser, 1963) to threshold models (Granovetter, 1973) to cybernetics (McPhail, 1991)—share the basic assumption that there are expressive and symbolic impulses in human behavior that can be communicated and amplified in spatially and socially concentrated gatherings (Collins, 1993). The dynamics of cumulative interaction in emotional contagions has been demonstrated in events ranging from acts of cruelty (Collins, 1974) to the formation of philosophical circles (Collins, 1993). The theory of complex contagion can be understood as an extension of Granovetter’s theory of the strength of weak ties (Granovetter, 1973). According to Granovetter, ties are weak relationally when there is infrequent interaction and/or low emotional salience, as in relations with an acquaintance. However, these ties can nevertheless be structurally strong in that they provide access to information from outside one’s immediate circle. In network terminology, relationally weak ties often have long range, meaning that they connect nodes located in structurally distant network neighborhoods. Damon and I showed that the structural strength of longrange ties is limited to simple contagions that do not require social reinforcement. In

58

M. W. Macy

contrast, the spread of complex contagions depends on “wide bridges” composed of densely interwoven ties to multiple sources of influence. The reinforcement of social influence is half the story. The other half is how homophily paves the way for wide bridges through the tendency for ties to form and strengthen among like-minded neighbors. Just as social influence can be reinforced in densely clustered networks, homophily is the magnetic force that pulls network nodes together into the dense local clusters required by complex contagion. As with magnetic force, there is repulsion as well as attraction, but unlike ferromagnetics, in social magnetics it is the opposites that repel. Years before my collaboration with Damon, James Kitts, who was my first graduate student, had worked with me to develop a computational model that combined social influence with the network dynamics of attraction and repulsion. Our work drew heavily on John Hopfield’s model of recurrent neural networks (Hopfield, 1982), and James honored our forebear by naming our model “the Hopster.” The model shows how political and cultural polarization can be a unique global attractor, yet other stable configurations are also possible, depending on the density of the belief matrix. These include monoculture and the cross-cutting divisions of a pluralist society.

4.4

From Latte Liberals to Naked Emperors

The Hopster, it turned out, had much more to teach us about the self-reinforcing dynamics of homophily and social influence. Daniel DellaPosta, Yongren Shi, and I used a descendant of the original model to address the curious tendency for liberals and conservatives to differ not only on policy but also lifestyle preferences, as documented using the General Social Survey (DellaPosta et al., 2015). However, our underlying theoretical motivation ran deeper: to show how belief systems can self-organize through the forces of attraction and repulsion. The emergent configurations invite post-hoc explanations of cultural fault lines that can be as substantively idiosyncratic as a liberal preference for caffeinated hot beverages, an idea originally proposed by Miller McPherson (2004). A few years later, Sebastian Deri, Alex Ruch, Natalie Tong, and I (2019) tested the “latte liberal” hypothesis using a large online experiment, modeled after the multiple worlds “music lab” devised by Salganik et al. (2006). The results of our “party lab” experiment confirmed the predicted unpredictability of partisan divisions. The emergent disagreements were as deep as those observed in contemporary surveys, but with one important difference: You could be sure that the two parties would strongly disagree, but it was a coin flip as to who would be on which side. In one “world,” Democrats might join the bandwagon to embrace “great books,” while Republicans rallied around more emphasis on children’s physical fitness, but in the next world the sides would be switched. The problem is that social scientists, like the participants in our study, can only observe the one world we all inhabit. That leaves us susceptible to “just so” stories that plausibly explain the opposing beliefs of each

4

Computational Social Science: A Complex Contagion

59

side, unaware that the sides could just as easily have been switched but for the luck of the draw. The arbitrariness of partisan division invites the reassuring hypothesis that political polarization can be easily reversed simply by reminding everyone that the emperor is naked. Unfortunately, it is not so easy, for two reasons—false enforcement and hysteresis. In a study with Robb Willer and Damon Centola (2005), we simulated Andersen’s classic fable to show how conformists might falsely enforce unpopular norms to avoid suspicion that they might have complied because of social pressure instead of genuine conviction. In a follow-up study with Ko Kuwabara, Robb and I tested the “false enforcement” hypothesis in a wine-tasting experiment using vinegar-tainted wine (Willer et al., 2009). As predicted by Andersen (as well as Arthur Miller’s The Crucible), participants who praised the tainted wine were more likely to criticize the lone confederate who refused to go along with the charade—but only when the criticism was performed in public. Polarization may be hard to reverse even in the absence of false enforcement. In collaboration with a team of computer scientists at RPI, I recently used another Hopster variant to investigate the tipping point beyond which a polarized legislature becomes increasingly unable to unite against common threats, such as election interference by a foreign adversary or a global pandemic (Macy et al., 2021). The problem is hysteresis, in which polarization alters the network structure by eliminating the inter-party ties by which pragmatists might “reach across the aisle.” The structural change is difficult if not impossible to reverse, even if the political temperature could somehow be lowered well below current levels. The disturbing implications attracted widespread media attention, including the New York Times and CNN.

4.5

The Second Wave

If the “first wave” in computational social science was all theory and little data, the “second wave” was the mirror opposite: big data with little theory. Social science has accumulated a trove of theories waiting for the data that are needed to test them. The transformative potential of online data was celebrated by one of the founders of computational social science, Duncan Watts (2011, p. 266): [J]ust as the invention of the telescope revolutionized the study of the heavens, so too by rendering the unmeasurable measurable, the technological revolution in mobile, Web, and Internet communications has the potential to revolutionize our understanding of ourselves and how we interact. . . . [T]hree hundred years after Alexander Pope argued that the proper study of mankind should lie not in the heavens but in ourselves, we have finally found our telescope. Let the revolution begin.

Is the Web the space telescope of the social sciences? The metaphor is instructive. The power of a telescope, whether in outer space or cyberspace, depends on our ability to know where to point it. Moreover, with millions of observations in global

60

M. W. Macy

networks, the challenge is to find differences that are not statistically significant. Theoretical significance then becomes paramount.

4.6

Structural Holes and Network Wormholes

My first study using big data addressed Ron Burt’s theory of structural holes. Nathen Eagle, Rob Claxton, and I used call logs from one of the UK’s largest telecoms to test Ron’s theory at population scale (Eagle et al., 2010). As predicted, we found that economically advantaged communities tended to have more people with ties that link otherwise distantly connected neighbors, although the causal direction remained to be sorted out. More recently, Patrick Park, Josh Blumenstock, and I (2018) used these same telecom data, along with global network data from Twitter, to search the social heavens for “network wormholes,” our term for long-range ties that span vast distances in a global communications network. These were not the “wide bridges” in complex contagion; in contrast, we were searching for Granovetter’s “weak ties,” the “long bridges” that connect otherwise unreachable clusters. Not surprisingly, we found these ties to be extremely rare. For any random edge in a global network, the “degree of separation” along the second-shortest path is almost always close to two hops. Nevertheless, a handful of long-distance “shortcuts” can also be found in the global communication networks made visible by social media. The question then arises, are these shortcuts strong enough to matter? The default assumption in network science, going back to Granovetter, is that they are relationally weak. The “strength of weak ties,” in Granovetter’s theory, is the access they provide to socially distant sources of information, not their affective or social intensity. Long-range ties, or so the theory goes, connect acquaintances with whom interaction is infrequent, influence is low, and bandwidth is narrow. But that is not what we found. Contrary to extant theory, network wormholes have nearly the same bandwidth and affective content as the densely clustered ties that connect a small circle of friends. Another big empirical study was motivated by years of computational modeling with the Hopster. Yongren Shi, Feng Shi, Fedor Dokshin, James Evans, and I used millions of Amazon book co-purchases to see if partisan cultural fault lines extended even to the consumption of science, a realm that is presumably above the political fray (2017). Could a shared interest in science bridge political differences and encourage reliance on science to inform political debate? Or has science become a new battlefield in the culture wars? We found that the political left and right share an interest in science in general, but not science in particular. Liberals are drawn more to basic science (e.g., physics, astronomy, and zoology), while conservatives prefer applied science (e.g., criminology, medicine, and geophysics). Liberals read science books that are more often purchased by people who do not buy political books, while conservatives prefer science books that are mainly purchased by fellow conservatives.

4

Computational Social Science: A Complex Contagion

61

The most impactful paper of my career was a 2011 study with Scott Golder in which we used millions of messages obtained from Twitter to measure diurnal emotional rhythms (Golder & Macy, 2011). We sorted the tweets into 168 buckets, one for each hour of the week, and then used Pennebaker’s Linguistic Inquiry and Word Count lexicon to measure the level of positive and negative emotion in each bucket. Positive emotion is indicated by the use of words like “excited” or “fun,” in contrast to words like “disappointed” or “frustrated.” We found that people were happiest right around breakfast, but for the rest of the workday it was all downhill. We immediately suspected the effects of an exhausting job ruled by an unpleasant supervisor, but then we noticed the same pattern on weekends as well, except that the starting point was delayed by about 2 h, reflecting perhaps the opportunity to sleep in. Using changes in diurnal rhythms relative to sunrise and sunset, we concluded that the pattern is driven largely by sleep cycles, not work cycles. The paper was published in Science and attracted more mainstream media attention than all my other papers combined. A decade later, Minsu Park and I (with the help of company staffers) used global Spotify music logs to track the flip side of the Twitter study—the emotions people are exposed to in the music they choose to stream instead of the emotions they express (Park et al., 2019). We filtered out Spotify playlists to focus on user-selected music the world over. We discovered that the diurnal pattern in affective preference closely resembles the diurnal cycles that Scott and I detected in expressed emotion. This suggests the possibility that our affective preferences reinforce rather than compensate our emotional state. For example, when we are sad we do not go for upbeat music to lift our spirits, we listen to something melancholy. Unfortunately, Minsu and I were unable to link user’s Spotify and Twitter accounts, so our affective reinforcement theory remains to be tested at the individual level.

4.7

Conclusion

This brief overview of my personal involvement in the first and second waves of computational social science is intended to call attention to the foundational questions that the studies addressed, from the origins of social order to the strength of weak ties in global online networks. There is no shortage of theory in the social sciences, and from its inception, computational social science has ranked among the more theory-driven fields. In contrast, there has been a shortage of data with which to test many of those theories, due largely to the historic difficulty in observing social interaction except in small groups. That is now changing with the global popularity of online activities that leave digital traces, from shopping to blogging. Nevertheless, the social sciences have not taken full advantage of the vast new research opportunities opened up by advanced computational methods. The question is why? I do not believe this hesitancy should be attributed to the failure to ask interesting and important questions. On the contrary, the signature contribution of computational social science is the opportunity to tackle vital questions that would otherwise

62

M. W. Macy

be inaccessible. I have not run the numbers but my casual impression is that these studies are far more likely to appear in general science journals with double-digit impact scores than in highly specialized journals devoted to topics that interest only a narrow audience. The problem is not the disciplinary relevance of the research; I suspect it is instead the price of admission. Rapid advances in computation have been accompanied by equally rapid turnover in the requisite technical skills, from object oriented programming that super-charged agent-based modeling to deep learning and word embedding that have opened up new frontiers in text analysis. These methods require substantial retooling, even for quantitative specialists with advanced statistical training. The updating of graduate training that Scott Golder and I called for in our 2014 Annual Review paper remains to be implemented at scale in any of the social sciences. Until that happens, computational social science is likely to remain confined largely to those who have the necessary skills. The torch will then continue to be carried mostly by computer scientists and socio-physicists who may be more interested in discovering unexpected patterns in the data than discovering what they might mean. Meanwhile, our best option is the increasing reliance on interdisciplinary research teams that bring together specialists who not only know how to operate the telescope but also where to point it.

References Axelrod, R. (1984). The evolution of cooperation. Basic Books. Bainbridge, W. (1994). Grand computing challenges for sociology. Social Science Computer Review, 12, 183–192. https://doi.org/10.1177/089443939401200203 Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113, 702–734. https://doi.org/10.1086/521848 Centola, D., Willer, R., & Macy, M. (2005). The emperor’s dilemma: A computational model of self-enforcing norms. American Journal of Sociology, 110, 1009–1040. https://doi.org/10.1086/ 427321 Coleman, J., Katz, E., & Menzel, H. (1966). Medical innovation: A diffusion study. Bobbs-Merrill. Collins, R. (1974). Three faces of cruelty: Towards a comparative sociology of violence. Theory and Society, 12, 631–658. Collins, R. (1993). Emotional energy as the common denominator of rational action. Rationality and Society, 5, 203–230. Crane, D. (1999). Diffusion models and fashion: A reassessment. Annals of the American Academy of Political and Social Science, 566, 13–24. DellaPosta, D., Shi, Y., & Macy, M. (2015). Why do liberals drink lattes? American Journal of Sociology, 120, 1473–1511. Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science, 328, 1029–1031. https://doi.org/10.1126/science.1186605 Finkel, S., Muller, E., & Opp, K. (1989). Personal influence, collective rationality, and mass political action. American Political Science Review, 83, 885–903. Gladwell, M. (2000). The tipping point: How little things can make a big difference. Little, Brown. Golder, S., & Macy, M. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333, 1878–1881. https://doi.org/10.1126/science.1202775

4

Computational Social Science: A Complex Contagion

63

Golder, S., & Macy, M. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. https://doi.org/10.1146/annurev-soc071913-043145 Gould, R. (1996). Patron-client ties, state centralization, and the whiskey rebellion. American Journal of Sociology, 102, 400–429. https://doi.org/10.1086/230951 Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380. Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology, 83, 1420–1443. Grindereng, M. (1967). Fashion diffusion. Journal of Home Economics, 59, 171–174. Heath, C., Bell, C., & Sternberg, E. (2001). Emotional selection in memes: The case of urban legends. Journal of Personality and Social Psychology, 81, 1028–1041. https://doi.org/10.1037/ 0022-3514.81.6.1028 Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. National Academy of Sciences of the United States of America, 79, 2554–2558. https:// doi.org/10.1073/pnas.79.8.2554 Klandermans, B. (1988). Union action and the free-rider dilemma. Research in Social Movements, Conflict and Change, 10, 77–92. Macy, M. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s dilemma. Rationality and Society, 1, 197–219. Macy, M., & Flache, A. (1995). Beyond rationality in theories of choice. Annual Review of Sociology, 21, 73–91. Macy, M., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan. Proceedings of the National Academy of Sciences, 99, 7214–7220. Macy, M., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A computational model. American Sociological Review, 63, 638–660. Macy, M., & Willer, R. (2002). From factors to actors: Computational sociology and agent-based modeling. Annual Review of Sociology, 28, 143–166. https://doi.org/10.1146/annurev.soc.28. 110601.141117 Macy, M., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754 Macy, M., Ma, M., Tabin, D., Gao, J., & Szymanski, B. (2021). Polarization and tipping points. Proceedings of the National Academy of Sciences, 118, e2102144118. https://doi.org/10.1073/ pnas.2102144118 Markus, M. (1987). Toward a ‘critical mass’ theory of interactive media: Universal access, interdependence and diffusion. Communication Research, 14, 491–511. https://doi.org/10. 1177/009365087014005003 Marwell, G., & Oliver, P. (1993). The critical mass in collective action. Cambridge University Press. https://doi.org/10.1017/CBO9780511663765 McAdam, D., & Paulsen, R. (1993). Specifying the relationship between social ties and activism. American Journal of Sociology, 99, 640–667. https://doi.org/10.1086/230319 McPhail, C. (1991). The myth of the madding crowd. Aldine. McPherson, M. (2004). A Blau space primer: Prolegomenon to an ecology of affiliation. Industrial and Corporate Change, 13, 263–280. Opp, K., & Gern, C. (1993). Dissident groups, personal networks, and spontaneous cooperation: The East German Revolution of 1989. American Sociological Review, 58, 659–680. https://doi. org/10.2307/2096280 Park, P., Blumenstock, J., & Macy, M. (2018). The strength of long-range ties in population-scale social networks. Science, 362, 1410–1413. Park, M., Thom, J., Mennicken, S., Cramer, H., & Macy, M. (2019). Global music streaming data reveal diurnal and seasonal patterns of affective preference. Nature Human Behaviour, 3, 230–236. https://doi.org/10.1038/s41562-018-0508-z Rapaport, A., & Chammah, A. (1965). Prisoner’s dilemma: A study in conflict and cooperation. The University of Michigan Press.

64

M. W. Macy

Salganik, M., Dodds, P., & Watts, D. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856. Schelling, T. (1971). Dynamic models of segregation. The Journal of Mathematical Sociology, 1, 143–186. https://doi.org/10.1080/0022250x Shi, F., Shi, Y., Dokshin, F., Evans, J., & Macy, M. (2017). Millions of online book co-purchases reveal partisan differences in the consumption of science. Nature Human Behaviour, 1, 79. https://doi.org/10.1038/s41562-017-0079 Shubik, M. (1970). Game theory, behavior, and the paradox of the prisoner’s dilemma: Three solutions. Journal of Conflict Resolution, 14, 181–193. Smelser, N. (1963). Theory of collective behavior. Free Press. Watts, D. (2002). A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99, 5766–5771. Watts, D. (2011). Everything is obvious: Once you know the answer. Crown Business. Watts, D., & Strogatz, S. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442. Willer, R., Kuwabara, K., & Macy, M. (2009). The false enforcement of unpopular norms. American Journal of Sociology, 115, 451–490. https://doi.org/10.1086/599250

Chapter 5

Model of Meaning Hiroki Takikawa and Atsushi Ueshima

5.1

Introduction

Meaning is a fundamental component of the social world (Luhmann, 1995; Schutz & Luckmann, 1973; Weber, 1946). People inhabiting the social world interpret the meaning of natural objects in their environment and social objects, including others, and act based on these interpretations. If we call the mechanism by which people interpret the meanings of objects and other people’s actions and link them to their own actions the meaning-making mechanism (Lamont, 2000), then social science, which aims to explain the behavior of people and groups, must, as its fundamental task, elucidate this meaning-making mechanism as its fundamental task. In sociology, since Weber (1946) focused on subjective meaning in his definition of the discipline, considerations related to meaning-making mechanisms—considerations about the relationship between meaning and human action—have accumulated. In terms of methods for elucidating meaning-making mechanisms, meaning and culture have traditionally been considered qualitative in nature, as in German sociologist GeistesWissenschaft’s position (Dilthey, 1910), which is closely related to the establishment of Weber’s sociology. Therefore, although there are exceptions, approaches to meaning have primarily been attempted through qualitative social theory and qualitative research. On the contrary, rational choice theory, the most influential formal theory of action in sociology, has initially viewed meaning-making mechanisms in an

H. Takikawa (✉) Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] A. Ueshima Graduate School of Arts and Letters, Tohoku University, Sendai, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_5

65

66

H. Takikawa and A. Ueshima

extremely simplistic manner and by means of a narrowly defined self-interest maximization principle. Specifically, early rational choice theories positioned the meaning-making of surrounding objects and the actions of others solely in terms of self-interest maximization (Becker, 1976; Coleman, 1990). However, it has become clear that theories based on such narrow assumptions have significant limitations in capturing the richer meaning-making mechanisms of people targeted by sociology. Because of these recognized limitations, rational choice theories in sociology have been developed by incorporating various “explanatory factors,” such as nonmaterial benefits and altruistic considerations, into actors’ utilities and purposes (Opp, 1999). However, such sociological rational choice theories tend to be at risk of degenerating into mere storytelling, producing models that lack predictive power (Watts, 2014). The main reason for this is overfitting. Overfitting is a machine learning term that refers to overexplaining what “fits” the data at hand, thereby losing explanatory power in more general cases (see Chap. 3). In general, the more the model is loosened (the more complex it becomes) to allow for rich a posteriori “explanations” that make a given action understandable, the more prone it is to overfitting. If early rational choice theories are underfitting, recent sociological rational choice theories have the potential to fall into overfitting models that lack predictive ability. The problem here is clear. It is the absence of a formal meaning-making model that is superior in terms of predictive capacity based on empirical evidence. There are several reasons for this absence. In particular, data on the semantic and cultural aspects of human action have traditionally been qualitative, making formal examination difficult. However, with the recent development of digital society, a breakthrough has been achieved in this regard. The large volume of digital texts contained in digital traces and archives offers unprecedented opportunities for quantitative access to the semantic world of actors (Evans & Aceves, 2016; Grimmer & Stewart, 2013). In addition, natural language processing and artificial intelligence models that analyze texts not only serve as powerful analytical tools but can also provide clues for building theoretical models of meaning-making mechanisms (Blei, 2012; Mikolov et al., 2013). Of course, the target of sociology is the meaning of social action in general, not limited to the meaning of texts or natural language. However, since most social actions are mediated by language, it would not be incorrect to assume that there is at least some common meaning-making mechanism between the interpretation of social actions and of textual meaning. Therefore, it seems promising to construct a theoretical model of meaning-making based on findings in the field of natural language processing, where data and models are the most developed. Such a movement has emerged in recent years, particularly in cultural sociology (ArsenievKoehler & Foster, 2022; Boutyline & Soter, 2021; Foster, 2018; Kozlowski et al., 2019). In addition, throughout the chapter, we emphasize the link between interpretation and action, which is not necessarily the focus of the current natural language processing literature. Existing computational social sciences have little motivation to contribute to sociological theory by providing theoretical insights and empirical examinations of meaning-making mechanisms through the quantitative analysis of texts. Therefore,

5

Model of Meaning

67

this chapter examines whether the sociological elucidation of meaning-making mechanisms using textual data and artificial intelligence models is possible, and if so, in what direction exploration should proceed. The outline of this chapter is as follows: In the next section, we review existing theories of meaning in sociology. We summarize the key points of existing theories and argue that with the introduction of large-scale textual data and cognitive science, the conditions are now in place to enable the refinement of sociological theories of meaning through computational social science. We then formulate meaning-making as a computational problem and characterize it with three key points: the predictive function of meaning, the relationality of meaning, and Bayesian learning. We then discuss the utility of two computational linguistic models—the topic model and the word-embedding model—in terms of theories of meaning. For these models of meaning to be valid as sociological models of meaning production, it is necessary to demonstrate that meaning interpretation motivates actual action. In the preceding section, we discussed ways to link semantic representations and actions. The final section concludes the chapter.

5.2

Theories of Meaning in Sociology

In this section, we briefly review sociological theory of meaning.

5.2.1

Weber’s Social Action

According to Weber’s (1946) definition, sociology is “a science which attempts the interpretive understanding of social action to arrive at a causal explanation of its course and effects.” In this definition, action refers to human behavior based on subjective meaning. In this way, Weber established the goal of sociology to causally explain an action by interpreting the meaning attached to it. [The work of Simmel, 1922, who, prior to Weber, placed the problem of understanding meaning at the center of sociology, should not be ignored]. The great significance of Weber’s definition is that he clearly points out the examination of meaning-making mechanisms as a central issue in sociology. Furthermore, also important from today’s perspective is Weber’s recognition of the significance of the distinction between an understanding of action and the verification of the validity of that understanding through causal explanation. No matter how persuasive a meaning attribution may be, it will not work as a sociological explanation unless it can causally explain the actor’s action. This is very important from today’s perspective in terms of avoiding “overfitting.” It is also important to clarify that the interpretation of meaning in sociology is only done for the purpose of causally explaining actions.

68

H. Takikawa and A. Ueshima

In addition, Weber proposed an ideal type of device for such an explanation. Ideal types should be constructed, in Weber’s words, “meaning-adequate” and should be conducive to causal explanation, which is in line with the argument in this chapter that we should attempt to explain actions and collective consequences by formalizing actors’ meaning-making mechanisms.

5.2.2

Schutz’s Phenomenological Sociology

Starting from the critique of Weberian interpretative sociology, Schutz made several important insights into the meaning structure of the social world, which greatly influenced today’s schools of social constructionism (Berger & Luckmann, 1967) and ethnomethodology (Garfinkel, 1967). Schutz’s (Schutz & Luckmann, 1973) analysis focused on the structure of the everyday life-world, which is taken for granted by people in terms of their natural attitudes. The typification theory is particularly important. According to Schutz, actors in the world are both constrained by it and act and intervene in it, and pragmatic motives underlie their meaningmaking. People usually carry out their actions without delay based on typified knowledge about things and people. In this way, the knowledge of the everyday world is self-evident. However, when actions based on existing knowledge are confronted with problematic situations, the knowledge and typologies previously considered self-evident are questioned and reinterpreted. Under such dynamism, the meaning of the social world is constituted. Schutz’s typification theory is important in that it explicitly argues that our social world is semantically constituted through knowledge and typologies, and that they are not only given but are also socially constituted in that they are constantly being questioned and revised. Schutz’s argument also excels in the coherence of its logical organization. Although he himself does not propose a formal model, he recognizes the importance of using models in the social sciences, and his arguments have a logical structure that is relatively comfortable with formal models. As such, his typification theory has a remarkable commonality in logical structure with the formally formulated Bayesian theory of categorization (Anderson, 1991), which we discuss later.

5.2.3

Bourdieu’s Cognitive Sociology

Bourdieu followed the theoretical tradition of Durkheim (1915), who sought to clarify the social origins of people’s perceptions, classifications, and categories (Bourdieu, 1984). Bourdieu is known for his theories of social class, cultural capital, and habitus, but recently, his cognitive sociological aspects have attracted attention (Lizardo, 2004, 2019). He believes that there is a fundamental correspondence between the objective structure of the social world, such as social class, and the

5

Model of Meaning

69

psychological structure of the actors who perceive and categorize the social world, which is produced in the course of struggling to adopt their social positions (Bourdieu, 1989). The greatest legacy of Bourdieu’s argument is twofold: First, Bourdieu not only proposed a conceptual apparatus for treating cognition sociologically but also articulated a methodology for analysis using multiple correspondence analysis, which allows for the spatial arrangement of different variables and the examination of their positional relationships. Second, he argues that this method enables us to spatially represent the social world as a social space, as well as the world of symbols and meanings as a semantic space, and to discuss the correspondence between the two. Despite its limitations, Bourdieu’s method has had a direct impact on today’s quantitative cultural sociology in that it paved the way for a quantitative approach to meaning and, more specifically, suggested the possibility of meaning being expressed through spatial structures.

5.2.4

White and DiMaggio’s Sociology of Culture

Thus far, we have introduced Weber, Schutz, and Bourdieu’s classical theories of meaning. While these can serve as important inspirations for developing models of meaning, with the exception of Bourdieu, these sociologists themselves do not directly propose formal models. In contrast, White and DiMaggio, whose work is discussed next, developed ideas that directly lead to formal models of meaning and quantification of meaning. Their arguments can be credited by providing a basis for today’s computational sociology of culture. Early in his career, H. White (1963) paved the way for a quantitative treatment of culture by proposing an extended formal model of kinship structure by Levi-Strauss and Weil. His next step was to formalize the most important sociological concept of roles by means of an algebraic theory of networks, in which White invented the concept of “catnet” and drew attention to the fact that the semantic category “cat,” which refers to people or groups, is inseparably linked to the way “net” connects these people (White, 2008b; White et al., 1976). Later, White attempted to theorize people’s meaning-making and network formation mechanisms based on Bayesian ideas by which identity is committed to control in an uncertain world, but he failed to establish an explicit formal model (White, 1992, 1995, 2008a). Nevertheless, he has had a profound impact on many of the central researchers in cultural sociology today, such as Mohr, Bearman, and DiMaggio, who will be discussed shortly. Another important figure of the quantitative approach in cultural sociology today is DiMaggio, who published a programmatic review article in 1997 on the integration of cultural sociology and cognitive science (DiMaggio, 1997). He criticized traditional sociological theories for failing to deal explicitly with cognition while implicitly making some assumptions about cognition and called for the active introduction of theories from cognitive and cognitive psychology to explore the

70

H. Takikawa and A. Ueshima

mechanisms of meaning-making. He also laid the groundwork for the construction of quantitative semantic models in sociology, pointing to the kinship between sociological models of meaning and models of meaning in computational linguistics, leading to the early introduction of topic models in sociology (DiMaggio et al., 2013).

5.2.5

In Summary: Semantic Models in Cultural Sociology Today

In the classic discussions in sociology by Weber, Schutz, and others, the approach to the problem of meaning was almost entirely qualitative. However, thanks to the efforts of White, DiMaggio, and others, today’s cultural sociology is beginning to quantitatively approach meaning. Of particular importance are: 1. The expansion of data sources, including large-scale textual data, has led to the development of computational social scientific methods. 2. In conjunction with (1), ideas from cognitive psychology and cognitive science have permeated sociology. These two opportunities have led to the establishment of a healthy practice in cultural sociology by modifying theories based on empirical data while using formal theoretical models as a foundation (Mohr, 1998; Mohr et al., 2020). In this chapter, we will continue to follow this trend and examine the possibilities of a sociological model of meaning-making. For this purpose, we will examine models that have been developed mainly in computational linguistics and discuss the possibility of connecting them to a sociological model of meaning. In the next section, we would like to make some preliminary considerations that will serve as preparation for the examination of models in computational linguistics.

5.3

Preparatory Considerations on the Semantic Model

A key to connecting the sociological model of meaning-making with the model of computational linguistics is to view meaning-making as a computational problem (cf. Foster, 2018). In other words, we apply the idea of computation to the theory of meaning itself, not merely as a technique for analyzing the given data. If meaningmaking is viewed as a computational problem, it can be formulated as the activity of extracting meaningful information from an uncertain and noisy environment (Foster, 2018). Thus, it is concerned with the problem of inferring the potential “meaning” from noisy data. From the data given to our senses (e.g., ink stains, objects with color and shape, the physical actions of others), meaning-making as a computational problem is to extract the “meaning” behind them.

5

Model of Meaning

71

There are multiple reasons that meaning-making should be framed as a computational problem. First, there is the prospect that many of the valuable insights into different theorizations of meaning that sociology has attempted thus far can be modeled in a unified way under the idea of computation. Second, using the idea of computation, we can directly draw on various computational linguistics and artificial intelligence models that have already been developed to theorize semantic phenomena. This eliminates the need to start from scratch to build a formal meaning model. Third, if we can construct a model of meaning from a computational perspective, we can also obtain an organic link between a method for analyzing semantic phenomena in sociology and a model for theorizing semantic phenomena. Finally, the traditional findings of sociological theory, especially qualitative research, can be communicated to people in other fields such as computer science by reconstructing them from a computational perspective. Conversely, it can convince sociologists about the significance of computer science findings. Marr (1982) distinguished three levels of explanation for computation: (1) the level of computational theory, (2) the level of representation/algorithms, and (3) the level of implementation/hardware. We now consider the first level. The second level of representation/algorithm corresponds to computational models, such as topic models or word-embedding models, which were introduced later (cf. ArsenievKoehler & Foster, 2022). Hardware is a domain of neuroscience that is not covered here. The level of the theory of computation is the level that asks what is being done and why. The nature of computation is intrinsically dependent on the problem that it is solving. Thus, at this level, the question of what problem is being solved and why is decisive. What kind of activity are we engaged in when trying to interpret the meaning of words, things, and actions? When formulating meaning-making as a computational problem, it struggles with extracting information from noisy observed data. The challenge is that there is no unique way to extract information from the observed data (Hohwy, 2013). For example, the potential meaning of an observed physical action of another person raising their hand does not have a one-to-one correspondence with the immediate individual observed action (Hohwy, 2013). They might be saying hello, hailing a cab, or stretching. There are multiple possible meanings, and any interpretation can be erroneous (cf. Weber, 1946). A theory of meaning sees this meaning-making as a computational problem and provides an explanation of how an actor selects one potential meaning from multiple possibilities in an uncertain environment based on observed data, how they correct this interpretation if it is erroneous, etc. Why do we attempt to infer the meaning of things and events? We would like to emphasize the viewpoint that we do this to predict things and events that will occur and to choose appropriate actions. This perspective was emphasized by Weber (1946) and Schutz (Schutz & Luckmann, 1973), among others, and is also presented as a prevailing idea in cognitive science today (Anderson, 1991; Clark, 2015; Hohwy, 2013). By viewing meaning-making as the extraction of information from an uncertain environment in order to predict the future and choose appropriate actions, we can

72

H. Takikawa and A. Ueshima

more clearly formulate not only the function of meaning but also the forms of meaning (relations) and the learning process of meaning. In the following, we will explain a) a function of meaning in terms of prediction, and then discuss b) forms of meaning (relationships), and c) learning processes of meaning (semantic learning).

5.3.1

Function of Meaning: Prediction

Our desire to know the meaning of things lies in pragmatic motivation (Schutz & Luckmann, 1973). In other words, we must know the meaning of things and events in the world if we are to operate smoothly in this world. Knowing the meaning of a thing or event is connected to predicting the things and events related to that thing or event, which ultimately leads to actions based on predictions. For example, if we do not know the meaning of a traffic sign, we may cause a serious traffic accident. Suppose that we do not understand the meaning of a one-way traffic sign and enter a street from the opposite direction. In that case, if our oncoming car is traveling without slowing down, we might cause a serious accident. Understanding the meaning of a sign implies being able to predict the occurrence of events associated with that sign. When there is a one-way sign, we can predict that the car may drive in the direction indicated by the arrow. With such predictions, we can avoid serious accidents, drive smoothly, and arrive at our destination. Understanding the meaning of things makes the world predictable and enables us to live smoothly. What Schutz (Schutz & Luckmann, 1973) calls typification is the central activity of meaning-making, which also aims at prediction. That is, by understanding objects and objects in the world as specific categories, we can predict various aspects of the attributes of these objects. Typification is also called categorization and cognitive psychology (Anderson, 1991). According to Anderson (1991), categories are bundles of attributes. Thus, for a thing with attribute I, if we estimate the category membership K from that attribute, we can now predict another attribute, j, that the category has from the category membership K. In the example used in Schutz’s typology, the appearance of a mushroom (attribute i) and predicting that it has the attribute “edible” (attribute j) is an example of the connection between categorization and prediction (Schutz & Luckmann, 1973).

5.3.2

Relationality of Meaning

From the idea of meaning as a prediction, it is possible to draw another important implication about meaning: Meaning takes a relational form. In sociology, this corresponds to what Weber (1946) and Schutz (Schutz & Luckmann, 1973) call Sinn Zusammenhang.

5

Model of Meaning

73

Meaning tied to prediction also implies that meaning relates a thing (observed data) to another thing (other observed data) through prediction. From a computational perspective, understanding meaning involves inferring its potential meaning from certain observed data, and the function of inferring meaning involves contributing to prediction. For example, if we observe a “rain cloud” and understand it to mean “it will rain,” we can predict the event “the laundry will get wet.” From another angle, the observed data of “rain clouds” can be seen as being associated with the observed data of “laundry getting wet.” The meaning of “it rains” then “manifests” as a complex of relationships among “rain clouds appear,” “laundry gets wet,” “humidity rises,” and so on. Thus, meaning appears in the relationships between multiple things. Using Anderson’s notation, the potential meaning K appears as a complex of relationships among data i, j, . . . . If we link this to learning in anticipation of the next topic, it means that we learn the meaning K of observed data i by relating it to other data j, l. . . . Let us discuss this next.

5.3.3

Semantic Learning

We stated that we estimate the potential meaning of things and objects to predict what will happen next. We also mentioned that through such predictions, we can connect the relationships among the attributes i, j, . . . of things and objects. This prediction can be either correct or incorrect. If we see the color r on a mushroom and think it is an edible mushroom K and eat it, the mushroom may not be edible j, but may be poisonous l and give us a stomachache (Schutz & Luckmann, 1973). In this case, we would conclude that the prediction of j based on the (mis)inferred category K was incorrect, and we would reestimate the category membership inferred from the mushroom’s color r as poisonous mushroom M rather than edible mushroom K. Additionally, r would thereafter be remembered as being associated with poisonous l, not edible j. Thus, prior estimates are modified and revised a posteriori when the predictions are not accurate. Here, we simply assumed that if the prediction matched the estimation result, the estimation result would be retained, and if it missed, it would be replaced by another estimation. In other words, the category “edible mushroom K” was estimated from the color r of the mushrooms, and if the prediction derived from it (“mushrooms are edible j”) was correct, the estimate K was retained, and if it was wrong (“mushrooms are poisonous l”), K was discarded and M was assumed. Here, the learning process is the choice between retaining or discarding the estimated result. In reality, however, this learning process is stochastic because learning is performed to extract certain information that is meaningful out of an uncertain and noisy observed environment. In other words, the more accurate the prediction, the higher the probability that the prior guess is correct and the lower the probability that it is wrong. This idea can be formulated using Bayes principle: Bayesian learning, which will be introduced again in the section on models of computational linguistics, proceeds as follows: let Pr(K ) be the prior belief about

74

H. Takikawa and A. Ueshima

K (e.g., “a mushroom belongs to category K”), and let Pr( j| K ) be the probability of an event j occurring when K occurs (“a mushroom belonging to K has attribute j”). If the event j that occurs is exactly what K predicts (if Pr( j| K ) is large), the posterior belief Pr(K| j) after observing j is strengthened, and if it differs from the prediction (if Pr( j| K ) is small), the posterior belief is weakened. Thus, the meaning-making process is characterized by Bayesian learning through trial and error. By formulating meaning-making as a computational problem, we have discussed the following: (a) the function of meaning is linked to prediction, (b) meaning appears as a relational form, and (c) meaning is Bayesian, learned by trial and error. With these considerations in mind, we will now examine how various computational linguistics models can be used in theories of meaning.

5.4

Computational Linguistics Model

Computational linguistics interprets the understanding and creation of meaning in natural language as a computational problem. A computational problem involves extracting meaningful information from an uncertain environment and predicting the behavior of the environment. In the case of natural language, the environment is linguistic (Griffiths et al., 2007). From a computational perspective, a linguistic environment can be formulated as an environment with statistical properties in which individual words and phrases occur probabilistically. Based on this idea, being able to understand the meaning of a text implies being able to predict the words, phrases, etc., used in that text. For example, when reading or speaking a text, we check our understanding by predicting what the author or speaker will write or say. Furthermore, when an unexpected word or phrase appears, we realize that we have misunderstood the meaning of the text or utterance. In a more formalized formulation of the discussion so far, textual semantic understanding is the process of inferring from the latent structure that produces the observed features that appear in a text, thereby predicting the various features that will be subsequently produced. This latent structure is named “gist” by Griffiths et al. (2007). Thus, a model that models the latent structure that generates meaning is referred to as a generative model. The idea of the generative model and the aforementioned computational formulation of categories proposed by Anderson are based on almost identical ideas. A category defines a bundle of features of an object; a gist characterizes a bundle of features of a text, such as observable words and coded phrases. Like categories, a gist is not observable, but rather a latent structure g that generates an observable quantity. Identifying this latent structure g is equivalent to the semantic understanding of the text, which makes the occurrence of observable words and phrases predictable. The relationality of meaning has been formulated in computational linguistics as a distributional hypothesis of meaning. The distribution hypothesis was formulated by anthropologist Firth (1957), who learned it from Malinofsky (at the same time, linguist Harris (1954) published a similar idea). Thus, it is fair to say that this

5

Model of Meaning

75

hypothesis was originally derived from the social sciences. According to this hypothesis, the meaning of a word can be inferred from the words around it (“You shall know a word by the company it keeps!”). This hypothesis can be embodied by the idea of meaning as a latent structure. Now, let w be a set of words separated by a certain range, and let g be the latent structure that generates the individual words w1, w2. . .that belong to w (cf. Griffiths et al., 2007). Typically, w is a single sentence, and the individual words that make up that sentence can be thought of as w1, w2. . . . In this case, as the distributional hypothesis states, the meaning of a word is determined by its surrounding words. For example, to know the meaning of word w1, we must estimate the latent structure g. To estimate g, we must take w2, w3. . ., which are located around w1, as clues. Thus, from the observed words alone, the meaning of word w1 can be determined by words w2, w3. . . . As such, the generative model can also be used for learning meaning. The meaning of a word is learned from the meanings of the surrounding words. The occurrence of the next word is predicted from the distribution of surrounding words (via the estimation of the latent structure), and the accuracy of the prediction is increased by modifying the prior guesses a posteriori according to the success or failure of the prediction. As previously mentioned, such a learning process can be formulated within the framework of Bayesian learning. What we want to know is the probability Pr(g| w1) that a word w1 is produced by latent structure g when we observe it. Using the probability formula and transforming the equation, we obtain: Prðgjw1 Þ =

Prðw1 jgÞPrðgÞ : Prðw1 Þ

The right-hand side consists of three quantities, each of which has substantial significance. These three quantities must be available to compute the left-hand side. First, we need a subjective belief (prior belief), Pr(g), before observing w1. Additionally, we must know the probability Pr(w1 | g) that g generates w1. This is called likelihood in Bayesian learning. Finally, we need Pr(w1). This is the probability that w1 will occur, which is called the evidence.

5.4.1

Topic Model

As mentioned earlier, text generation can be viewed as a stochastic process from a computational perspective. There are many possible models for text generation; however, the topic model is the most widespread in applications in computational social science and sociology. Topic models have been widely applied in various areas of sociology, including cultural sociology, social movements, historical sociology, and the history of sociology (DiMaggio et al., 2013; Fligstein et al., 2017;

76

H. Takikawa and A. Ueshima

Fig. 5.1 The structure of a topic model. Note: Topic as a probability distribution over words such as z1, z2, . . .zm generates concrete words w1, w2, . . . A topic is assigned to each word according to multinomial distribution (topic proportion) unique to each document

Nelson, 2020; Takikawa, 2019). Topic models are analytical tools that allow researchers to discover latent topics in coherent texts and examine the words that represent these latent topics. However, for the results of a topic model to have external validity, it must be possible to say that real-world actors are actually extracting similar topics from the text, at least implicitly, and producing meaning. Therefore, in this chapter, we focus on the persuasiveness of the topic model as a semantic model from a computational perspective. The topic model itself has many variations, but the most basic model is the latent Dirichlet allocation method (Blei et al., 2003). Almost all topic models, including this latent Dirichlet allocation method, are hierarchical Bayesian. Hierarchical Bayesian models represent the generation of meaning structurally. This is one of the most prominent features of the topic models. In the hierarchical Bayesian model, text is assumed to be generated by two different probability distributions: a multinomial distribution that assigns a topic to the slot in which the word is generated, and a multinomial distribution that probabilistically generates a specific word in that slot, conditional on an assigned topic. The former is called topic proportion, and the latter is called topic. We assume that the observed word wi is generated by the latent structure g, expressed by these two distributions. Let us examine this in more detail (Fig. 5.1). First, we assume that each word w1, w2, . . ., wn that appears in document d belongs to some topics z1, z2, . . ., zm. We assume that the word we observe is generated probabilistically according to the topic to which it belongs. For example, suppose we assume that a word like “gene” belongs to the topic “genetics” and a word like “brain” belongs to the topic “neuroscience.” In the topic model, we model the topic “genetics” as a probability distribution (multinomial distribution) over words that generates words like “gene” and “DNA” with high probability (Blei, 2012). That is, each word is determined by the conditional probability distribution Pr (w| zj). How are the various topics assigned to the individual word slots in the first place? This topic assignment follows a probability distribution called topic proportion, which is unique to document d. To summarize, a probability distribution called topic proportion is first assigned to each document. Next, a topic is assigned to a slot

5

Model of Meaning

77

according to the topic proportion. Finally, a word is generated according to the distribution of the assigned topic. Assuming such a generative process, how can we resolve the inverse problem— that is, how can we estimate the topic of the observed word? The problem of estimating the meaning of the word lies in whether the latent topic is interpreted as the meaning of the word. Because the topic model has a hierarchical structure, the process of estimating meaning using inverse computation is also somewhat complex. However, this structure guarantees the flexibility of the semantic representation of the topic model. First, based on the distributional hypothesis of meaning, we assign a probable topic to a word based on what other words appear in document d. It is important to note that the meaning (topic) of a word is different if the surrounding words are different, even if they are the same. For example, if words such as “game” and “win” appear around the word “tie,” it is highly likely that the word means a draw. These words can probably be considered to have been generated by the topic “sports.” Conversely, if words such as “fashion” and “mode” co-occur, the word “tie” is likely to mean a kind of costume. Word polysemy can be modeled naturally in the topic model. Second, a single document can have multiple topics that complicate the topic estimation. If a single document is assigned a single topic (a model that assumes this is called a single-membership model), the estimation is simple: choose the topic with the highest likelihood of producing the set of words observed in the document (single-membership topic models are sometimes used in applications to short texts such as Twitter (Yin & Wang, 2014)). However, if we consider that a document is composed of multiple topics, we need to consider mixed-membership models. For example, if it is plausible to think that a scientific paper is composed of multiple topics, such as “genetics” and “neuroscience, “then the generation of the text should be modeled in a mixed-membership model. In this case, it is necessary to assume a multinomial distribution θd, called topic proportion, regarding the proportion of any topic appearing in the document and estimate its parameters. Accordingly, we consider that potentially different topics z1, z2, . . . are assigned to the words w1, w2, . . .of the document in question. Thus, in actual estimation, for each observed word, the topic is estimated by considering together the likely topic it is produced by and the plausible topic ratio for that document. As previously mentioned, one of the advantages of topic models that rely on complex hierarchical and structural representations is their ability to capture word polysemy. Hierarchical models allow for the possibility of a word belonging to multiple topics depending on its “context” (strictly speaking, it is better to say that they allow for the possibility of a word having a high probability of occurring in multiple topics, since topics are probability distributions). As we will see later, this is a particularly important feature of the topic model in terms of its sociological applications. To estimate the parameters of the topic model, a Bayesian inference framework is used. There are two main types of methods: sampling methods, such as Markov chain Monte Carlo methods, and approximate methods, such as variational Bayesian

78

H. Takikawa and A. Ueshima

methods. For technical details, refer to Stayvers and Griffiths et al. (2007) and Blei et al. (2003). To what extent does the topic model capture human understanding and meaningmaking practice? Of course, it is unlikely that human language use undergoes a generative process assumed by the topic model. However, the inverse process framed in terms of Bayesian inference can be regarded as modeling, to some extent, our understanding of the meaning of texts and the process of meaning-making. We infer the meaning of a word in light of its surroundings and thus predict the next word that will appear in a conversation or in a written text. This can be modeled as a process of inverse (Bayesian) estimation of the topic that produces the word. Also, in the process of comprehension, when we read a complex text consisting of multiple topics, such as a scientific paper or a novel, we can infer backward what topic the text covers and predict the subsequent development of the text accordingly. This also corresponds to the fact that the topic model models the process of considering the topic proportion of an entire document when estimating the meaning of a word. The topic model is also compelling from a cultural sociological perspective. DiMaggio et al. (2013) provided a coherent discussion on this. They pointed out the closeness of the topic model to cultural sociology in three or four ways. The first is the relationality of meaning (DiMaggio et al., 2013). Central to cultural sociology is the idea that “meanings do not inhere in symbols (words, icons, gestures) but that symbols derive their meaning from the other symbols with which they appear and interact (DiMaggio et al., 2013, pp. 586–587; Mohr, 1994; Mohr & Duquenne, 1997; Saussure, 1983). As mentioned earlier, the distributional hypothesis of meaning in computational linguistics has its origins in this cultural sociological idea and can therefore be regarded as the basis for the modeling of this idea. Specifically, the topic model embodies the idea of relationality of meaning in the way the topic of a word is determined. In this model, the topic of a word is assigned by its co-occurrence with other words, which describes the process by which meaning is relationally determined. Cultural sociology also emphasizes that meaning is determined by “context.” Thus, the same word may have different meanings in different contexts. This can be called contextual polysemy, derived from the relationality of meaning (DiMaggio et al., 2013). The topic model models this polysemy such that topic assignment changes depending on differences in collocations, even for the same word. As already mentioned, this modeling of polysemy is a strength of topic models that take a structural representation. DiMaggio et al. (2013) applied the topic model to a corpus of newspaper articles on arts funding by the U.S. government. They tested whether the topic model adequately captured the relationship between meaning and polysemy. In terms of polysemy, they examined whether the word “museum,” for example, actually expresses different meanings depending on the different topics assigned by the model. The results show that the word “museum,” assigned to different topics, can indeed be interpreted as meaning different things. The second way regards heteroglossia (DiMaggio et al., 2013). This was originally Bakhtin’s idea that a text should consist of multiple voices. By voices, Bakhtin

5

Model of Meaning

79

refers to “characteristic modes of verbal expression (word choice, syntax, phrasing, and so on) associated with particular speech communities.” (Bakhtin, 1982 [1934–1941], cited in DiMaggio et al., 2013) As we have seen, the topic model recognizes the mixed membership of documents. This feature can be used to model the manner in which a text is composed of various voices. In other words, each topic is viewed as representing a different voice. DiMaggio et al. (2013) clarified whether a topic represents a different voice by examining the language choices, emotional tone, and argument justification methods for each topic. Third, there is an important concept in cultural sociology: a frame. “A frame is a set of discursive cues (words, images, narrative) that suggests a particular interpretation of a person, event, organization, practice, condition, or situation” (DiMaggio et al., 2013, p. 593). In cultural sociology, frames are linked to people’s cognitive styles and prime specific schemas or activate specific association networks. If we consider frames as a kind of latent structure that produces words, then the topic of the topic model can be considered an operationalization of the concept of frames used in cultural sociology (cf. Fligstein et al., 2017). The above discussion shows that the topic model is not only a methodology, but that it can also be combined with a model of cultural sociology, human cognition, and meaning-making mechanisms that should lie behind it.

5.4.2

Word-Embedding Model

Along with topic models, another model commonly used in sociological research is the word-embedding model (Arseniev-Koehler & Foster, 2022; Jones et al., 2019; Kozlowski et al., 2019). A word-embedding model provides an efficient representation of the meaning of a word using vectors, which is also referred to as the distributed representation of words. The most important feature of this model is that meanings can be arranged spatially by representing the meanings of words by vectors. In semantic space, the more the meanings are represented by vectors that are close to each other, the greater the “similarity” between them. In addition, in semantic space, it is possible to perform calculations called analogy calculations. For example, the following calculation is possible. king - man þ woman = queen: Underlying this calculation is the principle that a directional vector such as womanman corresponds to a certain semantic dimension (in this case, the gender dimension), and that by moving the king in that direction in space, meaning can be shifted along that semantic dimension (see Fig. 5.2). The structure of the semantic space in which this analogical calculation is possible has great sociological applicability. For example, this feature can be used to construct axes on the semantic space that represent specific social dimensions of meaning, such as the gender dimension or the social class dimension. These

80

H. Takikawa and A. Ueshima

Fig. 5.2 An example of analogy calculation

constructed axes can be used to examine what gendered or class connotations a particular cultural or social phenomenon, such as a sport or occupation, is imbued with (Kozlowski et al., 2019). Alternatively, we can look at the moral or cultural connotations attached to obesity (Arseniev-Koehler & Foster, 2022). Again, we examine the persuasiveness of word-embedding models not as mere analytical tools but in terms of modeling human meaning-making (cf. ArsenievKoeler & Foster, 2022; Günther et al., 2019; Lake & Murphy, 2023). Wordembedding models can be constructed in several different ways, but we will focus on predictive or artificial neural network models. A typical model of an artificial neural network is the word2vec model (Mikolov et al., 2013). There are two different types of word2vec models: skip-gram models and CBOW models; in this chapter, we introduce the CBOW model. In artificial neural network models, the parameters of the model are learned by repeatedly solving a task by trial and error. The CBOW model uses a neural network with a two-layer structure: a hidden layer with N nodes and an output layer with V nodes (see Fig. 5.3). The input V-dimensional one-hot vector (a vector consisting of the indices of V words appearing in the corpus, where the index part of the word is 1 and the others are 0) is propagated to the hidden layer through a V × N weight matrix W and then to the output layer through an N × V weight matrix W′. The Vdimensional vector is then transformed by the softmax function, and the word with the highest probability is selected for the output. This information about the success or failure of the prediction is propagated through the model by back propagation, and the parameters are adjusted. How would the vector of words or the discrete representation be obtained using this model? It corresponds to a row vector with N elements of a weighting matrix that weighs from the input layer to the hidden layer. In general, the number of words V appearing in a corpus is tens of thousands or more, whereas N is only a hundred to a few hundred, so the dimension of the vector is highly reduced, which means that the embedded vector efficiently stores semantic information. To what extent can the word-embedding model be used for meaning-making? Let us consider its persuasiveness as a model in terms of the predictive function of meaning, relationality of meaning, and semantic learning. The neural network model of word embedding is also based on the predictive function of the meaning. Specifically, the meaning of a word is learned by predicting the target word from the context word of the window size. In other words, distributed

5

Model of Meaning

81

Fig. 5.3 Architecture of artificial neural network in CBOW model

representations of context words are learned in terms of the representations that best predict the target. Thus, the meanings stored in the word-embedding vector are organized under the function of predicting words. Furthermore, the word-embedding model is based on the distributional hypothesis of meanings, evident from the structure of the CBOW model, in which context words are linked to target words through prediction. The idea that meaning is acquired through the task of predicting co-occurring target words is consistent with the idea of the distributional hypothesis that meaning is determined by surrounding words. Thus, this model also captures the relationality of meanings. Finally, in terms of learning, the word2vec CBOW model did not use a Bayesian inference framework. However, it models the learning process of acquiring the semantic content of a word by trial and error through the success or failure of predictions.

5.4.3

Topic Models and Word-Embedding Models

In terms of the predictive function of meaning, the relationality of meaning, semantic learning, topic models, and word-embedding models broadly share these characteristics. However, there are significant differences between topic models and wordembedding models in terms of meaning-making models. The following two points can be pointed out. First, the topic model is based on hierarchical and structural

82

H. Takikawa and A. Ueshima

representations, whereas the word2vec model is based on spatial representations. Second, the topic model is a generative model, whereas word2vec is not directly a generative model. The topic model is a structural representation model, which means that the model is hierarchical, as we have seen earlier. In other words, the topic model is a two-stage process in which topic proportions are determined, topics are assigned based on such proportions, and words are generated probabilistically. Such a structural representation allows for flexible representation of the meaning of words. For example, topic models can represent the polysemy of words. The same word may have different meanings for different topics. This is not possible in a normal embedding model. Another strength of structural representation is its ability to classify words as concepts into qualitatively different groups, such as frames in sociology. On the contrary, the spatial representations made possible by word-embedding models also have aspects that are well in line with the theoretical tradition of sociology. One such tradition is Bourdieu’s model of social spaces. As mentioned earlier, his theory represents both social and symbolic structures in a spatial model, in which the structure of social space corresponds to the structure of people’s cognition (Bourdieu, 1989). The advantage of spatial representation is that different social and cultural meanings can be mapped onto the same space, allowing the study of the location of these meanings. This seems particularly suited to thinking and cognitive practices, through which we compare a set of concepts along a dimension and examine their location along that dimension. Specifically, the assignment of gender stereotypes to certain occupations, for example, or class images attached to cultural practices such as going to the theater, sports, or museums, are often captured by such spatial representations. Kozlowski et al. (2019) examplify such an approarch. In line with Bourdieu’s theoretical position, such spatial representations can be seen as reflecting the structure of our cognitive practices. Apart from Bourdieu, the view that humans use spatial metaphors to construct meaning is well established in conceptual metaphor theory in cognitive linguistics (Lakoff & Johnson, 1999). However, the extent to which cognitive practices can be captured in unstructured homogeneous spaces remains debatable. The first problem is that naive spatial models cannot distinguish between the various aspects of semantic “similarity.” For example, the similarity of meaning as measured by word2vec cannot distinguish between the synonymy and antonymy of words (Arseniev-Koehler & Foster, 2022). This limitation stems from the structural inflexibility of the spatial arrangement and representation of word meanings (Chen et al., 2017; Griffiths et al., 2007). Second, the naive spatial model cannot capture the polysemy of the words. Because the meaning of a word is represented by only one position in space, it cannot capture the process by which the meaning of a word changes according to context. This is a major difference from the structural representation of the topic model, which is better suited for representing word polysemy. Third, there is a major difference in that the topic model is a generative model, whereas word2vec’s embedding model is not a generative model. In terms of

5

Model of Meaning

83

computational theory of meaning, a generative model that models the process of meaning generation is preferable. Therefore, the word2vec embedding model has significant limitations. However, several interesting attempts have been made to transform the embedding model into a generative model. One of these is Arora’s model (Arora et al., 2016, 2018), which has been applied to sociology by DATM (Arseniev-Koehler et al., 2022). Arora proposed a generative model that generates embedding vectors to answer the question of why embedding vectors obtained by word2vec can be added together to form a certain semantic dimension, as in analogical computation (Arora et al., 2016, 2018). The generative model was straightforward. Consider a discourse vector ct on space Rd. The spatial coordinates represent the semantic content being spoken at that moment. ct is the “gist” of the word behind the observed word. Each word also has a latent vector, vw. Now, ct randomly walks on Rd, and the t-th word of a group of words is produced at step t. Specifically, the word produced at time t is determined by the closeness between ct and vw. In other words, w is produced with the following probability of being observed: Pr½the word w is produced at time t jct  / expð < ct , vw > Þ Such a generative model provides a basis for calculating the probability of occurrence of a word using the CBOW model. As we saw earlier, the CBOW model calculates the probability of occurrence of a word c by its closeness to the average of the k context words w1. . .wk. Such averaging naturally follows from the Bayesian inference process of estimating the current c from the words produced by the random walk of c over k periods. When interpreted as a model of semantic cognition, the agent considers the process of estimating the potential meaning (discourse vector) of a word from the distribution of surrounding words, assuming that words appearing in the surroundings are likely to share a potential meaning with the target word. There are other advantages of Arora’s model over word2vec’s CBOW model, in which the observed word can be viewed as a compound construct consisting of multiple discourse vectors or latent meanings analogous to the topic model. This allows the model to address the problem of word polysemy, which is not possible with the conventional embedding model. It also opens up sociological applicability. Arseniev-Koehler et al. (2022) proposed DATM based on this model and applied it to sociological text analysis. Generative modeling of embedding models can be seen as the integration of topic and word-embedding models (cf. Arseniev-Koehler et al., 2022), which is a promising approach. Generative models provide a model of semantic cognition, that is, how humans perceive meaning and generate words. On the contrary, semantic space models, although more limited than structured representations, have advantages over topic models, such as their ability to extract semantic dimensions. Furthermore, the weaknesses of the conventional embedding model, such as the handling of word polysemy, can be overcome to some extent by generative modeling of the

84

H. Takikawa and A. Ueshima

embedding model. However, whether the semantic space model has reached the same level of expressiveness as the structured representation model is an open question. Arora’s generative model assumes a very simple process of inferring the meaning of a target word from surrounding words based on the assumption of a random walk of the potential meanings of words. Further integration of the topic model and embedding model should be pursued by building a generative model that closely approximates the human meaning-making mechanism.

5.5

Language Model and Explanation of Actions

The models presented thus far have focused primarily on how people perceive the meaning of objects and other people (texts about them). However, as Weber (1946) points out, the interpretation of meaning in sociology is ultimately performed to explain human action. While some of our cultural and semantic constructions and their interpretations are directly related to and explain actions, others do not. People’s retrospective and ex-post “explanations” of their actions are at least not directly linked to their actions, but rather justify them ex-post (Swidler, 1986; Vaisey, 2009). In contrast, values and mental images motivate people to act. From the point of view of analyzing meaningmaking mechanisms to explain social phenomena as an accumulation of actions, it is the latter kind of motivational meaning-making that we want to extract. Therefore, the question of whether the semantic structure and semantic space obtained by the topic model and embedding model are actually related to people’s actions, and if so, how they are related, must exist at the final stage of elucidating the meaning-making mechanism. In other words, it is necessary to examine the performance of the meaning-making model by focusing on the extent to which the meaning of an object identified by the meaning-making model can explain subsequent human behavior. If the meaning of the object identified by the model can explain human behavior to some extent, it can be said that the model’s estimation of “meaning” has some validity. Here, we focused on a specific class of meaning-making models, the word2vec model. We examined to what extent the meanings that the word2vec model specifies can explain the succeeding behaviors. In other words, we examined whether and to what extent it is possible to predict people’s behavior using the word vectors obtained from word2vec as explanatory variables. Here, we refer to the method in which word vectors are used as explanatory variables in regression analysis to predict human judgments (Bhatia, 2019; Hollis et al., 2017; Richie et al., 2019). Consider that we would like to use words as explanatory variables and predict people’s responses to the words (a specific example follows later). In this case, we can use word2vec to represent word meanings using 300-dimensional vectors. This means that each explanatory variable (i.e., a word) has 300 semantic dimensions, such as when we used 300 survey questions

5

Model of Meaning

85

regarding the word and quantified the word’s meaning with 300 numeric values. Because each explanatory variable is now represented in 300 numeric values, a regression model in this method has 300 corresponding coefficients that determine the weights for each of the 300 semantic dimensions. Owing to the many explanatory variables, regularization methods such as ridge regression or model comparison based on AIC are often used to prevent overfitting. This method shows that judgments such as risk perception, gender roles, and health behaviors can be predicted with high accuracy (Richie et al., 2019). In a more recent study, Ueshima and Takikawa (2021) used word vectors to predict people’s vaccine allocation judgments regarding COVID-19. Participants rated how much priority vaccination should be given to each of the >130 occupations. The authors used a pre-trained Japanese word2vec resource (Manabe et al., 2019) to obtain a 300-dimensional word vector for each occupation. They reported that regression analysis using word vectors as explanatory variables exhibited a high out-of-sample predictive accuracy for participants’ vaccination priority judgments. To demonstrate the effectiveness of this approach, they compared the word-vector regression model with a benchmark regression model. The benchmark regression contained relevant explanatory variables such as the social importance of each occupation to quantify the occupations. The results of the model comparison showed that the word-vector model predicted vaccination priority judgments better than the benchmark model did. It is notable that the explanatory variables of the benchmark model—social importance, personal importance, and familiarity with each occupation—were obtained from each participant, while the word vector used in this study was not measured for this specific study, demonstrating the usefulness of the word-vector approach. Overall, the results of this study suggest that word vectors can quantify the meanings that people have for each occupation. In regression using word vectors, prediction is made by learning the regression coefficients or weights for each dimension of the word vector from the data. Intuitively, this can be interpreted as modeling how much each of the hundreds of semantic dimensions is weighted when people judge specific domains such as vaccination priority, gender role, or risk perception. Using weights (regression coefficients) for each semantic dimension makes it possible to interpret the criteria used to make judgments (Bhatia, 2019). In Ueshima and Takikawa (2021), participants answered that they would prioritize vaccination for occupations such as nurses. Accordingly, the 300-dimensional weights of a regression model trained by the participants’ vaccination judgments had large dot products with the word vector of “nurse.” This is because larger dot products indicate a higher prioritization of vaccination in this regression model. Importantly, it is possible to calculate the dot products between the regression weights and word vectors other than occupations. By exploring words that have larger or smaller dot products with the obtained 300-dimensional weights, it is possible to interpret the criteria associated with increasing or decreasing people’s judgments of vaccination priority. In the case of the vaccination priority judgments, words associated with medical institutions such as a “hospital” and with public service such as the “local government” produced larger dot products with the weights compared to other

86

H. Takikawa and A. Ueshima

common words, suggesting that meanings of these words were related to criteria for rating vaccination priority higher. Such exploratory analyses of judgment criteria help to understand the psychological mechanisms underlying judgments. Moreover, it was possible to predict vaccination priority ratings for occupations that were not included in the study. For example, based on these exploratory analyses, we can infer that an occupation such as a city hall worker would be rated highly. Thus, interpreting the obtained weights can lead to rigorous confirmatory research on new occupations. In summary, using word vectors as predictors or explanatory variables in multiple regression analysis is a promising method for predicting and interpreting human behavior. The fact that the word vectors obtained from word2vec are helpful in predicting human behavior indicates that it captures not only the semantic relationships between words in the linguistic corpus but also, to some extent, human knowledge about the world (Caliskan et al., 2017; Günther et al., 2019). To further develop models of meaning for predicting human behavior, future research should consider that the meanings of words are constructed not only by linguistic information but also by the perceptual and motor systems of humans (Bruni et al., 2014; Glenberg & Robertson, 2000; Lake & Murphy, 2023). Using multimodal data is a promising direction for developing better models of meaning to predict human behavior. Another important direction for future research is to model the heterogeneity of meanings and behaviors among people. At present, corpora used to obtain word vectors often consist of linguistic resources generated by many people and not by a certain individual. Therefore, the vector representation of words to be learned represents the average meaning or knowledge representation of words for the people who generate the corpora. However, the meanings of words should differ among individuals depending on the nature of the words (Wang & Bi, 2021). For example, small children may associate an occupation, such as doctors, with fear, while older people do not. Such heterogeneity of word meanings affects individuals’ behavior differently. Thus, obtaining word vectors that capture the heterogeneity of meanings is a necessary step toward modeling individual behaviors with higher accuracy.

5.6

Conclusion

In this chapter, we discuss the possibility of applying and extending models developed in computational linguistics to construct a sociological model of meaningmaking. The starting point for constructing a sociological theory of meaning-making is to view it as a computational problem, that is, to capture valid information from the environment with uncertainty, predicting next events and thus achieving one’s own goals. From this formulation, we can draw three points of meaning production: the predictive function of meaning, relationality of meaning, and Bayesian learning. Both topic models and word-embedding models can be interpreted as theoretical models of meaning-making, but there are differences between them. The topic model is a hierarchical generative model of language that is highly compatible with cultural

5

Model of Meaning

87

sociology and particularly suited for capturing word polysemy. In contrast, wordembedding models allow for spatial representation and are suitable for capturing the cultural dimensions of events and practices. An integrated model of the topic and word-embedding models is required. The last element that completes the theory of sociological meaning-making is the link between interpretation and action. In this section, we introduce a regularized regression model that examines the relationship between semantic representation and action. Future directions include the incorporation of not only linguistic information but also nonlinguistic information, information about the physical environment and the body, and heterogeneity of meaning interpretation according to the attributes of the actor and the socialization process, enabling a more precise prediction of subsequent actions.

References Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review, 98(3), 409. Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2016). A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4, 385–399. Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2018). Linear algebraic structure of word senses, with applications to polysemy.Transactions of the Association for. Computational Linguistics, 6, 483–495. Arseniev-Koehler, A., & Foster, J. G. (2022). Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat. Sociological Methods & Research, 51(4), 1484–1539. Arseniev-Koehler, A., Cochran, S. D., Mays, V. M., Chang, K. W., & Foster, J. G. (2022). Integrating topic modeling and word embedding to characterize violent deaths. Proceedings of the National Academy of Sciences, 119(10), e2108801119. Bakhtin, M. M. (1982). (1934–1941). (M. Holquist, Trans.) In: C. Emerson & M. Holquist (Eds.), The dialogic imagination: Four essays. University of Texas Press, . Becker, G. S. (1976). The economic approach to human behavior. University of Chicago Press. Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the sociology of knowledge. Anchor Books. Bhatia, S. (2019). Predicting risk perception: New insights from data science. Management Science, 65(8), 3800–3823. https://doi.org/10.1287/mnsc.2018.3121 Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. Bourdieu, P. (1984). Distinction: A social critique of the judgement of taste. Harvard University Press. Bourdieu, P. (1989). The state nobility: Elite schools in the field of power. Stanford University Press. Boutyline, A., & Soter, L. K. (2021). Cultural schemas: What they are, how to find them, and what to do once you’ve caught one. American Sociological Review, 86(4), 728–758. Bruni, E., Tran, K. N., & Baroni, M. (2014). Multimodal distributional semantics. Journal of Artificial Intelligence Research, 49, 1–47. https://doi.org/10.1613/jair.4135

88

H. Takikawa and A. Ueshima

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10. 1126/science.aal4230 Chen, D., Peterson, J. C., & Griffiths, T. L. (2017). Evaluating vector-space models of analogy. ArXiv, 1705(04416), 1–6. Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press. Coleman, J. S. (1990). Foundations of social theory. Harvard university press. Dilthey, W. (1910). Der Aufbau der geschichtlichen Welt in den Geisteswissenschaften. Verlag der Königlichen Akademie der Wissenschaften, in Commission bei Georg Reimer. DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23, 263–287. DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding. Poetics, 41(6), 570–606. Durkheim, E. (1915). The elementary forms of the religious life: A study in religious sociology. Macmillan. Evans, J. A., & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual Review of Sociology, 42, 21–50. Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. In Studies in linguistic analysis (pp. 1–32). Basil Blackwell. Fligstein, N., Stuart Brundage, J., & Schultz, M. (2017). Seeing like the Fed: Culture, cognition, and framing in the failure to anticipate the financial crisis of 2008. American Sociological Review, 82(5), 879–909. Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of culture. Poetics, 68, 144–154. Garfinkel, H. (1967). Studies in ethnomethodology. Polity Press. Glenberg, A. M., & Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning. Journal of Memory and Language, 43(3), 379–401. https://doi.org/10.1006/jmla.2000.2714 Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211. Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. Günther, F., Rinaldi, L., & Marelli, M. (2019). Vector-space models of semantic representation from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psychological Science, 14(6), 1006–1033. https://doi.org/10.1177/1745691619861372 Harris, Z. (1954). Distributional hypothesis. Word. World, 10(23), 146–162. Hohwy, J. (2013). The predictive mind. Oxford University Press. Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8), 1603–1619. https://doi.org/10.1080/17470218.2016.1195417 Jones, J. J., Amin, M. R., Kim, J., & Skiena, S. (2019). Stereotypical gender associations in language have decreased over time. Sociological Science, 7, 1–35. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905–949. Lake, B. M., & Murphy, G. L. (2023). Word meaning in minds and machines. Psychological Review, 130(2), 401–431. https://doi.org/10.1037/rev0000297 Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to Western thought. Basic Books. Lamont, M. (2000). Meaning-making in cultural sociology: Broadening our agenda. Contemporary Sociology, 29(4), 602–607. Lizardo, O. (2004). The cognitive origins of bourdieu’s habitus. Journal for the Theory of Social Behaviour, 34(4), 375–401.

5

Model of Meaning

89

Lizardo, O. (2019). Pierre bourdieu as cognitive sociologist. In W. Brekhus & G. Ignatow (Eds.), The Oxford Handbook of Cognitive Sociology. Oxford University Press. Luhmann, N. (1995). Social systems. Stanford University Press. Manabe, H., Oka, T., Umikawa, Y., Takaoka, K., Uchida, Y., & Asahara, M. (2019). Japanese word embedding based on multi-granular tokenization results (in Japanese). In Proceedings of the twenty-fifth annual meeting of the Association for Natural Language Processing. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. MIT Press. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119. Mohr, J. (1994). Soldiers, mothers, tramps and others: discourse roles in the 1907 New York Charity Directory. Poetics, 22, 327–358. Mohr, J. W. (1998). Measuring meaning structures. Annual Review of Sociology, 24(1), 345–370. Mohr, J. W., & Duquenne, V. (1997). The duality of culture and structure: poverty relief in New York City, 1888–1917. Theory and Society, 26, 305–356. Mohr, J. W., Bail, C. A., Frye, M., Lena, J. C., Lizardo, O., McDonnell, T. E., Mische, A., Tavory, I., & Wherry, F. F. (2020). Measuring culture. Columbia University Press. Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3–42. Opp, K. D. (1999). Contending conceptions of the theory of rational action. Journal of Theoretical Politics, 11(2), 171–202. Richie, R., Zou, W., & Bhatia, S. (2019). Predicting high-level human judgment across diverse behavioral domains. Collabra: Psychology, 5(1), 50. https://doi.org/10.1525/collabra.282 Saussure, F. (1983). Course in general linguistics. Open Court Press. Schutz, A., & Luckmann, T. (1973). The structures of the life-world (Vol. 1). Northwestern University Press. Simmel, G. (1922). Die Probleme der Geschichtsphilosophie: eine erkenntnistheoretische Studie. Duncker & Humblot. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis (pp. 439–460). Psychology Press. Swidler, A. (1986). Culture in action: Symbols and strategies. American Sociological Review, 51, 273–286. Takikawa, H. (2019). Topic dynamics of post-war Japanese sociology: Topic analysis on Japanese Sociological Review corpus by structural topic model (Japanese). Sociological Theory and Methods, 34(2), 238–261. Ueshima, A., & Takikawa, H. (2021). December. Analyzing vaccination priority judgments for 132 occupations using word vector models. In IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (pp. 76–82). Vaisey, S. (2009). Motivation and justification: A dual-process model of culture in action. American journal of sociology, 114(6), 1675–1715. Wang, X., & Bi, Y. (2021). Idiosyncratic tower of babel: Individual differences in word-meaning representation increase as word abstractness increases. Psychological Science, 32(10), 1617–1635. https://doi.org/10.1177/09567976211003877 Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology, 120(2), 313–351. Weber, M. (1946). From Max Weber: Essays in sociology. Facsimile Publisher. White, H. C. (1963). An anatomy of kinship: Mathematical models for structures of cumulated roles. Prentice-Hall. White, H. C. (1992). Identity and control: A structural theory of social action. Princeton University Press. White, H. C. (1995). Network switchings and Bayesian forks: Reconstructing the social and behavioral sciences. Social Research, 64, 1035–1063.

90

H. Takikawa and A. Ueshima

White, H. C. (2008a). Identity and control: How social formations emerge (2nd ed.). Princeton University Press. White, H. C. (2008b). Notes on the constituents of social structure. Soc. Rel. 10-Spring’65. Sociologica, 2(1). White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4), 730–780. Yin, J., & Wang, J. (2014). A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 233–242).

Chapter 6

Sociological Meaning of Contagion Yoshimichi Sato

6.1

Contagion as a Main Theme in Sociology

Contagion or diffusion has been a main theme in sociology.1 Gabriel Tarde, a founding father of the study of diffusion, proposed the laws of imitation and made diffusion as a critical concept in the study of society, that is, sociology (Tarde, 1890). Following the tradition of the study of diffusion laid out by Tarde, Everett Rogers studied classic examples of diffusion in his seminal work (Rogers, 2003). This book, whose first edition was published in 1962, deals with various topics of diffusion that succeeded or failed from a failed case of a practice of water boiling in a Peruvian village to the diffusion of hybrid corn in Iowa to the STOP AIDS program in San Francisco. Computational social science has rapidly and radically advanced the study of contagion. This is partly because computational social science finds it easy to trace the contagion process by utilizing two characteristics of big data, that is, “Big” and “Always-on” (Salganik, 2018). “Big” literally means that the size of the data is large, and “Always-on” means that data is continually collected. Data used by conventional methods in the study of contagion misses these characteristics. Wu et al. (2020), for example, apply big data methods to nowcast and forecast of COVID-19. Their article has three purposes. The first one is to infer the basic reproductive number of COVID-19, R0, and the outbreak size in Wuhan, China, from December 1, 2019, to January 25, 2020. The second one is to estimate the number of cases exported from Wunan to other cities in mainland China. The third one is to forecast the spread of COVID-19 within and outside mainland China. For 1

Contagion and diffusion are used interchangeably in this chapter.

Y. Sato (✉) Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_6

91

92

Y. Sato

these purposes, they build a susceptible-exposed-infectious-recovered (SEIR) model to simulate the pandemic of COVID-19 in Wuhan. For the estimation of the model, they need mobility data of people moving from and into Wuhan domestically as well as internationally. They collect data from three sources: “(1) the monthly number of global flight bookings to Wuhan for January and February, 2019, obtained from the Official Aviation Guide (OAG); (2) the daily number of domestic passengers by means of transportation recorded by the location-based services of the Tencent (Shenzhen, China) database from Wuhan to more than 300 prefecture-level cities in mainland China from January 6 to March 7, 2019; and (3) the domestic passenger volumes from and to Wuhan during Chunyun 2020 (Spring Festival travel season. . .) estimated by Wuhan Municipal Transportation Management Bureau and press-released in December, 2019” (Wu et al., 2020, pp. 690–691). Their research is an excellent example of big data analysis applied to the study of pandemic, the diffusion of the corona virus. And a huge number of articles on the diffusion of the corona virus using big data have been published since the outbreak. However, diffusion of new ideas, values, and norms in society studied by Rogers is different from that of a virus such as the corona virus. I will explore the difference in the next section.

6.2

Complex Contagion

Diffusions of new cultural items such as ideas, values, and norms are different from that of viruses. Centola and Macy (2007) call diffusions of new cultural items “complex contagions” and argue that an individual needs to contact more than one source of activation for a complex contagion to occur. In contrast, a simple contagion, a contagion of a virus, for example, occurs if an individual contacts a person having a new item such as a virus. This type of contagion is based on a biological mechanism, and the individual does not need to interpret the meaning of the virus to be infected by it. In contrast, a diffusion of a cultural item is based on a sociological mechanism, and the individual interprets its meaning before he or she decides whether to accept it or not. Take a social movement, for example. Participation in a social movement, say, in a demonstration incurs risks such as being arrested by police and being beaten by opponents of the movement. Thus, exposure to a single source of activation is not enough for an individual to participate in the risky movement. Rather, he or she needs more than one source of activation to participate in it, as Centola and Macy (2007) argue. Why does he or she need more than one source of activation? Centola and Macy (2007) propose four reasons: strategic complementarity, credibility, legitimacy, and emotional contagion. I will examine them in detail in the next section. What is important about them is that, in a sense, Centola and Macy (2007) go back to Rogers’ (2003) study of diffusion because he also recognizes the importance of sociological and psychological factors for a diffusion to occur. Centola and Macy (2007) conduct computer simulations to support their argument. They start with the small world model proposed by Watts and Strogatz (1998).

6

Sociological Meaning of Contagion

93

Watts and Strogatz (1998) assume a ring lattice in which an individual is connected to four nearest neighbors. Then, they randomly cut ties and connect them to randomly chosen individuals. This rewiring connects distant individuals and creates a small world. The originality of their study is that they demonstrate that a simple procedure of cutting and rewiring ties creates a small world, which became famous because of Milgram’s small-world experiment (Milgram, 1967). He conducted innovative social-psychological experiments. He and his collaborators randomly chose a group of people in Wichita, Kansas, and in Omaha, Nebraska, and asked people who agreed to participate in the experiments, who were called a starting person, to send a folder via their acquaintances to a person living or working near Harvard University, who was called a target person. A result of the experiments shows that the median number of intermediate acquaintances for the folder to be delivered from a starting person to the target person is five. The result empirically and scientifically endorses the cliché used in daily life, “What a small world.” A major contribution of Watts and Strogatz (1998) is that they succeeded in creating a small world by a simple procedure of randomly cutting and rewiring ties. Centola and Macy (2007) modify the model of Watts and Strogatz (1998) by changing only one assumption. They make activation thresholds for contagion higher than those assumed in previous studies. This seemingly minor change in the assumption leads to the importance of wide bridges for complex contagion to propagate. Simply speaking, the width of a bridge between two persons is the number of bridges closely connecting them. [See Centola and Macy (2007, pp. 713–714) for a mathematically strict definition of the width of a bridge.] Centola (2018) advanced the study by Centola and Macy (2007) by conducting an online experiment. As he points out, it is almost impossible to collect the whole network data of a society even though its size is small in order to check the empirical validity of the theory of complex contagion. To solve this problem, Centola (2018) created a society online, which has two different social networks: a clustered network and a random network. The society he created was an online health community called the Healthy Lifestyle Network. When a participant in the experiment arrived at a web page of the Healthy Lifestyles, he or she was given overview information on it. Then, if he or she agreed to join the network and signed up for it, he or she was randomly assigned to one of the two networks. In either network, a participant knew that he or she could interact with only a set of neighbors, who were called “health buddies” in the experiment. In other words, a participant does not know the whole structure of the network he or she was allocated to. In the clustered network a participant shared overlapping contacts with other health buddies, so he or she had wide bridges with them. In the random network, a participant does not share such wide bridges with other health buddies. Contagion began with the random choice of a participant. The participant sent a message to his or her neighbors to encourage them to join a health forum website. The neighbors decided whether to join the forum or not. If a neighbor joined the forum, invitation messages were automatically sent from him or her to his or her health buddies to invite them to join the forum.

94

Y. Sato

Joining the forum was not easy, however. A participant could not join the forum only by clicking a “Join” button. Rather, he or she had to fill in answers to questions on a registration form that was long enough for him or her to scroll down to complete. Centola (2018) intentionally designed this form to make the contagion in the experiment more difficult than a contagion of a virus. In other words, this task of joining the forum is an appropriate condition to test the empirical validity of the theory of complex contagions proposed by Centola and Macy (2007). Results of the experiment clearly showed that diffusion in the clustered network evolved faster than that in the random network and that the percentage of adopters was higher in the clustered network than that in the random network. In addition, carefully observing the process of contagion in the two networks, Centola (2018) reported that invitation messages circulated in the same neighborhood in the clustered network, which means that a participant was exposed to the invitation messages via more than one neighbor. In the random network, in contrast, the messages quickly diffused in the network, but, because of the lack of redundancy, the diffusion did not evolve so fast as in the clustered network, and the percentage of adopters was lower than that in the clustered network. These results empirically support the theory of complex contagions.

6.3

Role of Meaning and Interpretation in the Contagion Process

Although it has advanced the study of contagion, I would argue that the theory of complex contagions does not fully explain a contagion process among individuals, because it does not incorporate meaning and interpretation in the contagion process. Centola and Macy (2007) and Centola (2018) proposed four mechanisms to explain why complex contagions need multiple sources to occur: strategic complementarity, credibility, legitimacy, and emotional contagion. Strategic complementarity means that for an individual to participate in a risky, costly behavior such as participation in collective action, he or she needs to know other people in his or her network have already participated in it. Credibility means that for an individual to adopt a new item, he or she needs more than one sources to believe that the item is credible. Legitimacy seems related to strategic complementarity. If some people who are strongly connected to an individual have participated in a risky, costly behavior such as a demonstration, bystanders become more likely to accept it as legitimate, which in turn encourages the individual to participate in it. Emotional contagion means that emotions are exchanged and amplified in a collective action, and such emotional contagions encourage an individual connected to close friends participating in it to participate in it. These mechanisms are sociologically plausible, but they miss interpretation and meaning in the diffusion process (Goldberg & Stein, 2018). Because the theory of associative diffusion by Goldberg and Stein was explained in Chap. 2, I revisit the

6

Sociological Meaning of Contagion

95

failed case of diffusion of a boiling water practice in a Peruvian village (Rogers, 2003) to show the importance of interpretation and meaning in diffusion studies. Boiling water for drinking was crucial for the health of residents in a peasant village because the water they drank was contaminated. Thus, Nelida, the local health worker representing the Peruvian public health agency in the village, conducted a two-year campaign to persuade villagers to boil water. However, the campaign failed. Only 11 families out of 200 families in the village began to boil water. Why did the campaign fail even though boiling water was beneficial to the health of the villagers and not risky to them? Rogers (2003) argues that the villagers perceived boiling water as culturally inappropriate. They categorized foods, beverages, and medicines into “hot” and “cold” types. The categorization was not related to their actual temperatures. Rather, it was social constructed among villagers who believed in the legitimacy of the categorization. Furthermore, it is socially connected to illness. In general, sick persons should avoid extremely hot or cold types, and raw water was categorized as a very cold type. Thus, only sick persons drank boiled water because villagers thought that boiling water would make it not extremely cold. Healthy persons, in contrast, did not drink boiled water, because they thought that they did not need to drink it. This failure case shows the importance of local culture and norms that dominate the scheme of interpretation and meaning of people. If the villagers had not culturally linked boiled water to illness, they would have accepted the custom of boiling water. Although the four mechanisms for complex contagions to occur are convincing, the theory of complex contagions does not seem to properly deal with interpretation and meaning in the process of contagions. In other words, if it includes interpretation and meaning in its logic, the theory of complex contagion will enhance its explanatory power.

6.4

Big Data Analysis of Diffusions, Interpretation, and Meaning

Before examining how to incorporate interpretation and meaning in big data analysis in detail, let me quickly review their history in sociology to show their importance. Mead (1934) was one of the founding fathers who introduced interpretation and meaning in sociology. To summarize his profound theory in a very simple way, self consists of I and Me. Me is expectations of others I accepts, and I reacts to the expectations. Multiple selves smoothly interact with each other if the reactions are not contradictory to the expectations. I do not think that Me is the expectations of others as they are. Rather, I interprets the meaning of the expectations and reacts to the interpreted expectations. Here we see the interaction between interpretations and reactions.

96

Y. Sato

Berger and Luckmann (1966) took Mead’s theory a step further in sociology. They proposed a theory that explains how reality is constructed by interactions of actors. Reality, or social reality to be exact, does not exist without actors. Actors interact with other actors, add meanings to their actions, and interpret them. If actions and interpretations fit well together, reality emerges. In other words, reality is socially constructed by actors involved. Then, actors interpret the reality as objective social order and behave in accordance with it. Here again, we observe the interaction between interpretations and actions in the creation of social reality, or social order. The theory of social construction of reality by Berger and Luckmann (1966) influenced studies of social movements, mobilization processes in particular, which are closely related to the theory of complex contagion by Centola and Macy (2007). Resource mobilization theory (e.g., McCarthy & Zald, 1977) used to be a main paradigm in the study of social movements. The theory argues that resources such as social movement organizations and human and financial resources are necessary for social movements to emerge and succeed. However, it was criticized because it did not exactly explain how people were mobilized. To overcome this shortcoming of the theory, D. A. Snow, the main figure in the development of a new theory called frame theory, and his colleagues focused on how a social movement organization aligns its frame with that of people the organization wants to mobilize (Snow et al., 1986). It is often the case that the frame of a social movement organization is different from that of its target people in the beginning of a social movement. In this case, even if the organization has plentiful resources for mobilization, the target people do not participate in the movement. This is because the people do not understand the meaning of and, therefore, the significance of the movement due to the difference between their frame and that of the organization. Thus, the organization tries to adjust its frame so that the target people would interpret the movement as significant and related to their own interests. Here again, it becomes obvious that the people’s interpretation of the adjusted frame and the movement is the key for the organization to succeed in mobilizing its target people. So far, I argued the importance of interpretation and meaning in sociology in general and the study of social movements in particular. This is also the case when we study the process of diffusions, as we observed in the failure of the boiling water campaign in a Peruvian village. Then, how can we incorporate interpretation and meaning in big data analysis of diffusion? This is a difficult task because most of the big data is about behavior of people, and, therefore, big data analysis finds it difficult to deal with interpretation and meaning. Of course, people express their opinions on Twitter and Facebook, but we do not know how people interpret them and add meaning to them by big data analysis. How can we solve this problem? One possible solution would be applying topic models to text data such as Twitter and Facebook (Bail, 2016). Chapter 2 cites DiMaggio et al. (2013) who applied a topic model to newspaper articles to explain the decline in the government support for artists between mid-1980s and mid-1990s. Here I examine Bail’s (2016) work in detail because he and I share the same interest in interpretation and meaning in big data analysis.

6

Sociological Meaning of Contagion

97

In general, people do not receive discourses expressed by other people as they are. They interpret the discourses and add meaning to them using their cognitive schema or frame. If the discourses and their cognitive frame are close to each other, people tend to accept them. If not, they tend to refuse them. Based on this theoretical idea, Bail (2016) studied how organ donation advocacy organizations align their frames with those of their target population to induce people in the target population to engage with them. He proposed a theory of cultural carrying capacity. Organ donation advocacy organizations face a dilemma when they use social media to attract more people. On the one hand, they need to cover various topics in their messages so that more people would resonate with their frame. To cite Bail’s example, an organization that produces messages discussing only sports could attract only basketball fans, but one that spreads messages about sports, religion, or science could induce not only sports fans but also religious people or scientists to endorse its activities. On the other hand, if an organization produces messages that cover too many diverse topics, such diversification might limit its capacity to mobilize people. This is because diversification creates disconnected audiences without collective identity, which is necessary for mobilization. Thus, theoretically, diversification of topics in messages has an inverted U-shaped effect on the number of the mobilized people. If the level of diversification is very low as in the case of messages whose topic is only about sports, the number of the mobilized people should be small. Meanwhile, if the level of diversification is very high, the target population becomes fragmented, and, therefore, the number of the mobilized people should be small, too. Thus, the number of the mobilized people should become the largest if the level of diversification is in the middle. To check the empirical validity of this theoretical reasoning, Bail created an innovative method. He created a Facebook application. If an organ donation advocacy organization participates in the study and installs the application, it is provided with useful information on and recommendations about its online outreach activities. In return, the application collects all publicly available text from the organization’s Facebook fan page and Insights data available only to the owner of the fan page, that is, the organization. Then the application conducted a survey to the representative of the organization to collect information on the organization and its outreach tactics. Forty-two organizations participated in the study and produced 7252 messages between October 1, 2011 and October 6, 2012. These messages are analyzed by structural topic modeling. Topic modeling extracts latent topics from the messages and calculates scores that show how strongly a message is associated with topics. The scores are called membership scores. Structural topic modeling deals with metadata so that it could deal with temporal change in the meaning of a word or a group of words vis-à-vis a topic. Eventually, 39 topics were identified. Then Bail (2016) created an index that shows the diversity of topics using the matrix of membership scores for each post in the topics. He calls the index the coefficient of discursive variation. Mathematically, the coefficient for each organization i at time t (Cit) is defined as follows:

98

Y. Sato

Cit =

σt - 7 μt - 7

“where σ is the standard deviation of the membership matrix of organization i’s posts in the 39 topic categories during the previous week and μ is the mean score of the posts across all topic categories during the same time period” (Bail, 2016, p. 284). The coefficient of discursive variation looks like the coefficient of variation. If organization i uses diverse topics in their posts, Cit becomes large. Thus, the coefficient of discursive variation can be used to check the empirical validity of the abovementioned theoretical argument. If the coefficient is very small or very large, the number of mobilized people (the number of engaged Facebook users by day in this case) should be small. If the coefficient is in the middle, the number of mobilized people should be large. To show that his theoretical argument is empirically valid, Bail (2016) conducted a sophisticated type of regression modeling with the number of engaged Facebook users by day as the dependent variable. The key independent variable is the coefficient of discursive variation. He also considered other theories proposing factors that might affect the number of engaged Facebook users by day and included variables derived from the theories in the models as control variables. Thus, if the coefficient of discursive variation has an inverted U-shaped effect on the number of engaged Facebook users by day after controlling for the variables derived from other theories, his theoretical argument would be empirically supported. A simple graph in Fig. 6.1 shows that the theoretical argument seems empirically valid. To confirm the robustness of the graph, Bail conducted regression analysis, and its results support the theoretical argument. The coefficient for the coefficient of discursive diversity is positive, and that for the squared coefficient of discursive diversity is negative after controlling for other variables. This means that the coefficient of discursive diversity has an inverted U-shaped effect on the number of engaged Facebook users by day as in Fig. 6.1. The significance of Bail’s (2016) study is that it highlights the importance of meaning and interpretation in mobilization. If an organ donation advocacy organization posts messages focusing on too few topics, its frame does not match frames of the audience, so it cannot get attention from a wide range of audience. If it posts messages about too many various topics, the audience interprets that its frame is fragmented and contradictory, so it cannot appeal to a wide range of audience, either. Only if it posts messages covering an adequate range of topics, the audience can clearly interpret its frame as important to them and become mobilized. In addition, his study explains why more than one source is necessary for complex contagion to propagate. As abovementioned, Centola and Macy (2007) and Centola (2018) proposed four mechanisms for complex contagion to occur: Strategic complementarity, credibility, legitimacy, and emotional contagion. From the viewpoint of frame analysis in mobilization, a mobilizing organization, an organ donation advocacy organization in Bail’s study, aligns its frame with frames of its target population through the four mechanisms. If the four mechanisms work well, the organization finds it easier to resonate its frame with frames of the target

6

Sociological Meaning of Contagion

99

Fig. 6.1 Relationship between the coefficient of discursive variation and the number of engaged Facebook users by day. Gray zone represents standard errors with 95% confidence interval. (Source: Bail 2016, p. 286, Fig. 1)

population. Based on Bail’s argument, if a target person receives more than one message with different topics from different source, his or her frame is more likely to resonate with that of the organization. Here we can clearly understand why complex contagion needs more than one source to propagate. The theory of cultural carrying capacity also helps us clearly understand why the campaign for boiling water in a Peruvian village failed, which was discussed in Sect. 6.3. Nelida, who was in charge of the campaign, failed to persuade the villagers to boil water. This is because she did not resonate the frame of the Peruvian public health agency with that of the villagers. If she had used topics in the persuasion process that overlapped with topics the villagers were interested in, she could have succeeded in persuading them to boil water.

6.5

Conclusion

The study of complex contagion by Centola and Macy (2007) and Centola (2018) is a seminal work showing that a contagion of a cultural item is substantively different from that of a virus, but it does not completely explore how meaning and interpretation function in the process of complex contagion. Conversely, Bail (2016) does

100

Y. Sato

not talk about complex contagion, but he proposes a thought-provoking theory, the theory of cultural carrying capacity, emphasizing the importance of meaning, interpretation, and frame resonance when a movement organization tries to mobilize their target population. He checked the theory’s empirical validity collecting and analyzing big data, Facebook posts. Combining studies by Centola and Macy (2007), Centola (2018), and Bail (2016) gives us a deeper comprehension of the mechanism of complex contagion. However, this is just an example showing that focusing on meaning and interpretation enriches studies using big data and makes them more significant in sociology. Furthermore, this research strategy would contribute to solve research questions sociologist has attacked without big data and failed to find answers to them.

References Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and social media engagement. Social Science & Medicine, 165, 280–288. Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology of knowledge. Anchor Books. Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton University Press. Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113(3), 702–734. DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the emergence of cultural variation. American Sociological Review, 83(5), 897–932. McCarthy, J. D., & Zald, M. N. (1977). Resource mobilization and social movements: A partial theory. American Journal of Sociology, 82(6), 1212–1241. Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University of Chicago Press. Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67. Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press. Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press. Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment processes, micromobilization, and movement participation. American Sociological Review, 51(4), 464–481. Tarde, G. (1890). Les lois de l’imitation: Etude sociologique. Félix Alcan. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442. Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study. Lancet, 395, 689–697.

Chapter 7

Polarization of Opinion Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa

7.1

Background

In recent years, there has been growing concern about that opinion polarization and fragmentation appear to be increasingly pronounced over various issues. Specifically, previous research centered on the United States has evidenced that opinions on several crucial issues have become increasingly divided and polarized since the 1960s (Hetherington, 2001; McCarty et al., 2006). Significant social and political changes, including the civil rights movement, anti-war protests, and feminist, have prompted individuals and organizations to take more pronounced positions on a range of topics, leading to a general trend toward opinion polarization. Furthermore, there is broad scholarly consensus that the degree of opinion polarization has continuously increased since that and the ramifications of polarization may hold greater significance in today’s social context. The rise of opinion polarization has aroused extensive interest because its negative consequence can pose a disruptive threat to democratic societies. When individuals or groups are strongly polarized, they typically become less willing to comprise with others and more prone to conflicts and even violence. This tendency represents a severe societal risk by undermining the capacity to respond to pressing Z. Lyu (✉) Graduate School of Arts and Letters, Tohoku University, Sendai, Japan e-mail: [email protected] K. Nagayoshi Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] H. Takikawa Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_7

101

102

Z. Lyu et al.

challenges such as inequality, pandemics, and violence. In light of these detrimental effects, numerous endeavors have been embarked upon to enhance the understanding of opinion polarization. However, systemic changes within society, such as the proliferation of information sources, increasing diversity of values, and the escalating significance of online discourse, are transforming conventional formations and dynamics of opinion, presenting new challenges to the study of opinion polarization. Faced with challenges, the development of computational power, increasing availability of data science toolkits, as well as the increasing magnitude and plethora of data have opened up new avenues for research of opinion polarization. This chapter discusses the challenges and opportunities of incorporating novel data and advanced computational methods in the study of opinion polarization. While such interdisciplinary research is valuable for enriching and expanding the scope of investigations, it also came up with challenges in how to grasp fitting theories, adopt appropriate methodologies, and integrate schemas from multiple disciplines. In order to better understand the cutting-edge and innovative works occurring in this field, this chapter will (1) clarify the concept of opinion polarization and distinguish its different variants for theoretical clarity, (2) provide an overview of primary novel data and methods closely related to the investigation of opinion polarization, (3) review several representative research to demonstrate how novel data and advanced method can be applied to address both theoretical and practical questions in research of opinion polarization, and (4) discuss how to integrate theoretical concepts and empirical findings to establish an iterative research framework of opinion polarization.

7.1.1

The Concept of Opinion Polarization

Opinion polarization is a multifaceted concept, encompassing numerous variants. Clearly defining its concept is crucial for determining data collection methods, choosing appropriate analytical approaches, and accurately interpreting results. First, opinion polarization is commonly conceptualized as increases in antagonistic and extreme preferences over public policy, ideological orientations, partisan attachments, and cultural norms. This form of polarization is typically operationalized as a bimodal distribution of opinions. For example, considering ideology that implies a general belief and ideas of politics and governance, it is typical to locate individuals’ ideology in a continuum spectrum from a liberal position to a conservative position. Ideology is considered polarized when there are less centrist and more sharply conservative or liberal in the population. Likewise, observing opinions on the spectrum shifting towards opposite and more extreme positions indicates a form of polarization related to specific opinion. Second, the concept “affective polarization” is introduced to describe polarization through the disparities in the warmth that people feel toward out-groups versus in-groups (Garrett et al., 2019; Iyengar et al., 2012, 2019; Rogowski & Sutherland, 2016). Affective polarization stems from social group identity theory, which

7

Polarization of Opinion

103

suggests that group membership can trigger more positive emotional reactions toward the in-group than the out-group and a greater willingness to cooperate with members of the in-group (Iyengar et al., 2012; Tajfel, 1982). For instance, with regard to politics, as a social identity, partisanship contributes to bipolarity and the favoring of people with similar political views while strongly biased against those with opposing ones. Thus, increasing affective polarization can be characterized by negative feelings toward opposing political parties or their supporters and positive feelings toward one’s preferred political party.

7.2

The Mechanism of Opinion Polarization

Theoretical thinking also involves the generated hypotheses that should be empirically examined. In the discussion about opinion polarization, there are broadly two mechanisms proposed for explaining its emergency and increasing. On the one hand, opinion polarization is assumed to be associated with the homophilic feature in interactions and connections among individuals. More specifically, in order to reduce cognitive dissonance, individuals tend to seek out information that confirms those beliefs while ignoring or dismissing information that contradicts them (Stanley et al., 2020), as a result, individuals would increasingly interact with those who share their views and are exposed to homophilic information while distancing themselves from interactions with those who hold divergent opinions and avoiding exposure to opposing viewpoints. In this way, since homophilic interactions are deemed to reinforce the existing opinion, it often leads to individuals taking more extreme positions (Sunstein, 2002). On the other hand, group-justifying biases can also contribute to opinion polarization. Individual’s group identity constitutes the psychological attachment to a particular group, which is considered as one of the most crucial factors that affect their opinions. Once identified with a group member, in order to enhance the selfcategorization as a good group member, self-categorization to a group can induce individuals to adopt various forms of motivated reasoning, including in-group favoritism and out-group derogation (Tajfel, 1982), as a way to maintain group distinctiveness and advance their in-group member status. In this term, such in-group/out-group distinction can exacerbate the extent of polarization.

7.2.1

Computational Social Science and Opinion Polarization

The challenge of disentangling the underlying mechanism of opinion polarization and its effect has grown with the complexity of social changes, including the increasing diversity of identities, inequality, and new information environment. Specifically, although considerable efforts have been devoted to investigating the extent, dynamic, cause, and consequence of opinion polarization, faced with a

104

Z. Lyu et al.

shifting society composed of new technologies and issues that enable new identities, forms, and practices, an updated understanding of opinion polarization is needed. Up until recently, most quantitative social science research has been limited to the conventional statistical analysis of survey data. However, survey-based research relies on the sampling of observations, which often encounters limitations in term of representativeness, scale, and granularity. The limitation of the traditional method is becoming rather apparent in the investigation of opinion polarization as the mechanism and process of opinion polarization involve a seemingly endless number of information flows, expressed opinions, and interrelated behaviors among different actors across long-term time spans, which is difficult to be addressed using the survey taken in small groups. Fortunately, the developments of toolsets and computational capacities offer significant potential for a better investigate and understand human behaviors and opinions. In recent years, an unprecedented amount of digital data alongside a variety of computational methods have illuminated new pathways to overcome the limitations of previous social science research, thereby birthing a novel field known as computational social science (Edelmann et al., 2020; Hofman, 2021; Lazer et al., 2009). Broadly speaking, the primary features of the computational social science research paradigm are concerned with collecting, manipulating, and managing big data, utilizing advanced computational methods to improve the understanding of human behaviors and social processes. These features bring up strong expectations with regard to the promises they hold for the study of opinion polarization. First, big data typically encompasses comprehensive and multivariate information about the behaviors of large populations. The primary strength of big data lies in its remarkable scalability and detailed granularity. This allows researchers to monitor human behaviors with high frequency and on a large scale. Such capabilities are invaluable for creating innovative measures of public opinion and for closely examining interrelated human behaviors and social phenomena. From this perspective, the characteristics of big data provide researchers with the tools to explore a broad spectrum of opinions and behavioral dynamics on a substantial scale, which could be instrumental in revealing new aspects and fundamental mechanisms driving the phenomenon of opinion polarization. Also, computational social science has the potential to revolutionize the research on opinion polarization through the introduction of advanced computational methods and new techniques. A wide range of computational methods, such as network analysis, natural language processing (NLP), and machine learning have been applied to investigate the status and mechanism of opinion polarization. These approaches provide detailed insights into the formation, dissemination, and evolution of opinions in various contexts, reaching a level of scale and depth unattainable with conventional methods. Beyond that, the experimental approach, traditionally used to explore the causality of opinion polarization, has undergone significant evolution due to recent technological advancements. Specifically, through conducting experiments with thousands of participants in the online discourse, digital experiments allow observation of a sizeable number and heterogenous individuals in natural settings, which affords great promise to overcome the limitations

7

Polarization of Opinion

105

inherent in traditional laboratory experiment design. During the experiment, by manipulating interventions and conditions in theoretically informed ways, it can be determined which mechanism produces specific outcomes, answering previously hard-to-tackle questions about the causality of opinion polarization. In general, social scientists can strategically adopt computational tools to unpack the underlying mechanism of opinion polarization. In summary, it is reasonable to argue that the dual growth in the availability of big data and the power of computational methods is becoming increasingly relevant to the investigation of opinion polarization. Importantly, inconsistent conclusions concerning the mechanism of opinion polarization indicate its complexity and heterogeneity, thus single particular mechanism might be inconclusive in explaining opinion polarization. Rather, to adequately understand opinion polarization, this may require more consideration on the variations and randomness of its underlying mechanism. A proper explanation of opinion polarization is achieved by specifying the actors and contexts, then explicitly demonstrating how these conditions combine and interact to produce the occurrence and changes of opinion polarization. From this perspective, the massive volume of data combined with advanced computational tools provided unprecedented opportunities to consolidate, develop, and extend theories of opinion polarization. This includes accessing the dynamics of opinion and its associations with fine-grained behaviors in real social environments that can provide potential research insights, taking a consistency check to confirm that theoretical mechanism indeed explains the opinion polarization as hypothesized, and uncovering the new perspective and questions of opinion polarization to guide further research. In this term, we can refine the concepts and measurements of opinion polarization iteratively through these sequential and inductive processes.

7.3 7.3.1

New Methodology and Data for Opinion Polarization Research Network Analysis

Generally speaking, network analysis provides useful concepts, notions, and applied tool to describe and understand how actors are connected. In a network, nodes typically represent actors or institutions, whereas edges represent connections between such entities. Over the recent decade, there has been increasing attention devoted to the applications of network analysis methods in social science. Much of this interest stems from the flexibility of social networks to define and model various relationships among social entities. Social networks consisting of actors and social relations are ubiquitous in society. For instance, people connected by the common interest, citizens connected by the common supported party, or social media users connected by the interactions in the platform. Importantly, the structure and dynamics of these connections have the potential to yield meaningful insights into a variety

106

Z. Lyu et al.

of social science problems, thus it has become an established field within the social sciences and has become an indispensable resource for understanding human behaviors and opinions (Borgatti et al., 2009; Watts, 2004). Previously, since collecting relational data through direct contact is timeconsuming and difficult, social network analysis was typically restricted to small bounded groups. Thanks to the development of information technology, recent years have witnessed an explosion in the availability of networked data. Especially, the rapid increase in the use of social media has generated time-stamped digital records of social interactions. These digital records have reinvigorated social network analysis by enabling analyses of relations among social entities with unprecedented scale in real-time, which has also opened up new opportunities to investigate opinion polarization on social media. First, network analysis can provide insight into the nature and dynamics of opinion polarization by serving as a method to detect individuals’ opinions. As indicated above, opinion polarization involves the degree to which people hold competing attitudes on the specific issues. Here, one of the crucial questions refers to how to quantify individuals’ opinion. From the perspective of network analysis, the basic idea is to assume individuals are embedded in social relations and interactions with measurable representations of patterns. Specifically, one of the most important characteristics of the social network is the homophily principle, which implies that people’s social networks are homogeneous with regard to sociodemographic characteristics, behaviors, and opinions (McPherson et al., 2001). Accordingly, it is reasonable to assume that people who have similar opinion leanings are more likely to share a homophily social network, accordingly, peoples’ social network structure is assumed to be associated with their opinions. Based on this theoretical assumption, the availability of network data among social media users has elicited many efforts to estimate opinions with network-based measurement. Specifically, interactions on social media can be naturally described as the social network in which individuals with shared interests tend to form groups and individuals within the same community likely share a similar opinion. For example, many social media platforms, including Twitter and Facebook, allow users freely choose whom to follow or not follow. These interactions have potential to serve as a source of opinion detection. Barberá (2015) introduced a systematical framework to estimate individuals’ ideology based on following relationships with politicians. According to the homophily principle, it is reasonable to assume that users following the same politicians tend to share similar political opinions as follow network reflects their political preference. Computationally, the following relationship can be aggerated as an adjacent matrix that reflects the following relationship among politicians and ordinary users. Then, the dimension reduction algorithms, such as singular value decomposition, can be applied to map these following relationships into low-dimension ideological space. In this term, each node in the following network can be attributed with an estimated ideology based on their network structure. Beyond the following relationship, other interaction behaviors and connections on social media, such as “like,” “retweet,” and “reply” have also been proven to be associated with individuals’ latent ideology (Bond & Messing, 2015;

7

Polarization of Opinion

107

Wong et al., 2016). Furthermore, recent studies attempt to employ more sophisticated methods that can integrate different types of information attached to nodes and edges in the network to predict ideology. Graph neural networks (GNNs) method has a powerful capability to handle abundant information with edges among multi-type nodes and attributes associated with each node (Zhang et al., 2019), making it suitable to capture nuanced patterns and relationship. For example, utilizing GNNs, multiple relations on Twitter, including follow, retweet, like, and mention can be aggerated as input for the deep learning model of ideology detection (Xiao et al., 2020). Second, another stream of research has employed network analysis to investigate opinion polarization from the perspective of homophily interactions. Selective exposure and echo chamber are widely used to describe a particular situation in which people are only exposed to information and ideas that reinforce their existing opinions, thereby creating a self-reinforcing cycle that reinforces those opinions (Garrett, 2009; Prior, 2013; Stroud, 2010; Wojcieszak, 2010). Notably, they are assumed to diminish mutual understanding, narrow the diversity of exposed viewpoints, and ultimately lead to a situation where people have less common ground and feel animosity toward those who hold opposing views, that is, opinion polarization. From this perspective, how individuals interact with others, and how individuals are exposed to information flow, could provide important insights for the underlying mechanism of opinion polarization. A growing body of research has suggested the homophily tendency in the connections and interactions on social media. Beyond that, the availability of social network data enables researchers to investigate connections and interactions from more diverse and nuanced perspectives. For example, Conover et al. (2011) employed clustering algorithms to investigate the political communication network on Twitter, demonstrating that the retweet network exhibits two highly segregated communities of users, while mention network is much more politically heterogeneous. Barberá et al. (2015) suggest that in political discussion, information was exchanged primarily among individuals with similar ideological preferences, yet such homophily tendency is much weaker in discussion related to other issues. Bakshy et al. (2015) examine how millions of Facebook users interact with socially shared news. The analysis suggests homophily tendency is more pronounced in shared links of hard content such as national news, politics, or world affairs. These researches indicate that the “echo chamber” narrative might be overstated as such a tendency appears to be more pronounced in the specific context. Beyond that, network analysis can also furnish the investigation of affective polarization. Affective polarization refers to the gap in feelings between in- and out-group members. Many social media platforms allow users to express their attitudes toward the posts or other users through functions such as “like” and emotional reactions. The expressed sentiment and opinions in interactions inspire the investigation of affective polarization based on network analysis. For instance, Rathje et al. (2021) empirically indicated that contents related to the out-groups are more likely to elicit negative emotional reactions such as “angry,” while contents related to the in-groups are more likely to elicit positive emotional reactions such as

108

Z. Lyu et al.

“love.” Brady et al. (2017) find that messages revealing negative emotions toward rival political candidates are more likely to be spread within liberal or conservative in-group social networks. Marchal (2022) suggests that negative sentiment is significantly more salient in interactions between crosscutting users rather than likeminded users. To summarize, the combination of network analysis methods and big data allows us to formalize diverse patterns of social networks and investigate their characteristics from more comprehensive and nuanced perspectives. Particularly, it enhances our understanding of how interaction patterns, related issues, and the actors involved contribute to the degree of opinion polarization. These implications are crucial in deepening our understanding on the states, mechanisms, and consequences of opinion polarization.

7.3.2

Natural Language Processing

Texts and words are integral parts of society. People typically use texts and words to express, make proposals, and communicate with each other. These texts and words can serve as crucial source that reflects individuals’ beliefs, attitudes, and opinions. In particular, the development of digitalization is generating an unprecedented volume of textual data that can be used to investigate opinions. Also, many efforts have been devoted to making textual data easier to acquire. For instance, many governments have established digital data archives of policy documents, congressional records, and reports, social media platforms such as Twitter provide application user interfaces (APIs) that allow researchers to access and make use of generated textual data of users. Despite its great potential, text analysis has always been a difficult task because human language is complex and nuanced. Especially, increasing availability of these new data sources came up with demands for advanced techniques to handle the scalability and complexity of the text. Investigation of opinion based on textual data requires a method to describe how opinions can be measured and quantified. Traditionally, content analysis of opinion revealed in texts has been based on hand-coding by coders, which involves a series of processes such as developing coding schema and training coders. These processes are usually time-consuming, especially, as the scale of textual data gets larger, it becomes harder to process a large amount of text relying on hand-coding. Fortunately, the advent of computational methods and ever-increasing computing power has substantially refined the way for further research in this direction. Compared to the traditional approaches, the development of NLP techniques now provides a broad spectrum of advanced tools for analyzing large-scale textual data more efficiently (Wilkerson & Casas, 2017). Motivated by the advent of computation techniques and the increasing availability of textual data, there has been a growing interest in capturing the states and dynamics of opinion polarization through the automatic detection of opinions from large textual data.

7

Polarization of Opinion

109

Automatic text analysis method requires a model describing the pattern and structure of texts in a computational way. Typically, the model is designed for transferring the unstructured textual data into the “structural data” (i.e., numerical representation) that can be further analyzed by various computational methods, which is known as “feature extraction” in NLP. The most common strategy of feature extraction refers to the bag-of-words (BoW) model, which describes the occurrence of words within a document while disregarding the grammatical details and the word order. For the detection of opinions, tokens are usually matched with a list of words that have been previously annotated as opinion-related terms. For example, Laver and Garry (2000) developed a dictionary to define how a series of words related to the specific content categories and political party. The content of a text can be automatically identified by matching words to their respective categories in a dictionary. Laver et al. (2003) developed the Wordscores method that uses the reference text with the annotated political placement to replace the dictionary. More specifically, Wordscores assumes that the revealed political opinion of the new texts can be derived from the similarity to the reference texts based on the word frequency. Therefore, word frequency information from “reference” texts with the annotated ideological positions can be used to make predictions for new texts for which the positions are unknown. The BoW model, while straightforward and manageable, has several inherent limitations. First, it views words as individual entities with distinct meanings, often overlooking key aspects like grammatical structure and word order in texts. Second, encompassing all words that occur in a pre-encoded dictionary can be challenging, leading to the exclusion of significant details in text analysis and possibly resulting in bias. Additionally, extensive human intervention is often required to choose relevant words or reference texts for assigning meanings to each text. In particular, the coding scheme is typically compatible only with specific texts, which limits the methods’ generalizability. Consequently, the development and maintenance of dictionaries tend to be time-consuming and labor-intensive. More recently, social scientists have adopted more sophisticated methods to improve the efficacy and accuracy of text opinion scaling. Specifically, to overcome limitations inherent in the BoW model, word embedding models have been applied for estimating opinions revealed from texts. Broadly speaking, word embedding attempts to encode each word into a dense low-dimensional vector, where proximity between two vectors indicates greater semantic similarity between the associated words. To achieve this aim, word embedding model assumes that the semantic meaning of words can be inferred based on the words that appear frequently in a small context window around the focal word. There are various approaches and architectures of word embedding, such as Word2Vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014), to define the context and the semantic meaning of words in different ways, while they commonly capture context, relation, and semantic similarity of words in the texts. Therefore, using word embedding to represent text as a vector can preserve more information in the raw text for opinion detection. Moreover, word embeddings require less human intervention since they can be efficiently trained based on the large, preexisting, and unannotated corpora. With

110

Z. Lyu et al.

this great advantage, there is immense promise with word embedding models in text as data research and several studies have leveraged it as an alternative way to capture the opinions of individuals and organizations. Rheault and Cochrane (2020) employed parliamentary corpora and augmented it with input variables reflecting party affiliations to train the “party embedding.” Since the word embedding model is powerful at capturing the pattern and characteristics of text to build better feature representations, party embedding can reflect similarities and differences in ideological positions and policy preferences among parties. Notably, word embedding can be automatically trained and easily adapted to new tasks or domains based on new data. For example, parliamentary corpora of different countries enable the comparison of opinion polarization across counties, historical textual data enable the investigation of opinion polarization over time. Furthermore, word embedding model can be flexibly and efficiently applied to various types of corpus. For example, it is reasonable to assume that published posts and profile descriptions can reflect the opinions of a social media user, thus these textual data can be aggregated at the individual-level to produce word embedding representation reflecting individuals’ opinions (Jiang et al., 2022; Preotiuc-Pietro et al., 2017).

7.4

Digital Experiment

Experiment is one of the most important methodologies that has been widely applied in social science to theoretical hypotheses and establish causality. The main advantage of experiments is the possibility to tightly control the setup of experiment condition to systematically estimate the effect of a stimulus or condition. However, conventional experiments were typically conducted in offline laboratories or in terms of surveys among relatively small populations. Especially, considering the highly controlled settings and limited diversity of participants, it becomes harder to establish the reliability and generality of experimental insights in real-life situations outside the laboratories and specific populations. In recent years, with the availability of digital platforms and tools, many researchers have begun utilizing the Internet as a novel intermediary to recruit participants and conduct experiments. The digital experiment represents several advantages that can not only help scholars to conduct more scalable, flexible, and efficient experiments but also provide useful tools to inspire new strategies of experiments. First, compared to the traditional offline experiment, the digital experiment overcomes the limitation of space and time, which can facilitate a more sizable and heterogenous recruitment of participants to improve the external validity of experimental research (Peng et al., 2019). Second, digital techniques enable the collection of real-time, high granularity, and fine-grained behavioral data of participants. These digital trace data can not only provide additional information for measuring the intensity of the treatment effect and temporal changes of behaviors but also allow the check of compliance with

7

Polarization of Opinion

111

treatments to enhance the validity of experiments (Guess et al., 2021; Stier et al., 2020). Connected to this, digital experiment is more flexible for a long-term design, allowing scholar to assess change in individual attitudes and behaviors over time. Third, digital experiments can be conducted in natural settings to achieve full ecological validity and to avoid demand effects (Mosleh et al., 2022). Since behavioral tracking tools can collect real-world data unobtrusively in the naturalistic environment, thus causal effects can be explicitly examined by observing actual behaviors. The power to detect causation has inspired research leverage digital experiments to investigate the underlying mechanism of opinion polarization. In practice, it would be beneficial to combine the digital experiment with other available data, such as survey data and behavioral data, to examine the causality of opinion polarization. Typically, participants are motivated to change their information exposure in online discourse by changing their news feeds or social media following patterns. In this term, causal effects of information exposure can be examined in natural settings. Also, other key variables, such as political attitudes, policy preferences, and demographic information can be accurately estimated by the survey. Moreover, digital trace data, such as participants’ social networks, generated content, and interactional behaviors in the online discourse can provide important insights into how individuals’ opinions and interrelated behaviors change over time. In this term, the combination of digital experiments and other data sources can not only shed light on the causality of political polarization but also provide insight into the cumulative effects of interventions through the investigation of finegrained behavioral data. Bail et al. (2018) incentivized a large group of social media users to follow bots that retweeted messages by elected officials and opinion leaders with opposing political views. Through evaluating the impact of the treatment on participants’ opinions via surveys, the results indicate that exposure to opposing political views may not mitigate opinion divergence and can even generate backfire effects that intensify political polarization. Similarly, Casas et al. (2023) also focus on the effect of exposure to dissimilar views on opinion polarization. In a longitudinal experiment, participants were incentivized to read political articles of extreme opposing views, then they rely on participants’ survey self-reports and their behavioral browsing data to track over time changes in online exposure and attitudes. Guess et al. (2021) incentivized participants to change their browser default settings and social media following patterns to enhance the likelihood of encountering partisan news. This research design incorporated a naturalistic nudge within an online panel survey with linked digital trace data, providing significant insights into the long-term consequences of heightened exposure to partisan information. Levy (2021) recruited participants using Facebook ads, asking them to subscribe conservative or liberal news outlets on Facebook. Together with behavioral tracking data, such as sharing posts and liking pages, this study demonstrates how news exposure and the algorithms of social media platforms affect users’ behaviors and attitudes. Moreover, digital experiments have enriched the treatment strategy for investigating the mechanism of opinion polarization from diverse perspectives. For

112

Z. Lyu et al.

example, Mosleh et al. (2021) implemented a field experiment leveraging the platform features of Twitter. Twitter is infested with social bots that mimic human behavior. They created human-like and identical-looking bot accounts with varied self-identified political partisanship. Then, they randomly assigned Twitter users to be followed by the bots, aiming to estimate the causal effect of political exposure on users by observing if they tend to follow back accounts of like-minded partisanship. Chen et al. (2021) conducted digital experiment to examine the biases stemming both from Twitter’s system design and social interactions with other users. They created bots that initially followed a popular news source with a specific partisan bias, programmed to imitate social media users, and released them into the wild. After 5 months, the generated and consumed content of these bots with varying initial biases was compared, demonstrating how existing political leanings foster echo chambers through sustained exposure to partisan and biased information flow on social media.

7.5

Discussion

This chapter is intended to offer a comprehensive overview of the fundamental principles and components of opinion polarization, illustrating how the intersection of novel data and advanced computational techniques can propel the investigation of its states, mechanism, and consequences. First, new data and methods can solve numerous long-standing obstacles once considered insurmountable. There is little doubt that the abundant availability of fine-grained, temporal, and fairly detailed information can provide a more comprehensive understanding of opinion polarization. More specifically, computational social science approaches enable us to observe and describe human behaviors and social processes in a way that is not possible with small data, thereby presenting a novel opportunity to validate and generalize classical theories from a fresh perspective. Second, new data and methods can develop new theories and raise new questions about opinion polarization. Indeed, the development of digital techniques itself has dramatically reshaped the way that people retrieve information, develop opinions, and communicate with others (Jungherr et al., 2020), which may ultimately create new forms of opinion polarization in online discourse. The increasingly important role of computer-mediated communication underscores the need for more work on the mechanism and consequence of opinion polarization in new, noisy, and complex information environments (González-Bailón & Lelkes, 2023). Recent works employing novel methods and data have indicated many established theories seem not hold up to empirical scrutiny, indicating the necessity to further evaluate the external validity of findings in existing studies. Therefore, future study should iteratively examine the empirical findings and refine theories to validate, clarify, and develop theoretical insights.

7

Polarization of Opinion

113

In summary, as a field that actively incorporates big data and computational methods into social science research, research on opinion polarization has also facilitated the formalization of a paradigm that integrates theories and empirical findings. While social sciences are devoted to formalizing the underlying mechanism and providing interpretative explanations for human behaviors and social changes. Such a theory-driven paradigm of social science, however, has been criticized for limited generalization and failing to offer solutions to real-world problems (Watts, 2017). The availability of big data and computational methods expedites the emergence and development of data-driven social science research. Despite the great potential of novel data and methods, it should be noted that exploratory detection is still necessarily driven by suitable theories that are reflected in how data is collected, what analysis method is adopted, and how a result is interpreted. Indeed, the review of research on opinion polarization has highlighted how to establish an iterative and inductive research framework through the integration of theories and empirical findings. Theoretical clarity strongly influences what variants of opinion polarization are investigated, what types of hypotheses should be examined, and what methods are appropriate to investigate them. In turn, as indicated above, implications derived from empirical findings can contribute to the theoretical grounding of opinion polarization from various perspectives. Especially, the integration of theories and empirical findings is beneficial to translate the abstract debate into precise interventions to prevent the increasing opinion polarization and its negative impacts. Thus, the most crucial implication is that we should leverage the momentum in the methodological innovations and further connect it with appropriate theoretical groundings, balancing theories and methods development by iteratively examining data and refining theories.

References Bail, C. A., et al. (2018). Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences, 115, 9216–9221. Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132. https://doi.org/10.1126/science.aaa1160 Barberá, P. (2015). Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Analysis, 23(1), 76–91. https://doi.org/10.1093/pan/mpu011 Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science, 26(10), 1531–1542. https://doi.org/10.1177/0956797615594620 Bond, R., & Messing, S. (2015). Quantifying social media’s political space: Estimating ideology from publicly revealed preferences on Facebook. American Political Science Review, 109(1), 62–78. https://doi.org/10.1017/s0003055414000525 Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892–895. https://doi.org/10.1126/science.1165821 Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Bavel, J. J. V. (2017). Emotion shapes the diffusion of moralized content in social networks. Proceedings of the National Academy of Sciences, 114(28), 7313–7318. https://doi.org/10.1073/pnas.1618923114

114

Z. Lyu et al.

Casas, A., Menchen-Trevino, E., & Wojcieszak, M. (2023). Exposure to extremely partisan news from the other political side shows scarce boomerang effects. Political Behavior, 45, 1491–1530. Chen, W., Pacheco, D., Yang, K.-C., & Menczer, F. (2021). Neutral bots probe political bias on social media. Nature Communications, 12, 5580. Conover, M., Ratkiewicz, J., Francisco, M., Goncalves, B., Menczer, F., & Flammini, A. (2011). Political polarization on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 5, 89–96. https://doi.org/10.1609/icwsm.v5i1.14126 Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and sociology. Annual Review of Sociology, 46(1), 61–81. https://doi.org/10.1146/annurev-soc121919-054621 Garrett, R. K. (2009). Echo chambers online? Politically motivated selective exposure among Internet news users. Journal of Computer-Mediated Communication, 14(2), 265–285. https:// doi.org/10.1111/j.1083-6101.2009.01440.x Garrett, R. K., Long, J. A., & Jeong, M. S. (2019). From partisan media to misperception: Affective polarization as mediator. Journal of Communication, 69(5), 490–512. https://doi.org/10.1093/ joc/jqz028 González-Bailón, S., & Lelkes, Y. (2023). Do social media undermine social cohesion? A critical review. Social Issues and Policy Review, 17, 155–180. https://doi.org/10.1111/sipr.12091 Guess, A. M., Barberá, P., Munzert, S., & Yang, J. (2021). The consequences of online partisan media. Proceedings of the National Academy of Sciences, 118(14), e2013464118. https://doi. org/10.1073/pnas.2013464118 Hetherington, M. J. (2001). Resurgent mass partisanship: The role of elite polarization. The American Political Science Review, 95(3), 619. Hofman, J. M. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181–188. https://doi.org/10.1038/s41586-021-03659-0 Iyengar, S., Sood, G., & Lelkes, Y. (2012). Affect, not ideology: A social identity perspective on polarization. Public Opinion Quarterly, 76(3), 405–431. https://doi.org/10.1093/poq/nfs038 Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The origins and consequences of affective polarization in the United States. Annual Review of Political Science, 22(1), 129–146. https://doi.org/10.1146/annurev-polisci-051117-073034 Jiang, J., Ren, X., & Ferrara, E. (2022). Retweet-BERT: Political leaning detection using language features and information diffusion on social networks. https://doi.org/10.48550/arxiv.2207. 08349 Jungherr, A., Rivero, G., & Gayo-Avello, D. (2020). Retooling politics: How digital media are shaping democracy. Cambridge University Press. https://doi.org/10.1017/9781108297820 Laver, M., & Garry, J. (2000). Estimating policy positions from political texts. American Journal of Political Science, 44(3), 619. https://doi.org/10.2307/2669268 Laver, M., Benoit, K., & Garry, J. (2003). Extracting policy positions from political texts using words as data. American Political Science Review, 97(2), 311–331. https://doi.org/10.1017/ s0003055403000698 Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Alstyne, M. V. (2009). Computational social science. Science, 323(5915), 721–723. https://doi.org/10.1126/ science.1167742 Levy, R. (2021). Social media, news consumption, and polarization: Evidence from a field experiment. American Economic Review, 111(3), 831–870. https://doi.org/10.1257/aer. 20191777 Marchal, N. (2022). Be nice or leave me alone: An intergroup perspective on affective polarization in online political discussions. Commun Res, 49, 376–398. McCarty, N., Poole, K. T., & Rosenthal, H. (2006). Polarized America: The dance of ideology and unequal riches. MIT Press.

7

Polarization of Opinion

115

McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415–444. https://doi.org/10.1146/annurev.soc.27. 1.415 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 1081–1088. Mosleh, M., Martel, C., Eckles, D., & Rand, D. G. (2021). Shared partisanship dramatically increases social tie formation in a Twitter field experiment. Proceedings of the National Academy of Sciences of the United States of America, 118(7), 9–11. https://doi.org/10.1073/ pnas.2022761118 Mosleh, M., Pennycook, G., & Rand, D. G. (2022). Field experiments on social media. Current Directions in Psychological Science, 31(1), 69–75. https://doi.org/10.1177/ 09637214211054761 Peng, T.-Q., Liang, H., & Zhu, J. J. H. (2019). Introducing computational social science for AsiaPacific communication research. Asian Journal of Communication, 29(3), 205–216. https://doi. org/10.1080/01292986.2019.1602911 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/d14-1162. Preotiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political ideology prediction of Twitter users. In Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 729–740). https:// doi.org/10.18653/v1/p17-1068 Prior, M. (2013). Media and political polarization. Annual Review of Political Science, 16(1), 101–127. https://doi.org/10.1146/annurev-polisci-100711-135242 Rathje, S., Bavel, J. J. V., & van der Linden, S. (2021). Out-group animosity drives engagement on social media. Proceedings of the National Academy of Sciences, 118(26), e2024292118. https:// doi.org/10.1073/pnas.2024292118 Rheault, L., & Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112–133. https://doi.org/10.1017/pan.2019.26 Rogowski, J. C., & Sutherland, J. L. (2016). How ideology fuels affective polarization. Political Behavior, 38(2), 485–508. https://doi.org/10.1007/s11109-015-9323-7 Stanley, M. L., Henne, P., Yang, B. W., & Brigard, F. D. (2020). Resistance to position change, motivated reasoning, and polarization. Political Behavior, 42(3), 891–913. https://doi.org/10. 1007/s11109-019-09526-z Stier, S., Breuer, J., Siegers, P., & Thorson, K. (2020). Integrating survey data and digital trace data: Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516. https://doi.org/10.1177/0894439319843669 Stroud, N. J. (2010). Polarization and partisan selective exposure. Journal of Communication, 60(3), 556–576. https://doi.org/10.1111/j.1460-2466.2010.01497.x Sunstein, C. R. (2002). The law of group polarization. Journal of Political Philosophy, 10(2), 175–195. https://doi.org/10.1111/1467-9760.00148 Tajfel, H. (1982). Social psychology of intergroup relations. Annual Review of Psychology, 33(1), 1–39. https://doi.org/10.1146/annurev.ps.33.020182.000245 Watts, D. J. (2004). The “new” science of networks. Annual Review of Sociology, 30(1), 243–270. https://doi.org/10.1146/annurev.soc.30.020404.104342 Watts, D. J. (2017). Should social science be more solution-oriented? Nature Human Behaviour, 1(1), 0015. https://doi.org/10.1038/s41562-016-0015 Wilkerson, J., & Casas, A. (2017). Large-scale computerized text analysis in political science: Opportunities and challenges. Annual Review of Political Science, 20(1), 529–544. https://doi. org/10.1146/annurev-polisci-052615-025542

116

Z. Lyu et al.

Wojcieszak, M. (2010). ‘Don’t talk to me’: Effects of ideologically homogeneous online groups and politically dissimilar offline ties on extremism. New Media & Society, 12(4), 637–655. https:// doi.org/10.1177/1461444809342775 Wong, F. M. F., Tan, C. W., Sen, S., & Chiang, M. (2016). Quantifying political leaning from tweets, retweets, and retweeters. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2158–2172. https://doi.org/10.1109/tkde.2016.2553667 Xiao, Z., Song, W., Xu, H., Ren, Z., & Sun, Y. (2020). TIMME: Twitter ideology-detection via multi-task multi-relational embedding. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2258–2268). https://doi.org/10.1145/ 3394486.3403275 Zhang, C., Song, D., Huang, C., Swami, A., & Chawla, N. V. (2019). Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & Data Mining (pp. 793–803). https://doi.org/10.1145/3292500.3330961.

Chapter 8

Coda Yoshimichi Sato and Hiroki Takikawa

8.1

Revisiting the Relationship Between Computational Social Science and Sociology

We explored in previous chapters how computational social science and sociology should collaborate to advance both disciplines. One way, which was emphasized in Chaps. 2 and 6, is to properly incorporate meaning and interpretation in computational social science. These two concepts have been one of the central themes in sociology, while computational social science mainly analyzes behavioral data such as mobile data collected via GPS devices in smart phones. It is true that computational social science deals with text data such as X posts (tweets), but it is rare to study how people reading X posts (tweets) interpret them and add meaning to them. Chapter 2 scrutinized previous literature in computational social science such as Goldberg and Stein (2018), Boero et al. (2004a, b, 2008), Sato (2017), and DiMaggio et al. (2013) to show how to incorporate meaning and interpretation in the studies of agent-based modeling and digital data analysis. Chapter 6, focusing on diffusion or contagion, carefully examined the theory of cultural carrying capacity proposed by Bail (2016) to show that the theory is a milestone for including meaning and interpretation in the analysis of big data and suggested that incorporating the theory in the study of complex contagion proposed by Centola and Macy (2007) and Centola (2018) would enrich the study.

Y. Sato (✉) Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan e-mail: [email protected] H. Takikawa Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social Science, Translational Systems Sciences 40, https://doi.org/10.1007/978-981-99-9432-8_8

117

118

Y. Sato and H. Takikawa

Another way to promote collaboration between computational social science and sociology is to apply techniques of the former to the latter, which was discussed mainly in Chaps. 3 and 5. The chapters reinterpret the potentiality and possibility of computational social science so that it would contribute to the advancement of sociology. This reinterpretation is worth discussing in detail, so let us discuss it in the next sections.

8.2

Beyond the Deductive Approach

Chapter 3 argues that computational social science is free from the deductive approach. This is an important point in the scientific methodology. In conventional scientific activities including sociology, scientists start with a theory, derive hypotheses from it, set up an experiment or conduct a social survey to check their empirical validity. If their empirical validity is confirmed, the theory is thought to survive the empirical test. If not, scientists revise the theory or invent a new theory to make it more empirically valid. Thus, scientific activities gradually advance scientific knowledge (see Popper, 1959). This is a typical image of the deductive approach. Some of computational social science can also follow the deductive approach. Big data analysis, for example, is useful to check the empirical validity of hypotheses derived from a theory. However, computational social science could go beyond it and contribute to creating a new theory in sociology. It is true that computational social science provided social scientists with social telescopes (Golder & Macy, 2014) with much higher resolution than conventional sociological research. However, from the viewpoint of the deductive approach, we need a theory before observing social universe. Without a theory we could not know which area in social universe we should observe. For example, it does not make sense to only observe mobility patterns collected via smart phones after the lockdown caused by COVID-19. This is because it is unclear why we need to observe the patterns. However, if we have a theory of differential effects of the lockdown on people of different class and ethnicity, we will combine data of such social characteristics and mobility patterns and analyze how the lockdown increased or decreased social inequality (see Chang et al., 2021). In a sense, a sociological study using computational social science should start with a theory if it follows the deductive approach. However, computational social science could contribute to the advancement of sociology without the deductive approach as Chap. 3 argues. Big data enables social telescopes to cover much larger areas than conventional sociological data. Thus, computational social science has possibilities to find a new theory by searching a wide range of social universe. Machine learning, which is focused on in Chap. 3, has this potential. For example, unsupervised machine learning can extract the latent structure of observed data, which could not be extracted by researchers analyzing the data. This means that unsupervised machine learning has potential to find new patterns that might lead to a

8

Coda

119

new theory. Being free from the deductive approach, machine learning, unsupervised machine learning in particular, opened a door to a new way of exploring new theories in sociology. Machine learning has another advantage. Because it is more flexible about modeling than conventional statistical models, machine learning proposes better predictions. When we use conventional statistical models such as regression models, we derive hypotheses from a theory, build models based on them, and estimate the models’ coefficients using data. In other words, the theory used for the modeling limits the scope of social universe the social telescope observes, so the modeling might miss the possibility of finding a new theory that exists outside of the scope. In contrast, as abovementioned, machine learning is free from the deductive approach, so it can search social universe for models that have more predictive power. One caveat should be mentioned, however. Machine learning does not tell us how the selected model should be interpreted. This is different from conventional statistical models. For example, we understand how a regression model estimates its coefficients using data. We can easily understand what the model means based on the interpretation of coefficients. This is generally not the case when we use machine learning model. The model is so complex that it is extremely difficult to grasp the gist of the model. In other words, the selected model is in a black box. To solve this problem, a method called scientific regret minimization method is proposed. This method compares a black-box model obtained by machine learning and a model that is interpretable such as a regression model. If the gap between them is large, the latter is improved to make the gap smaller. And, finally, we find an interpretable model whose gap with the black-box model is the smallest. At the end of this process, we get an interpretable model with strong predictive power.

8.3

A New Way of Using Computational Linguistic Models in Sociology

Chapter 5 explores the possibility of applying topic models and word-embedding models to sociological study of meaning-making. As pointed out many times in this book, meaning and interpretation have been one of the central themes in sociology. Thus, if we successfully apply the models in computational social science to the study of meaning-making, we can show that computational social science is not just a tool for sociological analysis of digital data, but it substantively contributes to the advancement of sociology. In preparation for that, the chapter raises three points of meaning-making: Prediction, relationship, and semantic learning. Prediction is an important function of meaning. We interpret the meaning of things or events such as a non-smoking sign and a physical movement of an actor. We interpret the non-smoking sign as the prohibition of smoking in the room, so we predict that nobody in the room smokes. If we see a stranger with a knife is approaching us, we interpret the behavior as

120

Y. Sato and H. Takikawa

attacking us and predict that he/she will stab us. Based on such predictions, we can take a proper action such as not smoking in the room and running away from the stranger. Relationship is the foundation of prediction. Meaning relates a thing and an event to other things and events, which is a basic characteristic of prediction. We relate a non-smoking sign to predictions that nobody smokes in the room and that we would be punished if we smoked. We relate a stranger approaching us with a knife to a prediction that he/she will stab us. In other words, a thing or an event does not have meaning if it is detached from other things or events. Meaning of a thing or an event exists in the network of the thing or the event with other things and events. Semantic learning is a process that we correct the estimation of the relationship. To cite an example in Chap. 5, suppose that we see the color of a mushroom, think that the mushroom is edible, and eat it. If the mushroom is poisonous and we have a stomachache, we learn that the relationship between the color of the mushroom and its edibility is wrong and update the relationship with the fact that it was poisonous. Do topic models and word-embedding models capture these points so that they would analyze meaning-making from a different viewpoint from that of conventional sociology? Chapter 5 gives a positive answer to this question. Topic models extract latent topics from observed sentences and relate (1) sentences and topics and (2) words in the sentences and topics with probabilities. These characteristics of topic models uncover meaning of a word. As pointed out above, a word does not have a meaning by itself. It has a meaning only if it is related to other words. To cite an example in Chap. 5, a “tie” means a draw if it is strongly related to a topic “sports” and appears with another word “game” also being related to the topic “sports.” In contrast, a “tie” means a necktie if it is related to a topic “fashion” and appears with another word “suits” also being related to the topic “fashion.” Thus, topic models unveil relationality of meaning of a word and polysemy of a word. In the above example, a “tie” has two meanings depending on which words are used with it. Therefore, topic models are suitable for capturing polysemy. Topic models use Bayesian updating methods to estimate their parameters. This is similar to the abovementioned semantic learning. Bayesian updating methods update parameters, which are called posterior parameters, using prior parameters and new information. In the case of the abovementioned mushroom example, we had a prior belief (parameter) about the relationship between the color of the mushroom and its edibility. Then we got new information that the mushroom was poisonous, and that we had a stomachache. Using this information, we revise the prior belief and get a posterior belief (parameter). Of course, we do not rigorously use Bayesian updating methods in everyday life, but we conduct a kind of updating process of parameters. It is obvious that the logic of topic models and people’s meaning-making are different from each other, but we think that we can get useful tips to understand people’s meaning-making by deeply understanding the logic of topic models. This way of using topic models is completely different from their conventional use, but this is a way to apply topic models to meaning and interpretation, major research topics in sociology.

8

Coda

121

Word-embedding models create semantic space, which helps us to understand similarity between meanings of words as well as relationships between them. In the models the meaning of a word is represented as a vector. And like vector calculation in n-dimensional real space, we can conduct calculation of meanings in semantic space. To cite an example in Chap. 5, king - man + woman becomes queen. Word-embedding models using neural networks have the predictive function of meaning. This is because a word is predicted by words surrounding it. The models also capture the relationality of meanings because of the same reason. The prediction of a word from surrounding words implies the relationality of meanings. As for learning, the models use neural networks, which learn via back propagation. Thus, topic models and word-embedding models capture the predictive function of meanings, the relationality of meanings, and learning processes and, therefore, could help us to understand the meaning-making process of people. However, one more problem remains: explaining human actions. Since Max Weber’s interpretive sociology, meaning and interpretation have been key concepts to explain actions. Suppose that a shaman conducts a ritual for rain with his/her villagers. The ritual is not understandable from the viewpoint of modern science. However, if we understand that the shaman and villagers believe that the ritual works well, we properly interpret the meaning of the ritual and explain why they conduct it. How can we explain actions with the help of computational linguistic models? Chapter 5 proposes that a regression model with word vectors as independent variables and people’s response to the words as the dependent variable clarifies how people interpret the meaning of the words. Citing an example in Chap. 5, Ueshima and Takikawa (2021) predicted how people judge priority of vaccination of COVID-19 using a regression model with word vectors as independent variables. Word vectors for occupations were obtained by a word2vec model, a wordembedding model. The model shows how people interpret the meaning of occupations and how people judge which occupations have priority on vaccination.

8.4

What Is the Next Step?

Chapters in the book including Chaps. 3 and 5 have shown that computational social science has the potential to advance sociological research. We have also emphasized the importance of meaning and interpretation for computational social science to substantively contribute to the advancement of sociology and shown examples of such practices. Sociology should also change itself to promote collaboration with computational social science. It has evolved by the combination of sociological theories and empirical studies checking their empirical validity. Conventionally, most of the empirical studies are case studies and statistical analysis of survey data. Both of them are strong tools for the development of new sociological theories. Take study of inequality of educational opportunity, for example. Reduction in the inequality has been an important topic in society. This is because modern societies assume that

122

Y. Sato and H. Takikawa

equality of educational opportunity is an ideal of them. The ideal tells that, in modern society, anybody should equally get access to education, higher education in particular, no matter what family he/she comes from. Therefore, it has been a central topic in the study of social inequality to empirically clarify the degree of the inequality. A study of inequality of educational opportunity in Ireland by Raftery and Hout (1993) is in line with this research tradition. They conducted statistical analysis of data on transition from elementary education to secondary education and from secondary education to higher education. The Irish government conducted reforms of secondary education in 1967. For example, tuition fees became free, and free school transportation was provided. These reforms resulted in the increase in the overall participation rate in secondary education. However, it is another story whether class differentials of the rate were reduced. Intuitively, it is plausible that the reforms increased the opportunity for children from lower classes to enter secondary education, and, therefore, the class differentials decreased. However, this intuitive story should be empirically examined. For this examination Raftery and Hout (1993) analyzed effects of the reforms on inequality of educational opportunity by birth cohorts and social classes. To summarize their findings, the reforms did not reduce the inequality. As for the opportunity to enter secondary education, the class differentials reduced for young cohort. This is an effect of the reform. However, as for the opportunity to complete secondary education and to enter higher education, the differentials did not change. Raftery and Hout (1993) generalized these findings to propose a hypothesis of maximally maintained inequality, which has been cited uncountably. The hypothesis is as follows (Hout, 2006; Raftery & Hout, 1993): 1. Even if educational expansion occurs by the educational reform, class differentials of educational opportunity do not change. It is the case that the transition rates of students from lower classes increase, but the rates of students from any classes increase in parallel. 2. Then, if the demand for better education of students from upper classes is saturated, more educational opportunities become open to students from lower and middle classes. This results in reducing class differentials. 3. Conversely, if educational opportunities shrink, inequality of educational opportunity increases. After proposing the hypothesis, Raftery and Hout (1993) proposed a rational choice model to explain it. To cite it simply, students and their families decide whether to move up to higher education or not by calculating costs and benefits associated with the education. Upper-class students and their families estimate that benefits are larger than costs, so they decide to continue education. In contrast their lower-class counterparts estimate them in the opposite way, so they tend not to continue education. Only if educational opportunity expands and the transition rate of upper-class students saturates, lower-class students and their family begin to estimate that the benefits exceed the costs and decide to continue education. This results in the decrease in class differentials.

8

Coda

123

Raftery and Hout’s (1993) study of inequality of educational opportunity in Ireland is a wonderful example of empirical research in sociology. They reported persistent inequality of educational opportunity even after the reform by analyzing statistical data, generalized their findings to propose the hypothesis, and propose a model or theory to explain it. We argue that sociological research using computational social science should also follow their step. Computational social science techniques often reveal interesting hidden pattens that would not be found by conventional methods in sociology. This is an advantageous point of computational social science. However, if they are not connected to sociological theories, the patterns do not substantively contribute to the advancement of sociology. There are two ways to connect sociology and computational social science. The first way is to begin with a sociological theory following the deductive approach. Then computational social science techniques and data suitable to the empirical test of the theory are selected and used for the test. This way is fruitful if conventional sociological methods and data cannot check the empirical validity of the theory because of their limitations. A problem of this way is, as argued in Chap. 3, that it is difficult to create a new theory. Sociologists following this way tend to check empirical validity of an existing theory rather than to forge a new theory. Of course, excellent sociologists think of a new theory and check its empirical validity with conventional sociological methods or computational social science technique. However, this is not always the case. The second way is to follow the step of Raftery and Hout (1993). That is, finding an interesting pattern using data analysis, generalizing it to a hypothesis, and forging a theory to explain the hypothesis. Computational social science is advantageous compared with conventional sociological methods because the former is more likely to find a pattern that was not expected than the latter is. This may be because machine learning is not affected by the cognitive framework of a sociologist using it. For example, when they conduct regression analysis, sociologists tend to choose independent variables based on their prior sociological knowledge. In contrast, machine learning does not use such knowledge, so it can choose independent variables that were not expected but seem to be plausible. Then sociologists need to forge a theory to explain why the variables were chosen. The relationship between the dependent variable and the independent variable is in a black box. It is not computational social science techniques but sociologists with sociological knowledge and imagination who can create a new theory that explains a newly found pattern and contribute to the advancement of sociology. In conclusion, computational social science has a great potential with which sociologists conduct research at higher levels. However, they should begin it with a sociologically important idea. The idea could be central concepts and theories such as meaning and interpretation. Or it could be effects of educational reforms on inequality of educational opportunity in Ireland as reported by Raftery and Hout (1993). The power of computational social science enables sociologists to conduct research that could not be done by conventional sociological methods. Proper use of computational social science opens a new door to upgrading sociology.

124

Y. Sato and H. Takikawa

References Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and social media engagement. Social Science & Medicine, 165, 280–288. Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems (pp. 48–69). Springer. Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial Societies and Social Simulation, 7(2), 1. Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties: An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174. Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton University Press. Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113(3), 702–734. Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021). Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589, 82–87. DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41, 570–606. Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the emergence of cultural variation. American Sociological Review, 83(5), 897–932. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. Hout, M. (2006). Maximally maintained inequality and essentially maintained inequality: Crossnational comparisons. Sociological Theory and Methods, 21(2), 237–252. Popper, K. R. (1959). The logic of scientific discovery. Hutchinson. Raftery, A. E., & Hout, M. (1993). Maximally maintained inequality: Expansion, reform, and opportunity in Irish education, 1921–75. Sociology of Education, 66(1), 41–62. Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.), Reconstruction of the Public Sphere in the Socially Mediated Age (pp. 37–46). Springer Nature Singapore Pte. Ueshima, A., & Takikawa, H. (2021). Analyzing vaccination priority judgment for 132 occupations using word vector models. In WI-IAT ‘21: IEEE/WIC/ACM International conference on web intelligence and intelligent agent technology (pp. 76–82).