The Dynamics of Science: Computational Frontiers in History and Philosophy of Science 0822947374, 9780822947370

Millions of scientific articles are published each year, making it difficult to stay abreast of advances within even the

175 97 6MB

English Pages 303 [305] Year 2022

Report DMCA / Copyright


Table of contents :
Introduction. Tools, Tests, and Data: An Introduction to the New History and Philosophy of Science | Andreas De Block and Grant Ramsey
Part I. Toward a New Logic of Scientific Discovery, Creativity, and Progress
1. Five Models of Science, Illustrating How Selection Shapes Methods | Paul E. Smaldino
2. Pooling with the Best | Justin Bruner and Bennett Holman
3. Promoting Diverse Collaborations | Mike D. Schneider, Hannah Rubin, and Cailin O’Connor
4. Using Phylomemies to Investigate the Dynamics of Science | David Chavalarias, Philippe Huneman, and Thibault Racovski
Part II. Frontiers in Tools, Methods, and Models
5. LDA Topic Modeling: Contexts for the History and Philosophy of Science | Colin Allen and Jaimie Murdock
6. The Potential of Supervised Machine Learning for the Study of Science | Krist Vaesen
7. Help with Data Management for the Novice and Experienced Alike | Steve Elliott, Kate MacCord, and Jane Maienschein
Part III. Case Studies
8. How Not to Fight about Theory: The Debate between Biometry and Mendelism in Nature, 1890–1915 | Charles H. Pence
9. Topic Modeling in HPS: Investigating Engaged Philosophy of Science throughout the Twentieth Century | Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto
10. Bolzano, Kant, and the Traditional Theory of Concepts: A Computational Investigation | Annapaola Ginammi, Rob Koopman, Shenghui Wang, Jelke Bloem, and Arianna Betti
11. The Evolution of Evolutionary Medicine | Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler
Recommend Papers

The Dynamics of Science: Computational Frontiers in History and Philosophy of Science
 0822947374, 9780822947370

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

The Dynamics of Science

Edited by

Grant Ramsey and Andreas De Block

The Dynamics of Science Computational Frontiers in History and Philosophy of Science Uni versi ty

ress of Pittsburgh P

Published by the University of Pittsburgh Press, Pittsburgh, Pa., 15260 Copyright © 2022, University of Pittsburgh Press All rights reserved Manufactured in the United States of America Printed on acid-free paper 10 9 8 7 6 5 4 3 2 1 Cataloging-in-Publication Data is available from the Library of Congress ISBN 13: 978-0-8229-4737-0 ISBN 10: 0-8229-4737-4 Cover art from Cover design by Melissa Dias-Mandoly


Acknowledgments ix

Introduction. Tools, Tests, and Data: An Introduction to the New History and Philosophy of Science 3 Andreas De Block and Grant Ramsey

Part I. Toward a New Logic of Scientific Discovery, Creativity, and Progress 1. Five Models of Science, Illustrating How Selection Shapes Methods 19 Paul E. Smaldino

2. Pooling with the Best 40 Justin Bruner and Bennett Holman

3. Promoting Diverse Collaborations 54 Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

4. Using Phylomemies to Investigate the Dynamics of Science 73 David Chavalarias, Philippe Huneman, and Thibault Racovski

Part II. Frontiers in Tools, Methods, and Models 5. LDA Topic Modeling: Contexts for the History and Philosophy of Science 103 Colin Allen and Jaimie Murdock

6. The Potential of Supervised Machine Learning for the Study of Science 120 Krist Vaesen

7. Help with Data Management for the Novice and Experienced Alike 132 Steve Elliott, Kate MacCord, and Jane Maienschein

Part III. Case Studies 8. How Not to Fight about Theory: The Debate between Biometry and Mendelism in Nature, 1890–1915 147 Charles H. Pence

9. Topic Modeling in HPS: Investigating Engaged Philosophy of Science throughout the Twentieth Century 164 Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

10. Bolzano, Kant, and the Traditional Theory of Concepts: A Computational Investigation 186 Annapaola Ginammi, Rob Koopman, Shenghui Wang, Jelke Bloem, and Arianna Betti


11. The Evolution of Evolutionary Medicine 204 Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

Notes 231

References 243

Contributors 279

Index 287



Our first debt is to the twenty-seven contributing authors, without whom this book would not have been possible. We also thank the reviewers who gave their time and expertise to provide valuable feedback. We give special thanks to the University of Pittsburgh Press staff, in particular to our editor, Abby Collier, who believed in this project and helped carry it to fruition. The seed for this book was a workshop held at the KU Leuven on the cultural evolution of science. We are grateful to the Research Foundation Flanders for their financial support for this workshop.


The Dynamics of Science


Tools, Tests, and Data An Introduction to the New History and Philosophy of Science Andreas De Block and Grant Ramsey

In 1973, philosopher Ronald Giere published a now-famous review of a 1970 volume, edited by historian Roger Stuewer (1970), in which he reflected on history and philosophy of science and how integrated they can or should be. According to Giere, there is no intimate relationship between history of science and philosophy of science, only a “marriage of convenience.” In his view, normative questions constitute the field of philosophy of science, and such normative questions cannot be answered by the descriptive work of historiography. One problem with Giere’s view is that the boundaries between the normative and the descriptive are often vague. Another problem is that how science has been done is relevant for how it should be done, much in the same way that moral psychology is relevant for normative ethics (Doris 2002). Last, Giere later wholly denounced earlier claims about the nature of philosophy of science and defended its naturalization in the sense that “philosophers should be in the business of constructing a theoretical account of how science works” (2011, 61). In other words, Giere later defended the view that he attacked in the 1970s. Much like history of science, Giere now believes that philosophy of science should be conceived of as a descriptive discipline, even though the two disciplines address somewhat different research questions. Today, most philosophers of science seem to subscribe to this sort of naturalism (Schickore and Steinle 2006), and many are at least sympathetic to the view that history of science and philosophy of science can be so intertwined as to form an “interdiscipline” (Thorén 2015). Yet it is not always clear how the two disciplines should interact. Some fear that without this clarity, “we are condemned to the dilemma between mak3


Andreas De Block and Grant Ramsey

ing unwarranted generalizations from historical cases and doing entirely ‘local’ histories with no bearing on an overall understanding of the scientific process” (Chang 2011, 109). Indeed, philosophers of science all too often cherry-pick examples from the history of science to illustrate or support their philosophical views. This even holds for some of the philosophers with a solid background in the historiography of science, like Thomas Kuhn. Even today, philosophers and historians often engage in detailed analyses of particular discoveries, inventions, scientific debates, and developments and simply assume that these examples are representative of (the history of) science, without actually showing that the examples are paradigmatic (Scholl and Räz 2016). Case studies do have heuristic value, and their use in pedagogy is generally undisputed, but they provide a weak inductive basis for general claims about science. Or, to put it more bluntly, anecdotal evidence is unreliable evidence. This problem is exacerbated both by the lack of historical interest in what are seen as “grand narratives” and by the theory-ladenness of case studies. It is quite likely that the choice and interpretation of case studies tend to be profoundly influenced by the researcher’s theoretical presuppositions, idiosyncratic preferences, values, biases, and philosophical training: “Cases are often generated in a manner that does not adequately guard against biases in selection, emphasis, and interpretation” (Steel, Gonnerman, and O’Rourke 2017). This problem is aggravated by the fact that the researcher’s choices are rarely made explicit, which makes an independent assessment of the conclusions or interpretations even more difficult. In this book, we will address whether new computational tools can successfully address some of these problems. This introduction sketches the philosophical and scientific background of computational history and philosophy of science (HPS), argues for the great potential of these methods and tools, and offers an overview of the chapters that follow, each of which contributes to the realization of the promise that computational HPS holds, a promise that motivates this edited volume. Naturalism in Philosophy of Science Clearly, there is not as much progress in philosophy as in science. David Chalmers, for instance, claims that “there has not been large collective convergence to the truth on the big questions of philosophy” (Chalmers 2015, 5). Most of the progress in philosophy is negative, or so Chalmers argues. We now know better than before that some arguments of the great philosophers are deeply flawed and what knowledge, for example, is not. But we have not arrived at a widespread agreement on what knowledge—or any of the core philosophical subjects—is. Plenty of

Tools, Tests, and Data

philosophers seem to think that there is little we can do about it. Philosophy, they believe, should be primarily about asking good questions and much less about finding the final answers to them. Others hold that philosophy does make considerable progress but that this progress is obfuscated by how we conceptualize philosophy. In their view, once an important philosophical question has been answered decisively, we stop considering the question to be a philosophical one; it is expelled from the philosophical realm. As Daniel Dennett (1998) put it, The trajectory of philosophy is to work on very fundamental questions that haven’t yet been turned into scientific questions. Once you get really clear about what the questions are, and what would count as an answer, that’s science. Philosophy no longer has a role to play. That’s why it looks like there’s just no progress. The progress leaves the field. If you want to ask if there has been progress in philosophy, I’d say, look around you. We have departments of biology and physics. That’s where the progress is. We should be very proud that our discipline has spawned all these others.

In this view, Newton’s natural philosophy became modern physics, and Wundt’s experimental philosophy can be seen as one of the first successful attempts at a genuinely scientific psychology. Still other prominent philosophers, such as Timothy Williamson (2006), think that current academic philosophy can and must do better. But what kind of reform is necessary for such an improvement? Since Descartes, philosophers have tried to borrow scientific methods to arrive at the same kind of certainties and cumulative knowledge that the sciences deliver. In recent years, philosophy has witnessed a renewed interest in the use of scientific tools and methods to tackle philosophical issues. In many instances, this tendency is illustrated by evolutionary and formal epistemology (Callebaut and Pinxten 2012; Hendricks 2006, 2010). Ever since the early 1970s, evolutionary epistemology had been analyzing belief systems and belief change with the help of hypotheses, theories, and models that were developed to understand population dynamics in biology. And at the beginning of this century, formal epistemology really took off. It made clear that popular approaches in, for example, computer science and statistics were instrumental in getting a better grip on some of the more vexing philosophical issues. Whereas naturalism is now mainstream in philosophy of science in that it often focuses on descriptive questions and takes the details of scientific research into account, philosophers of science have not seemed to be overly enthusiastic about embracing naturalistic methods. According to Edouard Machery, “philosophers of science have surprisingly been



Andreas De Block and Grant Ramsey

reluctant to include these methods in their toolbox, but doing so is necessary for philosophy of science to be a genuine part of a naturalized epistemology” (2016, 487). Yet this is beginning to change. Experimental philosophy of science is now a flourishing approach (Griffiths and Stotz 2008; Wilkenfeld and Samuels 2019), and the digital or computational tools of formal epistemology are now regularly applied in philosophy of science (Leitgeb 2011), which in itself can be seen as a form of applied epistemology. Despite the reality of this tendency in philosophy toward naturalization, it would be an oversimplification to see the rise of computational HPS as nothing but the result of this tendency. As the next section makes clear, the maturation of the science of science also played a role in it, as did developments within history of science. HPS as a Science of Science History and philosophy of science has always had close ties and tensions with other academic disciplines that reflect on science. This holds in particular for sociology of science and its longtime ally, the interdisciplinary science and technology studies (STS). Much like traditional HPS, STS relies mostly on qualitative methods and verbal reasoning. However, within the STS community of the 1970s, a number of scholars argued that in order to weigh optimally on science and technology policy, more quantitative approaches were necessary (Martin, Nightingale, and Yegros-Yegros 2012). To accomplish that, they regularly joined forces with scholars from scientometrics (Leydesdorff 1989), another (sub)discipline that emerged in the 1970s. It took a while, but eventually this became the precursor of a new and burgeoning field, the science of science.1 In a recent review published in Science, Santo Fortunato and colleagues (2018, 1007) define this field as follows: “The science of science . . . offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science.” The science of science does not limit itself to measuring innovation and impact; it also focuses on the epistemic and ethical costs of science policy and the (changes in the) organization of science. Evolutionary biologists Gross and Bergstrom (2019), for example, used economic contest theories to assess the relative efficiency of grant proposal competitions and concluded that these competitions probably hinder science more than advance it, thus tying into recent work in philosophy of science about the epistemic merits of lotteries (Avin 2019). Similarly, quantitative and empirical work on peer re-

Tools, Tests, and Data

view is highly relevant for philosophical reflections on its value (Heesen and Bright 2021). Consequently, papers published under the “science of science” banner are sometimes indistinguishable from quantitative and empirically informed articles in philosophy of science.2 Not all “scientifically conducted” HPS is considered computational. Yet it is not always clear how to draw lines between work that is computational and work that is not. Within history, everyone seems to agree that a computational turn has taken place, but there is disagreement on what exactly constitutes this turn. We believe the two most common names for this turn are indicative of what it means to do computational HPS. First, the term “computational” is used because the methods that are at the heart of this “turn” or “revolution” often involve mathematical abstractions and mathematical and other formal models (Roth 2019).3 Second, the turn is sometimes also called a “digitized turn,” a “big data revolution,” or a “digital revolution” (Gibson, Laubichler, and Maienschein 2019) because this “revolution” to a large extent hinges on a systematic engagement with digitized archives and big data. These two strands or orientations are in line with how Grim and Singer (2020) sketch computational philosophy: “Techniques employed in computational philosophy may draw from standard computer programming and software engineering, including aspects of artificial intelligence, neural networks, systems science, complex adaptive systems, and a variety of computer modeling methods. As a growing set of methodologies, it includes the prospect of computational textual analysis, big data analysis, and other techniques as well.” The two orientations can go hand in hand, as some of the chapters in this volume show, but they need not. For instance, “the disciplinary exchange between history and computational humanities shows that there are valid research questions for C[omputational] H[umanities] regardless of the size of the data, in particular in the domain of knowledge representation and reasoning” (Piotrowski and Fafinski 2020, 175). Of course, the new tools and methods should not just be naively accepted and applied. A good fit between research question and method is necessary, and computational tools are not well suited to address all the central questions of HPS. Hence, traditional HPS should not be completely replaced by computational HPS. There is also more to HPS than answering questions, and non-computational methods certainly have great heuristic and pedagogic value. Another reason to temper a naive enthusiasm for the use of computational methods in HPS is the steep learning curve. Many historians and philosophers simply lack sufficient training in mathematics and computer science for conducting computational HPS on their own, though they can outsource the more



Andreas De Block and Grant Ramsey

technical work to others who have this training (Gibson, Laubichler, and Maienschein 2019). We think the computational developments in historiography, the naturalistic turn in philosophy, and the rise of the science of science will inevitably lead to more interdisciplinary approaches and collaborations. A full grasp of the nature and dynamics of science requires a huge toolbox and an academic community that harbors the abilities to use all the tools in that toolbox. Philosophers and historians of science will surely play an important role in that community, but so will data and computer scientists, cognitive scientists, evolutionary theorists, and statisticians. Philosophers of science, even those who are not well versed in computational methods, can bring to this interdisciplinary community their well-trained capacities for conceptual rigor, methodological reflection, and synthesis. These capacities complement the computational skills because philosophical considerations and decisions are necessary for a reliable and valid collection, processing, and interpretation of the data, for instance by identifying “the specific, extant conjectures about the processes of scientific change to be tested” (Laudan et al. 1986, 143). Let’s now consider the promises that computational tools and methods hold for history and, especially, for philosophy of science. We will consider the potential of digitized material as well as the prospects of models and simulations. How Not to Be Selective In 2006, an estimated 1.35 million scientific articles were published— almost four thousand per day (Bjork, Roos, and Lauri 2009)—and this number is doubling about every nine years (Van Noorden 2014). Thus, while there was a time when we could keep up with broad swaths of the scientific literature, it is now difficult to stay abreast of advances within even the smallest subdisciplines. Traditional approaches to the study of science—such as HPS—involve closely reading a relatively small set of books, journal articles, and other documents. The historian may be studying a specific moment in the history of science, trying to uncover what happened and why. The philosopher may be trying to unravel the meaning of particular concepts or to reconstruct forms of scientific inference or explanation. In doing so, each researcher can examine but a single drop in the ocean of literature—their studies may involve depth, but they lack breadth. As we already mentioned, some studies require this depth and don’t greatly suffer from a lack of breadth. Very specific claims about local phenomena are often best addressed through only a handful of case

Tools, Tests, and Data

studies. In that sense, the continued use of case studies in philosophy of science is not necessarily a problem. What is worrisome, though, is the still rather widespread reliance on case studies as evidence for general claims about science (Mizrahi 2020). And as Laudan and colleagues already noted in the 1980s, “many of the avowed case studies are not ‘tests’ of the theory in question at all; rather, they are applications of the theory to a particular case” (Laudan et al. 1986, 158). In fact, there are many questions about the nature and history of science that would clearly benefit from—or even require—casting a larger net. If we want to know whether most scientific change is gradual or revolutionary, or what the key sources of scientific novelty are, then it cannot be close reading—the careful reading of text by humans—or a single case study that serves as the only data. Instead, it would be ideal to have a digital database full of scientific literature and to equip computers with algorithms capable of helping us answer questions like these. Fortunately, the past few decades have seen a massive effort to digitize the academic literature. This literature is thus accessible in ways that it has never been and can now be subject to distant reading—machine “reading” involving automated textual analysis (Moretti 2013). Having a rich source of data—an ocean of scientific literature—at our disposal can get us partway to answering fundamental questions about the nature and history of science. Importantly, digital tools are not useful just because they can engage with larger corpora. They have additional advantages. For instance, they can detect fine-grained linguistic patterns that would be lost to a (single) human reader and can increase the breadth of research questions (Pence and Ramsey 2018). Despite the surfeit of digital data from the scientific literature, this resource is a largely unexplored frontier. Some researchers have begun to employ digital approaches to study the scientific literature, but the techniques are still highly experimental. In order to understand the motivation for exploring the computational HPS frontier, let’s consider three published studies of the scientific literature. These highlight the increasing interest in textual analysis, as well as the relative absence of appropriate analysis tools. Example 1. In 2007, Antonovics et al. published a paper titled “Evolution by Any Other Name: Antibiotic Resistance and Avoidance of the E-word” (Antonovics et al. 2007). Antonovics and his team manually tallied terms in thirty articles from medical and biological journals to test the hypothesis that the term “evolution” is used more rarely in medicine than in biology in the context of antibiotic resistance. Their results supported this claim, but they had to manually read through the papers, limiting their sample size to only thirty. Such a study suffers



Andreas De Block and Grant Ramsey

from a lack of automation. If they had been able to use even the most basic automated techniques—simple searches for “evolution” across a larger corpus—they could have considerably broadened and strengthened their study. Is the difference they detected historically recent, or is it long-standing? Is it true of all Anglophone journals or only those from the United States? Do medical journal articles that use the “e-word” exhibit different patterns of citation? Answering these questions could lead to a much richer understanding of how and why the term “evolution” appears to be avoided in medical journals. In chapter 11 of this volume, Painter, Damerow, and Laubichler use automated techniques to touch upon a few of these issues and show how biologists seem to be more attracted to evolutionary medicine than medical scholars. Example 2. In 2013, Overton performed an analysis of how the term “explain” is used in the scientific literature. His paper (Overton 2013) used a set of 781 articles from the journal Science. Although he obtained interesting results, his study was limited in scope (a single journal over a limited span of time) and used only the most basic of techniques (counting the frequency of n-grams, i.e., strings of letters). Science is a highly selective prestigious general science journal, and this poses questions about the degree to which we can generalize from these results. Is the concept of explanation deployed differently in a journal like this than in narrower, lower-impact journals? How and why has usage changed over time? Is the concept used differently when new phenomena are explained than when old ones are? These are the sorts of questions one could answer only with more comprehensive data analysis tools at hand. Example 3. In 2014, Dietrich, Ankeny, and Chen (2014) published an article titled “Publication Trends in Model Organism Research.” They examined the rates of citations for articles referencing specific model organisms. They were interested in the pattern of citations, especially in the wake of the National Institute of Health specifying in 1990 a list of organisms as official model organisms. Their study used the Web of Science database, but this involved a significant limitation: it was not until 1992 that the Web of Science began to include not only titles but also abstracts. Thus, Dietrich, Ankeny, and Chen (2014) could use titles only, which underreport model organism usage, and the data were likely considerably noisier than they would have been using abstracts or whole texts. Had they used a tool that provides the full text of journal articles, they may have had a much richer dataset and may have been able to go beyond word tallies to examine word associations. For example, what words modify or occur near the names of model organisms?

Tools, Tests, and Data

Models and Simulations Which models—from cultural evolution, game theory, or other sources—can best be used to understand scientific change is still an open question. In 1988, David Hull published Science as a Process, in which he tried to give an evolutionary account of the dynamics of science. To substantiate his claims, he relied on several sources, such as interviews with pupils of successful scientists, information about patterns of journal article publication, and data on the age and number of scientists who accepted Darwin’s thesis on the mutability of species between 1859 and 1869. Yet the data are quite limited and were extracted by hand. Moreover, Hull (1988) does not rely at all on mathematical models of social evolution, and he barely mentions the existence of such methods and how they can be applied to cultural phenomena. This is especially strange, since foundational work in this area had been published just a few years before Hull’s book—two important examples being Boyd and Richerson’s (1985) Culture and the Evolutionary Process and CavalliSforza and Feldman’s (1981) Cultural Transmission and Evolution: A Quantitative Approach. Suppose Hull is right and certain aspects of science are shaped by the same (or very similar) evolutionary forces that shape organisms (as Popper 1968 and Campbell 1974 also contend). It seems obvious then that the study of scientific change could benefit from using the mathematical tools and software programs that evolutionary biologists use to model and understand such forces. First, these tools can help to evaluate or strengthen the arguments made about the dynamics and nature of science. After all, it is well known that formalizing arguments helps us to detect errors and to add rigor, albeit sometimes at the expense of comprehensibility. Second and relatedly, formal tools can also help us better model and simulate the dynamics of science. Agent-based models, for example, can be used to explore the epistemic effects of different sorts of communication between scientists. Quite a few simulation studies have now shown that cognitive division of labor and cognitive diversity bring clear epistemic advantages (Thoma 2015). Some of the findings of such agent-based modeling, however, are much less intuitive. For example, Zollman (2010) has shown that the best functioning communication networks are not the fully connected networks, a finding further explored by Holman and Bruner in chapter 2 of this volume. Of course, many useful simulations and models are not intrinsically linked to evolutionary explanations of the change of scientific knowledge. Paul Thagard, for instance, was a staunch opponent of evolutionary epistemology (Thagard 1980) but contributed enormously to the



Andreas De Block and Grant Ramsey

computational study of scientific networks. His seminal article on this topic, “Explanatory Coherence” (Thagard 1989), has now been cited more than twelve hundred times. That said, many philosophers keep on highlighting and exploring the similarities between evolutionary dynamics and the dynamics of science. Epistemic landscapes are regularly likened to, and even inspired by, fitness landscapes (Alexander, Himmelreich, and Thompson 2015). Likewise, evolutionary theory–inspired terms such as the “cultural red king effect” have been coined to refer to effects that simulations of scientific communication bring to the fore (O’Connor 2017; see also chapter 3 in this volume where Schneider, Rubin, and O’Connor dig deeper into the cultural red king and related effects). Structure This book is structured in three parts. Each part treats different aspects of the history and philosophy of science with model- or algorithm-based approaches. Part I, “Toward a New Logic of Scientific Discovery, Creativity, and Progress,” comprises the first four chapters of the book. It deals with questions that have been central to philosophy of science and explores how the new techniques can elucidate them. Although many insights in philosophy of science have been gained through traditional philosophical methodologies, the new computational methods explored here can shed fresh light on old questions and may be instrumental in deciding debates at the center of the field. In chapter 1, Smaldino shows that the classic “hypothesis testing” model of science is flawed and discusses a series of Darwinian models free from the flaws of the classic model. These models treat science as a population-level process. Each of these models highlights different incentives and suggests that the current dynamics often results in science that lacks robustness and reproducibility. On the other hand, the models also suggest how selection pressures on individual scientists and their work, instantiated by publishing and funding practices, can be altered in such a way that they structurally advance methodological rigor. In chapter 2, Bruner and Holman investigate how norms and incentives shape scientific group inquiry. Social structure matters, and as a result, many philosophers of science have investigated how norms and incentives can lead to more or less successful inquiries. More specifically, Bruner and Holman use agent-based modeling with realistic assumptions to study whether less competent scientists will come to interact with the most competent researchers. They find that scientists will eventually pool with the best, but that this will not always result in a com-

Tools, Tests, and Data

munication structure that is optimal for the reliability of science. They conclude that the social structures that emerge from the simulations are suboptimal. This result has implications for understanding these structures but also provides some guidance for how to improve them. In chapter 3, Schneider, Rubin, and O’Connor use agent-based modeling to explore how underrepresentation of minority groups within science affects diverse collaborations. Arguably, such collaborations are epistemically beneficial and morally desirable. Hence, many initiatives have been taken to promote diverse collaborations. Yet Schneider, Rubin, and O’Connor’s modeling indicates that many of these initiatives will not improve the diversity of collaborative groups, in part because they have unintended negative consequences. In chapter 4, Chavalarias, Huneman, and Racovski describe how the evolution of science involves both historical patterns and particular processes. They argue that, traditionally, philosophers have based their account of patterns almost exclusively on a study of the processes. In their chapter, they show how this unidirectional approach can be replaced by two-way reasoning that explicitly draws on phylogenetic methods. The phylomemetic approach that Chavalarias et al. present in their chapter allows for better reconstructions of the dynamics of science than the methods traditionally used in philosophy of science. Part II, “Frontiers in Tools, Methods, and Models,” contains three shorter, method-focused chapters. These chapters are intended to provide the reader with insight into the cutting edge of computational HPS. This will inform the readers of the state of the field but will also serve as a kind of instruction manual for future work. In chapter 5, Allen and Murdock use latent Dirichlet allocation (LDA) topic modeling to understand and quantify conceptual similarity and conceptual change, the sensitivity of meanings to context, and pathways of intellectual influence. They show how this method is relevant for HPS researchers and that its relevance goes beyond its use as a mere search and retrieval tool. According to them, many of the pitfalls of LDA can be avoided by conceiving of topics in a topic model as representing contexts, not just as “buckets of words.” In chapter 6, Vaesen scrutinizes the promises and pitfalls of different machine learning techniques to track and analyze disciplinary change in fields like economics, sociology, and cognitive science. He focuses on how many of the shortcomings of unsupervised techniques can be overcome by human supervision. He argues that, although the potential of supervised machine learning techniques is hitherto relatively unexplored within HPS, this family of techniques is able to deal with relevant classification features such as semantics, grammar, and style,



Andreas De Block and Grant Ramsey

features that mostly escape the unsupervised tools. Compared to unsupervised techniques, supervised techniques suffer less from information loss and can incorporate more classification features. In chapter 7, Elliott, MacCord, and Maienschein give useful advise on how to manage data. HPS researchers are typically not very well trained in managing and storing data. With the advent of digital humanities, however, more and more researchers must overcome the data hurdle. They present different principles to manage data and illustrate the use of those principles with two digital projects from the history of science. Part III, “Case Studies,” focuses on the application of models and automated textual analyses to gain new insights into the history of science. While parts I and II focus on theoretical and epistemological aspects of computational tools, part III considers what fruits can come from applying these tools and methods to specific historical moments and episodes in science. These case studies can further clarify the strengths and limitations of the new techniques for studying the dynamics of science. In chapter 8, Pence uses digital tools to sketch a network of discourse for the late nineteenth-century controversy over the way in which Darwin’s insights should be reconciled with theories of heredity. The network of discourse he reveals shows a couple of surprising features of how the debate unfolded. Pence’s contribution shows how digital tools can be used to develop and assess sociological, historical, and philosophical claims about important scientific debates. In chapter 9, Malaterre, Chartier, and Pulizzotto use topic modeling to examine the appearance and disappearance of socially engaged papers in the journal Philosophy of Science throughout the twentieth century. The results of their analysis corroborate existing views about the presence of socially engaged philosophy of science before the 1960s and its relative decrease thereafter. Yet their analysis also suggests a continuing interest in engaged philosophy of science, and particularly in feminist philosophy of science and in science and values, that had not been identified earlier. They discuss these findings as well as some advantages and limitations of topic modeling in the context of HPS. In chapter 10, Ginammi, Koopman, Wang, Bloem, and Betti deploy a mixed method to shed light on Bolzano’s ideas about noncausal explanations and the hierarchy of concepts. To examine a large volume of text, they combine a self-developed web-based information retrieval tool with more traditional methods such as close reading. They find interesting, and hitherto unknown, continuities and discontinuities between Bolzano and Kant on the traditional theory of concepts.

Tools, Tests, and Data

In chapter 11, Painter, Damerow, and Laubichler combine evolutionary reasoning with computational tools to understand the growth and maturation of evolutionary medicine. They show that evolutionary medicine has become more interdisciplinary over the years and attracts researchers from a diversity of fields. Yet their network analysis also indicates that the pattern of collaborations is clearly not very interdisciplinary. Their computational analyses permit a better framing of the discussion about interdisciplinarity by distinguishing different layers as well as by providing quantitative evidence.


Part I Toward a New Logic of Scientific Discovery, Creativity, and Progress

Chapter 1

Five Models of Science, Illustrating How Selection Shapes Methods Paul E. Smaldino

“The good thing about science is that it’s true whether or not you believe it.” This oft-repeated quote, attributed to the astrophysicist and TV presenter Neil deGrasse Tyson, was seen everywhere at the March for Science, a set of gatherings held around the world on April 22, 2017. The quote has become a rallying cry for supporters of science—and of the application of scientific knowledge in daily life—against widespread science denialism. And of course, science should be defended. Carl Sagan, Tyson’s predecessor as host of Cosmos, noted that science not only increases our knowledge of the world but also serves as a bulwark against superstition and charlatanry (Sagan 1996). However, there is a counterpoint to Tyson’s claim. Plenty of science, or at least scientific results, are not true. During the first decade of the twenty-first century, the biotech company Amgen attempted to confirm the results of fifty-three published oncology papers deemed “landmark” studies. Of these, they claim to have successfully replicated only six (Begley and Ellis 2012).1 In 2015, a team of 270 researchers calling themselves the Open Science Collaboration repeated one hundred studies from published psychology papers. Of these, they successfully replicated only thirty-nine results (Open Science Collaboration 2015). In 2016, neuroscientists discovered design errors in the most popular statistical packages used to analyze fMRI data, indicating that as many as 70% of the results obtained using these packages may be false positives (Eklund, Nichols, and Knutsson 2016). And in 2018, a team of social scientists targeted twenty high-profile studies published in the prestigious journals Science and Nature and successfully replicated only twelve; even among these, most of the effects turned out 19


Paul E. Smaldino

to be smaller than originally published (Camerer et al. 2018). Indeed, a survey conducted by Nature in 2016 revealed that a large proportion of empirical scientists, hailing from fields as diverse as chemistry, biology, physics, earth sciences, and medicine, had failed to replicate other researchers’ results (Baker 2016). This is a problem. Our understanding of the world relies on facts. Charles Darwin understood the perniciousness of false facts, writing in The Descent of Man, “False facts are highly injurious to the progress of science, for they often endure long; but false views, if supported by some evidence, do little harm, for every one takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed and the road to truth is often at the same time opened” (1871, 385). What he is saying in his overwrought Victorian prose is that we shouldn’t worry too much about false theories, because academics are competitive and love to take each other down a peg by demonstrating logical inconsistencies in one another’s theories. Since logic is a common language in science, the competition for theoretical explanations remains relatively healthy. However, any coherent explanation must rely on a firm foundation of facts. If our facts are false, we end up wasting our time arguing about how best to explain something that isn’t even true. Science involves both theory building and fact finding. This chapter focuses on the fact-finding aspect, and as a shorthand the search for facts is what I will mean henceforth by the term “science.” In this sense, science can be viewed as a process of signal detection for facts. We wish to discover true associations between variables. However, our methods for measurement are imprecise. We sometimes mistake noise for signal and vice versa. How we conceptualize the scientific enterprise shapes how we go about the business of conducting research, as well as how we strive to improve scientific practices. In this chapter, I’ll present several models of science. I’ll begin by showing ways in which the classic “hypothesis testing” model of science is misleading and leads to flawed inferences. As a remedy, I’ll discuss models that treat science as a population process, with important dynamics at the group level that trickle down to the individual practitioners. Science that is robust and reproducible depends on understanding these dynamics so that institutional programs for improvement can specifically target them. A First Model of Science: Hypothesis Testing Early in our schooling, many of us are taught a simple and somewhat naive model of science as “hypothesis testing” (figure 1.1). The scientist comes up with a hypothesis about some natural system. She cannot

Five Models of Science, Illustrating How Selection Shapes Methods

1.1. A first model of science. Hypotheses are investigated and results, with characteristic error rates, are recorded. The real epistemic state of each hypothesis, true or false (T or F), is unknowable except through this sort of investigation.

directly infer the essential epistemic state of the hypothesis, whether it is true or false. Instead, she investigates the hypothesis by experimentation or other empirical means, which results in either a positive result in support of the hypothesis or a negative result indicating a lack of support. The alignment between her results and the epistemic state of the hypothesis is necessarily imprecise. There is some risk of a false positive, α = Pr(+|F), as well as a false negative, β = Pr(−|T ). These outcomes are sometimes called Type 1 and Type 2 errors, respectively.2 This uncertainty forces us to ask: How confident should our scientist be in her results? Consider the following scenario. Dr. Pants investigates one of her many hypotheses. Using her well-tested method, the probability that the test will yield a false positive result is 5%. That is, Pr(+|F) = 0.05. If the hypothesis is true, the probability that the test will correctly yield a positive result is 50%. That is, Pr(+|T ) = 0.5. The test is conducted, and the result is positive! Now, what is the probability that Dr. Pants’s hypothesis is correct? You may be tempted to answer 95%. After all, the probability of a false positive is 5%, and it’s clear that 100 − 5 = 95. If this is your answer, you are not alone. When a version of this question was posed to students with scientific training, 95% was indeed the most common answer, at least in years past (Gigerenzer and Hoffrage 1995). Why is this wrong? Recall that we are looking for the probability that the hypothesis is true conditional on obtaining a positive result, Pr(T|+). Fortunately, we have a handy mathematical tool for computing exactly this sort of conditional probability. Using Bayes’ Theorem, we can write out our conditional probability as follows:



Paul E. Smaldino

You’ll notice right away that there’s a term in this equation I haven’t provided: Pr(T ). This is the prior probability that any hypothesis being tested by Dr. Pants is true, often called the base rate. We ignore the base rate at our peril. A Second Model of Science: Hypothesis Selection and Investigation Imagine now that Dr. Pants tests not one but one hundred hypotheses. Of these, ten are true and ninety are false. If you want a more concrete example, imagine Dr. Pants runs a behavioral genetics lab. She is looking for single nucleotide polymorphisms (SNPs) that correlate with a heritable behavioral disorder. She tests one hundred SNPs, of which ten are actually associated with the disorder. Thus, the base rate is b = 0.1. If this seems low, consider that for many disciplines, the base rate may actually be much lower. Every association tested, every statistical test run, is a hypothesis that may be supported. Dr. Pants tests her hypotheses using the method described in the previous paragraph, with α = 0.05 and β = 0.5. So what is the probability that a hypothesis with a positive result actually reflects a true hypothesis? In this case, it’s 50%, not 95% (figure 1.2). And the lower the base rate, the lower this posterior probability gets. Worse yet, in reality we can never know for certain the epistemic states of our hypotheses, nor can we easily estimate the base rate. Our results are all we have. So now we have a second model of science that includes the process of hypothesis selection as well as the experimental investigation of that hypothesis (figure 1.3).3 We can capture this model in terms of the posterior probability that a positive result indicates a true hypothesis using the notation introduced so far:

This Bayesian model of science was introduced by Ioannidis (2005) in his now classic paper, “Why Most Published Research Findings Are False.” The analysis is straightforward. If the base rate, b, is low, then even a moderate false positive rate (such as 5%) will lead to a low posterior probability and a large number of false positives. One concern about this model is that it treats each hypothesis in isolation. It ignores the social and public aspect of science. Scientists don’t just produce results; they also try to publish them, and some results are easier to publish than others. Once published, results can then be replicated, and with new information comes the opportunity for new estimates of the epistemic states of the underlying hypothesis.

1.2. The importance of base rate. Left: 100 hypotheses are tested, of which 10

are true (the base rate is b = 0.1). Right: 50% of the true hypotheses and 5% of the false hypotheses yield positive results, producing a posterior probability that a positive result is actually true of approximately Pr(T|+) = 0.5.

1.3. A second model of science. Investigation is preceded by hypothesis selection. The inner circles indicate the real epistemic value of each hypothesis. Black indicates false, white indicates true. The gray outer circle represents the fact that these epistemic values are unknown before investigation.


Paul E. Smaldino

1.4. A third model of science. After hypothesis selection and investigation,

results are communicated. Some results end up published and become part of the literature, which can accrue through replication. This is indicated by the rectangle containing concentric circles. Each layer represents a positive (white) or negative (black) results. Results that are not published end up in file drawers, unknown to the scientific community.

A Third Model of Science: The Population Dynamics of Hypotheses The first two models of science both portray a science in which each hypothesis is investigated in isolation. But consider what happens to a result once the hypothesis has been investigated. The researcher will sometimes decide to publish the result. I say “sometimes” because some results are never published, especially when they don’t support the hypotheses being tested. These results end up in the “file drawer” (Rosenthal 1979). Once published, the studies supporting a given hypothesis can be replicated, whether by other labs or by the one that generated the original result. Our third model conceptualizes hypothesis testing as a dynamical system involving a large number of hypotheses being tested by a large number of scientists (figure 1.4). A scientist first selects a hypothesis to test. A novel hypothesis is true with probability b, the base rate. The hypothesis is investigated, producing results. These results can then be disseminated to the scientific community via publication. This stage is important, because not all results are published with equal probability. Novel positive results are usually the easiest to publish. Negative results are published at much lower rates (Fanelli 2012), possibly due to being rejected by journal editors but also because they are viewed as carrying low prestige for researchers and are therefore rarely submitted (Franco,

Five Models of Science, Illustrating How Selection Shapes Methods

Malhotra, and Simonovits 2014). Once findings are published, they can be replicated. The results can then be added to the literature, but only if they are published. As results accrue, each hypothesis is associated with a record of positive and/or negative results in the published literature. Because some types of results are more likely than others to be published, the published literature likely reflects a biased record of investigation. This dynamical model was introduced and analyzed in an earlier paper (McElreath and Smaldino 2015). Our analysis focused on the probability that a hypothesis was true, conditional on its publication record.4 For simplicity, we operationalized the publication record as a tally of the net positive findings—that is, the number of positive results minus the number of negative results in the published literature. Although this conditional probability was influenced to some degree by all of the model’s parameters, we found that the two parameters exerting the largest influence—by far—were the base rate, b, and the false positive rate, α. If the base rate is high (so that most tested hypotheses are true) and the false positive rate is low (so that most positive results reflect true hypotheses), then a single positive result likely reflects a true hypothesis. However, as base rate decreases and false positive rate increases—to values that, I must add, I view as quite realistic for many disciplines—then more and more successful replications are necessary to instill the same amount of confidence in the truth of a hypothesis. Above all, this indicates that replication is important for a healthy science (Smaldino 2015). Indeed, our analysis showed that replication studies are valuable even when they use designs with different methodological power than the original investigations. More than that, we shouldn’t be surprised that some results fail to replicate. Some erroneous results are inevitable. When methods are imperfect, both false positives and false negatives may be common. That said, the model also illustrates that improvements to the practices and culture of science should focus on factors that increase the base rate of true hypotheses and lower the rate of false positives results, so as to decrease the number of false facts in the published literature. A number of factors lead to false discovery. False facts are more common when: Studies are underpowered, because small sample sizes tend to lead to false positives and ambiguous results. Negative results aren’t published, distorting the publication record by eliminating disconfirmatory evidence. Statistical techniques are misunderstood, leading to false positives and ambiguous results.



Paul E. Smaldino

Surprising results are the easiest to publish, because such results have a low base rate of being true, given priors to the contrary. Although the factors in this list may be new to some readers, scientists have, in general, been aware of these issues for decades. Why, then, isn’t science better? Understanding how scientific practice—and not just scientific knowledge—changes over time requires a new model that includes the scientists themselves in the model dynamics. Before introducing such a model, I’ll need to say a few words about some of the incentives that structure human social behavior. A Brief Interlude on Incentives Science is the search for truth about the natural world, for a better understanding of our universe. Scientists, however, are also human beings who need steady employment and the resources to conduct their research. Obtaining those jobs and securing that funding is far from trivial these days. There are currently far more PhDs looking for employment in academia than there are permanent positions for them to fill. In several disciplines, including biomedicine and anthropology, the creation of new PhDs outpaces the creation of new faculty positions by a factor of five (Ghaffarzadegan et al. 2015; Speakman et al. 2018). More generally, the number of open faculty positions in scientific disciplines is only a small fraction of the number of total PhDs awarded each year (Cyranoski et al. 2011; Schillebeeckx, Maricque, and Lewis 2013). This creates a bottleneck at which selection is nonrandom. In academic science, this selection pressure is often linked to an individual’s publication history, as evinced by the clichéd admonition to “publish or perish.” Successful scientists are certainly publishing more. Since just the early 2000s, the number of publications at the time of hiring for new faculty has more than doubled in fields such as evolutionary biology (Brischoux and Angelier 2015) and cognitive psychology (Pennycook and Thompson 2018). A large study of over twenty-five thousand biomedical scientists showed that scientists who ended up as principal investigators (PIs) consistently published more papers and placed them in higher-impact journals that those researchers who ended up leaving academia (van Dijk et al. 2014). It may not be immediately obvious that preferential reward for productivity and impact factor are bad things. Indeed, it seems that we should want scientists to be productive and we should want their work to have a wide impact. Don’t we want our scientists to be awesome? The difficulty is that awesomeness is in reality quite complicated and multidimensional. The importance of research may not be manifest for

Five Models of Science, Illustrating How Selection Shapes Methods

quite some time, and a lack of productivity can just as easily reflect careful study of a difficult problem as it can a lack of drive. This difficulty becomes a serious problem when awesomeness is assessed with crude, quantitative metrics like paper count, journal impact factor, and h-indices. It has been widely noted by savvy social scientists that, as Campbell (1976, 49) noted, “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” When incentives to publish drive scientists, science itself may become distorted. There is evidence that scientists do, in fact, respond to incentives. In China, as in several other countries, PIs are often given cash rewards for publishing in top English-language journals. This system began in the early 1990s with small rewards, but the size of the rewards has grown tremendously. As of 2016, Chinese researchers were paid, on average, $984 for a paper in PLOS ONE, $3,513 for a paper in the Proceedings of the National Academy of Sciences, and a whopping $43,783 for a firstauthor paper in Science or Nature (Quan, Chen, and Shu 2017). Correspondingly, between 2000 and 2009, Chinese submissions to the journal Science nearly quadrupled (Franzoni, Scellato, and Stephan 2011). China was recently declared the world’s largest producer of scientific papers (Tollefson 2018). Such cash-for-papers incentives can be found in several other countries, including India, Korea, Malaysia, Turkey, Venezuela, and Chile (Quan, Chen, and Shu 2017). The West is not immune either. For example, I recently had dinner with some American psychologists, who told me with pride about how much their graduate students published. Their program provided a cash prize of several hundred dollars for the best student paper each year. When I asked how they assessed the best paper, they told me that a first-author publication in a top journal was the best indicator. “Do you read all the papers?” I asked. The answer was no; the journal’s reputation was deemed a sufficient mark of quality. It is not hard to see how students in this program are incentivized not only to produce papers but to produce a particular type of paper. Evidence that scientists respond to incentives can be more subtle. Vinkers, Tijdink, and Otte (2015) looked at relative word frequencies in PubMed abstracts between 1974 and 2014. They found dramatic increases in the frequencies of positive, congratulatory words. Frequencies of the words “innovative” and “groundbreaking” had each increased 2500%. Frequency of “novel” had increased 4000%. And frequency of “unprecedented” had increased 5000%. There are, of course, two possible explanations for this shift in word frequencies. The first is that



Paul E. Smaldino

contemporary scientific research is actually twenty-five times more innovative than it was forty years ago. The other, a smidge more likely, is that scientists are responding to incentives to distinguish their word as important and pathbreaking. A system that rewards novel, innovative results can—and does—incentivize cheating. Recent examples include Jan Schön in physics, Diederik Stapel in psychology, and Brian Wansink in nutrition science. A personal favorite is a case of fraud uncovered by the editors of the British Journal of Clinical Pharmacology. The authors of a paper claiming impressive results suggested as reviewers several prominent scholars in their field. These scholars were contacted as reviewers, and all returned glowing reviews within just a few days. One of the editors grew suspicious at the quick responses from the busy big-shot scientists, and contacted them at the email addresses listed on their university web pages. They were all surprised by the emails, because none of them had heard of the paper in question. The explanation: when the authors submitted their manuscript, they had provided fake email addresses for their suggested reviewers and submitted forged reviews of their own paper (Cohen et al. 2016).5 Fraud surely happens, but it’s also probably the exception rather than the rule. Most scientists are well-meaning people who want to learn about the world. The problem is that incentives for maximizing simple quantitative metrics, which act as proxies for more meaningful but multifaceted concepts like productivity and influence, can be detrimental even if all actors are well intentioned. To help explain why, we’ll turn to a new model of science that includes the scientists as well as the hypotheses. A Fourth Model of Science: Variation, Heritability, and Selection Science is a cultural process that, like many cultural processes, evolves through a Darwinian process (Richerson and Boyd 2005; Mesoudi 2011; Smaldino 2014; Smaldino and McElreath 2016). Philosophers of science including Campbell (1965), Popper (1979), and Hull (1988) have discussed how scientific theories evolve by variation and selective retention. But scientific methods can also evolve. Darwinian evolution requires three conditions to occur: 1. There must be variation. 2. That variation must have consequences for survival or reproduction. 3. Variation must be heritable.

Five Models of Science, Illustrating How Selection Shapes Methods

Research practices and methods certainly vary. That variation leads to differences in the sorts of results that are produced and, consequently, the publications that arise from those results. These publications have consequences in determining who is successful in terms of getting hired and promoted, securing grants, attracting graduate students and postdocs, and placing those trainees in positions heading their own research groups. And variation in practice is partly heritable, in the sense that trainees acquire research habits and statistical procedures from mentors and peers. Researchers also acquire research practices from successful role models in their fields, even if they do not personally know them. Therefore, when researchers are rewarded primarily for publishing, habits that promote publication are likely to be passed on. If we want to understand how we might minimize false discoveries, we need a model of science that includes variation among scientists. This model has two phases: Science and Evolution (figure 1.5). In the Science phase, each research lab chooses and investigates hypotheses and tries to publish their results, just as in our third model of science. However, the methods used by each lab can differ, which affects the rate at which they conduct research and the probability of certain results. More specifically, consider a population of labs, all conducting research. We make the following assumptions: Each lab has characteristic methodological power, Pr(+|T ). Increasing power also increases false positives, unless effort is exerted. This is because it is easy to have perfect power if every result is positive, but correctly eliminating the false hypotheses requires additional work.6 Additional effort also increases the time between results because each study requires more work. Negative results are harder to publish than positive results. Labs that publish more are more likely to have their methods “reproduced” in new labs. This model was first presented and analyzed in another paper with Richard McElreath (Smaldino and McElreath 2016). First, we found that if effort is held constant and power is allowed to evolve, power evolves to its maximum value and the false discovery rate (the proportion of published results that are incorrect) skyrockets. Everything is deemed “true,” and we have no information about anything. This scenario is pretty unrealistic. We have fairly good ways of assessing the power of research methods, and no one would ever allow this to happen. However, effort is notoriously difficult to assess. If we hold power con-



Paul E. Smaldino

1.5. A fourth model of science. Dynamics occur in two phases: Science and

Evolution. In the Evolution stage, labs compete for research positions. A lab’s methods are represented by its shading, and its prestige is represented by its size. When a new position opens, it is more likely to be filled by someone using the methods of more prestigious labs.

stant and allow effort to evolve, we find that effort reliably evolves to its minimum value, and once again the false discovery rate balloons. To reiterate, this dynamic requires no cheating or strategizing on the part of our agents, only that publication is a determinant of job placement. We have referred to this dynamical process as “the natural selection of bad science” (Smaldino and McElreath 2016). What does this mean? It means that if our model of science is at least moderately realistic, and incentives for publishing do drive selection on research methods, then we should see evidence for impediments to the improvement of scientific methods on the timescale of generations. If, on the other hand, incentives are rewarding methodological rigor, we should see a steady increase in the quality of methods for scientific inquiry. In 1967, Paul Meehl cautioned about the misuse of p-values, warning that scientists were wrongly interpreting their meaning and consequently generating lots of false positives (Meehl 1967). In 2016, the American Statistical Association published its “Statement on p-Values,” cautioning about their misuse and warning that scientists were wrongly interpreting their meaning and consequently generating lots of false positives. The ASA bemoaned, “Let us be clear. Nothing in the ASA statement is new. Statisticians and others have been sounding the alarm about these matters for decades, to little avail” (Wasserstein and Lazar 2016, 130). In 1962, Jacob Cohen published a meta-analysis of abnormal and social psychology experiments, noting the frustratingly low statistical power of most published research (Cohen 1962). He cautioned

Five Models of Science, Illustrating How Selection Shapes Methods

that many studies were not sufficiently powered to adequately provide confirming or disconfirming evidence, leading to an excess of spurious results. In the late 1980s, two studies provided new meta-analyses investigating whether there had been any improvement to the average statistical power of psychological research (Sedlmeier and Gigerenzer 1989; Rossi 1990). They found no improvement. Recently, Richard McElreath and I updated those studies and confirmed that, on average, there was no improvement to the average statistical power in the social and behavioral sciences through 2011, with an average power to detect small effects of 0.24 (Smaldino and McElreath 2016).7 Szucs and Ioannidis (2017) provided a focused study of ten thousand papers published in psychology, medicine, and cognitive neuroscience journals between 2011 and 2014 and similarly found very low power in all three fields. The natural selection of bad science appears to be pernicious. I previously noted the importance of replication for assessing the true epistemic value of hypotheses. Could replication similarly help to curb the degradation of methods? One particularly interesting, if extreme, suggestion came from Rosenblatt (2016), who proposed that the authors of each published paper, or their host institutions, sign a contract committing them to pay a fine if their studies fail to replicate. Let me be clear: this is a terrible idea. As stated earlier, occasional failure to replicate is to some extent the price of doing business in scientific research. However, it is one of the more concrete suggestions for using replication to improve science. So we put it—or something like it—into the model. Under our replication extension, all labs committed a proportion r of their investigations to replicating previously published results. We assumed that all replications were publishable regardless of the result and carried half of the prestige carried by a novel positive finding.8 If another lab replicated a finding successfully, the lab that published it originally got a small boost in prestige. If another lab failed to replicate a finding successfully, the original authors suffered a tremendous loss of prestige. To be honest, we thought this extreme intervention would curb the decline in effort and the runaway false discovery rate. In hindsight, it is clear why it didn’t. Although some labs did suffer a huge loss of prestige, the most successful labs were still those who cut corners and avoided being caught. Incentive structures that push scientists to boost quantitative metrics like publication counts and impact factors can lead to the degradation of methods. This dynamic requires no fraud or ill intent on the part of individual actors, only that successful individuals transmit their methods.9 From this, we might conclude that changing individual behavior—each of us improving our methods—is not sufficient to im-



Paul E. Smaldino

prove scientific methods; this requires institutional change. Specifically, it requires that the selection bottlenecks of hiring and promotion are not overly focused on those metrics but can instead provide a more nuanced assessment of researcher quality that maintains high methodological integrity. Unfortunately, institutional change is far from easy. For the most part, institutions are not meant to change easily. They provide a stable framework that structures social interactions and exchanges, and ensure some consistency to the operation of a society in the absence of enforcement by specific individuals (North 1990). This means that we run into trouble when our institutions are unhealthy. If we are to change the institutional incentives for publishing in academic science, we should be aware that such change will likely be slow. Is there anything else that can be done in the short run? There are many efforts currently under way to improve the norms and institutions of academic science regarding rigor and reproducibility, often under the banner of the “Open Science” movement (Nosek, Spies, and Motyl 2012; Munafò et al. 2017). Some of these new norms include preregistration and registered reports (Nosek and Lakens 2014; Chambers 2017), preprints (Bourne et al. 2017; Smaldino 2017b), double-blind and open peer review (Mulligan, Hall, and Raphael 2013; Okike et al. 2016; Tomkins, Zhang, and Heavlin 2017), and better training in methods, statistics, and philosophy of science. At the same time, funding agencies are increasingly paying attention to what gets funded, and some have been shifting how they fund new research projects. How do these developments influence the conclusions from our fourth model of science? A Fifth Model of Science: Follow the Money Our fourth model of science makes several pessimistic—if realistic— assumptions about the way academic science works in our era. However, changes in just the last few years prompt us to challenge some of these. I want to focus on three specific assumptions and discuss what happens when we relax or alter them. Assumption 1: Publishing negative results is difficult or confers little prestige. This assumption is realistic, because negative results are rarely published (Fanelli 2012) or even submitted (Franco, Malhotra, and Simonovits 2014). However, there is an increasingly large push to publish negative results. Many journals now accept registered reports, in which the research plan is peer reviewed before a study is conducted. Once approved, the paper’s acceptance is contingent only on adherence to the submitted plan and not on the character of the results (Nosek and Lak-

Five Models of Science, Illustrating How Selection Shapes Methods

ens 2014; Chambers 2017). A recent study by Allen and Mehler (2019) found that among studies using registered reports, 61% of results did not support the authors’ original hypotheses, compared to estimates of 5%–20% of null findings in the wider literature.10 What if publication bias against negative results were eliminated? Assumption 2: Publishing positive (confirmatory) results is always possible. This assumption ignores the corrective role of peer review in maintaining high-quality research. The assumption is realistic, because there is little evidence that peer reviewers can act as effective gatekeepers against false discovery. The many failed replications discussed earlier in this chapter testify to that. Peer review may in many cases be more about maintaining group norms than about weeding out error. There is widespread evidence that peer reviewers can be biased toward prestigious individuals and institutions and against authors who are women and underrepresented minorities (Budden et al. 2008; Tomkins, Zhang, and Heavlin 2017). If peer review was reliable, we should expect consistency between reviewer recommendations. Instead, a number of studies have found low correlation between reviewer decisions on grant panels (Cole and Simon 1981; Marsh, Jayasinghe, and Bond 2008; Mutz, Bornmann, and Daniel 2012), conference proceedings (Langford and Guzdial 2015; Deveugele and Silverman 2017), and journal articles (Peters and Ceci 1982; Cicchetti 1991; Nicolai, Schmal, and Schuster 2015). Nevertheless, we increasingly see efforts to improve the conditions that facilitate effective peer review. Registered reports remove biases based on the novelty or expectedness of a study’s results (Nosek and Lakens 2014; Chambers 2017). Double-blind peer review aims to reduce biases, including those based on prestige, familiarity, gender, race, or ethnicity (Mulligan, Hall, and Raphael 2013; Okike et al. 2016; Tomkins, Zhang, and Heavlin 2017). Journals increasingly require or incentivize open data and methods, which improves the ability of peer reviewers to assess results, and the increased use of repositories such as OSF (Open Science Framework) and GitHub has helped to facilitate this behavior. Open peer review and the increased use of preprint servers also allow for a greater number of critical eyes to read and comment on a manuscript before it is published (Bourne et al. 2017; Smaldino 2017b). And better training in statistics, logic, and best research practices—as evidenced by the popularity of books, massive open online courses, podcasts, symposia, and conferences on Open Science—may promote more informed reviews. What if peer review was effective at filtering out false discovery? Assumption 3: Research productivity is constrained only by the ability to complete projects. This assumption ignores the role of funding, which is required for much scientific research. This assumption was justified



Paul E. Smaldino

1.6. A fifth model of science. In addition to the Science and Evolution phases,

labs also compete for grant funding, which enables them to conduct more research.

by the desire to ignore differences in access to funding and focus on the bottlenecks at hiring and promotion. Moreover, if one assumes that success in securing grant funding results from success in the quantity and prestige of one’s publications, then including explicit funders in the model is unnecessary. Instead, what if funders ignored publication records, or even focused on funding projects with the most rigorous methods? The norms of hiring and promoting researchers based on simple metrics are entrenched in deeply rooted tradition and diffuse across many academic institutions; they will not be changed quickly or easily. In contrast, the recent changes highlighted above are occurring rapidly, due to greater top-down control from journals and funders. To investigate the consequences of these changes, we will once again revise our model of science. We again consider a finite population of labs. Each lab has a characteristic methodological rigor (or lack thereof), which is linked to the false positive rate of the results they obtain. In our fourth model, a lab’s productivity was limited only by its rigor. This time, investigating hypotheses requires funding. Each lab is initialized with some start-up funds it can use to conduct research. Once these funds are exhausted, additional funds must be acquired from grant agencies.

Five Models of Science, Illustrating How Selection Shapes Methods

To our two phases of Science and Evolution, we add a third: Grant Seeking (figure 1.6). In the Grant Seeking phase, some of the labs apply for funding, and the one that best matches the funding agency’s allocation criteria is awarded a grant. We might consider any number of strategies. My colleagues and I have considered those based on publication quantity, funding labs at random, and targeting those labs with the most rigorous methods. The Science phase looks quite similar to that of our previous models, having three phases—hypothesis selection, investigation, and communication. Here we may also take the opportunity to study changes to peer review and publication bias as discussed. In the communication phase, positive results are always published, and negative results are published with probability p. Erroneous results (in which the result does not reflect the real epistemic state of the hypothesis) are successfully blocked during peer review with probability r. The Evolution phase works exactly as it did in the previous model, such that labs with more publications are most likely to transmit their methods to the next generation. This is worth repeating: the selection pressure for publication quantity is still present. For a detailed analysis of this model, see Smaldino, Turner, and Contreras Kallens (2018). Here, I summarize our main results. First, we can ask whether, in the absence of any contributions from funding agencies, curbing publication bias and improving peer review can promote substantial improvements to reproducible science. There is bad news, then good news, and then bad news again. The bad news is that, taken one at a time, each of these improvements must be operating at nearly maximum levels for any improvements to occur. That is, negative results must be published at equal rates as positive results, and peer reviewers must be nearly perfect in detecting false discoveries. The good news is that the effects of these two interventions are additive, so that moderate improvement to both publication bias and peer review can decrease the rates of false discovery to some extent. The bad news (again) is that this effect operates on the published literature, so that more published results are true, but does little to improve the quality of the scientists who produce that published research, at least in terms of methodological rigor. We still get bad scientists; it’s just that institutions won’t allow them to publish their worst work. This is doubly troubling if we then expect those same corner-cutting researchers to perform exemplary peer review. We next turned to an exploration of funding strategies. We first studied very simple strategies and found that a strategy of purely random funding allocation is little better than directly funding labs based on publication history. We did find that if funding agencies could ef-



Paul E. Smaldino

fectively target those research groups using the most rigorous methods, the degradation of research quality can be completely mitigated. This is, however, a big “if.” Rigor is notoriously difficult to assess, and it is probably quite unrealistic to assume that funders could consistently and accurately infer the quality of a lab’s methods. So it appears at first glance that random allocation is unhelpful and that funding focused on rigor works but is probably a pipe dream. These results were discouraging, to say the least. However, we then started paying more attention to the emerging literature on modified funding lotteries, which incorporate aspects of funding strategies focused on both randomness and rigor. Recently, a number of scholars and organizations have supported a type of lottery system for allocating research funds (Barnett 2016; Fang and Casadevall 2016; Bishop 2018; Avin 2018; Gross and Bergstrom 2019), usually proposing that a baseline threshold for quality must first be met in order to qualify projects for consideration in the lottery. Although rigor may be difficult to assess precisely, at least some information about the integrity of a research lab is often available. Such lotteries may confer advantages not directly related to reproducibility, including (1) promoting a more efficient allocation of researchers’ time (Gross and Bergstrom 2019); (2) increasing the funding of innovative, high-risk/high-reward research (Fang and Casadevall 2016; Avin 2018); and (3) reducing gender and racial bias in funding, as well as systemic biases arising from repeat reviewers or proposers coming from elite institutions (Fang and Casadevall 2016). Such biases can lead to cascading successes that increase the funding disparity between those who, through luck, have early successes and those who don’t (Bol, de Vaan, and van de Rijt 2018). However, the potential influence of modified lotteries on reproducibility had not previously been studied. We investigated a funding strategy in which funds were awarded randomly to the pool of qualified applicants. Applicants were qualified if their methodological rigor (equivalent to the inverse of their characteristic false positive rate) did not fall below a threshold. We found that this strategy could be extremely effective at reducing false discoveries, even when using fairly modest thresholds (such as restricting funding to labs with false positive rates below 30%). Even better, when modified lotteries were paired with improvements to peer review and publication bias, the model produced dramatic improvements to both the scientific literature and the scientists producing that literature. This indicates that funders who prioritize research integrity over innovation or productivity may be able to exert a positive influence over the landscape of scientific research above and beyond the individual labs they fund.

Five Models of Science, Illustrating How Selection Shapes Methods

Many of the interventions heralded by the Open Science movement—including registered reports, preprints, open data, and the like— have undeniable value. This model indicates that these interventions are likely to be insufficient to sustain the persistence of high-quality research methods as long as there are strong incentives for maximizing simple quantitative metrics like publication quantity and impact factor, which act as proxies for desirable but complex and multifaceted traits. On the other hand, the model also provides room for cautious optimism. Even in the face of strong selective pressures for publication at the key bottlenecks of hiring and promotion, science may nevertheless be improved by countervailing pressures at other bottlenecks, such as the competition for funding, if they promote rigor at the cost of productivity. Discussion This is a chapter about how institutional incentives shape behavior in academic science. Methods are shaped by cultural selection for practices that help researchers optimize the criteria on which they are judged, hired, and promoted. Selection can shape practices even in the absence of strategic behavior to change those practices. If methods are heritable, selection is sufficient to be damaging. The improvements promoted by the Open Science movement, as well as by well-intentioned funding agencies, are important. The models indicate that they can do some good. Beyond what is captured by the models, these practices may produce normative shifts by becoming associated with prestige and by promoting the informal punishment of transgressors. However, the models also indicate that Open Science practices are not sufficient if selection continues to favor easily measured evaluation metrics over more holistic, multidimensional assessments of quality. This conclusion forces us to consider exactly what properties we want in our academic scientists. This is also a chapter about cultural evolution. In the last few decades, a new interdisciplinary field has emerged. It has provided formal models, increasingly backed by empirical research, of how individuals maintain cooperative participation (e.g., Boyd and Richerson 1992; Hooper, Kaplan, and Boone 2010), how they acquire and transmit cultural information (e.g., Henrich and Gil-White 2001; Kendal et al. 2018), and how the population dynamics of cultural traits unfold as a result (e.g., Boyd and Richerson 1985, 2002; Mesoudi 2011; Turchin et al. 2013; Waring, Goff, and Smaldino 2017). In October 2018, the Cultural Evolution Society held its second meeting in Tempe, Arizona, with over two hundred participants representing psychology, anthropology, archaeology, behavioral ecology, genetics, linguistics, economics, sociology, engineering, and mathematics. It behooves those who are interested



Paul E. Smaldino

in the science and sociology of science to pay attention to this field, for its primary focus is cultural stability and the dynamics of cultural change. It also appeared to me, as a participant, that much of the science presented was of unusually high quality. It is possible that, when one has to present work to those unfamiliar with the methodological norms of a small subfield, there is a strong incentive to be extraordinarily thorough and transparent. Although field-specific expertise is invaluable in assessing research, it may also be that cross-disciplinary communication has an important role to play in maintaining methodologically rigorous research. This is also a chapter about models. I have presented a series of five models, each of increasing complexity, to help us understand and explain the process and cultural phenomenon of scientific research. How we model science shapes our ability to identify both problems and solutions. Even at their most complex, models involve drastic oversimplification. The models I have presented focus on hypothesis testing—the fact-finding portion of science—and ignore the critical role of theory building. In these models, hypotheses are independent of one another, rather than interconnected. Hypotheses are formulated as clearly true or false, and results are formulated as unambiguously positive or negative. The later models characterize competition as being solely about publication, whereas network effects and research topics also drive success. Perhaps most importantly, the models ignore innovation and the social significance of results. Taken in isolation, these models represent a fairly crude way of thinking about science. However, the point of a model is not to capture all the nuances of a system. The point of a model is to be stupid (Smaldino 2017a). By being stupid, a model clarifies the aspects of the system we should be paying attention to and makes clear the aspects we do not include, forcing us to consider their influence on a system we now at least partially understand. Models are not the sum total of our understanding, but they can scaffold our imaginations toward a richer and deeper understanding of complex systems (Haldane 1964; Schank, May, and Joshi 2014). The models I have presented have focused on the factors that make positive results more or less likely to represent true facts. That is an important question about how science works, but it is far from the only question. A more complete understanding of the system requires many models with many perspectives and many different stupid oversimplifications. With them we can consider, for example, how false facts are canonized through publication bias (Nissen et al. 2016; Romero 2016), how funding allocation affects the efficiency of research effort (Avin 2018; Gross and Bergstrom 2019), how group loyalties and gatekeeping institutions can stifle innovative paradigms

Five Models of Science, Illustrating How Selection Shapes Methods

(Akerlof and Michaillat 2018), how scientists select important research questions (Strevens 2003; Weisberg and Muldoon 2009; Thoma 2015; Alexander, Himmelreich, and Thompson 2015; Bergstrom, Foster, and Song 2016; O’Connor 2019; Zollman 2018), and how we might develop better theories (Stewart and Plotkin 2021). To some extent, this is a chapter about how incentives for publication ruin everything and how those incentives have to change. However, it should not be taken as a story about how we academics are powerless in the face of the mighty incentives. It’s true that we inherit the culture into which we are born and develop, but it’s also true that we collectively create the culture in which we participate. Collectively, we have the power to change that culture.


Chapter 2

Pooling with the Best Justin P. Bruner and Bennett Holman

Work in epistemology is increasingly focused on better understanding the flow of knowledge and the various social factors that influence the creation and dissemination of knowledge. This reorientation of epistemology is far-reaching, forcing us to reconsider and reevaluate the way in which deliberation occurs in a democratic society, as well as helping us better understand and structure legal and political institutions. To more precisely illustrate this, consider the case of scientific inquiry. Science is a deeply social activity, for even though experiments are sometimes conducted in isolation, the interpretation and sharing of scientific results is governed by social and publication practices. These practices can be evaluated based on the epistemic outcomes they tend to produce. Publication bias, the tendency to publish statistically significant results and “shelve” nonsignificant results, has a dramatic and detrimental effect on our ability to correctly assess the efficacy of certain medical interventions (Begley and Ellis 2012; though recent developments in statistics could mitigate this issue: Bruner and Holman 2019). To provide another example, the norms that determine how credit and esteem are allocated go toward helping ensure that the community as a whole distributes its efforts across various projects in an efficient fashion (Kitcher 1990). Social structure matters, and as a result many philosophers of science have investigated how norms and incentives can lead to more or less successful inquiry. One aspect that has been of particular interest in recent years is the effect communication structure has on experimentation. Scientists communicate their findings and hunches to peers as well as the media in both formal and informal venues (journal articles, in40

Pooling with the Best

terviews, over beer). Kevin Zollman (2007, 2010) has shown how communication networks, and in particular the density of such networks, shape the reliability of group inquiry. Relatedly, Huttegger and Skyrms (2009) have explored how agents can learn to communicate and arrange communications in a mutually beneficial fashion. Here, we build off these and other papers on the topic, bringing into focus a picture of how groups can come to structure their interactions so as to promote reliable group inquiry. In particular, we focus on whether myopic individuals using simple decision-making heuristics will naturally produce communication structures in which those members who are more knowledgeable or competent have greater influence on the group as a whole. This would ensure not only that individuals benefit from pooling information but also that they come to interact and pool with those who are particularly competent in the domain of interest. Being able to do so would provide us with good reason to think that epistemically beneficial group structure can naturally emerge without top-down imposition. However, ultimately we find that the social structures that naturally emerge from our computer simulations are often second best. Furthermore, our analysis uncovers a hitherto unnoticed epistemic social dilemma. If the community determines the truth or falsity of some proposition on the basis of a vote (using majority rule), a randomly generated communication network may be preferred to a hierarchical network (where the most competent are always consulted). Yet from the perspective of each agent, individual accuracy is maximized when one defers to the most competent in the community. What is epistemically best for the individual and what is best for the group appear to come apart. In this chapter, we first discuss some of the prior work done in network epistemology. After discussing the effect network structure has on inquiry, we discuss different accounts of how this structure could naturally arise. Finally, we provide two models of this process and discuss their import for social epistemology. Network Epistemology Network epistemology is important in a vast number of philosophical projects. Since social structure is ever present, we would expect any group activity that involves epistemic activities to depend, in large part, on social structure. For instance, in a democracy, individuals have private beliefs which they may pool or share with others, and the structure of the communication environment may make a difference as to whether the group as a whole is reliable (Grim et al. 2018). Likewise,



Justin Bruner and Bennett Holman

the social and communication structure of science may influence what projects scientists decide to engage in, which may, in some cases, result in discoveries that otherwise wouldn’t have been possible (Zollman 2010). Epistemologists have realized the huge importance of communication and have recently turned to simulations and agent-based models to try to better understand how information is pooled and processed by a community. There is no one moral to be found here but instead a suite of different models that can be drawn on to inform discussion in a particular domain. We outline some below. Perhaps one of the earliest models of network epistemology is due to French (1956), DeGroot (1974), and Lehrer and Wagner (1981), who all contributed to the development of pooling or aggregation models. Briefly, each individual begins with a particular opinion about some question (how likely is it to rain tomorrow, for instance), as well as an array of influences, where each entry in the array specifies how much weight they give the opinion of a particular individual in the group. The individuals all use their array and the beliefs of others to update their own belief. If this process is iterated, it can be shown that under certain conditions the community will converge on a stable state where all members believe the same thing. The exact content of what they believe, of course, will be determined by the initial beliefs of the agents and the level of influence the agents grant one another. Details of this model aside, for us what is most important is that individuals have an influence array which corresponds to a social network and that this social network induces consensus. Bala and Goyal (1998) consider a network game where all individuals have some private information and the exchange of information is mutually beneficial. A complete network will ensure that all private information is “out in the open,” but if establishing and maintaining a line of communication between two agents is costly, then the complete network is inefficient. In cases where connections are costly to establish and information does not deteriorate—that is, the quality of information is the same regardless of how many links stand between two agents—any minimally connected network is efficient. Furthermore, these networks are also stable: no individual wants to change their connections. Building on Bala and Goyal, Kevin Zollman (2007) developed a suite of network models involving the so-called bandit problem. In Zollman’s model, individuals must take one of two possible actions. Individuals endeavor to take the “better” of the two actions and can acquire information about the quality of the options from either personal experience or the experience of their neighbors. Individuals then update their beliefs about the two actions on the basis of prior rounds of experi-

Pooling with the Best

mentation. Somewhat surprisingly, Zollman demonstrates by computer simulation the existence of a trade-off between speed and reliability. Consensus occurs in short order in dense networks with ample connections, but it is more likely to be agreement on the wrong view than if a sparser set of connections had resulted in a slower-forming consensus. Network Formation Communication matters, and as the previous section has illustrated, the specific structure of the epistemic group determines, in large part, just how effective communication is. But where does this social structure come from? In the models described above, the existence of a fixed communication structure was taken for granted. Yet communication networks in our social world are fluid and ever evolving. Moreover, there are cases where communication fails to occur even though the sharing of information may be mutually beneficial. Thus, an account of group inquiry should explore not only the effect social structure has on the flow of information but under what conditions agents will opt to share private information and, if they do, the contours of the resulting communication network. Consider, for instance, the network game of Bala and Goyal (2000). The optimal communication structure depends on the costs associated with communication and whether information deteriorates. In the simple case where ties are costly but information does not deteriorate, the circle is an equilibrium. Yet if the community is originally arranged in some alternative fashion, will individuals change their connections in a way that will eventually lead to the circle arrangement? Huttegger and Skyrms (2008) provide an early affirmative answer to this question by exploring a model where agents update their links using the Herrnstein reinforcement dynamic, a common model of learning used in psychology. Cost-minimizing communication structures emerge without the influence of some third party or social planner.1 Similarly, Holman and Bruner (2015) adapt Zollman’s (2007, 2010) network epistemology models to consider cases of dynamic network formation. As in Zollman’s original model, agents confronted by a two-armed bandit update their beliefs based on a series of experiments conducted by themselves as well as others they are connected to on a social network. Individuals in Zollman’s original model cannot sever ties with their peers and are forced to give equal consideration to all they communicate with. Holman and Bruner relax these assumptions and allow agents to amplify or weaken ties with peers. In particular, an agent strengthens a tie with a peer who produces data consistent with the focal agent’s beliefs (and weakens the connection if this is not the case).



Justin Bruner and Bennett Holman

Holman and Bruner find that while this allows for polarization in the short run, the individuals eventually converge on the complete network. In later work, Bruner and Holman (forthcoming) considered cases of “apparent disagreement,” in which two agents had seemingly conflicting beliefs but, unbeknownst to them, an unknown factor actually mediated their situation such that both agents were correct. As an example, consider the case where two doctors disagree about whether drug A or drug B is superior. Suppose it is in fact true that drug A is superior for the types of patients one doctor sees and drug B is superior for the types of patients the other doctor sees (perhaps because the second drug is poorly metabolized by the elderly and the first doctor works in a nursing home). Given that the doctors remain in the dark about this fact, Bruner and Holman explore the conditions under which such communities can form communication structures that maintain an appropriate dissensus about the superior action. In short, the normatively appropriate state for the community is to be polarized.2 Relatedly, Goodin and Spiekermann (2015, 2018) explore a network model involving two “epistemic groups.” Those of the same group have the same interests, and in-group and out-group members have opposing interests. However, group membership is not immediately obvious, and individuals must learn to pool with in-group members and avoid pooling with out-group members. While in these simulations they show that it is sometimes possible for agents to learn to communicate only with in-group members, this is not at all a moral certainty, and Goodin and Spiekermann outline conditions where the minority group (the “epistemic elites”) are able to sway the population due to the emergent communication structure. Works by Goodin and Spiekermann (2015) and Bruner and Holman (forthcoming) are interesting because they both explore situations where agents must in some sense learn to connect to the right people. While the main issue with the Bala-Goyal model was ensuring there were no costly redundancies in the communication network, the problem in these latter models is to identify and pool with reliable agents. Barrett, Skyrms, and Mohseni (2019) explore this kind of effect in the context of a model that resembles the Bala-Goyal model. In their model, some individuals are better than others at learning some truth from nature experimentally. Individuals then must determine to either learn asocially from nature or connect to others and access the truth through their social connections. They find that this setup results in a rather hierarchical structure, in which the most competent agent learns asocially and others connect either to this agent or to others who are connected to this asocial learner. In other words, the community was able to naturally

Pooling with the Best

arrange itself so the most reliable individuals are deferred to and others do not bother to experiment but instead learn socially. The model found in Barrett, Skyrms, and Mohseni (2019) is closest to the simulation we introduce in the next section. However, some features of their model strike us as unrealistic. For one, communication is limited: if an individual selects to learn “socially,” they are restricted to learning from just one person at that time. Furthermore, individuals are forced to choose between asocial and social learning. This may be a reasonable assumption in some cases, but in general both kinds of learning are available to individuals, and more often than not agents choose to supplement their individual experimentation with input from their peers. In the next section we develop our own model, which combines features from some of the models discussed above. Our aim is to understand whether individuals can structure their interactions in such a way that maximizes their epistemic success. We consider cases where connections are costless and individuals can learn from their peers as well as from experimentation. Models We now introduce two models of network formation. In both models agents engage in individual and social learning; that is, individuals get information about the world (i.e., they receive a private signal from nature) but also have the opportunity to pool their opinions with others. Thus, at the end of a round of inquiry, the beliefs of an agent will likely be a function of both their private information and the beliefs of others. Model 1

In this model individuals attempt to determine whether one of two states of the world obtain (S1 or S2). Individuals at the beginning of each round each receive a signal from nature indicating which of the two states obtain. Individuals have a reliability level, r, and the signal they receive from nature corresponds to the actual state of the world with probability r. Nature then randomly selects an agent to update their beliefs. This first agent decides whether they want to pool with others and, if so, which individuals they pool information with. Note that the individual has the option to not “pool” with themself—that is, they can ignore the signal they receive from nature and instead defer to others in the community. Once the agent decides whether to incorporate their private signal, pooling takes place as follows. The agent does a headcount of all the agents they pool with. If a majority believe S1 obtains, the agent adopts the belief that S1 obtains. If more believe S2 ob-



Justin Bruner and Bennett Holman

tains, the agent likewise adopts the belief that S2 obtains. In the case of a tie, a fair coin toss determines whether the first agent believes S1 or S2. After this first agent pools, nature then selects an individual from the pool of agents who have not acted to update their beliefs. Note, however, that if this second person selects to pool with the individual first selected by nature, the belief they pool with is not just a reflection of this person’s private signal from nature. Instead, the belief may be the result of prior pooling. This process continues until all individuals in the community have had the chance to pool with others, at which point the round ends. At the end of the round the actual state of the world is then revealed. Those who correctly identify the state of the world are more likely to consult in future rounds the individuals they pooled with.3 In other words, individuals learn to reinforce successful behavior, meaning that the more success an individual has when they consult another, the more likely they are to consult that individual in the future.4 In the second round of the simulation, nature selects a different state of the world (S1 or S2), and individuals all receive new private signals from nature. Nature then selects a different ordering of agents. The first agent in this ordering then selects to pool (or not), and so on. Reinforcement occurs once again when all individuals have had the opportunity to pool with their peers. We now consider an implementation of this model. In particular, we restrict ourselves to rather small communities (five individuals) and consider what the communication network looks like after two hundred rounds. Further, we consider cases where the distribution of reliability is uniform and individuals are competent, in the sense that they have a better than 50% chance of identifying the correct state of the world. With this setup in hand, we note that our model is similar to Zollman’s network epistemology model, as the individuals are engaged in a process of gathering information from nature, but unlike Zollman’s model, individuals do not directly share their evidence but instead pool beliefs. Likewise, the way in which network paths are forged differs from that of Bruner and Holman (forthcoming); unlike their model, the agents in our model gain access to the truth at some point, meaning they can update their links on the basis of how reliable other agents are. As mentioned, our model is closest to that of Barrett, Skyrms, and Mohseni (2019).5 In their model, individuals have the option of learning socially or asocially, and individuals have different levels of competence. Yet in their model individuals face a forced choice—they must either learn asocially or learn from a peer, and when social learning does occur, it is restricted so that an individual can learn from only one agent.

Pooling with the Best

However, we allow both forms of learning. Since individuals in the real world can help themselves to both forms of learning, we consider our model to be a closer approximation of the way in which members of an epistemic community learn. Model 1 Results

We now turn to the results of our computer simulation. In the baseline model, five agents are considered. We assume the signal from nature that each individual gets is better than chance (i.e., people have a better than 50% chance of getting the state correct if they just guess based on the signal they receive from nature). Finally, individuals have the option of pooling with two individuals. Individuals need not pool, however, and they can also not attend to their private signal. Simulations are run for a total of two hundred rounds. In this case, we find a strong positive correlation between reliability and influence. In other words, the more reliable the agent, the more likely others will defer to them and take their beliefs into account when they pool. Figure 2.1 illustrates this tendency. Note, however, that the relationship between influence and reliability is by no means perfect. More reliable individuals are more likely to be consulted by their peers, but this does not mean that the most reliable individual will be directly consulted by all others in the community. Whether individuals indirectly consult others via the communication network is a messy matter. We find that in many cases individuals are indirectly connected to the most reliable individual because they pool with someone who in turn pools with the most reliable agent. Thus, the focal individual is, to a certain extent, influenced by the most reliable agent. Yet the level of influence is rather weak, as the agent is not directly pooling with the most reliable but instead pooling with an associate. This means that the private signal of the reliable agent is diluted by being mixed with the signals of many others.6 For this reason it is not always advantageous to be indirectly linked to the most reliable agent, and this is exactly what we see in simulations. In some cases, individuals are neither directly nor indirectly connected to the most reliable agent, instead interacting with the second or third most reliable agent, who in turn pool with each other. Nonetheless, we retain the striking pattern displayed in figures 2.1 and 2.2. The more reliable the agent, the more likely it is that others will consult them. This relationship can be strengthened by changing key parameters in the model. For instance, if we instead allow the simulation to run for one thousand rounds (as opposed to two hundred), the relationship between reliability and influence is more pronounced (see


2.1. Relationship between reliability and influence. Data comes from 1,000

simulations, each run for 200 rounds. The x axis measures an individual’s reliability as a proportion of the average reliability of the community. The y axis tracks influence as the proportion of individuals who consult a given agent.

2.2. Relationship between reliability and influence. Data comes from 1,000

simulations, each run for 1,000 rounds. The x axis measures an individual’s reliability as a proportion of the average reliability of the community. The y axis tracks influence as the proportion of individuals who consult a given agent.

Pooling with the Best

figure 2.2). Giving the agents more time to determine whom they want to interact and pool with results in more consultations with those who are reliable, although it is still the case that not all agents directly defer to the most reliable group member. Changing the distribution of reliability also strengthens the relationship between reliability and influence. If reliability is determined by a draw from the uniform distribution from 0 to 1, then there is a much tighter relationship between reliability and influence. Those with lower-than-average reliability are almost never consulted. Moreover, these less reliable individuals also learn to ignore the private signal they receive from nature and instead defer to the beliefs of others in the community. Model 2

We now introduce a related but importantly different model of inquiry. Individuals once again receive a private signal from nature. However, the signal is now conceived of as a number on the unit interval. One natural interpretation of this is that the agent is no longer attempting to determine which of two possible states obtain and is instead attempting to estimate some parameter (such as the success rate of a drug treatment, for instance, or the likelihood of some future event). As a result, signals are modeled as a number on the unit interval. An individual can then select to pool their signal with others, as we discuss below. After pooling has occurred, the underlying parameter of interest is revealed. If the agent’s estimate is close to the actual parameter, they firmly reinforce their behavior. Put slightly differently: agents are more likely to pool with those who in previous rounds helped them better estimate the parameter of interest. As in model 1, we simulate this for two hundred rounds with an eye toward the social structure that naturally emerges. Before we discuss this, however, a few important additional details of our model are in order. As mentioned, individuals receive a signal from nature in the form of a number on the unit interval. We assume that this signal is drawn from a normal distribution with mean µ and variance σ2. In our baseline model, we assume individuals draw from independent and identical distributions. We then alter this to capture the fact that some agents may be better placed to identify the parameter of interest. This is done in two ways. In the first, all individuals draw from a normal distribution with the same mean, but some individuals draw from “noisier” distributions. For instance, individual i draws from the distribution N(µ, σ2i), while j draws from distribution N(µ, σ2j ), where σ2i < σ2j . Although both draw from normal distributions centered around µ, agent i’s signal is more likely to be close to the actual value of the parameter since this agent is



Justin Bruner and Bennett Holman

drawing from a less noisy distribution. Consider, for instance, the limiting case where agent k draws from the distribution N(µ, σ2k ), where σ2k is 0.001. In this case, the value of agent k’s signal will almost always be nearly identical to the true parameter value of interest. In other words, some agents can be said to be “more reliable” than their peers. Our second deviation from the baseline considers situations where agents may be systematically biased. For instance, an individual may have a tendency to overestimate parameters, always thinking new medical treatments, for instance, are slightly more efficacious than they in fact are. While this optimist may have a sunny disposition, there may also exist more naturally skeptical or incredulous individuals who tend to underestimate treatment efficacy, to stick with our example. This, once again, can be modeled in a somewhat straightforward fashion. Our optimist, for instance, draws from N(µ + βi, σ2) while our pessimist from the distribution N(µ − βj, σ2), where βi and βj are a measure of just how biased the individuals are.7 It is worth briefly pausing to discuss how these alterations are to be interpreted. As mentioned, individuals in our model are taken to be engaged in inquiry. Why, then, would optimism regarding some parameter translate into them receiving a signal from nature that is systematically skewed? Why is nature responsive to their sunny (or rainy) disposition? We can provide two interpretations at this moment. The first is that nature is not responding to their disposition. Instead, individuals allow their bias to influence the way in which they interpret signals from nature. On this reading, both the optimist and pessimist may receive the same signal, but the optimist is prone to misinterpret the signal in a particular way. Another viable interpretation is that in the course of inquiry, the individuals are using different methods or tools to gain information (i.e., the agents have different ways of generating a signal from nature). These tools can be miscalibrated and as a result systematically provide underor overestimates. This could be due to no fault of the agent—perhaps the individual mistakenly thought her method of inquiry was more reliable than it is in fact. Either way, once individuals receive private signals from nature, they must then determine which of their peers they pool with. This process is nearly identical to the way pooling takes place in model 1, with two exceptions. When an agent decides to pool with another, they take the agent’s belief and average it with the beliefs of the others they pool with.8 Once pooling takes place, the actual value of the parameter of interest is revealed to the community and the individuals then engage in the reinforcement process. If an individual’s belief is close to the ac-

Pooling with the Best

tual underlying value of the parameter, the individual reinforces heavily on those it consulted, meaning the agent is more likely to consult with these individuals in the future. Model 2 Results

We observe some familiar patterns. As was the case with model 1, there is a general tendency for those more reliable agents to have more influence over the community. Recall that reliability is determined by the features of the distribution nature draws from when selecting a signal. In particular, if the mean of this distribution is centered on the truth and the variance is low, then the agent will often receive a signal that is very close to the true value of the parameter. In the case where all distributions are centered at the truth, those agents associated with a distribution with smaller-than-average variance are reliable in the sense that they are more likely than their peers to receive a signal that is close to the truth. In these cases, as in model 1, individuals tend to consult these reliable peers. What of those cases where individuals can be systematically biased? As discussed above, individuals with a sunny disposition may routinely overestimate the efficacy of a drug or medical treatment, while their less optimistic peers may often underestimate efficacy. Will these “biased” individuals be routinely ignored? Will the population instead just attend to those “unbiased” individuals? As it turns out, the intuitive claim that biased individuals will be ignored is not quite right. In some cases, individuals with complementary biases end up listening to one another. An optimist’s hope can be mediated by a skeptic’s pessimism. If individuals are biased in different directions but the strength of bias is approximately the same, the result is an assessment that is on average close to the truth. And in some settings, the optimist does best to consult with the pessimist and not pool with an unbiased agent. Pooling with the unbiased agent will not completely reduce the bias inherent in the optimist’s initial judgment. Overall, the results of this slightly more complicated model reinforce the findings from model 1, with the caveat that biased agents can still be influential, but their influence may be limited to those with a complementary bias. Pooling with the Best and an Epistemic Social Dilemma As we saw, while there is a tendency for individuals to pool with those with higher-than-average reliability, it is not the case that the most reliable member of the group is always consulted. In other words, individuals don’t always identify the most reliable agent and pool with them. Nonetheless, the emergent network ensures that on average individuals



Justin Bruner and Bennett Holman

are more likely to arrive at the correct answer. Even though the most reliable members are not always consulted, endogenous network formation is in the interest of the individual agents, as it allows them to better estimate the truth. Yet there is a sense in which the kind of endogenous network formation we have observed is, from the perspective of the group, epistemically problematic. Consider the following situation where all individuals consult only a few members, one of which is the most reliable agent in the community. In this case, we posit, the group may not do well epistemically if the method the group uses to determine the truth is a majority-rule vote. Consider, for instance, the case where the most reliable individual is correct 75% of the time while all others have a reliability score of 70%. If all agents pool with the most reliable individual, when this most reliable individual gets things wrong, the whole community may be misled (the result of a vote will be unanimous support for the wrong state of the world). Thus, this kind of hierarchical community may do epistemically worse than a community with a random communication network. In this latter case, while individuals may not be pooling with the best individual, their errors are not perfectly correlated since they do not all pool with the same individual or small set of individuals. Taken together, this suggests a kind of epistemic social dilemma. Individual agents who just care about their individual epistemic success (their accuracy, so to speak) will have incentive to find and identify more competent agents and learn from them. As our simulations indicate, this is likely to happen—agents will often end up pooling with those individuals who indeed are extraordinarily reliable. Yet when taken together, if all behave in this fashion and look to the most reliable agent, the independence assumption underlying Condorcet’s jury theorem is violated,9 and a majority vote will often fail to correctly identify the state of the world. This epistemic social dilemma is distinct from other dilemmas discussed in the literature. List and Pettit (2004), for instance, consider an epistemic social dilemma, but the dilemma holds under rather stringent assumptions. For one, their dilemma requires that individuals understand Condorcet’s jury theorem and the assumptions that underlie the theorem and, moreover, have a clear sense of how their individual behavior conforms (or fails to conform) with the assumptions of Condorcet’s jury theorem. In our case, all that is required is for the individuals to care about their individual accuracy and to know that there is a limit to how many people in the community they can pool with. Given that both time and cognitive constraints limit the number of individual

Pooling with the Best

agents one can reasonably consult, we contend that this is a reasonable restriction to impose. Summing up, we’ve seen how it is possible for communication networks to endogenously emerge and that these networks on the whole are epistemically beneficial. In other words, if left to their own devices, there is a tendency for individuals to gravitate toward those who do better than the average at the task of tracking the truth. Furthermore, this is likely to occur even when our agents are not particularly cognitively sophisticated. All that was required was a very rudimentary form of reinforcement learning. As we’ve mentioned, this work is related to a small handful of papers recently published exploring endogenous epistemic network formation, as well as the more general study of how simple heuristics can lead to surprisingly sophisticated social behavior. Some might hesitate to conclude much from this burgeoning literature because work in this area has relied almost exclusively on computational methods (agent-based models, computer simulations). Simulations, like any method, have their limitations, and empirical investigation is surely warranted. We invite the skeptic to view computational methods as a first step toward better understanding the social organization of scientific and epistemic communities and as a viable means of generating hypotheses that can be empirically tested. For example, some early theoretical results on epistemic networks have been confirmed by laboratory experiments (Jönsson, Hahn, and Olsson 2015). Finally, recall our discussion of epistemic social dilemmas: social arrangements that are optimal from the perspective of the individual but fail to maximize group accuracy. Social dilemmas of this kind suggest that increasing the cost of communication or the cost and difficulty of identifying reliable agents may in some instances be to the benefit of the group. Furthermore, flat or egalitarian communication networks—where there is no central hub—may outperform more hierarchical structures. This last point is particularly troubling, as both the communication and the social structure of many scientific communities appear to more closely resemble the latter than the former.


Chapter 3

Promoting Diverse Collaborations Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

Philosophers of science and social scientists have argued that diversity in scientific communities is critical to the progress of science and have explored initiatives that might help diversify science. However, there has been much less work done on promoting diversity in collaborative teams, where scientists are actually interacting and working with those unlike themselves. Rubin and O’Connor (2018) use evolutionary game-theoretic models to show that when members of one social group tend to get more credit for collaborative endeavors, this can disincentivize collaboration between groups, leading individuals to collaborate mostly with those like them. This sort of process can negatively impact the progress of science whenever collaboration benefits from diversity. In this paper, we use a similar evolutionary framework to explore the conditions that promote diverse collaborations in scientific disciplines. In particular, we employ agent-based, game-theoretic models of actors in collaboration networks to test how discriminatory norms interact with individuals’ decisions to collaborate across identity groups. We consider various types of policy proposals aimed at improving diversity, including measures to promote the representation of minority groups and active incentives for diverse collaboration, focusing on the latter. As we will outline, a tension arises—some policies that could successfully increase the diversity of scientific collaborations will also increase the level of inequity experienced within the community. In other words, we identify cases where policies to promote epistemic goods and policies to promote social goods in scientific communities can come apart. However, as we further argue, segregation into scientific subfields based on 54

Promoting Diverse Collaborations

social identity can have negative effects if and when subfields associated with minority groups lose standing. We will begin by briefly highlighting the ways that personal diversity might matter to the success of scientific collaboration, using a few historical examples. We describe the modeling framework we will employ here, which uses bargaining models from game theory to explore the emergence of patterns of collaboration. We will also discuss a few relevant previous results. The next section describes different proposals for increasing diversity in collaboration networks. We then look at one of these in more detail, exploring the possibility of directly rewarding diverse collaborations. In the final section, we again appeal to the history of science to discuss how a prolonged lack of diverse collaborations in an otherwise diverse scientific community might itself lead to inequities. Why Diverse Collaborations Might Matter Diversity has been championed as an important feature of successful academic communities, both by those in feminist epistemology/philosophy of science and by those doing formal work in social epistemology. As mentioned, though, this chapter addresses diverse collaborations, not simply diverse communities. That is, we are interested in what might cause collaboration networks to be homophilic, segregated along social identity lines, and what interventions might break these patterns. One question of obvious relevance to this exploration is: Does the type of homophily we discuss actually impede epistemic progress? Presumably, we should be most worried about homophily in epistemic groups if it hurts scientific inquiry. So long as diverse ideas are present somewhere in a community, we might ask, why should it matter whether collaborations themselves are diverse? One might imagine a circumstance in which a researcher from one social identity group is likely to figure out A and a researcher from another group B. If they collaborate, then they might also together conclude C, which follows from A and B. However, the community has another route to concluding C: A and B are published separately, whereupon the community as a whole has access to these ideas and any one of them can conclude C. This is somewhat similar to the picture Okruhlik (1994) has in mind—diverse researchers will generate and test diverse hypotheses, which will then be assessed by the usual scientific methods. In such a case, diversity within a community matters, but diversity within collaborative groups does not. There are a few reasons to think that diversity within collaborations themselves might be important as well, though. First, it is possible that independent discoveries or ideas (accessible to members of differ-



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

ent groups), which together would generate outcomes worthy of publication, are on their own relatively insignificant. In the example from above, A and B might not individually warrant publication and only be valuable realizations insofar as they jointly imply C. If this is the case, it is reasonable to think that A and B would never be published on their own. This might be especially likely if members of one group struggle to publish in top journals due to reputational effects. (This possibility will be discussed more below.)1 In addition, there is some evidence that the actual process of group deliberation and decision-making is altered, and sometimes improved, by personal diversity. Diversity seems to prompt individuals to state their background assumptions and beliefs, rather than assuming they are shared, and to challenge others’ assertions more readily. This is sometimes called information elaboration. For example, Sommers (2006) finds that racially diverse juries deliberate differently by sharing more information. Phillips, Northcraft, and Neale (2006) find that small, racially diverse groups tend to solve problems better than homogenous ones. And van Dijk, van Engen, and van Knippenberg (2012) in a meta-analysis argue that objective measures of performance show general benefits of diversity in teams. More specific to academia, it has been shown that culturally diverse collaborative teams tend to be more productive, arguably due to the presence of diverse skills, experiences, and cognitive frameworks (Barjak and Robinson 2008).2 Additionally, Campbell et al. (2013) find that in ecology, gender-diverse collaborations generate work that is cited more by peers, and they argue that this is an indication of its higher quality. (Though Bear and Woolley [2011] and Eagly [2016], who give literature reviews of work investigating gender diversity and group performance, find contextually sensitive and mixed results.) Last, information spreads more slowly through homophilic networks (Golub and Jackson 2012), so a homophilic epistemic network will be less efficient in that it will take longer for the community to reach various conclusions, assuming (as we think is reasonable to do) that people are more likely to engage with their coauthors in terms of discussing relevant research and spreading ideas. Under certain assumptions, such homophilic networks can also prevent the spread of new and better scientific practices throughout the community as a whole (Schneider, n.d.). In order to elucidate the sorts of cases where diversity might improve collaboration, we will now pull details from a few cases in the history of science. Without regular collaboration between women and men, sexology at the turn of the twentieth century simply would not

Promoting Diverse Collaborations

have prospered as it did. Leng (2013) illustrates how women who participated in the predominantly masculine discourse concerning human female sexuality helped improve the state of the field at the time: “As the British physician Havelock Ellis pointed out, the female sex drive was an ‘elusive’ phenomenon, a ‘mocking mystery’ even, because social prohibitions against female sexual expression made it extremely difficult to acquire accurate and comprehensive information—for male physicians, at any rate” (132–33). Although the male sex drive was well studied, to have a general theory of sexology, the men who dominated the community needed to collaborate with women who were in the position to identify which of their conclusions were not, in fact, general ones but were specific to the male sex drive. Women researchers, on the other hand, were for obvious reasons less able to generate successful insights into aspects of male sexuality. It was by virtue of there being active collaborations across gender lines (even if they did not actually coauthor—the academic environment generally promoted sole-authored publications) that the field of sexology improved (Leng 2013, 147). Another historical example can be found in the early history of public museums, in which collaborations across gender lines were essential to the success of a new field of museum pedagogy. In the United States at the start of the twentieth century, museum educators, predominantly women, were experts on innovative pedagogical techniques appropriate for educating the public and, in particular, for educating younger audiences. These women brought their expertise in pedagogy and library science to newly public-facing natural history museums, run by men with advanced degrees in the natural sciences (Kohlstedt 2013). Productive collaborations between men and women whose expertise differed according to their respectively gendered occupations allowed museums to emerge as institutions of public (and often hands-on) learning and allowed the field of museum pedagogy to thrive. Bargaining and Discrimination The previous section made the case for the potential importance of diverse collaborations to science. We will now use evolutionary game theory to explore possible avenues for promoting such diverse collaborations, drawing on previous results regarding the dynamics of discrimination and collaboration. Rubin and O’Connor (2018) introduce a framework with two elements—a network, where each researcher is a node and each collaborative engagement a link between nodes, and a bargaining game representing collaborative interactions between the individuals on the network. Agents update their collaboration strategies, as well as their



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor Table 3.1. A payoff table for a mini Nash demand game. Rows represent strategies

for player 1, and columns for player 2. Entries list payoffs for combinations of strategies with player 1 first. Player 2 Low Player 1















network links, creating a dynamic system where the way academics treat each other can influence whom they decide to work with. It may sound unintuitive to use a bargaining game as a representation of academic collaboration, but in fact, collaboration is a strategic interaction where researchers have to bargain, whether implicitly or explicitly, to decide (1) how much work each will do and (2) how to divide credit in the form of author order. The particular bargaining game they use, which we will also employ, is a mini version of the Nash demand game. In this model, two actors each demand some portion of a resource—a low, medium, or high amount. When applied to scientific collaboration, these demands are best understood as requests for author position relative to amount of work done. An academic who does the majority of work on a project and is first author is demanding a medium amount, whereas a first author who did little work makes a high demand.3 If their demands are compatible in that they do not exceed the resource (credit per time, in this case), they each get what they demand. If they are incompatible, the assumption is that the actors cannot peaceably reach an agreement, so they instead get a poor payoff called the “disagreement point.” For simplicity’s sake, we assume the resource has a value of 10, the medium demand is always 5, and the other two demands are compatible (3 and 7, for example, or 1 and 9). Table 3.1 shows a payoff table of this game where “Low” and “High” are 4 and 6, respectively, and the disagreement point is set to 0. This game has three “Nash equilibria,” or strategy pairings where actors have no incentive to change what they are doing. Because no one can get a higher payoff by deviating from such an equilibrium, they tend to be stable and, in particular, to show up as the end points of evolutionary processes. In particular, this game has one fair equilibrium where both actors demand “Med” and two unfair or inequitable ones where one actor demands “High” and the other “Low.” We call these latter two equilibria inequitable because one collaborator does more work per credit received and the other less.

Promoting Diverse Collaborations

In the models we will consider, and those considered by Rubin and O’Connor (2018), agents play this Nash demand game on a network and may be of two types, which might represent two races or genders or cultural groups. Individuals can condition their demands based on the type of their partner.4 Note that this model has the capacity to represent something like a discriminatory norm or convention. When members of each group tend to make fair demands within their own type, but between groups one side tends to demand High and the other Low, we call this discrimination. Rubin and O’Connor (2018) show that under many conditions discrimination in this sense can emerge in such models. (See also Poza et al. 2011.) In particular, they find that a group is more likely to end up being discriminated against when they are in the minority.5 Furthermore, when members of one group face discrimination, they tend to break out-group links and collaborate with those like themselves. This results in a homophilic network, rather than one where collaborations tend to include diverse members. (See also O’Connor and Bruner 2019.) In real epistemic groups, notably, similar patterns have been observed. Studies have found that women are less likely to hold prestigious first and last author positions (West et al. 2013; Larivière et al. 2013). Feldon et al. (2017) find that women students in biology labs tend to put in more work but are less likely to be granted authorship on papers. Additionally, in some disciplines, researchers have found that women are less likely to collaborate in general and more likely to collaborate with other women (Ferber and Teiman 1980; McDowell and Smith 1992; Boschini and Sjögren 2007; West et al. 2013). Botts et al. (2014) also find that Black philosophers tend to cluster into subfields. It should be noted that a similar effect is expected when external forces distribute credit inequitably, even if academics are treating each other fairly. For example, in economics author order is alphabetical, but women who coauthor are much less likely to receive tenure than men who coauthor, though this effect is ameliorated if they collaborate with other women (Sarsons 2017). The models from Rubin and O’Connor (2018) would predict that in such a case women would tend to learn to stop collaborating with men. (And indeed, economics is one of the disciplines where this very pattern has been observed.) Diversity Initiatives When inequity or discrimination disincentivize collaboration between members of different social identity groups, what steps can be taken to re-incentivize collaboration among socially diverse partners? Here, we will discuss some possible policy proposals aimed at improving the diversity of individual collaborations using the framework just described.



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

Improving Minority Representation

As mentioned, when one group is in the minority, the chance that they end up discriminated against in network models is higher. Other results on the emergence of bargaining norms suggest that, in general, minority status may lead to similar disadvantage (Bruner 2019; O’Connor 2017; Mohseni, O’Connor, and Rubin 2021). Perhaps, then, a solution is to try to increase the prevalence of members of minority groups in order to promote fair bargaining norms (and diverse collaborations). In the modeling framework described, though, the addition of individuals to a minority group in an existing community will not change existing patterns of bargaining. This is because new members of a group will be in an environment where those around them already tend to adhere to some norm, meaning that their best response will involve adhering to the same norm. Imagine, for example, a community where women are in the minority, everyone makes fair collaborative demands of their in-group, and when men and women collaborate, men demand High and women Low. If more women are added to this network, they will meet men who demand High of them, meaning that Low is their most successful response. These new members of the community will learn to demand Low of men and eventually to avoid collaborating with them, leading to a perpetuation of homophily and nondiverse collaborations. In communities where there is not a stable norm but some variety of behaviors, we should still expect all those incoming academics who meet discriminating out-group members to learn to avoid them. This is not to suggest that there are not good reasons to promote the presence of underrepresented minorities in epistemic groups. The point here is merely that we should not expect the simple addition of minority individuals to change inequitable patterns of behavior to equitable ones or to decrease collaborative homophily. Special Grants for Diverse Collaborations

Another suggestion might be that grant-giving agencies create special initiatives to promote diverse collaborations. There are a few ways to do this. In Rubin and O’Connor (2018), academics only have a certain number of collaborative links available to them. This makes sense, since no one has an infinite amount of time and resources for academic work. Grant agencies, then, might introduce initiatives to clear up the schedules of academics who are interested in engaging in a new collaborative project with an out-group member. This could involve, for example, a paid course release. Another possibility is special money to hire research assistants who can lighten workload and so create more time for those

Promoting Diverse Collaborations

interested in an out-group collaboration. These interventions might be thought of as creating new links for academics, but ones that can be used only for diverse collaborations. As a result, we should expect between-group collaborations to increase under this sort of initiative. There is a possible downside, however. Under this type of initiative, academics will choose to collaborate with out-group members even in cases where they are being discriminated against. After all, some amount of academic credit is better than none. If there are norms and patterns of discrimination in an academic community, then out-group links tend to involve someone being taken advantage of. The gain in diversity within collaborations is a loss in equity. Another similar initiative might increase the credit granted to collaborations between scientists in different groups. That is, instead of increasing links, increase the size of the credit pie that collaborators share. This could be achieved by making it more likely that diverse groups win grants or by giving larger grant amounts to projects with diverse investigators. (Scientists with such grants can publish more, generating more credit.) It is less immediately clear what the effects of this sort of initiative might be, as researchers still must choose whether to collaborate with in- or out-group members, only now with an added incentive to out-group collaboration. Therefore, we provide a model in the next section to evaluate the possible consequences of this type of initiative. Modeling Increased Credit As in Rubin and O’Connor (2018), our agents can learn to update both their bargaining strategies (collaborative behavior) and their network structure (whom they collaborate with). We start with an empty network, with each agent’s bargaining strategy randomly determined. In each round, there is some small probability that each agent will take an action. If an agent takes an action, there is a chance they will update their set of collaborators and a chance they will update their bargaining strategy (agents do not update both at once). Each agent’s bargaining strategy consists of two parts: a demand when interacting with an in-group member and a demand when interacting with an out-group member. Agents receive payoffs from each successful collaboration. Their total payoff then is just the sum of all these payoffs. (Agents who either have not formed collaborative links yet or are only part of collaborations in which the parties’ demands exceed the whole will have a total payoff of zero.) Agents update their strategies by using what is called myopic best response: the strategy an agent picks is the one that would have gotten them the best payoff in the



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

last round, given the demands of their collaborators.6 This captures the idea that agents are trying to choose a strategy that is likely to result in them getting the most out of a successful collaboration, while avoiding the poor payoff from a failed collaboration. The evolution of the collaborative network is slightly more complex. We employ a model similar to Watts (2003) in which agents can choose to form or break links with other agents based on their payoffs from bargaining with those other agents. A player can unilaterally sever a link, but both players must consent to a new link being formed. This represents the fact that all the researchers involved in a collaboration must consent to be part of the collaboration. Additionally, agents have a maximum number of links, capturing the fact that there are a limited number of projects academics can work on. As mentioned, we begin with an empty network (there are no links between any nodes). At each time-step, two nodes are chosen at random. One of these is an agent who will update their links, and the other is either a potential or current collaborator of the agent. If it is a potential collaborator, we determine whether both parties will consent to form a new link between them—each will consent if either they do not already have the maximum number of links or they can increase their payoff by breaking a link with another collaborator in order to form this new link. (If an agent chooses to break a link, they break the one that gives them the lowest payoff, chosen randomly in the case of a tie.) If both agents consent, they will form the link; otherwise no links are formed or broken. By contrast, if we have chosen a current collaborator, the agent has an option to break the link and form a link with a new randomly chosen collaborator. Again, both the agent and the new potential collaborator must consent to form the link; otherwise no links are formed or broken. To give a big picture of what this model represents, imagine a community of researchers, some men and some women. They have regular collaborative partners, with whom they divide labor and credit. Sometimes they realize that their collaborative strategies are not working (because, for example, they are too aggressive about credit or maybe doing too much work), and they try something that will be more successful given what their partners are doing. Sometimes they encounter a new collaborative partner who is less demanding and decide to drop an older working partner. Over time, the whole group settles down into stable partnerships and stable demands for work and credit. As mentioned, we are particularly interested in the effects of changing the payoffs for between-group collaborations, with the idea that this might promote diverse collaborations. In the Nash demand game we presented, actors divide a resource of size 10. We vary the size of the

Promoting Diverse Collaborations

3.1. Level of homophily, measured using inbreeding homophily, over changes in the payoffs for between-group collaborations.

“pie” for between-group collaborations by multiplying the total available payoff (credit) by some amount, π. We look at results for π ranging from 0.5 to 2, in intervals of 0.125. That is, we look at cases where between-group collaborations are half as valuable to twice as valuable as within-group collaborations.7 Results

A preliminary result is that varying the payoffs for between-group interactions does not change the relationship between minority group size and majority discrimination discussed in Rubin and O’Connor (2018). Majority group members are still more likely to discriminate when the minority group is smaller, no matter the amount of credit allocated to between-group collaboration.8 We were also interested in how changing the value of between-group collaborations affected the between-group collaboration levels. In order to quantify this sort of effective diversity of the network, we use the following measure of homophily, called inbreeding homophily:

where Hi is the proportion of a group i’s links that are within-group links and wi is the fraction of the population that group comprises (Currarini,



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

3.2. Majority discrimination over changes in the payoffs for between-group collaborations. The solid line is given in terms of the scale on the left-hand y axis, and the dashed line is given in terms of the scale on the right-hand y axis.

Jackson, and Pin 2009). This measure takes into account what level of between-group linking would be expected given the relative sizes of the groups, and then yields a number that is positive when between-group linking is less than expected (i.e., when there is homophily) and negative when it is greater than expected. Figure 3.1 shows that varying π affects homophily in a predictable way. When between-group collaborations are less valuable than within-group collaborations (π < 1), homophily is high. As these between-group collaborations become more valuable, homophily decreases and becomes negative; that is, there are more between-group than within-group collaborations. So the results suggest that special incentives increasing the credit available to members of diverse collaborations could, indeed, improve the diversity of collaborative scientific groups. But this diversity comes at a cost. Rubin and O’Connor (2018) found that even when many members of the majority group have underlying discriminatory strategies, there was not much actual discrimination occurring in their models since minority group members were able to break links with discriminators. One concern is that increasing π will lead to more discrimination against the minority group: minority group members will be incentivized to accept discriminatory collaborations because they are

Promoting Diverse Collaborations

worth more than fair collaborations with in-group members (i.e., π · L > M). Figure 3.2 shows how varying π affects discrimination. We look at the proportion of discrimination, which is the proportion of between-group collaborations that involve a majority member who demands High. In addition, we look at the instances of discrimination, found by simply adding up the total number of collaborations in which a majority member discriminates against a minority group member. Note that this is averaged over every run of the simulation, so it averages over different minority sizes. The same general trend is observed for any size of the minority, but the specific numbers will differ. We can see from figure 3.2 that increasing π only slightly increases the proportion of discrimination.9 This might seem encouraging, but we should not be too optimistic. The totaled instances of discrimination increase precipitously as π increases, as shown by the dashed line in figure 3.2. This is because, as shown in figure 3.1, increasing the size of the pie increases the total number of between-group links. Since the between-group links tend to be discriminatory at a higher rate, this means that individuals are now incentivized to accept these discriminatory interactions. To be completely clear, the majority group is not acting any worse than before (i.e., demanding High more often), but the policy brings minorities in contact with existing discriminatory behavior. If all we were concerned with was the proportion of between-group collaborations that were discriminatory, then this policy might seem beneficial: it increases diversity, and majority group members are about as likely to discriminate as before. Furthermore, the policy as modeled improves outcomes for everybody by increasing between-group credit. However, the minority group is now receiving a greater number of inequitable outcomes as a result of the increase in out-group collaboration. And, on average, there is far more discriminatory behavior across the entire community. Furthermore, as we will explain in the next section, the situation is worse when we more accurately represent the conditions under which a diversity initiative like this would typically be implemented. Nonrandom Starting Points

The models discussed in the last section start with a random distribution of strategies and no network structure. But diversity initiatives tend to be implemented in communities where there is already a lack of diversity and where discriminatory behavior is present. To represent these starting conditions more realistically, we alter the above model in two ways.



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

3.3. Level of majority discrimination over changes in the payoffs for between-group collaborations. The solid line is given in terms of the scale on the left-hand y axis, and the dashed line is given in terms of the scale on the right-hand y axis.

First, instead of beginning with an empty network, we start with a homophilic network. In particular, we make use of multi-type random graphs—networks used to model populations with multiple social identity groups (Golub and Jackson 2012). Each agent has some probability of forming a link with each in-group member, pin, and some probability of forming a link with each member of the other group, pout. When pin > pout, the network is homophilic. Here, we set pin = 0.1 and pout = 0.05. After we form the network, we then ensure that no agent has more than the maximum number of links. For agents that have more links than maximum, we randomly choose links to break until they are at the maximum number.10 This procedure results in networks that, on average, have an inbreeding homophily of 0.18, which roughly matches the average level of homophily when π = 1.11 Second, in order to represent the fact that minority groups often receive less from collaborations in real academic communities, we do not start with a random distribution of strategies. Instead, we alter the probabilities with which between-group strategies are assigned initially so that there is a 45% chance that a majority group member will demand High against a minority group member and, in turn, a 45% chance of a minority group member demanding Low against a majority group member (each of the other strategies are employed with equal probability). Addi-

Promoting Diverse Collaborations

tionally, to capture the fact that the fair outcome is most common within groups, there is initially a 90% chance of demanding Med of in-group members. These choices are arbitrary but capture the sort of case where discrimination and homophily are already occurring in the community. In this altered model, we see interesting changes in how varying π affects discrimination.12 As figure 3.3 shows, increasing π leads to more discrimination, now in terms of both overall instances of discrimination and the proportion of between-group collaborations that are discriminatory. This occurs because as minority members seek to form between-group links, it is more likely these links will be discriminatory for two reasons. First, it is less likely they will find majority group members demanding Med. Second, it is more likely they will be able to successfully collaborate with a majority group member demanding High (because the minority group members are more likely to demand Low between groups). That is, minority members are more likely both to encounter and to accept inequitable demands than when we looked at random starting conditions above. So, as it turns out, more accurately representing the conditions under which we believe these initiatives might be implemented only makes the negative consequences of the policy more prevalent. Note that these results are for only mild increases in the likelihood of majority members demanding High (an increase from 33% under the assumption of random starting points to 45%). More initial potential discrimination would only lead to greater increases in actual discrimination. Implications

We have argued that diversity initiatives that promote between-group collaborations may achieve their goal while unintentionally fostering inequity in the credit awarded to minority group members. So, in the short term at least, epistemic and social goods come apart. This is pertinent in light of a common argumentative technique in discussions about increasing diversity in academic communities: here are X, Y, and Z reasons why increasing diversity will promote various social goods. If these fail to convince, here are the purely epistemic benefits as well. Call these “private sins as public goods” arguments because they convince those only interested in epistemic gains to incidentally promote socially beneficial policies.13 Unfortunately, this argumentative strategy is not available if an epistemically beneficial policy turns out to have unintended negative social consequences. The instinct to give “private sins as public goods” arguments backfires in these situations, insofar as the default expectation is that all initiatives ought to carry explicit epistemic benefits, while social benefits are an afterthought.



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

There are further implications when thinking about the long term, after the initiative ends. If the initial lack of diverse collaborations is due to minority group members breaking links with majority group members in order to protect themselves from discrimination, then when the initiative ends, homophily will likely reappear. In the meantime, the initiative has not only perpetuated but further entrenched inequity. This is because even though both majority and minority group members’ absolute payoffs are increased during the initiative, the difference between the minority payoffs and majority payoffs also increases.14 So the initiative only temporarily fosters diversity, while the majority group accumulates more power, prestige, and so on, by collaborating with the minority and receiving an undue amount of credit for those collaborations. It is important to note that our aim is not to argue against the implementation of these diversity initiatives, and we do not take our results to yield any specific policy recommendations. Rather, we think these results caution against naive instincts that any policy will be helpful. Some initiatives may need to be complemented by other policies to achieve their goals without further exacerbating the situation they are intended to amend. For instance, the diversity initiatives discussed in this section might helpfully be complemented with improved standards for awarding credit to ensure equitable collaborations. A Contagion of Disrespect Earlier, we argued that diverse collaborations can carry epistemic benefits. However, as the previous section showed, there may be circumstances in which policies to promote epistemic goods and policies to promote social goods in scientific practice come apart. One may worry that the epistemic benefits of diverse collaborations are insufficient to merit policies that entrench social inequity. Unfortunately, there may be further reason to think that homophily, by itself, will lead to a different sort of inequity, which, perhaps, diverse collaborative links might solve. That is, although we have argued that some initiatives to promote effective diversity may carry negative social consequences, there are reasons to think that we need to do something, because persistent lack of effective diversity can create new social and epistemic harms. Homophilic scientific communities can come to champion different scientific subdisciplines, or “niches.”15 The cases we are worried about are those where niches championed by minority groups become low prestige by virtue of their associations with marginalized researchers while, correspondingly, niches championed by the majority group enjoy elevations of prestige. That is, lack of effective diversity can create the conditions for particular domains of study to become ghettoized by

Promoting Diverse Collaborations

virtue of their associations with particular marginalized social groups. They suffer a contagion of disrespect, where new results in these niches are increasingly dismissed as unimportant to the production of scientific knowledge.16 In the modeling framework discussed, this sort of process will lead to a state where the total size of the pie for one in-group gets smaller. As evidence for this contagion of disrespect, we provide two examples from the history of science, in which niches that came to be associated with women declined in prestige as a result. We end by considering the effects of such a contagion on individuals’ choices to suffer inequitable collaborations across groups. Child Study

Beginning in the 1870s and 1880s, various “men of science” interested in the emerging field of naturalized psychology began to focus on infant cognitive development (Lorch and Hellal 2010). These men began publishing detailed accounts of their children’s psychological developments and encouraging others to do the same (von Oertzen 2013, 176). As von Oertzen notes, however, “scientific empiricism of this kind . . . presented unforeseen obstacles. The intimate space of the nursery, widely regarded by contemporaries as a quintessentially female domain, restricted fathers’ and other men’s access to human offspring” (176). This created an opportunity for scientifically minded women, such as Milicent Shinn, to boldly go where men were culturally barred. In addition to compiling extensive notes on the early development of her niece that became a widely circulated book in the new field of child study in the 1890s, she also trained and established a network of college-educated mothers and aunts to behave as citizen-scientists, who provided valuable findings. Experimental psychologists and social scientists, nearly all men, were quick to categorically dismiss the field of child study for its reliance on women’s observations. Von Oertzen (2013), quoting from one paper by Shinn, writes, One prominent critic, American psychologist James Mark Baldwin, asserted that “only the psychologist can ‘observe’ the child, and he must be so saturated with his information and his theories that the conduct of the child becomes instinct with meaning for mind and body. This is just the difference between the mother and the psychologist—she has not theories: he has. She may bring up a family of a dozen and not be able to make a single trustworthy observation; he may be able from one sound of one yearling to confirm theories of the neurologist and educator which are momentous for the future training and welfare of the child.” (186)



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

Baldwin’s criticism will strike contemporary readers as backward. Nonetheless, such views were widely held in the scientific community of the period. Despite the women in the field having amassed a wealth of data and analysis on child development, the credibility of the field of child study dwindled over the next twenty years (von Oertzen 2013, 190). Eventually, the field became overshadowed by experimental psychology, a discipline whose methodology favored men who controlled access to labs suitable for experiments. Here we see a clear case where the fact that the practitioners of the field were women led directly to a devaluation of the work done. Home Economics

At the time of her death in 1911, Ellen Swallow Richards was the head of the Department of Social Economics at MIT and president of the Home Economics Association. Richards, the “‘engineer’ of the modern home economics movement . . . saw domestic science as a way to move women trained in science into employment in academics and industry” (Stage 1997, 5). Home economics achieved this goal, as it “tied the kitchen to the chemical laboratory, emphasizing nutrition and sanitation” (Stage 1997, 5). However, over the next several decades, home economics transformed from an academic discipline to an “intentional conspiracy to keep women in the kitchen” (Silva 1998, 570).17 What is responsible for the field’s fall from grace? There are good reasons to think that home economics suffered a serious decline in prestige precisely because its practitioners were predominantly women. As Rossiter (1997, 96) explains, in the 1950s and 1960s, “[College] administrators, all men then, held skeptical and hostile attitudes toward home economics, even as they expressed unabashed ignorance about the field. . . . To them such female domination constituted proof that the field was out of date.” While home economics was “one of the primary areas in which educated women found professional employment in academia and business from the 1900s to the 1960s” (Stage 1997, 4), and enrollment of students in the field modestly grew throughout the first half of the 1900s, the number of faculty working in the field and funding for research declined at increasing rates (Rossiter 1997, 98–99). In the context of this decline, centers of research in the discipline during the 1960s and 1970s began to rebrand the subject as “human ecology” (Stage 1997, 6). That is, a new field was created whose methods and domains of inquiry happened to coincide exactly with the older field because its reputation had so deteriorated.

Promoting Diverse Collaborations

Choosing Inequity

Our models would predict that this sort of contagion of disrespect would incentivize members of minority groups to collaborate with the majority group, despite receiving inequitable amounts of credit.18 If niches lose prestige, the total payoff awarded to pairs of researchers working in these niches (the size of the pie) shrinks. At a certain point, the payoff for equitable collaborations in low-prestige niches will be less than the payoff for minority individuals accepting inequitable bargaining outcomes when collaborating with majority members in high-prestige niches. At this point, minority researchers should come to prefer the unfair collaboration with a majority group member over the fair collaboration with a minority group member. This is similar to what was described above when the size of the pie was increased for between-group (diverse) collaborations. However, in that case diverse collaborations had higher payoffs than either the minority or majority group collaborations. In this case, it is only the minority collaborations that have lower payoffs. The history of science provides countless cases of women who succeeded in making contributions to a variety of high-prestige fields of science but who were not awarded a fair share of the credit. In other words, they elected to channel their efforts into highly inequitable collaborations, rather than, for instance, into equitable collaborations in low-prestige niches associated with their gender. In light of what has been said here, one explanation of this choice is that they elected to suffer inequitable bargaining norms so as to enjoy access to the sort of high-prestige research happening in the niche dominated by members of the majority group. The suggestion here is that while initiatives aimed at promoting diverse collaborations may lead to inequities, other social processes associated with homophily can do likewise. This should complicate our thinking about the potential benefits and pitfalls of initiatives to promote diversity. Parting Remarks We have seen how a variety of good-faith policy proposals to improve effective diversity in the name of scientific progress might succeed but might also risk further entrenching social inequity in scientific communities. Our aim is not to conclude with explicit policy recommendations on the basis of these simplified models; rather, it is to have identified risks for various sorts of diversity initiatives to carry unintended negative consequences for the scientific communities involved. In particular, our investigations have suggested how policies to promote epistemic



Mike D. Schneider, Hannah Rubin, and Cailin O’Connor

goods and policies to promote social goods in scientific communities can easily come apart. On the other hand, we have also suggested how a lack of any sort of policy at all may itself carry negative consequences, both epistemic and social, in the long term. Obviously the agent-based models we develop are highly simplified. They fail to capture many relevant aspects of the real-world systems they represent. To give a few examples, we assume that collaborations are always dyadic, whereas in real scientific communities collaborations can contain many members, or else researchers can work alone. We assume that there are no costs to breaking old collaborative ties and forming new ones. And we assume that in choosing collaborative partners, academics ignore factors related to personality, proximity, and identity, instead focusing only on the expected credit benefits of different pairings. For this reason, one should not take these models to directly represent the expected dynamics in academic groups. This said, our models can play several useful roles in reasoning about collaboration and homophily. First, they are useful in illuminating a possible way that diversity initiatives may go wrong—namely in promoting contact, and thus discrimination, between identity groups. This should direct further attention toward this possibility from those aimed at intervening in academia. Second, these models provide a helpful framework for thinking about and classifying different ways of intervening. For instance, creating “new links” (i.e., time for further collaboration) is not the same as “increasing the pie,” though both may promote diverse collaborations. Third, they allow us to study costly, difficult, potentially harmful interventions in a low-risk, low-cost way. The results are not definitive themselves but can direct further research into potential ways that collaborations in academia might be diversified.

Chapter 4

Using Phylomemies to Investigate the Dynamics of Science David Chavalarias, Philippe Huneman, and Thibault Racovski

Making sense of the way science evolves is a basic concern for philosophy of science. This concern involves two kinds of questions. The first one is about the patterns of the evolution of science: Is it gradual? Is it continuous? Discontinuous? How is the growth of knowledge likely to be measured and assessed? The second one involves the processes causing those patterns: What are the basic features of science likely to yield such patterns? For instance, if science proceeds by the accumulation of observations, supporting in turn inductive inferences, the growth of knowledge will be cumulative, with possibly some acceleration due to novel experimental or technical access to new data. If the evolution of scientific ideas is rather mostly due to the refutation of previous hypotheses, as in Popper’s fallibilistic account (Popper 1959), then science should rather appear as a discontinuous process. Addressing the former kind of question requires the examination of long-term historical data. The latter kind of question is addressed through the study of the ways science is done, by focusing either on historical case studies or on contemporary science. In each case, one examines the way theories are formed, revised, or changed, hypotheses are tested, and models are elaborated and validated. And in each case, the meaning of all those terms, both for the scientists themselves and for the philosopher who focuses on them, has to be questioned. Thus, there are two sets of answers that can be given to the question of the evolution of science, one on the historical patterns and the other on the processes, although the two issues are mutually dependent 73


David Chavalarias, Philippe Huneman, and Thibault Racovski

and cannot be decoupled. However, this dependence has traditionally tended to be unidirectional. In most cases, philosophers draw their account of the patterns of scientific progress—its continuous character, discontinuous character, rate of increase, etc.—from a conception of how science works. For instance, Kuhn’s idea of the discontinuity of scientific progress is directly related to his account of scientific activity, focused on the two notions of “normal science” (with its usual “riddles”) and “scientific revolution” (Kuhn 1957, 1962). These notions are derived from his examination of the actual practice of science and of the way it is taught. This shows an interesting contrast with what often happens in some empirical sciences, where approaches pertaining to each way of reasoning coexist.1 In this article, we argue that the time has come for philosophy of science to develop a two-way reasoning through a quali-quantitative methodology that takes advantage of the recent evolution in the media that record science as a body of knowledge.2 Throughout history, this body of knowledge has been distributed over several media: people’s minds, artifacts, and textual materials, to mention only the main ones. But over the past few decades, text documents have migrated to machine-readable media that harness the power of methods derived from text mining and complex system modeling. This paves the way to the study of the evolution of science through the evolution of its textual digitized traces. The fact that textual records are both the most commonly used and the most enduring medium for recording and transmitting knowledge legitimizes this approach. In the first section of this chapter, we will introduce the notions of phenomenological and theoretical reconstruction and indicate how this distinction may help to articulate a two-way reasoning for history and philosophy of science. Then we consider the classical discussion about the relation between philosophical claims about science and historical case studies, and we indicate some of the shortcomings of any attempt to use historical data as evidence for philosophical claims. Another section argues that phenomenological reconstruction, based on text mining, could in principle provide ways to understand the evidential/justificatory role historical sequences may play for conceptions of scientific dynamics and to avoid some of the issues (biases, hidden assumptions) that plague usual philosophical theorizing about the process of science. Finally, we introduce the “phylomemetic” approach as an example of such reconstruction, the “phylomemy” being the graph of filiations between words used in scientific papers or their metadata, in a way parallel to a phylogeny in evolutionary biology. We present its principle, provide some examples of the phylomemetic patterns that can be detected, and

Using Phylomemies to Investigate the Dynamics of Science

4.1. L-systems applied to plant morphogenesis. A few parameters can generate a wide variety of forms that match existing plant morphologies. Source: public domain,

then argue that major theses about the dynamics of science could be tested by unraveling “signatures” of the hypothesized processes detectable within phylomemies. Articulation between Data-Driven and Theory-Driven Research The articulation between observation of patterns and understanding of processes can be conceptualized in the context of theoretical thinking about complex systems, where researchers deal with large amounts of data and huge interaction networks. The articulation between the two ways of reasoning is dealt with through the distinction between phenomenological reconstruction and theoretical reconstruction (Bourgine et al. 2009). Theoretical reconstruction consists in proposing models and processes (formal or computational systems) that synthesize phenomenal diversity and whose generative properties make it possible to explain past phenomena, predict future events, or reproduce observed structures. This is the case, for example, of the L-systems, a formal grammar invented by the Hungarian biologist Lindenmayer (1968), whose aim is to model the process of plant development or bacteria proliferation. Once the parameters of an L-system that most closely reproduces the patterns of a given plant species have been inferred, it is possible to sim-



David Chavalarias, Philippe Huneman, and Thibault Racovski

ulate the growth of individuals of that species and analyze inter-individual heterogeneities by introducing variation in some of the parameters (figure 4.1). L-systems are widely used, for example, in animated films to simulate realistic landscapes. In hard sciences terminology, theoretical reconstruction generally corresponds to formal and computational modeling. On the other hand, when the challenge is to understand and model complex systems, unaided intuition cannot handle their intrinsic subtleties and nonintuitive properties. To observe and further understand through modeling a complex object O∈𝓞, we thus first select the properties to be observed and measured, then reconstruct from the data collected those properties and their relations as a formal object R∈𝓡 described in a high-dimensional space. Then some dimension reduction is applied to R to get a human-readable representation in a space 𝓥 (cf. figure 4.2). The chain 𝓞↦𝓡↦𝓥 defines what is called phenomenological reconstruction. The quality of a phenomenological reconstruction is measured by its ability to propose, from the raw data, representations in 𝓥 that make sense to us and provide affordances for modeling and conceptual understanding. Ideally, phenomenological reconstruction may provide us with candidate concepts and relations, which, when integrated into a theoretical reconstruction, can then serve as a basis for human experimental work (Bourgine et al. 2009). Phenomenological reconstruction is rarely thought of as a prerequisite for theoretical reconstruction. In fact, the name “data,” on which the models are validated, suggests entities that are directly accessible to experience and that do not need special treatment to be integrated into a conceptual model. All you have to do is observe or measure. This is, however, rarely the case (Leonelli 2016), especially when the object of study is a complex system. For example, a dump of the millions of papers3 science produces annually would be a very cumbersome dataset for those who do not know how to apply a phenomenological reconstruction to prestructure it. Phenomenological reconstruction provides both inspiration for new theoretical reconstructions and tests for existing competing theoretical reconstructions. Theoretical reconstruction provides some indication of the objects to be examined in phenomenological reconstruction and predicts some patterns that might be found in the latter. For the study of science, the generic output in 𝓥 of a phenomenological reconstruction is a structure described with some spatiotemporal resolution (e.g., the evolution and ramifications of major research fields in quantum computing over thirty years; see below). Although hypotheses about the underlying processes that guide the evolution of

Using Phylomemies to Investigate the Dynamics of Science

4.2. Articulation between phenomenological reconstruction and theoretical


science could help to find the relevant phenomenological reconstruction method, a phenomenological reconstruction does not make such hypotheses. Its relevance is measured by its heuristic power in theoretical reconstruction activity and the increased capacity it gives us to process and interact with massive datasets (how to store them, how to browse them, how to retrieve them). This chapter focuses on a type of phenomenological reconstruction that addresses the question of understanding the evolution of ideas and scientific fields from the point of view of the evolution of the academic vocabulary. This approach is complementary to methods that rely on other types of data, such as citation data or coauthorship. It has some specific advantages, discussed below. Philosophy of Science, History of Science, and the Testing of Theories of Scientific Evolution Problems with the Production of General Philosophical Theories

As mentioned earlier, in most cases, philosophers start from theoretical reconstruction; that is, they start from the study of the processes of science to infer patterns of scientific evolution. This method is faced with a problem regarding generalization for different reasons. First, all



David Chavalarias, Philippe Huneman, and Thibault Racovski

scientific disciplines do not function identically. For example, there are different styles of scientific thinking, different epistemic norms, and different sociological organizations depending on scientific disciplines or research communities. Second, the range of practices defined as science is historically changing; for example, alchemy and music would at some point have been considered sciences, though they no longer have this status. Third, there are different scales of analysis of science depending on the entities and activities that are the main objects of attention. One can focus on entities such as statements (e.g., the law of perfect gases), hypotheses (e.g., the “hypothesis of natural selection”; see Gayon 1998), theories (e.g., the theory of quantum electrodynamics or of sexual selection), models (e.g., the Wright-Fisher model in population genetics), experiments, texts (Newton’s Principia Mathematica), observations, or data points. The practice of science can involve activities as diverse as writing, computing, observing, experimenting, teaching, lecturing, reading, drawing graphs and diagrams, and programming. Discriminating between what belongs to the very nature of science and what is sociologically contingent is not straightforward.4 This plurality of types of scientific practices may partly explain the existence of competing views on the dynamics of science. Different views on the latter may be explained by generalizations grounded on different types of scientific practices or levels of analysis. For example, Karl Popper considered mostly hypotheses, the way they are tested, and the consequences this has upon the growth of science. David Hull (1988) considered mostly theories as units of a Darwinian process he thought was taking place in science and intended to draw analogies with Darwinian phylogenies—the equivalent of natural selection being the set of interactions among scientists and between scientists and the world, leading to differential representation of various theories among scientists. Thomas Kuhn focused on theories but forged a new concept to define the relevant level of analysis, indissociably conceptual and philosophical, namely the “paradigm” (Kuhn 1962; Kuhn and Epstein 1979). In any case, the differences between the levels of analysis imply that different patterns of scientific progress are allowed to coexist, since they will be constructed by considering these different levels. It is, for example, logically possible that the evolution of theories is discontinuous, whereas the evolution of hypotheses or concepts or models is more continuous. Previous Attempts at Two-Way Reasoning and Their Shortcomings

What we expressed here in terms of patterns and processes of scientific change—or of phenomenological and theoretical reconstructions—can

Using Phylomemies to Investigate the Dynamics of Science

be related to a tradition lamenting the lack of testing of philosophical theories of scientific change. This tradition has generally formulated the problem in terms of a relation between philosophy of science and history of science, where the former is conceived as the source of theories on scientific processes and the latter as a source of empirical data or patterns that can be used to test and refine these theories. This can be conceived as a form of two-way reasoning related to the one we are promoting. However, this approach suffers from problems and biases that the phylomemetic approach permits one to avoid or mitigate. The first attempt at a rigorous historical testing of philosophical theories of scientific change was developed in the 1980s by Larry and Rachel Laudan and Arthur Donovan with the “Scientific Change” project at Virginia Polytechnic Institute (hereafter the VPI project; Laudan et al. 1986). Laudan and colleagues envisioned the process of testing as a relation between philosophical theories and historical case studies (Laudan et al. 1986; Donovan and Laudan 2012). They construed philosophy of science as an empirical science of science rather than an enterprise relying on a priori standards (see Hull 1992; Scholl 2018). Three principles guided the VPI project: (1) testing the empirical support of theories should be done comparatively rather than in isolation; (2) the different theories should not be tested as wholes but broken down into claims or theses, some of which should be extracted and tested; (3) just like in natural sciences, hypothetico-deductive methods are more fruitful than inductive methods—that is, one should test claims rather than start with the historical record in order to formulate new theoretical claims. Even though the VPI project was short-lived, it elicited several reactions (see Scholl 2018 for a recent review of the debate). However, there are several problems with this approach to the relation between philosophy of science and history of science: (1) History of science might not provide the right kind of data to test philosophical theories because historians are interested in the particular. This is especially relevant in recent decades when the interests of historians have moved away from methodological and foundational aspects of science, toward questions of practice, social context, material culture, and so on. This worry was formulated by Laudan himself (1989) and others (e.g., Pinnick and Gale 2000), who advocated for a properly philosophical method of developing historical case studies. However, it can be said that historians are condemned to the particular, since they can only work on limited samples of scientific materials. This is even more important for recent science because, since the beginning of the twentieth centu-



David Chavalarias, Philippe Huneman, and Thibault Racovski

ry, reading (let alone mastering) the whole scientific literature in any field of study has become an impossible task for any historian. Thus, the choice of samples raises a major issue for any attempt to test general claims about the evolution of science. Distinct choices of documents constitute different sets of evidence and may support opposite claims about patterns. For instance, it may be that Kuhnian discontinuities established by looking at major innovative texts disappear when one takes into account many more documents. The lack of a robust phenomenological reconstruction here raises an issue for evaluating theoretic models of processes. (2) Another concern has been formulated as the “dilemma of case studies” (Pitt 2001; see also Faust and Meehl 1992). If one works according to a “top-down” approach, following the hypothetico-deductive method as the VPI project did, there is a risk of selection bias in the choice of case studies; that is, there is a risk of cherry-picking the case studies that confirm theoretical claims rather than those that do not. Conversely, if one works “bottom-up,” starting from case studies and trying to infer general claims from them, one falls prey to a form of the problem of induction: How many case studies are sufficient to justify a general claim? (3) A third concern regards the “theory-ladenness” of case studies. Even if methodological solutions to selection bias are implemented, there is a deeper sense in which case studies are influenced by theory. The historical record is not a repository of data readily informed for the testing of theories. To investigate historical case studies and make sense of the historical material, researchers must rely on conceptions of scientific activities. The theories that they explicitly or implicitly endorse influence the constructions of their historical narratives. This compromises the status of case studies as empirical evidence (see Hull 1992; Nickles 1995; Richards 1992). One example of theory-ladenness is the bias toward papers or books written by “important scientists,” namely, scientists like Newton or Darwin, whose work has undoubtedly provided major changes in our understanding of the natural and social world. Thus, concern for the testing of philosophical theories of scientific change has been repeatedly expressed over the last few decades, and debates regarding the appropriate methods for such testing are still ongoing. However, this literature concentrates on historical case studies as the main empirical resource for the testing. The potential contribution of the phenomenological reconstructions provided by text-mining analyses for the formulation and testing of philosophical theories has

Using Phylomemies to Investigate the Dynamics of Science

not been taken into account in this debate. It is not the purpose of this chapter to adjudicate whether historical case studies can ultimately be a valuable source of data for philosophical theories of scientific change.5 Rather, we argue that phenomenological reconstructions produced by phylomemies provide another form of data that solves, avoids, or mitigates several of the problems attached to historical case studies and that gives insights into the evolution of science that historical case studies cannot provide. Interest of Using Phenomenological Reconstruction to Study Science Evolution We claim that a technical and methodological shift—due to computational tools, the existence of accessible databases of scientific journals (PubMed, arXiv, Web of Science, etc.), and the development of text-mining methods and complex network analysis—now makes possible a phenomenological reconstruction of scientific evolution in the same way that postgenomic techniques made possible a science of genomic networks and architectures (Richardson and Stevens 2015). Thus, here enter data-mining and automated methods considering very large databases of documents. Because they do not focus on a small sample of “significant” papers, they do not assume any arguments about metrics, significance, and so on. The obvious difference, however, is that they consider papers in a nonsemantic way: they do not read the propositions, theorems, and statements; they do not understand models. In principle, they do not suffer all the biases we indicated concerning the establishment of patterns in scientific evolution. Whether science progresses continuously or discontinuously, whether it’s a Darwinian process or not, one can arguably think that those computational methods may offer good tests for the hypotheses philosophers of science elaborated regarding the history of science. In contrast with most of the arguments from philosophers of science, which consider together the process of science and the patterns of scientific evolution and often infer the latter (as detected on small samples) from the former, data mining applied to large journal databases makes it possible to start from detected patterns in large literature and then infer possible processes. Thus, pattern detection is the first task, and on this basis one can directly test claims about the general shape of scientific evolution. In a second step, an automated phenomenological reconstruction would allow one to couple patterns and processes, in a way similar to some analyses in evolutionary biology. Namely, patterns detected may be seen as signatures of some processes. The concept of signature is per-



David Chavalarias, Philippe Huneman, and Thibault Racovski

vasive in ecology and evolutionary biology. For instance, molecular evolutionists scrutinize populations’ genomes to detect patterns of variation that will signal the effect of natural selection or, inversely, the effect of stochastic processes (“random genetic drift”) acting on gene pools.6 The signature of a process is the pattern that very probably is there because of such a process; even if the process cannot be detected, the pattern can reliably indicate its existence. A proper phenomenological reconstruction can therefore allow one to detect a signature in the data, provided that signatures of specific processes have been characterized, in the same way evolutionists can characterize the signatures of selection, drift, or hitchhiking (a process through which genes that confer no selective advantage can still go to fixation in a population because they are associated with positively selected genes). A quantitative data-mining approach therefore would optimally provide us with two things: (1) ways to test philosophical claims about science evolution while mitigating the sample biases and grain issues mentioned above and (2) characterizations of “signatures” of various kinds of processes that could allow one to assess theoretical reconstructions about scientific processes by inferring processes from these typical detected signatures present in the phenomenological reconstruction. Before turning to the specific data-mining tool we will use, namely phylomemy reconstructions, we will talk about the general approach it instantiates and then distance it from classical scientometrics, which cannot answer the question about patterns that interest philosophers. Analyzing the Evolution of Science through Text Mining Principles of Scientometrics Analysis

Scientometrics, the quantitative analysis of science, as a field can be traced back to the 1950s (Garfield 1955) and has its first applications in the fields of communication and information science and management science. It was initially focused on the question of information retrieval and the identification of articles relevant to a given search engine query by means of citation analysis; it did not intend to reach general conclusions on the structure of science and its evolution. Until now, most scientometrics studies have been turned toward the evaluation and management of science with a focus on measures of scientific activity: the quantification of the quality and impact of scholars, based on citations, patents, funding, and so forth (Abbasi, Altmann, and Hossain 2011; Sinatra et al. 2016); the measure of the prestige of journals (West, Bergstrom, and Bergstrom 2010); the description of the scientific landscape (Börner, Chen, and Boyack 2003; Kawamura et al.

Using Phylomemies to Investigate the Dynamics of Science

2017); and the detection of hot or emergent topics and predictions of research landscape developments. A second branch of the quantitative analysis of science has developed in sociology around the analysis of the co-occurrence of terms in documents, or co-word analysis (Callon et al. 1983; Zitt 1991), detailed hereafter. This approach, which aims to quantify the way in which the elements of a discipline-specific vocabulary are related in its production, was explicitly oriented toward the analysis of the dynamics of science from a sociological perspective, for which co-citation analysis was considered insufficient and limited. In addition to some well-known weaknesses of co-citation analysis (e.g., low sensitivity or recall in the identification of recent domains, fragmentation of some identified communities; cf. Braam, Moed, and van Raan 1991a, 1991b; Callon et al. 1983), Callon et al. (1983) argued that citation as a social practice is not well defined and can cover meanings as diverse as allegiance, recognition, and reciprocity. This makes it difficult to interpret the cognitive structures that this methodology highlights. Moreover, as these same authors have pointed out, the practice of citation is absent in some knowledge-production contexts. Citation analysis alone is therefore not relevant for a generic study of the contexts of scientific production, which Latour and Woolgar (1986) have proposed to group under the concept of literary inscription, and which includes not just scientific articles but also reports, projects, and patents. Over the last decade, fostered by the availability of very large academic archives, scientific dynamics has received increasing attention (Börner 2010; Chen et al. 2009; Zeng et al. 2017). Some work focused on citation network reconstruction (Börner, Maru, and Goldstone 2004); another relied on top-down categorization of scientific fields from editors to compare their semantic diversity and evolution (Dias et al. 2018). Most of these projects try to evaluate scientific articles rather than scientific production as a body of knowledge: which are the key papers of a field, how can papers be clustered, which are the potential seminal papers, and so forth. Following mainstream scientometrics, the main goal is either to evaluate the scientific production and its producers or to improve scientific information retrieval systems. Very few papers are focused on tracking conceptual innovations within fields, the extinction or emergence of concepts, or the conceptual bifurcations and merging, as is required by any attempt to capture patterns of scientific evolution. That is why we will turn to an approach that is not in the pure tradition of scientometrics, namely phylomemy reconstruction, as a method for the phenomenological reconstruction of science dynamics.



David Chavalarias, Philippe Huneman, and Thibault Racovski

4.3. Number of records in the Web of Science database for the query “quantum computing.”

In what follows, we will focus on methods based on co-word analysis to illustrate how far we can go with the mere analysis of textual content. Undoubtedly, this type of analysis would then benefit from being combined with other quantitative approaches such as citation analysis or coauthor analysis, and it has already been demonstrated that this combination is valuable in the perspective of bibliometric evaluation (Noyons, Moed, and Luwel 1999) or information retrieval (Braam, Moed, and van Raan 1991a, 1991b). Comparisons and combinations with other approaches based on words statistics, like latent Dirichlet allocation (Blei, Ng, and Jordan 2003) or word embedding (Pennington, Socher, and Manning 2014; Levy and Goldberg 2014), might also reveal themselves to be relevant. We will consider for our examples the Web of Science (WoS) metadata database as a proxy for the digitized production of science. We assume that there is a consensus on the fact that peer-reviewed papers referenced by the WoS are part of the body of knowledge that science produces. We should, however, acknowledge that this database is only partially covering science productions and that the access to analysis of only titles and abstracts of scientific papers can lead to only a partial phenomenological reconstruction compared to what can be done with the full text. As we will show, phenomenological reconstruction on partial data nevertheless gives a fairly accurate picture of science dynamics as a whole.7 From Words to the Macro-Structures of Science

The most basic scale of vocabulary analysis is probably the mere quantification of the popularity of terms in the literature, that is, the evolution of the volume of publications that mention these terms. Figure 4.3 de-

Using Phylomemies to Investigate the Dynamics of Science

picts, for example, the evolution of the number of records mentioning one of the following keywords in the Web of Science database (33,500 records in total until 2018): quantum computer, quantum computing, quantum processing, quantum algorithm, or quantum communication. Apart from the fact that the field of quantum computing seems to have emerged around 1994 and that it has experienced some acceleration (in 2002, 2008, and 2016) and evolutionary plateaus, little can be said about this curve. In addition, the term “quantum computing” itself may have referred to different concepts over time, a phenomenon that cannot be revealed by this plot. In order to get the higher-order organization of the science of quantum computing that we will call the meso-scale, we need to analyze how this term has been linked by scholars to other terms like “entanglement,” “optical lattices,” “quantum matter,” and the like. Callon, Courtial, and Laville (1991) pioneered this approach with what they called co-word analysis. Its methodological foundation is the idea that “the co-occurrence of key words describes the contents of the documents in a file” with the emphasis on the fact that “a simple counting of co-occurrences is not a good method for evaluating the links between co-words” (Callon, Courtial, and Laville 1991, 160–61), and more sophisticated metrics should be used in order to reveal the relational structure between terms. Once given a metrics, the scientific objects that we deal with are word networks: for a given set of documents, we have on one side a list of text strings or n-grams (nodes) and on the other side their relations expressed as association strengths in the chosen metrics (links). We can then apply graph and network theories to identify relevant structures within these word networks. These structures will be subnetworks with particular properties; the choice of the properties to be searched for, and therefore of the associated network clustering algorithms, is another component of the methodological approach to be discussed. The end result, however, is the identification of groups of terms and their relational structure that, in the corpora considered, define some scientific issues, like the set {qubit manipulation, quantum information science, quantum thermodynamics, superconducting nanocircuits, microtraps, trap array, quantum gate}. The cognitive hypothesis in co-word analysis is that the meaning of a word (text string) is revealed by its patterns of interactions with other words (for example, the meaning of “match” is perceived completely differently when it co-occurs with “gas cooker” and “kitchen” than when it co-occurs with “soccer” and “goal”). These patterns of interactions provide a context for the interpretation of a word, and a word can have different meanings if it takes part in distinct interaction patterns. It



David Chavalarias, Philippe Huneman, and Thibault Racovski

should be noted that this approach doesn’t exhaust what can be said about the meaning of words in a text. More sophisticated methodologies like pragmatic sociology (Chateauraynaud 2003), argumentative analysis (Palau and Moens 2009), or sentiment analysis (Liu 2015; Pang and Lee 2008) can be complementary to the co-word approach. A network of relations between a set of words, which defines the interpretative context of each word in the set, is a building block of the meso-level of a phenomenological reconstruction based on vocabulary. Its size can range from describing a very specific problem to a long list of related issues depending on how generic or specific is the desired point of view (which will depend on the choices of metrics and clustering algorithms). By adjusting the resolution of the clustering parameters, the investigator can identify the structuration of knowledge domains into substructures. What we have described so far holds for any set of documents. When documents are time-stamped, as is the case for scientific production, the phenomenological reconstruction of the meso-level elements can be performed over different periods of time to obtain a temporal series of clustering, the clusters being word networks representing scientific fields. Their composition and hierarchical organization are likely to evolve, and Chavalarias and Cointet (2013a) proposed a method called phylomemy reconstruction to analyze such evolution. Chavalarias, Lobbé, and Delanoë (2021) have formalized this approach as a phenomenological reconstruction and introduced the distinction between the notions of level and scale. Phylomemy reconstruction is the operation that reconstructs the conceptual kinship between scientific fields. It is a phenomenological reconstruction of science semantic networks that highlights how mesolevel entities evolve through time at different levels of observation. The resulting multilevel organization, mathematically defined as a foliation on a temporal series of clustering, is a complex object ϕ∈𝓡 described in very a high-dimensional space. To get a human-readable representation φ in a space 𝓥, called a phylomemetic network, some dimension reduction is applied to ϕ that is parameterized by the desired level of observation. Each phylomemetic network φ depicts the lineages of scientific fields at a given level of observation, arranged within phylomemetic branches (see figure 4.4). Each branch describes how fields of a given domain have strengthened, merged, split, or weakened over time. This notion of level of observation is one of the originalities of the method of phylomemy reconstruction. Complex systems display structures at all scales, with a hierarchical organization reflecting the interactions between the entangled processes that sustain them (Chavalarias

Using Phylomemies to Investigate the Dynamics of Science

4.4. Multiscale organization of phylomemetic networks. Left: from the micro-

level of terms occurrences evolution to the evolution of the macro-level scientific fields, passing through the meso-level of local networks of keywords describing some domain of interest. Right: the lineages of scientific fields arranged into phylomemetic branches. Each circle represents an element of the meso-scale (a scientific field described as a network of keywords computed over a given period). Vertical links between circles stand for conceptual kinship detection. Time flows top-down. The positions of circles in this dynamical graph can serve as a categorization of scientific fields: emergent (at the beginning of a branch), declining (at the end of a branch), branching (giving birth to several subfields) or merging (resulting from the recombination of previous fields). The part of the branch highlighted by the caption is the domain of rehabilitation engineering detailed in figure 4.8.

2020). Their description mobilizes the notions of levels and scales, “level” being generally defined “as a domain higher than ‘scale’” and “scale” referring to “the structural organization within a level” (Li et al. 2005, 574). In biology for example, the choice of level of observation determines what the main entities under study (organs, cells, genes, etc.) are, while the choice of a scale determines the smallest resolution adopted to describe these entities. As for science, scientific research domains are sustained by socioeconomic processes that guide the progress of science with more or less of a broad focus (laboratories, research networks, journals, funding agencies, etc.). These research domains can be defined at different levels of observation by core questions and research objects of different granularity that characterize their unity over time. For example, proteomics is part of cell biology, which is itself part of biology. The method of phylomemy reconstruction as developed by Chavalarias, Lobbé, and Delanoë (2021) makes a clear distinction between these notions of level and scale in the phenomenological reconstruction of science evolution. The choice of a level of observation determines the range of intrinsic complexity of the dynamic entities we want to observe,



David Chavalarias, Philippe Huneman, and Thibault Racovski

4.5. Workflow of phylomemy reconstruction from raw data to global patterns.

The output is a set of phylomemetic branches where each node is constituted by a network of terms describing a research field. These nodes are a proxy of scientific fields and can have different statuses: emergent, branching, merging, declining. Source: Chavalarias and Cointet 2013b.

which are called “branches of science,” and the choice of a scale defines the extrinsic complexity of their description. One of the main differences between level and scale is that the concept of level is ontologically linked to the notion of time, since the components of a level derive their unity from some underlying dynamic process, while the notion of scale does not necessarily imply time. To summarize, the workflow of phylomemy reconstruction starts from a large set of publications as the raw data and ends with the production of a multilevel structure characterizing the transformations of large scientific domains that can subsequently be visualized and analyzed at a given spatiotemporal scale (cf. figure 4.5). In between, it applies sophisticated methods of text mining, co-occurrence indexation, transformations of temporal co-occurrence matrices into similarity matrices, and complex network analysis on the resulting graphs (see figure 4.5). We invite the reader to refer to Chavalarias, Lobbé, and Delanoë (2021) for technical details on the concepts and method. The concept of phylomemy (and phylomemetic networks) has been proposed by analogy with the concept of the phylogenetic tree, a formal object that represents some aspects of cultural evolution through the analysis of its digital traces. It relies on the notion of meme, understood as “cultural information that passes along from person to person, yet gradually scales into a shared social phenomenon” (Shifman 2013, 364–65). However, it is important to emphasize where the analogy begins and ends. Phylomemy reconstruction is a concept that belongs to a phenomenological reconstruction approach. Consequently, it does not imply any particular hypothesis on the nature of the phenomena that have generated the reconstructed pattern (here the social learning processes that govern the diffusion dynamics), which should be elaborated and tested in a theoretical reconstruction step. The notion of meme

Using Phylomemies to Investigate the Dynamics of Science

4.6. Sample of phylomemetic branches of the reconstruction of the domain of

future and emerging technologies. This phylomemy has been generated with the software WordsEvolution developed by David Chavalarias.

must therefore be understood in its broad sense, without any particular hypothesis about the ontological nature of memes or about the nature of the media in which they spread; nothing commits phylomemetic reconstructions to memetics, understood as the view of cultural evolution initiated by Dawkins (1976). The only property we retain from the definition of “meme” is that it is something that is transmitted nongenetically, with some variation, from one person to another. A phylomemy of science aims to study how researchers’ practices in the use of terms spread across a population and change over time. From a theoretical perspective, it is important to note that the formal object that represents inheritance patterns in phylomemies is a lattice, contrary to the traditional phylogenetic approach that deals with trees. This avoids any a priori hypothesis on the nature of these diffusion processes. It accommodates the horizontal transfer of terms between distinct clusters of terms, in a way parallel to lateral gene transfer in phylogenetics. Examples of Phylomemy Reconstruction

Before highlighting the advances that this method brings for history and philosophy of science, let us present a concrete example of what it



David Chavalarias, Philippe Huneman, and Thibault Racovski

4.7. Phylomemetic branch of quantum computing from the phylomemy reconstruction of the scope of FET Open projects between 1994 and 2007. Each square in the phylomemy is a set of terms defining a scientific field. As an example, the details of fields 1 and 2 are given. Fields that contain terms appearing for the first time in the phylomemy have a dark-gray header and corresponding terms are written in larger font size. This reconstruction highlights the reconfiguration of the fields around 2000 at a moment when two negative results have challenged mainstream approaches. We can clearly observe two phases of development involving many vocabulary innovations, with the pace of innovation slowing down in between at a moment when major negative results were published. Source: Chavalarias 2016.

leads to. We will be interested here not in the technical details that can be found in Chavalarias, Lobbé, and Delanoë (2021) but rather in the nature of outputs that can be produced with this methodology. In collaboration with the European Commission,8 we had access to about 5,000 authors’ keywords of projects submitted to the Future and Emerging Technologies (FET Open) funding scheme between 2009 and 2010. The co-occurrence matrix of these keywords has been processed from the full WoS database between 1990 and 2010 (about twenty-nine million documents), and then the phylomemy reconstruction workflow (steps 2 to 4 in figure 4.5) was applied (Chavalarias and Cointet 2013b; Chavalarias 2016). The goal was both to identify the science behind FET Open projects and to get insights into how it has unfolded through time (figure 4.6).

Using Phylomemies to Investigate the Dynamics of Science

One of the branches of this phylomemy depicts the domain of quantum computing (figure 4.7). This domain of research can be defined by terms such as “quantum computing,” “quantum computers,” “quantum processing,” or “quantum algorithms.” An example of a subfield from this domain is the relational network formed by the terms “qubit manipulation,” “quantum information science,” “quantum thermodynamics,” “superconducting nanocircuits,” “microtraps,” “trap array,” and “quantum gate.” The domain of quantum computing is particularly interesting from the point of view of phylomemy reconstruction for several reasons: (1) As can be observed in figure 4.7, this field emerged in the 1990s, which means that its production is very well covered by digital archives, whose coverage largely increased in the early 1990s. (2) This field is extremely well defined: it corresponds to an exotic branch of computing based on a new physics with a very specific vocabulary. (3) This is an area where theory and experience must go hand in hand, each of these activities being well represented in the publications; hence the phylomemetic reconstruction has good chances of capturing all relevant aspects of the field’s dynamics. (4) This is an area of high societal and strategic importance. For some tasks, quantum computers are theoretically billions of times faster than conventional computers, which would allow them to “break” all the cryptographic protocols currently used. The first nations that produce a quantum computer will have a major technological advantage over others, with access to all private or confidential data, which is why the U.S. military became involved in this area early on. The quali-quantitative analysis of this branch is very revealing of the different phases of development of this domain of research in its first twenty years.9 The subfields that feature brand-new concepts from the perspective of this structure (i.e., terms that appear for the first time in these subfields) are highlighted in dark gray. The morphology and the pace of innovation displayed by this branch alerts the reader that the domain of quantum computing may have undergone a reconfiguration around the first decade of the twentyfirst century. Indeed, two negative results challenged mainstream approaches precisely at that moment, with Noah Linden and Sandu Popescu (2001) proving that the presence of entanglement is a necessary condition for a large class of quantum protocols, which, coupled with



David Chavalarias, Philippe Huneman, and Thibault Racovski

Braunstein et al.’s (1999) result, called the validity of the main quantum computer approach into question. Both phases of development (1990–1999 and 2001–2007) involve many innovations and a diversity of approaches, reflected by the number of distinct subfields. On the contrary, the pace of innovation (as indicated by the frequency of dark-gray labels) seemed to have slowed down around the time of the negative results with a lower variety of fields. This description of the evolution of this domain is radically different from what can be inferred from the mere volume of publications (figure 4.3), which is regularly increasing over this period. By studying this branch in greater depth, we would be able to determine the details of this reconfiguration, the respective importance of technological and theoretical innovations, the role of the turnover among the academic community, and more. What appears here is a specific profile of interrelation between science and technologies that is proper to this field but that could be compared to what happens in other fields (e.g., bioscience in relation to biotechnologies, statistical physics in relation to modeling and computing techniques). Thus, the philosophical questions regarding the dependence between science and technique—the reason why Gaston Bachelard, highly aware of the dependence between the two domains, coined the word “phénoménotechnique”—can be dealt with on the basis of compared patterns of such dependencies detectable in phylomemies. It is not possible here to present an exhaustive taxonomy of phylomemetic patterns, but in addition to the case of domain reconfiguration, we briefly present two other examples that illustrate the possibility given by this methodology to do a cross-domain comparison of evolutionary dynamics: domain hybridization and abrupt domain emergence. Domain hybridization happens when a research domain incorporates research results and theories from another research domain. This is the case, for example, for the domain of rehabilitation engineering. Initially focused on concepts such as orthosis, amputation, prosthesis, and the like in the early 1990s, this domain has incorporated the work from cognitive sciences on mirror neurons and brain-computer interfaces to evolve toward brain-machine interfaces and neural prostheses for active orthosis. In addition to highlighting the hybridization of these two domains and its timing, this phylomemetic branch also reconstructs the main events of conceptual emergence with a good precision (Chavalarias and Cointet 2013b): the term “neuroprosthesis” becomes part of the phylomemy in 1994, two years after the seminal paper of Kovacs, Storment, and Rosen (1992); the merging of the two branches takes place in 2000, one year after the first workshop on brain-machine interface

Using Phylomemies to Investigate the Dynamics of Science

4.8. The phylomemetic branch of rehabilitation engineering. This branch has progressively evolved to incorporate the work in cognitive science on mirror neurons and brain-computer interfaces. Abrupt domain emergence happens when a sustainable domain including a large amount of subfields suddenly appears after a conceptual or technological innovation. This is the case for example for the research on massively multiplayer online games (figure 4.9) that has been first mentioned in 2002 in Web of Science (two publications) and appears in the phylomemy in 2003 as a sustainable branch, with a lot of new terms pointing to conceptual innovations.

supported by the Eastern Paralyzed Veterans Association; the term “retinal prosthesis” appears in 2000, which coincides with the first clinical trial of permanently implanted retinal prosthesis. Thus, morphological and semantic transformations of this phylomemetic branch synthesize the main transformations of this field over twenty years. Abrupt domain emergence happens when a sustainable domain including a large amount of subfields suddenly appears after a conceptual



David Chavalarias, Philippe Huneman, and Thibault Racovski

4.9. Phylomemetic branch of massively multiplayer online games that emerged

abruptly in 2003.

or technological innovation. This is the case for example for the research on massively multiplayer online games (figure 4.9) that was first mentioned in 2002 in Web of Science (two publications) and appears in the phylomemy in 2003 as a sustainable branch, with a lot of new terms (dark-gray labels) pointing to conceptual innovations. The morphological analysis of phylomemetic branches can be coupled with a statistical analysis of the content of the fields. For example, Chavalarias and Cointet (2013a) have demonstrated that the position of a field in a phylomemy of science (a field can be emergent, steady, declining, etc., cf. figure 4.4) is strongly correlated with cohesion measures on the terms that make up the field. In particular, it has been observed that a non-emergent field with a low cohesion measure is much more likely to disappear in the next period than any other non-emergent field. Thus, a phylomemy conveys important information about the dynamics of scientific domains that should be taken into account in a theoretical reconstruction. We take advantage of this example to insist on the fact that a phylomemy is a formal structure that is a phenomenological reconstruction,

Using Phylomemies to Investigate the Dynamics of Science

and consequently it makes no particular assumption about the nature of the phenomena that generated the reconstructed pattern. However, it makes choices on the data and the values of the parameters used in the reconstruction, and the output result is only a projection of the global complex structure of science in the form of a visualization that can be grasped by the human mind. If we had chosen other parameters, the displayed structure might have been different, illustrating other important aspects of the dynamics of the domains under study; conversely, structures that hold for a large range of parameters will likely be core structures of science dynamics. In addition, the reader must have noticed that even these projections are intrinsically multiscale and that one would have to be able to zoom in and out on the overall structure to fully understand it. For this reason, the implementation of phylomemy reconstruction methods into macroscopes like Gargantext is of utmost importance to unleash their heuristic power (Lobbé, Delanoë, and Chavalarias 2021).10 Discussion The study of phylomemies paves the way for a quali-quantitative study of science evolution, which relies not merely on simple numbers and their trends (number of publications, quotations, h-indices, etc.) but also on elements of the morphology of the branches of science and on the different types of relationships these branches have with each other and the lower levels of term dynamics. “Scale” refers here to a hierarchy in complexity: terms, concepts, hypotheses, models, and theories are distinct scales. Term dynamics, in this method, is the lowest scale. Clusters of terms will match with concepts, and theories will match with clusters of higher order (clusters of clusters). “Topics” are clusters that sit on the same scale as concepts but are defined and labeled within the phylomemy. Since one can consider the diachronic changes in those clusters, and sets of clusters, descriptions of the evolution of science can be given at several scales. Each one needs however to include a discussion of the relation between the scale of terms and the scale of concern, and of the assumptions used to define this relation. Another major descriptive concept regarding phylomemies is “level,” which has also been used throughout this chapter. Levels can be defined in terms of time intrinsic to the process (defined by the rate of change of terms) and in terms of space in the phylomemy (considering a discipline, a set of disciplines, etc.). Phylomemies therefore allow for analyses at various levels and scales, which explains why they can be used to assess philosophical claims about the dynamics of science (those claims having been defended at several levels and scales).



David Chavalarias, Philippe Huneman, and Thibault Racovski

The relationships between phylomemetic branches may be of distinct types such as hybridization, thematic divergence, and conceptual or methodological borrowing. The various processes hypothesized by philosophers to account for scientific dynamics at various levels—Kuhnian, Hullian, Lakatosian, or Popperian processes—can be discovered within the phenomenological reconstruction, to the extent that signatures of those processes can be characterized in the terms of the phylomemies. For instance, a scientific process as hypothesized by Lakatos (1978)—namely, the elaboration of a core set of hardly modifiable views and methods surrounded by a “protective belt” of concepts, models, and methods that will be more likely revised—may correspond to specific patterns where two kinds of scientific contents can be distinguished according to two different rates of change. In turn, this can be detected by determining a ratio of dark-gray versus light-gray labels (see figures 4.4 to 4.9), which measures the introduction of new terms. A philosophical analysis of the relations between terms, concepts, and topics (as defined in the phenomenological reconstruction) should be undertaken here, but it would not change the method of measuring those differential ratios in order to find out the signature of a “protective belt/varying periphery concepts and tools” mechanism proper to Lakatos’s view of scientific dynamics as a dynamics of “research programs” progressively negotiated. In turn, the Kuhnian idea of “scientific revolution” would correspond to signatures where a large discontinuity occurs and spans across several thematic fields. It also implies that a “normal science” period should be detected by its very low rate of change compared to short periods of revolution. This low rate of change will also appear as a relative conservatism of terms, and possibly as a conservatism of topics through branching processes (which indicates that subfields develop but that a same way of using core concepts, methods, references, exemplars, and so on is conserved through the branching). The “abrupt domain emergence” (described above) may indeed be part of the signature of a Kuhnian revolution; however, the rapidity of the sweep has to be measured against its extension (which is not very wide in the case of massively multiplayer online games, for example) in order to talk of a genuine Kuhnian revolution. Here philosophical arguments are required to determine this ratio (intensity of sweep/extension of range), but the method of inferring back from pattern signatures to processes is still adequate. Thus, regarding the very general philosophical question about the process of science sketched earlier in this chapter, the preliminary findings pointed out above already provide some insights.

Using Phylomemies to Investigate the Dynamics of Science

First, patterns appearing in phylomemies parallel known patterns in speciation study: rapid speciation, branching, fission, and so on. Were it confirmed by more extensive analysis of science phylomemies, this might be one argument in favor of a Darwinian viewpoint as Hull (1988) advocated it. Second, not all patterns are signatures. Many phylomemetic patterns by themselves are equivocal. Signatures, as we argue, are particular patterns that point to specific generative processes in a more unambiguous way than just any pattern. Hypothesizing specific signatures and searching for them in phylomemies (and not only detecting patterns) is what allows for deciding between rival philosophical theories of scientific processes at a given level and scale. Considering differential ratios of various gray labels in phylomemies, as we just hypothesized, defines three different signatures that would indicate Kuhnian, Lakatosian, or Popperian patterns of scientific evolution. In biology, tests of selection have been designed since the 1980s by evolutionary biologists to detect the signatures of natural selection and genetic drift. Those tests compare variation due to drift and due to selection, and they often focus on the nucleotidic substitutions on the genome that are synonymous (namely, changing a nucleotide does not change the amino acid coded by the nucleotide triplet) versus these that are nonsynonymous (such a change changes the amino acid). When directional (positive or negative) selection occurs, the rate of synonymous substitutions decreases regarding the rate of nonsynonymous substitutions (since nonsynonymous substitutions alter fitness and therefore are targeted by selection in one sense or another, more than synonymous substitutions). One of the most common tests, the McDonaldKreitman test, compares within-species and inter-species differences between these two kinds of substitution. Concerning phylomemy, once we know the signature of some processes, one can analogously design tests that would detect these signatures. If science can evolve by natural selection, according to Hull’s thesis, for instance, one should be able to design within-branches and inter-branches comparisons of the different rates of change of various gray labels in order to identify topics that are under directional selection.11 Third, scientific fields and domains generally display very different evolutionary dynamics. The patterns seen may therefore be signatures of very diverse processes; for instance, the “abrupt emergence” described in our last domain might be analogous to what evolutionary genetics calls “selective sweep” (Hermisson and Pennings 2005), where the selection on a locus actually incorporates all neighboring loci and induces a rapid genomic change. But other domains may rarely witness such signatures.



David Chavalarias, Philippe Huneman, and Thibault Racovski

This calls for further investigation, especially for the analysis of the different social structures of domains and their respective relations to technology—the former requiring other kinds of data than what is required for phylomemies. In any case, this may indicate that there is not one process of science but various science-producing processes, whose integrated theory has yet to come. This gestures toward a pluralistic view of science’s processes. At some level, the Hullian signature of a Darwinian process may dominate, while at another level the dynamics of science would provide the signature of a Popperian or Kuhnian process. It might also be that depending upon the scale analyzed—be it theories or models—the signatures are not the same, and therefore, the dominant processes are not the same. What phylomemies allow is a test of hypotheses about the general dynamics of science at several levels and scales: different signatures at different levels and scales would define a pluralistic view of science dynamics. Formulating and Revisiting Questions Based on Phylomemetics We argued in this chapter that phylomemies provide an example of the kind of phenomenological reconstructions required for a bottom-up approach to the dynamics of science that would make possible a search for domain-characteristic processes of science in a robust manner. We mean by robust the fact that biases proper to usual philosophical reconstructions can be attenuated. Of course, biases still exist and their neutralization requires multiplying the points of view through cross-analyses (for instance, changing the notion of term proximity in the reconstruction, the notion of clustering in the detection of scientific fields, the text-mining methods that delineate a specific vocabulary). Experts of the field should also take part in the control of phylomemetic reconstructions by a back-and-forth movement between maps, expert knowledge, and phylomemies. Moreover, we do not claim that all controversies about the dynamics of science in philosophy of science can be solved by appealing to phylomemies. This is the case because no dataset is immune to philosophical characterization (for instance, about the role of publications in the process of science), and second, we cannot bypass the need (acknowledged above) for a philosophical reflection on the relations between concepts, terms, tags, and topics in order to refine what “signatures” should be. Last, no philosophical theory of a scientific process can emerge directly from the phenomenological reconstruction. We argue for a complementarity rather than a revolution in the philosophy of science.

Using Phylomemies to Investigate the Dynamics of Science

To conclude, we should indicate based on the previous developments that, with the help of this new methodology, several questions about the evolution of science can be revisited: 1. How can we characterize the multilevel structure of science, its organization into disciplines and subdisciplines, and its dynamics? What are different regimes of science? Are there patterns of concept emergence and disappearance within branches of science that are characteristic of some particular domains? How heterogeneous are the timescales of science evolution between disciplines? 2. What influences the evolution of a branch of science the most: Conceptual or methodological innovation? The arrival of a new generation of scientists? The capitalization of results stemming from another branch of science? A technological innovation? 3. How do scholars flow through the global structure of science? How heterogeneous are scholars with respect to their role in building the phylomemy? Do we have contributors specialized in developing emerging branches, others consolidating mainstream ideas, and some building the interdisciplinary bridges that open up new directions of research? Even though the methodology of phylomemy reconstruction is still in its early development, we are confident that this approach can be brought to a point where the morphological and quantitative features of reconstructed branches of science will reflect the types of research regimes that take place within them. The detailed and cross-domain characterization of these typologies will open up new perspectives for comparison with the science regimes hypothesized by philosophers of science through more traditional methods. We also expect phylomemy reconstruction, and similar methodologies, to be game changers in our relation to science production, providing new sources of empirical observations and testable hypotheses for the history and philosophy of science.


Part II Frontiers in Tools, Methods, and Models

Chapter 5

LDA Topic Modeling Contexts for the History and Philosophy of Science Colin Allen and Jaimie Murdock

In their introduction to a special issue of the Journal of Digital Humanities in 2012 about topic modeling for the digital humanities, the editors Elijah Meeks and Scott Weingart began by lampooning the state of the field, with its obscure jargon and self-inflicted wounds. As other contributions to this volume show,1 we are not alone in believing in the promise of topic modeling as a research tool for the humanities in general and for history and philosophy of science (HPS) in particular. However, as we shall argue, to realize this potential and minimize additional self-harm require a shift in the way that topic models have been used and discussed, moving away from a word-centered conception of topics and toward a document- and context-centered conception of the models. The potential of topic modeling makes it worth mastering the jargon and developing an appreciation for the inside jokes in the introduction by Meeks and Weingart: Topic modeling could stand in as a synecdoche of digital humanities. It is distant reading in the most pure sense: focused on corpora and not individual texts, treating the works themselves as unceremonious “buckets of words,” and providing seductive but obscure results in the forms of easily interpreted (and manipulated) “topics.” In its most commonly used tool, it runs in the command line. To achieve its results, it leverages occult statistical methods like “dirichlet priors” and “bayesian models.” Were a critic of digital humanities to dream up the worst stereotype of the field, he or she would likely create something very much like this, and then name a popular implementation of it after a hammer. (Meeks and Weingart 2012) 103


Colin Allen and Jaimie Murdock

The “command line” to which Meeks and Weingart refer is that text-only interface to the computer’s operating system that is modeled on computer terminals from the 1970s and whose operation manuals bear some resemblance to a book of spells. The “hammer” to which they refer is the popular implementation of latent Dirichlet allocation (LDA) topic modeling named MALLET. To many, LDA topic modeling has indeed seemed like a blunt instrument whose significance is obscure to all but a cult of magicians. How, after all, could a technique that begins by throwing away all the syntactic information in language yield anything about the meanings of text? In this bag-(or bucket-)ofwords approach, “man bites dog” is indistinguishable from “dog bites man,” and thus the surprising is rendered indistinguishable from the banal. Furthermore, for reasons of computational tractability and model interpretability, the highest-frequency words are eliminated before the models are constructed (also known as “stoplisting”). Gone are the most common prepositions, conjunctions, and pronouns, as well as important operators such as “not.” Thus, not only is the surprising reduced to the banal, but exact opposites of meaning collapse to a single representation. It is hard to see how such a seemingly blunt instrument could be of interest to scholars in the humanities. Philosophers and historians of science may be forgiven for being especially skeptical given their concerns with the subtleties of scientific reasoning and explanation and the shifts in meaning and understanding that follow theoretical change, but analogous concerns arise for any other humanities discipline. The initial applications of topic modeling rather reinforced the idea that the models provide little to sustain the interest of anyone interested in detailed understanding of texts or intellectual history, whether literary or scientific. As Meeks and Weingart (2012) put it, given a topic model, “You would marvel at the output, for a moment, before realizing there isn’t much immediately apparent you can actually do with it.” A decade later, much has been done with topic models, but they still tend to perplex all but the innermost circle of practitioners. We surmise, however, that the difficulty of seeing what can be done with topic models is partly a product of some common misconceptions about how they are conjured and partly a product of how they are typically presented. Our aim here is to illustrate through our own work the application of topic modeling to questions that interest historians and philosophers of science, going beyond simplistic presentations that tend to give scholars the idea that the algorithms produce results that are superficial and perhaps unreliable. The facts about topic models and the ways in which they are often misrepresented and misunderstood frame our attempt in this chapter to convince readers that, despite appearances, LDA topic

LDA Topic Modeling

modeling provides a lot more of value to HPS research than merely providing for enhanced search and information retrieval from large sets of documents. There is much room for the interplay between human intelligence and sophisticated algorithms to expand the range of questions about science that HPS scholars will ask and can answer. Although it is impossible to assess the approach without some insight into the workings of the algorithms, we also believe the case for their use can be made and understood without first gaining expert-level understanding. Below we will provide a brief introduction to LDA topic modeling, but more detailed introductions can be found in various places, including the contributions to the aforementioned special issue of the Journal of Digital Humanities edited by Meeks and Weingart (see especially David Blei’s contribution to that issue, Blei 2012b; see also Blei 2012a and Vaesen’s chapter in this volume). Our take on topic models extends to dissatisfaction with the term “topic modeling” itself and urges a reorientation to documents and the contexts in which they are written. Although not ideal, a better label might have been “context modeling.” Such a relabeling would, nevertheless, help to avoid the ordinary connotations of the term “topic,” which suggests something that could be a title for a lecture, a course, or a thesis: “veterinary medicine in the Andes,” for example, or the “quantum states of electrons.” The implicit but oft-articulated story behind topic models is that they provide a (very partial) theory of the writing process (Boyd-Graber, Hu, and Mimno 2017). The texts that authors produce typically combine a few topics. A treatise on veterinary medicine in the Andes is likely to touch upon some related topics such as physiological adaptations to altitude or the state of veterinary education in South America, but it may also digress into geology or meteorology. From this perspective, topic modeling provides an account of how documents are generated by selecting among words associated with the multiple topics. We do not challenge the idea that topic models provide a theory about writing. However, by recasting them as context models, we think we get a better account of the relevance of the models to writing, as we shall explain below. Although our arguments are addressed explicitly to scholars in HPS, we believe they generalize to topic modeling in other humanities disciplines, including history and literary studies. It is worth acknowledging, however, that some of our interpretive concerns concerning topic models will seem more relevant to those with historical interests than those with primarily literary concerns.2 We also make the case that philosophers of science are particularly well placed to contribute to the interpretive questions because of their attention to models in science.



Colin Allen and Jaimie Murdock

On the Use and Abuse of Topic Models A frequent target of topic modelers—the low-hanging fruit as it were— has been the back issues of scholarly journals, identifying the changing distribution of topics over time.3 As tantalizing as the prospect might be that this technique would reveal something novel about the history of ideas, the presentations of the distributions and fluctuations of topics uncovered in this way rarely seem to go far enough. Part of the problem concerns the highly variable intelligibility of the word distributions identified as “topics” by the LDA algorithm. The flexible way in which people understand these so-called topics has been analogized to “reading tea leaves” (Chang et al. 2009), and the tendency of topic modelers to use relatively short lists of ten to twenty words to represent each topic exacerbates the difficulty of coming to a good understanding of the models. Our intermittent use of scare quotes around “topics” is intended to flag places where overinterpretation looms. Technically, LDA outputs are not topics as English speakers would primarily understand that term. Rather, each “topic” is a total probability distribution over the vocabulary in the corpus—that is, the sum of all the word probabilities within one topic is equal to one—and every word is assigned a non-zero probability in every topic (although most words are assigned a vanishingly small probability in most topics). Simultaneously, each document is represented as a total probability distribution over the topics, and every topic is assigned a non-zero probability in every document, albeit skewed to relatively few topics. The sum of the topic probabilities within one document is likewise equal to one. The number of topics is chosen by the modeler, but their content is not. The model is initialized with random probabilities assigned to the word-topic and topicdocument distributions. Only through an iterative training process that updates these distributions does anything interpretable emerge. Specifically, the models are trained by a Bayesian process, which tests document-word distributions generated from the model against the observed distributions sampled from the documents and concurrently adjusts the word-topic and topic-document probability distributions so as to be capable of better matching the word distributions found in the actual documents. The probability assignments in the models become stable with repeated training passes through the full corpus, making it reasonable to terminate the training after a few hundred iterations of this process. The shapes of the word-topic and topic-document distributions are controlled by two parameters—technically “hyperparameters” or “priors” on the Dirichlet distribution (named for the nineteenth-century

LDA Topic Modeling

mathematician Gustav Dirichlet)—that are also chosen by the modeler. These hyperparameters skew the algorithm toward producing wordtopic and topic-document distributions that have most of their probability mass (or “weight”) concentrated in relatively few of the words and topics assigned to topics and documents respectively. As Blei (2012b) explains, the choice of the hyperparameters represents a trade-off: “On both topics and document weights, the model tries to make the probability mass as concentrated as possible. Thus, when the model assigns higher probability to few terms in a topic, it must spread the mass over more topics in the document weights; when the model assigns higher probability to few topics in a document, it must spread the mass over more terms in the topics.” The practical upshot here is that when topics are heavily loaded on a few words, they will be less successful at accounting to the words in any given document, so more topics will need to be assigned to that document to account for its word distribution, but this runs counter to the imperative to load documents with relatively few topics. In the extreme, imagine a topic that puts nearly all of its probability mass on one word—e.g., “lion”—another that skews equally heavily on “tiger,” a third on “elephant,” and so on. A document containing a normal mixture of words—“Lions and tigers mostly avoid elephants in Africa and India respectively”—would need low weightings on lots of such skewed topics to represent its actual word distribution well. But distributing the probability mass over lots of topics is not compatible with the hyperparameter setting that favors distributing the probability mass over fewer topics per document, so the training process must compromise by assigning some of the probability mass to more words in each topic. Given reasonable and typical selections for these hyperparameters, even if the model is trained with as many topics as there are documents, LDA assigns a mixture of topics to each document, though many of these topics will be relatively hard to interpret. With too many topics, some of the topics are specialized on just a few documents, also making them less useful for discovering relationships in the wider corpus. When training models with considerably fewer topics than documents, the LDA process achieves generalizable and interpretable results via a form of data compression. However, with too few topics, each topic becomes very general and less useful for identifying informative relationships among the documents. While statistical methods exist for computing the number of topics that give the best statistical fit for a given corpus, scholars and other human users may prefer a coarse-grained scheme (fewer topics) for some purposes while preferring a more finegrained scheme (more topics) for others. Furthermore, the value of



Colin Allen and Jaimie Murdock

more specialized topics may be more apparent to domain experts than to other users. The fact that the statistical fit of the topic model to the documents in the corpus does not correlate with user judgments about topic quality was the point of the “tea leaves” paper by Chang and colleagues (2009). Many articles report the results of topic modeling in ways that drive misunderstanding of the models. First, it is typical to display only a subset of the topics found by LDA, and only the most readily interpretable, thus making the overall model seem more interpretable than it would otherwise appear. The embarrassment of “junk” and “jargon” topics— that is, topics that are hard to interpret—is thereby swept under the rug. For example, Malaterre and colleagues write, “47 [of the 200 topics] appeared to be either too generic or polysemic to be precisely related to any meaningful issue in philosophy of science. We therefore grouped these 47 topics under the label ‘Jargon’ and set them aside” (Malaterre, Chartier, and Pulizzotto 2019, 221). Similarly, albeit with different goals, Lambert and colleagues report that they “studied the British Medical Journal between 1960 and 2008, identifying 100 topics using latent Dirichlet allocation, which we filtered for those directly concerned with clinical practice or medical research using the words most highly associated with each topic, leaving us with 73 topics” (Lambert et al. 2020, 358). To be fair, these researchers understand that their decision to omit certain topics from their analyses is driven by their particular explanatory interests and that different interests might entail making the distinction between “junk” and “jargon” more precise, with the former perhaps providing a guide to better corpus preparation prior to modeling and the latter proving useful for genre or style analyses. Our point is only that the practice of ignoring such topics serves to reinforce a fraught strategy that centers on directly assigning meaningto topics. Second, by showing only the ten or so highest-weight words for each topic, such presentations neglect most of the words that contribute to the topics’ roles in representing the corpus documents. For example, in the 200-topic model that we constructed from 665 nonfiction English-language books read by Charles Darwin between 1837 and 1860 (Murdock, Allen, and DeDeo 2017), typically 500–600 words are required to account for 50% of the probability mass for any given topic. Looking only at the first ten or twenty words may provide little understanding of why that topic has been assigned a high weight for a given document. Third, this limited way of presenting the topics also leads readers who don’t fully understand LDA to the incorrect assumption that documents are assigned a high proportion of a given topic because they contain all

LDA Topic Modeling

the words listed as “in” the topic. While it is indeed somewhat likely that a high-probability word from a given topic appears in any document for which the topic is highly weighted, it is not guaranteed. One of the strengths of topic modeling is that thematically similar documents may be assigned similar topic profiles despite considerable differences in the vocabulary they contain. Another source of misunderstanding is that the phrase “words in the topic” is easily taken to encompass all and only the words that the authors have presented for each topic. Analyses of topic models that focus on interpretation of the topics often distort a fundamental feature of topic modeling: it is not a discriminative model (i.e., one that sorts entities into distinct categories) but rather a mixed membership model (Airoldi et al. 2014); thus a document does not “belong” to a single topic, nor does a “word.” The distributions themselves, independent of any interpretation of their components, can illuminate a collection of documents. For example, in our analysis of Darwin’s readings, we did not select a subset of meaningful topics and analyze only those. Rather, we analyzed the time sequence of Darwin’s reading choices in respect to fluctuations in an information-theoretic distance measure of “surprisal” (i.e., Kullback-Leibler divergence; see below) applied to the entire topic distribution—junk, jargon, or otherwise. Indeed, it is worth emphasizing that the machine itself has no understanding, and thus to it, all the topics are on a par, whether or not they make sense to a human reader who comes to them with the bias that “topics” must make sense. Those who work with topic models should, as Jeffrey Binder says, “resist attempts to present computational results in forms that readily appeal to our assumptions and intuitions about language” (Binder 2016, 212). It is important to emphasize here that no currently available form of artificial intelligence (AI) or machine learning (ML) can supply the meanings that matter for genuine scholarship and understanding.4 For the foreseeable future, computers will remain mere tools for humans to use creatively. But the stupidity of AI/ML does not render it useless. Topics as Contexts The shift to thinking of the “topics” in a topic model as representing contexts helps deal with the problems outlined above in various ways. First, it helps us reconceptualize the issue of junk topics. The assumption that LDA finds topics in the ordinary English sense practically forces one into interpretive mode, so that for topics for which there is no easily available interpretation, it is natural to label them as “junk.” For instance, here are the ten highest-probability words from two “topics” in an 80-topic LDA model of the 881 letters that Thomas Jefferson’s



Colin Allen and Jaimie Murdock

grandson chose to include in his edition of Jefferson’s papers (Randolph 1829), just a small sample from among the thousands written by Jefferson, the author of the US Declaration of Independence in 1776, the United States’ first secretary of state from 1790 to 1793, and its third president from 1801 to 1809: Topic A: vessels, war, British, vessel, port, Britain, sea, peace, enemy ships. Topic B: honor, respect, obedient, humble, servant, also, sentiments, esteem, take, think.

Setting aside for the moment the concern about guessing the content of a topic from just ten words, the first of these suggests an obvious interpretation, one that is borne out by it being assigned the most weight in a letter concerning British seizure of a ship inbound to the United States from the French West Indies, written in 1792 by Jefferson to Edmond-Charles Genêt, the French envoy to the United States. The second “topic” consists of words one might expect to find in letters but is not obviously topical in the ordinary sense. It turns out that this particular “topic” is most represented in the letters Jefferson wrote to various diplomats during his term as secretary of state in the early 1790s, especially to the British envoy George Hammond, but also in many of his letters to Genêt. These are not just any old letter words but words more likely to be used in the specific context of writing letters to diplomats. Writing takes place in historical contexts, which influence the words selected. The contexts of writing may include topics in the ordinary sense (i.e., the subjects addressed in the writing) as well as the situation of the writer in the historical moment, as influenced by the particular networks of family, friends, colleagues, and culture at large and the author’s roles in institutions and society more broadly. Different writing contexts entail different audiences: letters to friends and family versus business associates or diplomats, philosophical treatises, public speeches, or scientific publications. Each of these contexts changes the likelihood of the author selecting certain words even when the topic of discussion (in the ordinary sense) is nominally the same. Conversely, the appearance of the same word in different contexts may produce minor or major variations in meaning. The meaning of “realism” in the context of philosophy of science is fairly similar to the use of this word in the context of general metaphysics but quite different from its meaning in the context of political philosophy. Likewise, “topic” in a discussion of LDA topic modeling does not mean the same as “topic” in the context of a library or in a public debate. And lest it go without saying, the contexts in which words are used change over time, so the context of “mass” in Einstein’s usage is not the same as in Newton’s, yet both these contexts

LDA Topic Modeling

5.1. Cumulative divergence of Origin from English nonfiction books read by

Darwin 1837–1860 generated by repeatedly fitting the Origin to a 200-topic model of the reading corpus (“sampling”). The x axis represents the number of books read in the order that Darwin read them. The y axis represents KL divergence between the books read and samples of the Origin. Thicker dashed lines represent the mean (long dashes) and the upper and lower bounds of divergence (short dashed lines) at each point in the reading sequence. Lighter dashed lines each represent the results of one sample, although they appear dark when multiple samples overlap. Samples form clusters according to dominant topic. The cluster just below the maximum (top dashed lines) has the greatest divergence from the reading model, always close to the maximum. The samples in this cluster are dominated by a topic whose highest probability words are “forests,” “timber,” “teak,” “forest,” and “Ceylon.” The big drop in divergence at the 518th item, indicated by the vertical dashed line, corresponds to Falconer’s report on the teak forests in Burma, published in 1852 and which Darwin recorded reading on August 11, 1853. (For details of the sampling process and the clusters that emerge, see Murdock, Allen, and DeDeo 2018.)

are more similar to each other than the post-1940 context in which the phrase “probability mass” emerges. A second benefit of the shift toward thinking in terms of contexts rather than topics is that it reorients us back toward a documentcentered view of LDA. In fact, we disagree with the statement by Meeks and Weingart (2012) that topic modeling is “focused on corpora and not individual texts.” It would be more accurate to say that topic modeling is typically deployed in ways that lead scholars to focus on corpora



Colin Allen and Jaimie Murdock

and not individual texts. Studies focusing primarily on changes in topic distributions through the life of a journal exemplify exactly this. Thinking that the job of LDA is to find topics that are latent in a corpus sends one tripping toward the problems that Meeks and Weingart identify. But contexts matter to the understanding of particular documents. It is indeed banal to be told that Darwin’s On the Origin of Species contains topics related to botany; one hardly needs topic modeling for that. But it is far less banal to use a topic model to help identify the way in which, say, one document provides part of the context for the production of another. To illustrate, consider how one might compare Darwin’s Origin to the books that he read. At any given moment in his reading sequence, the books read so far contain some mixture of topics. Each new book read changes the aggregate mixture slightly. In the process of writing Origin, Darwin assembles a new mixture. Using Kullback-Leibler (KL) divergence, we can assess how different the mixture in Origin is from the mixture of topics aggregated across the subset of books Darwin has read up to any point in the sequence. This is illustrated in figure 5.1, but with a twist. Because there is a random element in the way in which the topics derived from the reading list are assigned to the written book, it is necessary to check that we obtain the same or similar answers from a repeated sampling process (for details see Murdock, Allen, and DeDeo 2018). When we run the process multiple times, we find that the samples fall into distinct clusters, and one such cluster reveals a large signature for one particular book: Hugh Falconer’s (1852) Report on the Teak Forests of the Tenasserim Provinces provides part of the context in which Darwin produced Origin (the topmost cluster of lines in figure 5.1). Instead of focusing on the high-probability words associated with the topics, we use the topics more holistically to address the interaction between the documents (books) in Darwin’s reading list and the book that he wrote. The context in which Falconer’s Report is relevant to Darwin’s writing is the latter’s discussion of the similar adaptations of different species of trees in similar mountain climates found at different altitudes and latitudes worldwide, constituting part of his argument for the power of natural selection to shape the characteristics of species. The turn to a document-centered view of topic models as context models applies not just to a theory of writing but to an account of reading as well. In our previous work we showed that the sequence of Darwin’s readings between 1837 and 1860 was not haphazard, but neither did he show the same pattern of choices through time; shifts in the pattern correspond to changes in his work context (Murdock, Allen, and DeDeo 2017). More specifically, using the topic weights assigned to the

LDA Topic Modeling

immediately prior book that he had read, as well as the aggregate of the topic weights assigned to the totality of books he had read to that point, we were able to apply an information-theoretic measure of distance to the next book he read. By focusing on Darwin’s reading trajectory through the documents, rather than on trying to interpret individual topics, we were able to detect shifting patterns in how close he stayed to the last-read book and whether he returned to books similar to those he had read previously or whether he went fishing for information in areas where he had not previously done any reading. These shifts correlated with three main phases of his work life: first organizing and publishing his notes from the voyage of the Beagle, next taking up the intensive study of barnacles from 1846 to 1854, and then turning to the organization of his notes for the book that would eventually be published in 1859, On the Origin of Species (Darwin 1859). Thus, we showed how the reading contexts extracted via LDA are related to Darwin’s intellectual endeavors. Subsequently, we have begun to extend the reading model to Darwin’s writings, with preliminary results published by Murdock, Allen, and DeDeo (2018). Thinking about how LDA helps us to identify contexts led us to become less concerned with finding labels for topics (as a way of interpreting them) and more interested in how the models help us find significant relationships between the documents that he read and wrote. For instance, we have results (again preliminary) indicating that the KL divergence to Darwin’s Origin is actually lowered by Whewell’s History of the Inductive Sciences (Whewell 1837), which he read soon after coming off the Beagle, whereas it is increased by the works of Francis Bacon (Montagu 1825–1834), which he read a couple of years after reading Whewell. This provides some additional evidence in support of Francisco Ayala’s (2009) suggestion that Darwin’s methodology owes more to Whewell than to Bacon (10035), despite Darwin’s overt claim to be following Bacon’s inductivist method (10033). The examples of the preceding two paragraphs describe our ongoing attempts to apply topic modeling to questions within HPS using a document-centered approach. Although these methods occasionally look at the signature contributions of specific “topics” to the documents, they work without regard to whether those topics are directly interpretable as stand-alone artifacts or in respect to the corpus as a whole. Similarly, we previously described methods for comparing Darwin’s writings on evolution to Alfred Russel Wallace’s 1858 essay using the model of Darwin’s readings (Murdock, Allen, and DeDeo 2018). The preliminary results reported there indicate greater similarity, in informationtheoretic terms, of Wallace’s essay to Darwin’s earlier essays than to the



Colin Allen and Jaimie Murdock

Origin. Darwin himself recognized this similarity to his earlier essays immediately, remarking to Charles Lyell in a letter dated June 18, 1858, “If Wallace had my MS. sketch written out in 1842, he could not have made a better short abstract!” Toward More Robust Modeling Practices In repeatedly emphasizing the preliminary nature of the results, we may seem again to be tripping toward problems that Meeks and Weingart (2012) identified in the remainder of a sentence already quoted above: “You would marvel at the output, for a moment, before realizing there isn’t much immediately apparent you can actually do with it, and the article would list a few potential applications along with a slew of caveats and dangers.” Reasons for caution are many. In addition to the tendencies toward overinterpretation of topics mentioned above, there are multiple technical issues. A far from exhaustive list includes sensitivity of the models to which volumes were obtained for the corpus and how the documents were fed into the model (whether as whole books, chapters, journal articles, pages, four-hundred-word chunks, etc.); sensitivity to the digitization process and cleanup of the text (treatment of hyphenation, inclusion or removal of headers and footers, etc.); unitization of terms as single words or using multiword phrases; removal of terms (by stoplist, by word frequency, as “foreign,” etc.); parameterization of the models (number of topics, choice of Dirichlet hyperparameters, number of cycles of training); and stochasticity in the training process (stemming from the choice of seed for initial randomization, the nature of the chosen sampling process, etc.). Philosophers of science are well positioned to bring their expertise to bear on assessing topic models because of their attention to modeling practices in science, including issues such as model robustness and the representational status of models. We agree with the sentiment expressed in the (different) context of philosophy of cognitive science by Paul Smaldino (2017a), who titled his article, “Models Are Stupid, and We Need More of Them,” itself a twist on the much earlier proclamation attributed to statistician George Box, “Essentially, all models are wrong, but some are useful” (Box and Draper 1987, 424). Despite the necessarily partial view provided by any particular modeling approach, and the stupidity of AI/ML, computational models in general and topic models in particular provide useful contexts for discoveries about documents important to the history and philosophy of science. In our own work we have tried to establish that results are robust across models with different numbers of topics (Murdock, Allen, and DeDeo 2017, appendix D.1), and we have also attempted to investigate systematically the behavior

LDA Topic Modeling

of models with different numbers of topics across different sample sizes (Murdock, Zeng, and Allen 2015). The space of possible investigations is huge, however, and much more work of this kind needs to be done. Working simultaneously with multiple models fosters the kind of “interpretive pluralism” that characterizes humanities computing (Rockwell and Sinclair 2016). It is also consonant with the kind of ensemble modeling used for weather forecasting (Gneiting and Raftery 2005). Emphasizing the interaction between models and human needs reflects the origins of LDA topic modeling in the field of information retrieval (Blei, Ng, and Jordan 2003)—a corner of information science that aims to support people to find what they need when confronted with large amounts of text. As topic modeling has been taken up within the digital humanities, and since Weingart and Meeks wrote their quo vadis in 2012, the level of understanding and analysis of what can be done with topic models has continued to develop among those who work with them intensively. This understanding has proceeded on two fronts: one is the recognition of the role of human interpretation, in rebuttal of the oft-expressed worry that computational methods seek to replace or reduce human understanding with (mere) mechanically derived statistical summaries of the text; the second is based on successful application of topic modeling to questions that scholars in the humanities should and do care about. To the first point, Geoffrey Rockwell and Stéfan Sinclair, as indicated in the title of their book, Hermeneutica (Rockwell and Sinclair 2016), stress the way in which computational models become themselves objects of interpretation by scholars who have different interests and interpretive strategies. Andrew Piper (2018) continues the theme of critical engagement with the models in his book Enumerations. He adds necessary nuance to earlier arguments by Franco Moretti (2013), who coined the phrase that became the title of his book, Distant Reading, and Matthew Jockers (2013), whose book Macroanalysis pioneered the use of network visualizations based on topic models of very large corpora. Piper is aware that computational analyses gain credence not intrinsically because of the quantity of data but rather by their representativeness of texts that may have been overlooked or otherwise marginalized by traditional means of analysis. The field of literary studies has traditionally held the view that generalizations about literature can only be justified on the basis of close readings of particular texts, but Piper’s book shows how the same kinds of generalizations can be supported independent of these close readings. Ted Underwood (2019) argues that topic modeling in literary studies supports a hypothesis testing approach that can reduce hindsight bias arising when computational methods are applied unguided by specific



Colin Allen and Jaimie Murdock

hypotheses, addressing the problem of how we prove that these methods illuminate more than we already know. This growing literature theorizing and justifying the practice of topic modeling for the humanities has been dominated so far by scholars of history and literature. Philosophers have barely engaged, predominantly sharing the prejudice that such techniques have little to say on issues of philosophical interest. The applications of topic modeling that are acclaimed by authors such as Piper or Underwood—for example, to determine the historical emergence of genre in literature—do not strike philosophers as central to their concerns. By having a foot in history, the integrative study of history and philosophy of science is in an interesting position of being forced to close the gap (or better, following Schickore 2011, to eliminate the false opposition) between concern for the particular (in the details of specific episodes of science) and the quest for abstraction (via generalizations that contribute to understanding the significance of those episodes). Although our own work on Darwin focused on the very particular reading and writing behavior of a single scientist, we pursued this project with the rather general goal of developing methods for measuring and tracking such things as conceptual similarity, conceptual change, the sensitivity of meanings to context, and pathways of intellectual influence. Topic models are both wrong and oblivious to these higher-order goals of understanding. They only partially capture aspects of language that are relevant to the sorts of meaning extracted by competent close readers of the texts. However, the evidence they provide speaks in new ways to existing questions and leads to new questions that could have been reached by traditional close reading or by other computational means, but are perhaps less likely to have been reached without the assistance of LDA. Our focus on LDA is not rooted in any pretense that it is the only tool that matters for computational text analysis. Other approaches have their uses too. Nor is it automatically the right tool for any given purpose. Historian Jo Guldi (pers. comm.; Guldi, n.d.) argues that older, simpler algorithms for analyzing and comparing documents—such as n-gram counts (simple quantification of occurrences of n-word phrases) and tf-idf (term frequency–inverse document frequency, originally described by Karen Spärck Jones [1972])—provide “white-box” methods that are preferable because of their greater comprehensibility to nonexperts than “black-box” methods such as LDA. Researchers who are among the LDA cognoscenti may gravitate toward more sophisticated tools for the wrong reasons, and without checking to see whether something else would work just as well for the purpose at hand. One

LDA Topic Modeling

worries about swatting the proverbial fly with the proverbial elephant gun. Direct comparison of methods may, however, help justify the use of both. Malaterre and colleagues, for instance, footnote their decision to use LDA because of “its proven reliability for identifying topics in large corpora” (Malaterre, Chartier, and Pulizzotto 2019, 217); in the same footnote they also mention benchmarking LDA against a simpler approach, k-means clustering, preferring LDA because it produced just as good quantitative results that were also more interpretable. (Exercise for the reader: Insert suitable warning about interpretability here!) To their credit, Malaterre, Chartier, and Pulizzotto (2019) do provide their readers with titles of documents related to the topics in their model, thus taking a step toward the document-centered view we are advocating. Similar points apply not just to the choice of modeling approach but also to the methods used to analyze the models. In our own work on Darwin’s reading behavior, we have preferred the (relatively complex) Kullback-Leibler divergence measure to assess similarity between document-topic distributions, rather than the conceptually simpler cosine similarity sometimes used on these vectors of topic weights. We also found that while both a simpler (white-box) rank-ordering method and the fully quantitative KL divergence measure (relatively black-box to some due to its complexity, although perhaps less opaque than neural network models) could capture Darwin’s overall pattern of reading selections, only KL divergence allowed us to adequately quantify shifts in his behavior between exploitation and exploration and to correlate these shifts with the major epochs in his research career described above. Where the methods converge, one has more confidence in both, and where they diverge, we are confident that the extra complexity is worthwhile. A legitimate concern (this due to an anonymous reviewer) is that our choice to use KL divergence was entirely post hoc, based on finding a method that yields the results we wish. In fact, however, we chose KL divergence not for those reasons but (as explained at the beginning of Murdock, Allen, and DeDeo 2017) because it is a widely used measure within cognitive science. In this chapter, following our previous work, we have focused on the most basic, unembellished form of LDA topic modeling. More sophisticated applications of LDA exist, such as dynamic topic modeling (Blei and Lafferty 2006), as well as other approaches to modeling text, such as neural-network-based word embeddings (i.e., Word2vec; Mikolov et al. 2013) and combined methods (such as lda2vec; Moody 2016). While such methods provide incremental advances on standard information retrieval benchmarks, they have yet to be shown to have specific benefits for computational humanities in general, or history



Colin Allen and Jaimie Murdock

and philosophy of science in particular. Furthermore, dynamic LDA, and related approaches that make topics variant rather than invariant entities over time, may suggest relationships among documents that are spurious. For example, Robert Rose, in our research group at Indiana University, ran some simple experiments with dynamic topic models showing that if topics were allowed to evolve in response to new documents being added to the corpus, a topic that was previously prominent in early documents concerning (for example) theology could morph to become dominant in later documents about (for example) symbolic logic, due to simple word ambiguities (e.g., “church” as referring initially to the Catholic Church and later appearing as the name of a seminally important logician). Of course, this particular issue could have been solved by tokenizing the name and the noun differently instead of treating both as instances of the same “word,” but it is indicative of a much broader problem of colexification of different concepts that cannot easily be solved algorithmically. It also might be thought that the problem can be avoided because a typical corpus for HPS research would not contain such a strange mixture of theology and mathematical logic. This optimism is undermined by actual experience, however. Thus, for instance, the corpus of 1,315 books we used for the study reported by Murdock et al. (2017) contained examples from logic, theology, comparative psychology, and many other areas; and it also coincided with a shift in the use of the term “anthropomorphism” from theological and anthropological contexts to the context of discussions about the nature of animal minds. A dynamic topic modeling approach with a fixed number of topics might have been forced to repurpose a topic assigned to early texts about theology to fit later texts about comparative psychology, whereas the simpler, static approach we took differentiated these topics within the corpus taken as a whole. So, although such a simple, unpublished experiment as the one we conducted with “church” and “Church” is not definitive, it suggests that more work needs to be done to determine whether more sophisticated versions of topic modeling are appropriate for the needs of HPS scholars. In our view, the document-centered view helps to keep the correct goal in sight. Fluctuations or other changes in topics are not the important outputs of LDA topic modeling. Rather, the power of the topic models lies in their ability to reveal relationships among appropriately contextualized documents. In the paragraph from Meeks and Weingart (2012) with which we opened this chapter, they refer to the “seductive but obscure results in the forms of easily interpreted (and manipulated) ‘topics.’” If LDA topic

LDA Topic Modeling

modeling can reorient the consumers of topic models away from the “topics” and back to the documents, we believe that its results may be rendered less seductive and less obscure. While the need for good information retrieval makes it worthwhile to model large corpora, often comprising documents that have been aggregated for institutional reasons such as library collections or the contents of professional journals, we have chosen a different path in our HPS work: to model the reading and writing behavior of specific individuals. This focus on people and the documents they read has led us to ask questions about their exploration and exploitation of the cultural contexts in which they found themselves and to view topic models as tools for identifying influence and measuring creativity within those contexts. The documents are, after all, the ultimate repositories of authors’ meanings and can only be read and understood by human beings, given the current limitations of all forms of AI and machine learning. But we hope to have convinced our readers that, judiciously used, LDA topic modeling is among the algorithms that are worthy of exploration and exploitation by historians and philosophers of science, who will continue to supply the meanings that inform our understanding of science and its philosophical significance.


Chapter 6

The Potential of Supervised Machine Learning for the Study of the Evolution of Science Krist Vaesen

There is a growing, scholarly interest in scientific cartography, that is, the production of maps that track the changing structure of scientific fields. Such maps aim to monitor, among other things, the behavior of bodies of knowledge over time, the major research themes within a field, and the interactions among scholars or research teams within and across fields (Börner et al. 2012; Suominen and Toivanen 2016). They could, as has been claimed of them, serve as a useful complement to more qualitative approaches to the history, philosophy, and sociology of science. Usually, scientific maps are produced based on bibliometric analyses of datasets that are extracted from scientific citation indexing services (e.g., Web of Science or Scopus—see the overview in Morris and Van der Veer Martens 2008). A very recent development in scientific cartography, however, is the use of datasets that contain, in a digitized format, the full text of scientific articles (as provided by, e.g., JSTOR Data for Research) and their analysis by means of text-mining machine learning techniques. Such analyses have yielded maps of the issues and themes covered by major journals in, among others, economics (Ambrosino et al. 2018; time span 1845–2013), sociology (Giordan, Saint-Blancat, and Sbalchiero 2018; time span 1921–2016), cognitive science (Priva and Austerweil 2015; time span 1980–2014), environmental policy (Murakami et al. 2017; time span 1990–2010), and philosophy of science (see chapter 9 in this volume). 120

The Potential of Supervised Machine Learning for the Study of Science

Each of these studies uses the same machine learning technique, namely topic modeling. This chapter sketches what topic modeling amounts to and the kinds of questions it might help us address. I argue that topic modeling is limited in scope because it does not take advantage of the machine supervisor’s prior knowledge about the field under study (i.e., it is an unsupervised technique) and because it cannot reckon with such things as argumentation, semantics, grammar, and style. I then discuss the sense in which supervised machine learning algorithms might overcome these limitations. I urge researchers to at least explore the ways in which such supervised machine learning algorithms can contribute to the science of science. The Strengths of Topic Modeling Topic modeling is an automated method for identifying topics in a corpus of texts. The most common topic modeling technique, latent Dirichlet allocation (LDA), treats texts as “bags of words” (so it does not take into account grammar or semantics). LDA is an unsupervised technique: ex ante, the human operator only has to define the number of topics that the machine is to extract from the corpus. What the topics are about is established by the operator ex post. Consider the following toy example, a corpus that comprises three documents. Document 1: cell, mitochondria, cell Document 2: cell, environment, species Document 3: environment, species, habitat This small corpus includes the word types “cell,” “mitochondria,” “environment,” “species,” and “habitat.” Suppose we instruct LDA to construct two topics, labeled Topic_1 and Topic_2. LDA will estimate probability distributions over the corpus’s vocabulary and calculate, for both Topic_1 and Topic_2, the probabilities of each word type (see table 6.1). The numbers in table 6.1 should be read as follows: if one were to sample Topic_1 (see the values in the first column), the probability of drawing the word “environment” is 0.012 (the same goes for “habitat” and “species”), the probability of drawing the word “cell” is 0.717, and the probability of drawing “mitochondria” is 0.247. For Topic_2 (see the values in the second column), the probabilities are 0.390 for “environment” and for “species,” 0.200 for “habitat,” and 0.010 for “mitochondria” and “cell.” Each word type will now be assigned to the two topics, according to the type’s significant probability values:



Krist Vaesen

Topic_1: {mitochondria, cell} Topic_2: {environment, species, habitat} Table 6.1. Probabilistic topic model for the corpus, part I: probabilities of drawing a word token of the word types “environment,” “habitat,” “mitochondria,” “cell,” “species,” when sampling Topic_1 (first column) and Topic_2 (second column). The probability distribution of each topic is continuous, with all probabilities summing to one.

environment habitat mitochondria cell species

Topic_1 0.012 0.012 0.247 0.717 0.012

Topic_2 0.390 0.200 0.010 0.010 0.390

Basically, topics are just lists of terms that are likely to co-occur in the documents of the corpus. The operator establishes what topics are about. For instance, if the above topic descriptions had been derived from a random sample of articles in biology (rather than from the documents of our toy example), one might conclude that the corpus covers themes in microbiology (Topic_1) and ecology (Topic_2). LDA also produces a table that sets documents against topics, as in table 6.2. The table presents the probabilities of drawing Topic_1 (first column) or Topic_2 (second column) when sampling, respectively, Document 1, Document 2, and Document 3 (the rows of the table). Document 1 appears to be associated primarily with Topic_1, Documents 2 and 3 with Topic_2. Accordingly, Topic_2 is, across documents, better represented in the corpus than Topic_1.

Table 6.2. Probabilistic topic model for the corpus, part II: probabilities of drawing

Topic_1 (first column) and Topic_2 (second column), when sampling, respectively, Document 1, Document 2, Document 3. The probability distribution of documents is continuous, with all probabilities summing to one.

Document 1 Document 2 Document 3

Topic_1 0.963 0.346 0.037

Topic_2 0.037 0.654 0.963

The Potential of Supervised Machine Learning for the Study of Science

When applied to time-stamped documents, such analyses help us explore the temporal evolution of topics over time. By way of illustration, Ambrosino et al. (2018) performed an LDA analysis on 250,846 articles published from 1845 to 2013 in 188 economics journals. Ambrosino et al.’s results suggest, among other things, that econometric and mathematical methods came to dominate economics from the 1980s onward until, around 2010, this dominance was taken over by economic history and the social aspects of economics. Another example concerns a study by Giordan, Saint-Blancat, and Sbalchiero (2018), who analyzed the abstracts of 3,992 articles published between 1921 and 2016 in the American Journal of Sociology. The authors found that, by the 1960s, the journal had become a popular venue for articles dealing with topics related to “sociology as social criticism of society.” At the same time, they observed a decrease in the presence of other topics, including the sociological study of religion, the psychological study of social processes, and the need to affirm sociology as a scientific discipline. LDA also can tell us something about degrees of specialization. The skewness of the distribution of topics across papers indicates whether the discipline in question is principally interested in a narrow set of questions or rather evenly spreads its attention over many topics. Likewise, the skewness of the distribution of topics within papers can be used to distinguish between specialist and generalist works (see, e.g., Murakami et al. 2017). Topic modeling is a technique that allows one to explore the thematic structure of corpora that are too large to be manually analyzed. It is an automated approach that, at least ex ante, requires little to no inputs from the operator. As illustrated above, topic models are useful in exploring chronological changes in the research agendas of disciplines and in assessing disciplines’ degree of specialization. Furthermore, they might complement more traditional forms of scientific cartography (i.e., cartography based on bibliometric data). Such traditional maps are good at capturing relationships (e.g., relationships between the documents within a given corpus, as inferred from citation data) but bad at specifying what these relationships are about. Given their focus on the content of scholarly work, topic models may indeed provide such specifications. The Limits of Topic Modeling Although topic modeling techniques such as LDA minimize the amount of ex ante intervention, they require substantive interventions ex post: topics must be interpreted in a meaningful way. To illustrate the difficulties of the interpretation of topics, consider Murakami et al.’s (2017) LDA analysis of the 675 articles that appeared, over the peri-



Krist Vaesen

od 1990–2010, in Global Environmental Change (a journal dedicated to publishing research on the human and policy dimensions of global environmental change). One of the topics that appears to have been prominent in early volumes of the journal but that declined afterward comprises the terms {countri, develop, nation, world, intern, their, india, global, industri, most} (note that the terms are stemmed, e.g., developed → develop, national → nation). Murakami and colleagues (2017) label this topic “Developing and developed countries” (so basically all countries). But, in the absence of a reliable framework for carrying out such interpretations, the topic seems as consistent with the labels “Developing countries,” “Developed countries,” and “National and global industries” (to name just a few of the possible alternatives). Furthermore, it is unclear what the decline in popularity of the topic actually means. Even if one accepts Murakami et al.’s label (namely, “Developing and developed countries”), its decline might imply any of the following: a decreasing interest in developing countries, in developed countries, or in countries as such. The latter, in turn, might imply that there is a trend toward more theoretical work, toward work dealing with levels below that of the nation-state, or toward work targeting specific countries. Or, more generally, the decline might simply mark a change in terminology (e.g., countries → states, developing → low and middle income). Additionally, in order to construct a consistent story, the interpretation of the trajectory of any given topic must be brought into accord with the interpretation of trends in other topics. Hence, the degrees of interpretive freedom increase as the number of topics increases. To be clear, the point is not merely that such interpretive freedom makes the development of topical maps, to a significant extent, arbitrary (a limitation that LDA analysts themselves have acknowledged; e.g., Ambrosino et al. 2018). The above also suggests that the scope of LDA is limited by the amount of information that human operators are capable of processing ex post. Indeed, in order to keep their interpretive task within manageable limits, researchers typically work with relatively small numbers of topics. For instance, for a corpus of 250,846 articles covering a time span of 168 years, Ambrosino et al. (2018) constructed no more than 100 topics. Surely not all topics will be equally relevant to mapping disciplinary change. Yet we lack a framework for deciding which topics are most indicative of which types of disciplinary change, and thus for deciding how many topics to consider. Suppose that in the 168 years covered by Ambrosino et al.’s dataset there is a period of 30 years during which economists were primarily concerned with applied (as opposed to theoretical) work. Given that this era of applied research is so short, topics that could mark it are unlikely to appear in the list of

The Potential of Supervised Machine Learning for the Study of Science

topics that are most characteristic of the full 168-year period and, consequently, the era would go unnoticed. Additionally, even if the topics characteristic of the 30-year period were to make it to the list of topics characteristic of the 168-year period, their salience might not be immediately obvious (e.g., topics including verbs such as “solve” or “tackle” might be indicative of applied research but could easily be overlooked). Another type of loss of information, and hence of detail, is not so much due to the limits of the human operator as to the bag-of-words principle on which LDA operates. LDA treats documents as sets of word tokens and thereby disregards any information concerning documents’ style, semantics, and grammar. It thus disregards much of what humans take into account when interpreting a text. Indeed, a study by Al-Doulat, Obaidat, and Lee (2018; see also below) suggests that the accuracy of machine-based text interpretation can be increased substantially by adding, in addition to content-based features (as extracted by means of LDA), features that pertain to style (e.g., sentence length, readability), grammar (e.g., frequencies of determinants, comparatives, superlatives), and semantics (e.g., frequencies of words of negation, belief, modality, conditionality). In sum, LDA is a data-driven approach that is good at the initial exploration of corpora but, due to substantive information loss, limited in the level of detail it can achieve. There is a family of machine learning techniques that might help us overcome these limitations. The Untapped Potential of Machine Learning The studies described above (namely, Murakami et al. 2017; Ambrosino et al. 2018; Giordan, Saint-Blancat, and Sbalchiero 2018) underutilize machines’ capabilities. They (merely) deploy LDA as a data-reduction technique—that is, a technique that, much like standard principal component analysis or factor analysis, extracts the most relevant information from datasets that are too big to be processed in their entirety. Accordingly, they do not take advantage of the human operator’s prior knowledge, knowledge that could be used to train the machine to make inferences about novel data (Gennatas et al. 2020). Techniques that do involve such training, or supervised techniques, might yield maps that are more detailed than those produced by means of LDA. To get a sense of how such techniques would work, suppose we aim to test the hypothesis, suggested by a qualitative study of Katzav and Vaesen (2017), that the Philosophical Review (one of the top generalist philosophy journals) became less pluralistic over the period 1940–1970. More specifically, we want to know whether the journal has increasingly published analytic work at the expense of pragmatist work. In a super-



Krist Vaesen

vised machine learning approach, we would use our prior disciplinary knowledge to first manually classify a representative sample of articles from Philosophical Review into the categories “Analytic philosophy” and “Pragmatism.” The interpretive work is thus done ex ante (rather than ex post as in LDA) and pertains to things that the operator is accustomed to interpreting (namely, full texts rather than topics and topic lists). In a second step, we extract from these articles all information that might enable the machine to perform similar classifications. Such classification information may include content-based features (e.g., topics established by means of LDA), as well as features relating to text style, grammar, and semantics, and even articles’ metadata. The two steps result in a training dataset that comprises example input-output pairs—with as inputs articles’ classification features and as outputs articles’ respective classification into “Analytic philosophy” or “Pragmatism.” In a third step, the machine, relying on algorithms such as support vector machines or neural networks,1 establishes a set of functions that best map the inputs to the desired outputs. Fourth, the trained machine uses these functions to classify new data, namely, all articles from the corpus that are not included in the training set. Since each article in the corpus has now been assigned a label (“Analytic philosophy” or “Pragmatism”), and each article has a time stamp, one can finally assess whether, between the 1940s and the 1970s, analytic work indeed came to dominate the pages of the Philosophical Review. Let me clarify the senses in which such a supervised machine learning (SML) approach would address the concerns raised about LDA above. To start, SML potentially suffers less from information loss than LDA does. Recall that the scope of LDA is limited by the number of topics that a human operator can process. In SML, in contrast, topics are processed by a machine. Accordingly, SML’s inferential base can easily exceed the one hundred or so topics that LDA studies are restricted to and thus is less likely to miss out on salient classification features (but see below). Additionally, to reiterate, SML can incorporate classification features that topic models disregard. Returning to the aforementioned study by Al-Doulat, Obaidat, and Lee (2018), these authors used several text analyses tools (namely, the Natural Language Processing Toolkit, the Linguistic Inquiry and Word Count, and TextSTAT) to compute, for each article (and article class) in their training dataset, corresponding indicators of grammar, semantics, and style. These indicators included, for instance, the number of nouns, verbs, proper nouns, determinants, comparatives and superlatives; frequencies of negation, belief, surprise, conditional, modal, existential, and interjection words; and word- and sentence-level complexity measures. By incorporating such information

The Potential of Supervised Machine Learning for the Study of Science

(that is, in addition to incorporating content information, as extracted by LDA) in the classification of new articles, classification accuracy increased up to 30%. Note also the sense in which SML would benefit from the operator’s knowledge about the corpus under study. In LDA, such knowledge only plays a role in the interpretation stage (namely, the ex post interpretation of topics). SML, in contrast, uses it in three stages: prior hypothesis setting, development of the training dataset, and interpretation of the results. The operator, ex ante, specifies a target and, through training, directs the machine toward it. SML thus has the potential of addressing very specific questions, including those that we have seen to be unapproachable by means of purely data-driven, exploratory tools (such as LDA). This is not to say that SML has no weaknesses or that, relatedly, it will actually realize the potential that is suggested by the above. For one, although there are few limits to the amount of classification information one can feed a machine, machines might struggle with identifying the most relevant features in input spaces that exhibit high dimensionality. Put simply, the larger the haystack, the more difficult it will be for the computer to find the needle. Further, the size of the training dataset that a machine needs in order to complete a given task successfully is proportional to the complexity of the task. In the worst case, the training dataset reaches the size of the full dataset (and the machine becomes redundant). Finally, one might have worries about the transparency of SML models. Indeed, neural networks—the type of model deployed by Al-Doulat, Obaidat, and Lee (2018)—have been said to suffer from opacity (Creel 2020; Sullivan, forthcoming). A neural network with ten layers might have thousands of connections and adjustable weights. Accordingly, it is extremely difficult to reconstruct the “reasons” for the decisions that the machine made in developing the network model; it very well might be that the trained machine’s classification accuracy is attributable to reliance on spurious factors. For instance, a machine might be able to distinguish accurately between pictures of a wolf and of a dog, not because of its ability to discriminate conspicuous features of wolves from conspicuous features of dogs (e.g., facial geometry, color) but because of its ability to discriminate snowy backgrounds (wolves) from non-snowy ones (dogs) (Ribeiro, Singh, and Guestrin 2016). Surely, modelers might rely on saliency maps to identify the factors that contributed most to classifications made by the model and thus to identify major spurious factors. Still, such maps will leave inaccessible most of the machine’s lower-level decisions (Sullivan, forthcoming).2



Krist Vaesen

This acknowledged, it is far from clear that the opacity of SML approaches is more problematic than the opacity of LDA approaches, for opacity pertains to the inaccessibility not just of models but also of its derived outputs (Sullivan, forthcoming). Now, whereas in SML it is models (but not outputs) that are difficult to understand, the reverse holds for LDA. As mentioned, LDA itself provides the modeler little guidance in interpreting the topics it constructs. The interpretability of topic models has been the subject of some recent studies (see, e.g., Arnold et al. 2016). Yet such studies have not yet been done not in the context of classifying scientific articles, let alone in a comparative setting (that is, LDA versus SML). Hence, as it regards opacity, the relative merits and shortcomings of LDA and SML are still to be established. Generally, there is little to go on to assess the above weaknesses in the context of scientific cartography and, thus, to assess how SML would perform in such a context. Although SML is not a new technique, I know of only one study that has used SML to mine scientific texts: the aforementioned study by Al-Doulat, Obaidat, and Lee (2018; for other applications of SML, see, e.g., Wang et al. 2012; Gibbons et al. 2017; the review by Hyde et al. 2019). These authors assessed the extent to which a trained machine could classify medical articles into subdomains (e.g., “Diabetes,” “Cancer,” “Neurology,” “Pediatrics”). From a corpus of 100,000 full-text and already classified medical articles, the authors first drew a sample of papers, from which they extracted both content-based classification features (by means of LDA) and stylistic classification features (features pertaining to text style, grammar, and syntax). The training data thus comprised the classification features of the papers in the sample (as inputs) and the papers’ desired classification (as outputs). A deep neural network was trained on the training dataset, and the trained machine classified the entire corpus (excluding the articles belonging to the training sample). Given that the classification of each paper in the corpus was known ex ante, Al-Doulat, Obaidat, and Lee (2018) could ex post determine the trained machine’s classification accuracy. The machine appeared to have been right in 82% of the cases. It is an open question whether similar accuracy levels can be achieved in the study of the dynamics of science. Several factors affect classification accuracy: the dimensionality of the input space, the size and quality of the training dataset, the distance between the categories that the machine is supposed to distinguish (larger distances, expressed in terms of, e.g., content, style, grammar, syntax, metadata, typically resulting in higher classification accuracy), and so forth. How these factors will play out in any given case is hard to predict. One cannot, for instance, determine the distance between “Analytic philosophy” and

The Potential of Supervised Machine Learning for the Study of Science

“Pragmatism” (see the earlier example of the Philosophical Review), let alone compare it with the distance between the categories of Al-Doulat, Obaidat, and Lee (2018), without actually carrying out the relevant analyses; to put it proverbially, the proof of the pudding is in the eating. Still, given Al-Doulat, Obaidat, and Lee’s preliminary results and given the possible gains, it would be reasonable at least to explore the use of SML in the context of scientific cartography. To what types of questions could one try to apply SML? I agree with what Malaterre, Chartier, and Pulizzotto write in their topic modeling case study (see chapter 9 in this volume): “If one wishes to identify more sophisticated . . . features of the corpora, other computational approaches [other than topic modeling] should be used.” Which “more sophisticated features” SML could identify remains to be seen. Given that SML allows one to incorporate semantic information in one’s analysis, perhaps it could help track such things as scientific (dis)agreement. A topic model might be able to indicate that, say, theories about anthropogenically induced climate change have become a major issue in climate science but will have little to say about the extent to which participants in the debate endorse or reject those theories. Potentially, incorporation in SML of frequencies of negation, belief, surprise, conditional, modal, existential, and interjection words and markers of argumentation might enable reconstructing (argumentative) shifts in the debate. Further, scientific topics might be addressed by means of different approaches. Consider the study by Malaterre, Chartier, and Pulizzotto (see chapter 9 in this volume). Their topic models suggest that the topic “science-and-values” was more prominent in the journal Philosophy of Science before than after the 1960s. Science-and-values, though, can be studied from different perspectives (e.g., pragmatism, logical empiricism, process philosophy, neo-Thomism, Marxism). Malaterre, Chartier, and Pulizzotto’s finding thus is consistent with a wide variety of lower-level processes (e.g., marginalization of all these perspectives, marginalization of a couple or only one of these perspectives). Accordingly, Malaterre, Chartier, and Pulizzotto’s study doesn’t tell us the likely actual story behind the shift in Philosophy of Science, namely that all approaches except logical empiricism were virtually banned from the pages of the journal. Given that approaches in philosophy (and, arguably, also elsewhere) typically have characteristic forms of argumentation, style, and grammar, one might hope that introducing such features (by means of SML) would increase the granularity of Malaterre, Chartier, and Pulizzotto’s reconstruction. Regarding features pertaining to argumentation, argument mining currently is a very active field of inquiry among machine learning



Krist Vaesen

scholars. Argument mining is defined as “the automated detection [usually relying on machine learning] of the argumentation structure and classification of both its component elements and their argumentative relationships” (Moens 2018, 2). The model for argument detection that is most used among argument miners was developed by the philosopher Stephen Toulmin (1958) in his book The Uses of Argument (see Lippi and Torroni 2016; Moens 2018). Toulmin suggested that a typical argument comprises six component types: claim, data, qualifier, warrant, rebuttal, and backing (Moens 2018). Recently, his model has been simplified to aid argument mining (Moens 2018); some researchers merely make a distinction between an argument’s claim and its premises (Mochales and Moens 2011), while others add a distinction between supporting and refuting elements in claims and premises (Habernal and Gurevych 2017). Argument mining has appeared to be far from straightforward. Discourse markers (e.g., “because”), for example, may have multiple meanings or simply be missing (Moens 2018). Or the premises for or against a claim might appear far apart in the text from the claim in question, so that it will be difficult for the machine to recognize that they are linked. Despite these difficulties, a recent survey of Lawrence and Reed (2020) indicates that the field actually is progressing as it comes to resolving some of the difficulties that argument miners have been running into. Finally, training a machine to perform a specific task (as in SML, but not in topic modeling) requires considerable disciplinary background knowledge (e.g., knowledge about the different ontologies underlying pragmatism and neo-Thomism, or underlying positivist and interpretivist approaches in economics and anthropology). Such solid background knowledge puts researchers in a good position to deduce very specific hypotheses (e.g., pragmatism incorporated elements of logical empiricism at the time it started to lose ground), hypotheses which SML then aims to test. As mentioned earlier, topic modeling, in contrast, tends to be data-driven: the answers it produces might, but need not, be answers to the (specific) questions the modeler has. I have tried to provide an accessible introduction to machine learning techniques in the context of scientific cartography. The use of such techniques is very recent and, at present, has been limited to one type, namely topic modeling. Topic modeling is useful for discovering broad patterns of disciplinary change, but the tool cannot deliver the level of granularity that would be needed for answering some of the deeper questions about the dynamics of science. It is to be seen whether such deeper questions can indeed be addressed by other machine learning techniques—techniques that, like

The Potential of Supervised Machine Learning for the Study of Science

SML, do more than just data reduction. In any case, there are plenty of questions to test SML on, for example, the degree to which disciplines are pluralistic, the degree to which disciplines have changed in terms other than of topics (e.g., changes in the use of rhetorical devices, changes in ontological commitments), developments over time in a discipline’s levels of consensus/disagreement, and developments over time in the extent to which a given discipline allows for or involves interdisciplinarity. I suspect, or at least hope, that we soon will come to know more about these kinds of applications of machine learning.


Chapter 7

Help with Data Management for the Novice and Experienced Alike Steve Elliott, Kate MacCord, and Jane Maienschein

With the powerful analyses they enable, digital humanities tools have captivated researchers from many different fields who want to use them to study science and its evolution. Researchers often know about the learning curves posed by these tools and overcome them by taking workshops, reading manuals, or connecting with communities associated with the tools. But a further hurdle looms: data management. Digital tools, as well as funding agencies, research communities, and academic administrators, require researchers to think carefully about how they conceptualize, manage, and store data and about what they plan to do with that data once a given project is over. The difficulty of developing strategies to address these issues can prevent new researchers from sticking with digital tools and can flummox even senior researchers. Data management is especially opaque to those from the humanities (Akers and Doty 2013). To help overcome the data management hurdle, we present five principles to help researchers, novice and experienced alike, conceptualize and plan for their data: 1. 2. 3. 4. 5. 132

Create and use a data management plan Recognize what counts as data Collect and organize data Store data and determine who can access it Share data

Help with Data Management for the Novice and Experienced Alike

We illustrate the use of those principles with two digital projects from the history of science, the Embryo Project ( and the Marine Biological Laboratory (MBL) History Project (history, both of which store data in the HPS Repository ( The Embryo Project produces a digital science outreach publication about the history of developmental biology, while the MBL History Project uses multiple types of digital media to preserve and communicate the history of science at the Marine Biological Laboratory in Woods Hole, Massachusetts. We have conducted the two projects for more than a decade, and while they are large projects involving dozens of researchers and tens of thousands of pieces of data, the principles we have gleaned from administering them apply also to projects with fewer researchers and data. Those two projects began with a few people working on relatively small sets of data, and they grew in part because of their abilities to manage data. The principles also apply beyond the digital realm, so those who collect and manage data by more traditional means will find them useful as well. The principles are broad enough that history and philosophy of science (HPS) researchers can use them to design plans for data that complement the unique features of their individual research projects. Create and Use a Data Management Plan A data management plan (DMP) is a document specific to a given research project that addresses how researchers in the project collect, organize, preserve, and share their data. There are at least three reasons why researchers construct DMPs for their projects. First, governmental funding agencies and foundations increasingly require DMPs as part of any grant proposal. In the United States, such requirements are necessary for key funders of digital and computational HPS projects, such as the National Endowment for the Humanities and the National Science Foundation, the latter of which funds such projects via programs focused on science and technology studies and on the science of science (NSF 2015; Maienschein et al. 2019). In Europe, the European Research Council also requires DMPs and publishes a template for proposal DMPs (ERC 2017). The same is quickly becoming true for funders throughout the world. Without a DMP, many projects simply won’t be eligible or competitive for funding. Second, a good DMP improves the overall quality of a research project. As researchers grapple with making DMPs, they are forced to consider and detail other practices besides the posing of interesting research questions. As researchers construct DMPs, they must address if the data



Steve Elliott, Kate MacCord, and Jane Maienschein

they plan to collect can yield answers to their research questions; if the data can be collected in specified time frames; whether and to what extent they will need protocols to collect and analyze data; and so on. Researchers improve the design and execution of their projects when they address those kinds of questions. Third, a good DMP provides institutional memory for a project. Research teams often face turnover, especially in academic settings, as undergraduate and graduate researchers, postdocs, and even primary investigators may join or leave projects from year to year. Without documents like DMPs, the institutional memory for managing data travels with individuals, not with the project. If a research team creates a DMP, they improve the reliability of their data management, and they can more efficiently and economically train new members. Even for projects conducted by sole investigators, DMPs help those investigators ensure the fidelity of data management across projects. A DMP is usually a living document. Researchers need not design optimal plans for their projects at the outset lest their projects fail. Rather, as their projects progress, researchers tinker with their plans and improve them. If researchers keep the principles in the next sections in mind, they will be able to revise their plans judiciously. DMPs vary in length depending on the types of data being collected and processed, the procedures for acquiring and storing data, and other factors. While DMPs are highly diverse in appearance, they address at least the following points: (1) roles and responsibilities for the data, (2) expected data, (3) period of data retention, (4) data format and dissemination, and (5) data storage and preservation of access. There are a number of tools available to researchers to construct DMPs, of which we recommend the DMPTool (available at This site compiles publicly shared DMPs as well as templates and best practices for many funding bodies. The further principles for data management that follow are framed in terms of DMPs, but the principles apply to data management more generally, too. Recognize What Counts as Data Those who study science often collect data. But many researchers trained in disciplines like philosophy, historiography, or social theory question whether they collect or employ data in their research (Akers and Doty 2013). Rarely, some argue, do they create spreadsheets of measurements of the world. Here, we provide some accounts of data, some general examples of kinds of data, and specific examples from the MBL History Project, indicating that data include many kinds of things collected and used by those who study science.

Help with Data Management for the Novice and Experienced Alike

There are several useful ways to think about data. In 2 C.F.R § 200.315 (2013), the US federal government defines “research data” for federal funding awards as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” Sabina Leonelli proposes two important features of data. A datum is first something “treated as potential evidence for one or more claims about phenomena,” and second, “it is possible to circulate it among individuals” (Leonelli 2015, 817). In Leonelli’s account, something may count as a datum in one research context but not in another. Something becomes a datum only once researchers relate it to specific phenomena and research aims. Importantly, its function as a datum does not depend on its original context of collection. More colloquially, researchers often treat data as anything placed in a database, especially—but not necessarily—if that database is digital in format. Under those accounts, data include many kinds of things collected and employed in nondigital studies of science. What is a banker’s box in a library archive but a database? The items in it are all data, as are copies or reproductions of them. Letters, records, manuscript drafts, newspaper clippings, diaries, receipts, photographs, government documents, and so on—all are data. More clearly, so is information collected from people or social groups: interview recordings and transcripts, ethnographic notes, survey results, and the like. Less obviously, but no less importantly, information collected via informal studies of texts—reading notebooks, marginalia and highlighted texts, annotated bibliographies, etc.—is data as well. All of those kinds of data underwrite the products traditionally crafted in studies of science, from historical narratives and interview analyses to premises of arguments. Insofar as we digitize those items, the digitized versions also count as data. Similarly, many kinds of information collected via computational tools count as data. Many tools start with corpora of texts and yield data such as word counts or frequencies, coauthor relations, citation relations, text annotations, geographic locations, and temporal frames, to name just a few. The above kinds of data underwrite analyses of networks, principal components, topics, and evolving languages or practices. In the digital realm, “data” can refer to a digital text or recording and to the information extracted from it, such as word frequencies and bibliographic data. Many digital projects use data in both senses. The MBL History Project is an example of a project that uses many kinds of data and that treats anything that goes in its database as data. The project digitizes items related to the history of the MBL, such as photographs, records of courses, and records of organisms collected or used at the campus. It also collects and digitizes interviews with MBL



Steve Elliott, Kate MacCord, and Jane Maienschein

scientists, local community members, and historians, and it has created a searchable database of all individuals associated with the courses or who have come as investigators over the past 120-plus years. Ultimately, the project uses digital tools to represent trends and changes in the laboratory’s history, telling stories with digital exhibits, which integrate short narrative encyclopedia articles with digitized items from the MBL archives and interviews with MBL community members. Once those items are stored in a digital database, they themselves become data objects. While the MBL History Project takes many kinds and iterations of things as data, those decisions may not be suitable for other projects. For a given project, the lead investigator(s) should determine the kinds and instances of data to collect and store based on the questions of the project. Exploratory projects might include many kinds and iterations of data, while more focused projects might be more selective. Collect and Organize Data When researchers plan how they collect and organize data, they accomplish at least two ends. First, they prepare to systematically collect data so as to increase the chances that those data can be used reliably to address research questions. Second, they increase the chances that others can replicate their data collection processes and results. When planning to collect data, researchers often begin with a series of lists on a DMP. First, they list the kinds of data they’ll be collecting, be those quantitative measurements, citation relations, whole text documents, survey results, interviews, or any other kinds of data mentioned earlier. They also inventory the sources of their data. For instance, if they are collecting citation data, the source might be corpora collected from JSTOR. If collecting survey data, the source might be a group of scientists at a professional conference. Next, they inventory any tools or computer programs needed to collect their data, such as Python, Zotero, special APIs (application programming interfaces), subject indexes, digital surveys, voice recorders, and archive permissions. Researchers also use DMPs to address whether they need approval from an institutional review board (IRB) or an ethics committee to collect the data. If so, they state which board, the dates of submission and approval of materials to the board or committee, and contact information for the ethics reviewer assigned to the case. If researchers must anonymize their data for institutional ethics approval, they summarize their scheme for doing so. Next, some researchers construct a roster of data collectors. These are the people who collect data, their relations to the project, the date

Help with Data Management for the Novice and Experienced Alike

ranges they worked on the project, and permanent contact information. If the project requires ethics approval for data collection, the roster also includes the dates when the collectors passed their ethics trainings and information on how to verify that training. Finally, researchers often construct at least two kinds of step-by-step protocols that ensure the reliability or fidelity of data collection across individual data collectors. The first protocol makes explicit each step of the collection process, such as locating the data source, interacting with it to pull information from it, organizing data, and storing data. The second protocol provides a procedure for tagging each chunk of data according to a naming scheme. The appropriate size chunk depends on the project, but consistent tagging ensures that researchers will not confound iterations of their own data, especially for projects with many datasets. That brings us to organizing data. Researchers aim to organize their data so as to distinguish and identify data, search data easily, and draw clear inferences from them. To achieve those ends, researchers use metadata schemes of categories to label information about data not captured by the data themselves. For instance, if the data are a set of citations extracted from a corpus of documents, then metadata might include information about how the dataset was constructed, including who collected it, when, where, using what tools, how long it took, and what kind of object or medium the data are captured in. Metadata might also include evaluations of the dataset: how complete it is, whether it was collected according to community standards or protocols, if it has known problems, who evaluated it and when. Those two kinds of metadata help researchers search data after they have been collected. Furthermore, metadata can include the categories or parameters that structure the data. Using the example from above, such categories could include article authors, article titles, journal titles, and dates associated with the articles from which each citation was drawn. In that example, the metadata are the categories that we might expect to label the columns in a spreadsheet of data, in which each row collects information for a single datum. This third kind of metadata enables researchers to make inferences from their data. Researchers should design their metadata schemes according to the specific needs of their projects and to their procedures for storing their data (see the following section). Regardless of their practices for storing data, researchers can rely on out-of-the-box and widely used metadata standards, such as Dublin Core ( We mention protocols or standard operating procedures often in this section. We encourage those who study science to think about and



Steve Elliott, Kate MacCord, and Jane Maienschein

draft protocols for collecting, tagging, and annotating data and suggest that they do so from the beginnings of their projects. As projects progress, researchers can revise their protocols in light of experience. Those protocols will help with the fidelity and reproducibility of data collection, with the reliability of inferences drawn from those data, and with the facility by which researchers can manage, search, and reuse their data. But developing protocols early in a project and iteratively revising them can save a lot of heartache later. It can also save a lot of money, as nothing eats into funding like having to, or having to pay an assistant to, organize and evaluate mountains of data after they have been collected. The MBL History Project was set up to collect and organize a variety of data types. For instance, a large portion of the project is devoted to digitizing archival materials at the Marine Biological Laboratory in Woods Hole, Massachusetts. These data, which range from photographs to institutional records to course notebooks, were digitized following extensive collaborations with archivists, with standards in excess of those set by the Library of Congress for digitization efforts, in order to ensure usability in the future. Materials from the archives were scanned using flatbed scanners set to capture 600 dpi TIFFs. These TIFFs acted as the archival master files and were uploaded to the open-access HPS Repository. Each TIFF file was converted to a smaller file—JPEG in the case of photographs and PDF in the case of documents—for ease of display and user access. These converted files were stored along with the master TIFF files, as separate bitstreams within the HPS Repository. The multiplicity of file types was designed to ensure ease of deployment across multiple use cases—from website display to publication replication. Metadata was created for each digitized item using a Dublin Core standard taxonomy, and controlled vocabularies were created by archivists for several of the Dublin Core properties at the outset of the project to ensure metadata standardization across the project. These metadata standards and controlled vocabularies are deployed for all projects that use the HPS Repository to store and organize their data. In addition to digitization, the researchers with the MBL History Project have conducted numerous video interviews with MBL scientists and community members, which are published on YouTube. The project’s principal investigator received IRB approval for these interviews, and a core set of standard questions was catalogued to facilitate interviews by multiple project researchers. Given the various kinds of data they collect, the MBL History Project and the Embryo Project collaborated on a metadata manual. This manual is specific to the standards set by the Dublin Core Metadata

Help with Data Management for the Novice and Experienced Alike

Initiative, which both projects use. The projects use it to train people to understand and code metadata for the various kinds of data stored in the HPS Repository. We encourage others to use the manual as a template to develop manuals specific to their own projects (DHPS Consortium 2013). Store Data and Determine Who Can Access It Researchers who manage their data well must decide how they will store and preserve those data. Three of the most important issues are about who can access stored data, where to store it, and for how long. When determining who can access stored data, researchers must consider at least people in their research team and researchers outside of their team. Many researchers assume any person anywhere should have access to all of their data, from raw data to cleaned data. But there are often good reasons for circumscribing access. A lead researcher may want those who are analyzing data to have access to cleaned and anonymized data, preferring a more restricted set of access permissions. For instance, the lead researcher may want to prevent novice or student analysts from accidentally destroying raw datasets or from seeing the names of people who may have provided confidential information. For data that has been anonymized, the researcher must decide who has access to the key that identifies actual names with anonymized names. For help determining these permissions and making them explicit, the researcher can rely on a team roster and on ethics review board approvals, as discussed earlier. Outside of their teams, researchers must determine if they want to share their data with researchers more generally. Sharing data helps ensure that others can replicate results and that data have use outside of the contexts in which researchers collected them. On the other hand, if researchers plan to share their data, it may limit their ability to collect confidential information. We discuss shared data further below. Once they’ve determined who can access their data, researchers can choose where to store them. Those working with digital data generally store their data on a computer, either their own or in cloud storage. If using their own hardware, researchers should specify which machines will be used and where on the machines the data will live and provide a directory structure to organize multiple files. Cloud storage includes things like encrypted university servers, Dropbox, Google Drive, Amazon storage, and data repositories. If using cloud storage, researchers should specify which service, methods of access, and directory structure. We discuss community repositories in the next section. Many researchers aim to keep at least two copies of their data in two distinct locations. For instance, many store data on their local hardware



Steve Elliott, Kate MacCord, and Jane Maienschein

but also back it up on a cloud service. For the Embryo Project, we store (and work on) all of our data in a secure university Google Drive shared among team members, but we archive everything on the Digital HPS community repository. We never discard or alter the raw data in case we must return to it. Time is often a difficult issue for data management. Some researchers outline at least a five-year plan for the life of their data, but many ignore temporal aspects altogether. When considering time, researchers should specify the period for which they will store data, what is to be done with the data once the project ends, how often to transfer the data from extant storage media to new storage media, and what others should do with the data if the primary researchers all leave the profession for some reason. Share Data When appropriate to their research projects, we encourage researchers to publish their data or to use digital data repositories. These repositories include community repositories like the PhilSci Archive (philsci-archive, ECHO (, GitHub (github .com), Dryad (, and our own digital HPS Repository (; institutional repositories like those for Stanford (, MIT (, and Arizona State (repository; and data journals including Scientific Data. Using data repositories can benefit researchers in several ways. It can decrease the number of decisions researchers must make when managing data. Data repositories provide a metadata scheme to store data, they preserve data on their own servers often with no termination date, and they have specialists who curate the data. Furthermore, by depositing data in repositories, researchers may get credit for sharing or publishing their data. This also enables others to replicate their analyses and results, making for stronger empirical claims resulting from quantitative and qualitative analyses (Freese and Peterson 2017). Repositories also benefit research communities, enabling more researchers to have more data, dedicating people to evaluate the quality of different datasets, and enabling researchers to address increasingly complex questions. There is also evidence that researchers in other fields use published data to identify potential collaborators and develop new projects, a practice that could lead to more collaborative projects in HPS (Pasquetto, Borgman, and Wofford 2019). While publishing data provides potential benefits, it also raises practical and ethical considerations. One practical issue is that, for data to be reusable, they must be formatted in ways that enable such reuse.

Help with Data Management for the Novice and Experienced Alike

Researchers are unlikely to reuse published data if they don’t trust their provenance or cannot computationally process them (Pasquetto, Borgman, and Wofford 2019). Some in data science have developed broad principles to suggest that published data be findable, accessible, interoperable, and reusable (FAIR) (Wilkinson et al. 2016), but it remains an open task for those in HPS to discuss and develop community principles for publishing data. A related issue is that it takes time to prepare datasets for publication: many data publishers request the dataset, any protocols, codes, or scripts used to analyze the data, and a README document that provides instructions for the previous items. The time to create these can eat into research time (Tenopir et al. 2015). Another practical issue is that researchers are still developing norms by which to acknowledge the use of published data, with the practice of citing such data slow to catch on (Stuart 2017). Published data have a range of uses, including in replication studies, in meta-analyses, for novel research questions, to train people, and to calibrate instruments and algorithms. One open task is to develop research norms and concrete practices by which to acknowledge such uses so that they can factor into professional rewards and motivate the outlay of effort and time used to publish data. There are also ethical considerations. First are considerations like privacy and autonomy owed to people represented by data. These considerations are especially relevant to researchers who record and analyze interview or biomedical data, and we encourage digital HPS scholars who find themselves working with such data to refer to relevant literature for best ethical practices for publishing (e.g., Mittelstadt and Floridi 2016; Zook et al. 2017; Antes et al. 2018). Second, there are ethical relations that hold between those who collect and deposit data, repository curators, downstream users, and the public at large (Johnson and Bullock 2009). For instance, most acknowledge that if primary data collectors plan to publish data, they should disclose those plans to anyone providing permissions to collect the data in the first place. But many argue that the text of such disclosures should accompany published data. That way curators and downstream users can determine if the data can be archived and reused in good faith. Finally, we note that the costs and benefits of publishing and using shared data are not the same for scholars in different parts of the world. There is substantial variation across geographical regions about best practices and desirability for publishing and reusing data (Tenopir et al. 2015). Researchers in low- and middle-income countries (LMICs) often face a range of overlapping obstacles that make publishing data difficult and often undesirable (Rappert and Bezuidenhout 2016; Bezuidenhout et al. 2017). These obstacles include weighing the best uses



Steve Elliott, Kate MacCord, and Jane Maienschein

of limited access to high-speed internet, lack of sufficient equipment or training to access and use repositories, using personal funds to produce raw data, and the need to guard against data vultures during the span of projects. Furthermore, researchers in LMICs report often spotty access to and training for software required to digitize, store, and analyze data (Vermeir et al. 2018). This is true also for free and open-access software, which these researchers report they are highly interested to learn and develop. So even if data repositories operate on open-access software and make published data freely available, it doesn’t necessarily follow that researchers in LMICs can usefully interact with those repositories. We suggest that these practical and ethical considerations about shared open data provide research topics that HPS scholars are particularly well positioned to address, especially with recent interest in how values and community norms influence science (Douglas 2016). First, many HPS scholars analyze the kinds and quality of scientific knowledge, especially when produced with the aid of novel technologies, of which data repositories are an example. As a result, HPS scholars can help articulate epistemic assumptions and consequences implicit within proposed open-data principles, such as the FAIR principles. HPS scholars can help show how, and according to what arguments, such principles produce better knowledge. Data are not simply good or bad, FAIR or not; they are so in relation to often implicit research aims. HPS scholars can help show the extent to which different sets of principles endorse some aims, technologies, objects of study, and research questions over others. Second, many HPS scholars analyze research ethics in contexts of either small or big data. There is an opportunity to articulate the ethical relations that do or should hold among different members of a research team and among folks who work at different stages of the data publishing workflow, including data depositors, curators, and downstream users. Similarly, an opportunity exists to articulate the ethical relations that do or should hold among researchers who share a field or discipline but live across regions that are vastly dissimilar politically and economically. Researchers from wealthy nations in North America and Europe are much more likely than their peers in LMICs to control the infrastructure and governance of tools like open-data repositories (Kindling et al. 2017). To what extent does such control contribute to or exacerbate inequities among researchers? For researchers from wealthy nations who build and govern data repositories, what obligations should they owe their peers in low- and middle-income nations? How should these questions be addressed within HPS? These are important questions.

Help with Data Management for the Novice and Experienced Alike

Further Resources We close with brief notes about finance and further resources. Issues of finance pervade all aspects of data management. For each data management plan, we recommend that researchers develop a budget that anticipates and records annual costs for all of the activities planned. Budgeting helps especially when applying for grants, and it helps researchers trim potentially unnecessary and expensive practices from their research designs. Researchers should use further resources when preparing for data management, especially as they develop larger projects. Two of us (MacCord and Maienschein) were part of an NSF panel that produced an open-access report on data management plans for those who study science (NSF 2015; Maienschein et al. 2019). We also recommend the web application DMPTool, which helps researchers construct simple DMPs. The site also shares many examples of DMPs. From other disciplines, helpful reports include McLellan-Lemal (2008), Goodman et al. (2014), and Michener (2015). For metadata we suggest using the Digital HPS Metadata Manual as a template for working with Dublin Core standards (DHPS Consortium 2013). While data management has long been a focus of librarians, two books aim specifically at researchers (Corti et al. 2014; Briney 2015). A few organizations worth watching include the Data Curation Centre (, Research Data Alliance (rd-alliance .org), and the Digital HPS Consortium (


Part III Case Studies

Chapter 8

How Not to Fight about Theory The Debate between Biometry and Mendelism in Nature, 1890–1915 Charles H. Pence

Thomas Kuhn’s (1962) massively influential Structure of Scientific Revolutions proposed a pattern for the development of a scientific theory now intimately familiar to philosophers and historians of science: After the initial struggles to formulate a unified theory, a paradigm emerges in an area of study for the first time. This paradigm consists not just of theoretical content but also of metaphysical and epistemological commitments, textbooks and patterns of training, and a set of problems that, it is expected, the paradigm will be able to solve, along with the kinds of solutions for those problems that would be deemed acceptable. The solving of those problems constitutes the everyday work of most scientists, Kuhn’s normal science. Should one of those problems persistently resist solution (or dissolution), a field might enter a state of crisis, where active replacements for the dominant paradigm are investigated and their future prospects evaluated. If a new paradigm is found to outperform the ruling orthodoxy, we have a revolution, and the new worldview becomes dominant in its place.1 This is the standard picture that we present to our undergraduates. Despite its nearly trite status, however, it still provides us with an interesting point of departure for analysis of any instance of theory change. For one, Kuhn leaves the period of crisis tantalizingly underdeveloped. Why do crises really emerge? How do they disappear? What are the varying contributions of social factors like persuasion, empirical results provided by “nature,” and novel theoretical developments? What of the 147


Charles H. Pence

bogeyman of Kuhn’s “incommensurability,” the idea that scientists operating in different paradigms in some sense “practice their trades in different worlds” (Kuhn 1962, 150), necessarily talking past one another’s internalized ways not just of interpreting but of seeing scientific data? Here, it seems, is a place where the intervention of digital methods might genuinely be able to shed light on the subject. For crises do not only take place in the heads of scientists, in their notebooks, or in the experiments performed in their laboratories—they are also played out in the literature that these scientists produce. Especially since the middle of the nineteenth century (with the proliferation of frequently published scientific magazines like Science, Nature, or The American Naturalist), the journal literature is perhaps even the primary venue in which such debates have occurred. We should thus be able to explore the signature of crises as they happen in these publications, and precisely such an exploration is my aim in this chapter. I examine a particularly divisive and public debate in the late nineteenth-century study of inheritance, a significant portion of which took place in the correspondence pages of Nature, and draw some speculative conclusions for the structure of scientific communities during theoretical crises. By analyzing the network of author mentions in Nature in this period, we reveal new and interesting relationships between the relevant players—different from both the basic networks of paradigm membership and the more complex networks of training and education—that support (albeit tentatively) some conclusions about how community members react to sustained theoretical disagreement. Selecting a Case Study Before I lay out the history of the episode I’ll be analyzing, however, I should offer a few words about why I selected it. First, we have a standard, “potted” history of the crisis—limited and problematic though it may be (more about this later)—offered by William Provine’s (1971) often-cited book on the history of population genetics, as well as a detailed sociological account, thanks to Kyung-Man Kim (1994). Second, the debate takes place over a fairly delimited time period (roughly 1890–1910), and a significant part of it occurs in a single journal, Nature.2 Finally, that time frame lies fully in the public domain, somewhat ameliorating copyright concerns related to using the full text of the articles. Each of these features makes a digital-humanities analysis of this case significantly easier, more likely to yield fruitful results, and better contextualized.

How Not to Fight about Theory

The Biometry-Mendelism Debate The publication of Charles Darwin’s (1859) On the Origin of Species did not precipitate the kind of immediate, wholesale conversion to Darwin’s worldview which that work’s lofty stature in the contemporary scientific canon might suggest. The Origin was, to be sure, amazingly successful and quickly convinced the vast majority of working naturalists of the truth of one of Darwin’s two main points—common descent, the claim that all life is descended, in a treelike structure, from some small number of distant ancestors. But this was only part of Darwin’s argument (for more on the development of Darwin’s views, see Murdock et al. 2017). He also offered a mechanism for diversification and the eventual production of novel species—natural selection—and it was much less accepted in the short term. In a period often, though problematically, called the “eclipse of Darwinism” (Huxley 1942; Bowler 1992; but see Largent 2009), the fate of selection waxed and (mostly) waned, as it confronted a variety of interesting criticisms, until the development of early population genetics in the 1920s and 1930s kicked off the Modern Synthesis that still grounds much of evolutionary biology today.3 One of the most significant issues that plagued natural selection was this: it seemed to many impossible to believe that selection was powerful enough to produce new species, in the absence of an understanding of the mechanism of variation. Only when we know how variations are generated within organisms and then passed from parents to their offspring can we go on to learn how selection could bias that transmission to cause speciation. And this approach, in turn, only works on the assumption of Darwin’s hypothesis that speciation would be the result of gradual, piecemeal changes from one generation to another— some scientists, such as William Bateson, remained convinced that these small, gradual variations would never suffice for producing species-level differences. Blending inheritance provides one example of the kind of difficulties that could, at least potentially, arise within the process of heredity. If all characters of offspring are merely blends of the characters of their parents, for any favorable variation to “stick” in a population requires that some mechanism prevent it from being blended back into, and thus swamped by, the ancestral character. Darwin was, to be sure, aware of the trouble for his theory raised by blending inheritance (Vorzimmer 1963).4 As a result, Darwin appeals to geographic isolation, the power of the struggle for existence (Vorzimmer 1963; Gould 1985), and machinery from his (long-held) theory of pangenesis (Hodge 1985); he



Charles H. Pence

also increases the bulk amount of variation available to natural selection (Depew and Weber 1995, 196). But the general problem festered. It is important to note that this plays into a preoccupation across the biological sciences of this period with a number of different “problems of variation.” Beyond the questions of blending and mere quantity of variation, this era also saw the debates over August Weismann’s theory of the germ plasm, the role of chromosomes as possible bearers of variations, Herbert Spencer’s support for the necessity of the directedness of organic variation to the explanation of complex adaptations, and debate over the role of the environment in generating variation, among others. Worries about variation were therefore “in the air” in the period and intimately tied to the nature of selection and whether it sufficed to explain the features of the natural world (see Pearce [2014, 18–20] and Beatty [2016] for excellent surveys of the broader landscape). One of the biologists most keen on solving the twin problems of variation and selection was the aforementioned William Bateson, who as early as the 1890s was already, as Radick (2012, 718) has put it, “well on his way to developing a vigorously dissenting saltationist perspective on evolution,” aiming to solve the question by finding evidence of large, discontinuous mutations responsible for speciation. Bateson published his Materials for the Study of Variation in 1894 (the subtitle of which, not often quoted, is Treated with Especial Regard to Discontinuity in the Origin of Species), calling on those interested in Darwinism to return to the descriptive work of a morphological understanding of variation itself: “As the first step towards the systematic study of Variation we need a compact catalogue of the known facts, a list which shall contain as far as possible all cases of Variation observed. To carry out such a project in any completeness may be impossible; but were the plan to find favour, there is I think no reason why in time a considerable approach to completeness should not be made” (Bateson 1894, vi). Meanwhile, as Bateson rallied a group of researchers to his cause, the biologist W.F.R. Weldon began pursuing statistical research into natural selection under the tutelage of Francis Galton (Weldon 1890). This work (and a fortuitous move from Cambridge to University College London) brought him into contact with the statistician Karl Pearson, who had long been interested in biological problems, both for their own sake (as an aid to eugenics) and as a potentially fruitful field of application for statistical methods (Pearson 1892). These two men would form a fast friendship and productive partnership that would last some twenty-five years, until Weldon’s untimely death from pneumonia at the age of forty-six (Pearson 1906). In the interim, they would form the antithesis to Bateson’s research program. Known as the “biometricians” or the “biometrical

How Not to Fight about Theory

school,” they were staunch Darwinian gradualists, committed to the use of statistical methods to demonstrate the ability of natural selection to generate new species from small, individual variations. Relations between the sides began to sour after Weldon (who had been Bateson’s mentor at Cambridge) published a harshly negative review of Bateson’s Materials (Weldon 1894). As the debate became progressively more heated, the work of Gregor Mendel (1866) was “rediscovered” in 1900 (published as Druery and Bateson 1901). While the biometricians initially believed this might provide an interesting statistical case study for their cause, Bateson and his allies (soon to be known as the Mendelians) saw in Mendel’s peas precisely the sort of discontinuous variation for which they had long been searching. For a variety of reasons—commonly cited are, at least, an increase in the amount of Mendelian data available (Cock and Forsdyke 2008), a well-run popular campaign of lectures by Bateson (Radick 2012), and the death of Weldon (see Vicedo [1995] for a helpful review)—the Mendelians had carried the day as of around 1906 (Sloan 2000). The combination of statistics with genetics would await the work of biologists like Fisher, Haldane, and Wright during the development of the Modern Synthesis (Provine 1971). So much for the standard historical tale we tell about this period. As can be seen, it focuses on key players (Bateson, Weldon, Pearson, Mendel), and it also attempts to explain the start of a crisis across an entire subdiscipline in terms of a small number of “key events,” without much detail as to how those events precisely influenced the actors involved (e.g., it ignores the kinds of fine-grained details studied in Darbishire’s case by Ankeny [2000] or for Yule by Tabery [2004]). Finally, while the fact of the debate’s conclusion by 1910 is acknowledged by all involved, there is a bewildering lack of clarity about why it might have ended. More detail would be enlightening. To offer a first expansion of our lens, we turn to the work of KyungMan Kim (1994), whose book Explaining Scientific Consensus: The Case of Mendelian Genetics extends the network of players to include a variety of what he calls “paradigm articulators.” These are writers who “articulated the still inchoate paradigms” of biometry and Mendelism “by extending and elaborating the theory” but who did not, in general, “evaluate their mentor’s theory” (Kim 1994, 35). Importantly, five of the central paradigm articulators, Kim argues, “converted” from biometry to Mendelism between 1903 and 1910. These biologists—A.D. Darbishire, Edgar Schuster, George Udny Yule, Raymond Pearl, and George Shull—were instrumental in producing the consensus around Mendelism, as they brought a significant shift in resources and interest (momentum, one might say) toward the Mendelian side.



Charles H. Pence

8.1. The network of training and influence in the biometry-Mendelism debate, between 1900 and 1910. Redrawn after figure 2 of Kim (1994). Arrows indicate mentorship, lines indicate collaboration. The five paradigm articulators whose “conversion” Kim singles out are in boldface.

I lack the space here to evaluate Kim’s claim about the influence of paradigm articulators on its own merits.5 But what makes Kim so interesting for my purposes is the expansion in scope: rather than a narrow view only of Pearson, Bateson, and occasionally Weldon, Kim turns our attention to structures of education, training, and theory transmission as they actually appear on the ground (the full network of actors considered by Kim in detail is found in figure 8.1). It is precisely this impulse toward casting a wider, more comprehensive net that underlies my turn toward digital methods. To really understand the nature of a scientific community during a period of crisis, it clearly won’t suffice to consider only the “elite” players. Building a Digital Analysis If, as the traditional history has it, the biometry-Mendelism debate is a crisis in the nascent field of genetics, and if, as Kim’s sociological story has it, it involves a wide array of players from across the field, then we should be able to detect its signature in the journal literature of the period. As I mentioned, a number of important contributions to the discussion occur in the pages of Nature.6 The first step in analyzing the crisis digitally, then, was to build a corpus of relevant articles, using evoText (Ramsey and Pence 2016; now known as Sciveyor), an analysis platform that includes the entire print run of Nature. I began by compiling a

How Not to Fight about Theory

list of articles authored by central individuals in Kim’s network (figure 8.1)—these included Bateson, Pearson, Weldon, the five “converted” paradigm articulators, and Wilhelm Johannsen (who is also central to Kim’s analysis).7 This resulted in a “seed set” of 144 articles. To expand this set, I used the Named Entity Recognizer (NER) provided by the Stanford Natural Language Processing (NLP) project (Manning et al. 2014) to construct a list of every proper name referenced in any of the articles within the seed set. This roster was manually culled to include only biologists (taken broadly to include authors of natural histories, explorers, etc.). This increased the number of biologists to ninety-eight, of whom fifty-two had published in Nature.8 These authors, in turn, published 1,622 articles in that journal between 1872 and 1940. This constituted the dataset analyzed in the following.9 Citations, Mentions, and the Network of Discourse

If the goal is to offer a new lens into community structure, then a classic methodology would be to utilize citation networks, an approach now more than fifty years old (Price 1965). Networks of collaboration (or even of content or topic) can be explored by noting which documents have references in common (bibliographic coupling) or which documents are often cited at the same time (co-citation). Unfortunately, this methodology is not applicable to the articles in the biometry-Mendelism dataset. Citation practices had yet to be standardized in Nature in the late nineteenth century. To take one example, a letter from J.T. Cunningham to Nature on July 30, 1896, refers to a prior letter from July 16 as follows: “It appears to me that Prof. Weldon’s argument, referred to in Nature of July 16 (p. 245), is accurately represented in the following illustration” (Cunningham 1896). What’s more, the letter to which Cunningham is referring is not even authored by Weldon himself—it is a recounting of one of Weldon’s arguments by E. Ray Lankester (1896). It is clear that no automated system would be able to extract the actual citations as they occur, and in many cases it isn’t clear that the modern notion of “citation” is even applicable. We thus need some way to determine how authors are related to one another without invoking citations. The simplest such method, as it turns out, works perfectly well for our purposes. Rather than looking for formal citation, we simply search each of the articles in the dataset and determine when each mentions the name of one of the other authors in our set. For example, if an article by Pearson mentions Punnett (or vice versa), then we add a connection between Pearson and Punnett (or increase the weight of that connection if it already exists). This produces a network that I have dubbed the network of discourse. While less precise



Charles H. Pence

8.2. The network of discourse for the entire 1,622-article dataset. Colors of nodes indicate membership in one of three modularity classes, size of nodes indicates number of occurrences in the corpus, and thickness of edges indicates number of connections between nodes. Bateson is the largest solid-black node, at upper right. Weldon is the small black node to its lower left. Pearson is the largest white node, bottom center. Note that this “hairball” network is extremely difficult to interpret. Network visualizations throughout created using Gephi (Bastian, Heymann, and Jacomy 2009).

than a citation network, as it turns out, the network of discourse is still powerful enough to draw interesting conclusions. The network of discourse for the entire debate may be found in figure 8.2. In general, this network is too dense and too interconnected to offer us any significant information. One feature, however, stands out. A number of methods are available to differentiate clustering in the graph, most common among them the modularity statistic. Modularity measures the distinctness of communities within a network by determining whether the number of edges that fall into a hypothesized set

How Not to Fight about Theory

of groups is greater or less than that expected if edges were distributed according to chance. Algorithms for detecting modularity traditionally accept a parameter that lets their level of “grain” be selected by the user, and they will then attempt to break the network up into clusters of the requested size (Blondel et al. 2008; for more on modularity detection in general, see Rivelli 2019). Robust modularity (i.e., a pattern of groups that remains roughly the same through a range of values of the finevs. coarse-grain parameter) indicates genuine community structure or clustering within the network—that is, it indicates that the nodes of the network really do separate into subgroups and that these groups are not simply random results of the algorithm. In the full network as shown in figure 8.2, we indeed get robust clustering, into more or less three groups. Most interestingly, we see that the network here does not simply recapitulate the network of paradigm membership (clusters do not simply sort biometricians from Mendelians), nor the network of training as identified by Kim. In fact, Bateson and Weldon fall into the same cluster, and a different cluster from Pearson—the first signal of the controversy in the literature as found in the data. The Network of Discourse over Time

To amplify the signal, and to capture the ebb and flow of the debate over the period 1890–1910, I segmented the network by date, visualizing the networks consisting only of articles published between 1885 and 1889, 1890 and 1894, 1895 and 1899, 1900 and 1904, 1905 and 1909, and 1910 and 1914. These five-year windows were selected so that the first window occurs entirely before the debate between biometricians and Mendelians breaks out (Bateson’s Materials is not published until 1894, and Weldon’s first statistical work dates from 1890) and the last window occurs entirely after the debate has cooled (Weldon’s death is in 1906, and Kim’s analysis has it that the last of the paradigm articulators to “convert,” Raymond Pearl, does so around 1909). Let’s begin by looking at the topology of the network over these time slices and then turn to what this case study might tell us more broadly. When we analyze this network in detail (in particular, attempting to detect communities within each time slice of the network), we find a variety of interesting features. Prior to 1884 and after 1910, we see no robust clustering—that is, the clusters that the modularity algorithm provides change dramatically as the coarse- versus fine-grain parameter is adjusted, indicating that the clusters produced by the algorithm are merely artifacts. The network has a very standard center-periphery structure, with the center occupied by the expected players—prolific writers from the correspondence pages such as E. Ray Lankester (Lester 1995),



Charles H. Pence

8.3. The network of discourse in Nature, from 1895 to 1899. The following

nodes are labelled: W.F.R. Weldon: 1; E. Ray Lankester: 2; William Bateson: 3; J.T. Cunningham: 4; Karl Pearson: 5; W.T. Thiselton-Dyer: 6. Node colors indicate membership in one of two modularity classes, node sizes indicate number of appearances in the corpus, and thickness of edges indicates strength of connection between nodes.

Thomas Henry Huxley, and George J. Romanes. As the controversy really begins to heat up, between 1895 and 1899, the impact on the network of discourse becomes more significant (figure 8.3). The most commonly appearing names, and the strongest links, are precisely those between the authors engaged in this debate. Right at the heart of the conversation we find Weldon (1), Pearson (5), and Bateson (3), along with Lankester, Cunningham, and W.T. Thiselton-Dyer, all of whom had been exchanging salvos of letters and articles on the controversy. Only a few other edges in the network even come close to producing the volume of crossmentions that these figures do. This signal of the debate itself thus dominates other discussions about and among these biologists in this period. When we move just five years forward, however, to the period between 1900 and 1904, the network shifts dramatically (figure 8.4). Pearson (5), Lankester (2), Cunningham (4), and Thiselton-Dyer (6) no

How Not to Fight about Theory

8.4. The network of discourse in Nature, from 1900 to 1904. Node numbers

are the same as in figure 8.3. Node colors indicate membership in one of four membership classes, with the two smallest classes collapsed into gray. Node sizes indicate number of appearances in the corpus, and thickness of edges indicates strength of connection between nodes.

longer even appear in the same cluster as Bateson (3) and Weldon (1). While the other authors here seem to have moved on to other concerns, Bateson and Weldon have, essentially, become consumed by this debate. Not only are they talking only to and about one another, but few other members of the network mention their works at all during these five years. The network visualization itself is even reminiscent of Weldon and Bateson standing in a corner, arguing with one another, being primarily ignored by the rest of the biologists publishing in Nature. After 1904, we see a brief signal of the heavy collaboration between Pearson and Raymond Pearl between 1905 and 1909, and the network returns to the structure it had before the debate took off in the first place. Local and Global I want to offer conclusions at two levels: first, for what this analysis can offer to scholars of the debate between biometry and Mendelism, and



Charles H. Pence

second, for what we may be able to infer about scientific communities and Kuhnian crises. Biometry and Mendelism

First and foremost, we can see from these results that the method used here, focusing on networks of discourse rather than citation networks, does in fact offer us reasonable results in the case of the biometryMendelism debate in Nature. This is an important finding in and of itself, as it was entirely unclear prior to running the analysis whether mere mention would suffice for indicating significant connection between authors in the debate. As we have seen, however, the network of discourse does indeed separate clusters of those individuals contributing to the biometry-Mendelism debate from those not. I anticipate that this method will be useful in a variety of other contexts where citation data are unavailable or prohibitive to extract. Network analysis without citations is thus a promising area for future research. Further, in this case, the network of discourse does not simply recapitulate features of the community of which we were already well aware. For example, it would be only of trivial interest if the network of discourse simply sorted the biometricians into one cluster and the Mendelians into another, or if it looked precisely like Kim’s network of training and collaboration (indicating that collaborators, mentors, and mentees made references to one another far more often than to others). Rather, the network of discourse seems to give us a novel way of approaching the structure of this debate, one that both carries features of intrinsic interest and inspires future research questions. For example, the mobility of Pearson in the network throughout this controversy is notable. The idea that Weldon and Pearson were importantly different figures with different philosophical backgrounds and biological commitments has recently gathered some steam (Radick 2005; Pence 2011), and the role of Pearson in Weldon’s extended disagreement with Bateson would be a fruitful locus to explore this question further. Similarly, in the network between 1890 and 1894, a tight cluster of authors forms that includes Romanes, Lankester, Spencer, and Weismann. When I initially generated the network, I was unable to account for this clustering, but later study points to a debate over Weismann’s results occurring precisely during this period.10 While I lack the space to pursue it here, we see yet again the digital analysis serving as a way to generate interesting research questions. Had there not been a ready explanation for this second set of clusters at hand, the digital methods would themselves have generated a novel object for historical investigation.

How Not to Fight about Theory

Of course, this analysis is not without its problems. Most significantly, the work so far only analyzes Nature, which at this point is still more or less centered in the United Kingdom (for more on the history of the journal, see Baldwin 2015). This harms the analysis in a variety of ways. Perhaps the most significant is that an entire school around Charles B. Davenport, operating in the United States (and hence less likely to publish in Nature), is nearly invisible in this analysis. A significant number of the authors identified in Kim’s work are thus nowhere to be found here. Further, in the middle of this period, the biometricians launch their own journal, Biometrika—the first issue is published in 1901 (Weldon, Pearson, and Davenport 1901)—as a response to perceived difficulties publishing in the journals of the Royal Society, which the biometricians believe are being taken over by Bateson and his allies (Pearson 1906, 34–35). Integrating an analysis of Biometrika would present a variety of technical challenges. It is not immediately clear how the analysis here could be expanded to include multiple journals. For one, the fact that the rate of publication in Nature is relatively constant over this period means that no “normalization” was required to control for the number of papers published in each of the time slices. It is likely that the entire publication output of Biometrika would be much smaller, and thus some sort of weighting would be needed for the data from Nature not to completely overwhelm that from Biometrika. If, on the other hand, a separate network was created just for the Biometrika contributions, we run into the problem of how to compare and draw conclusions from the various networks that result. Again, this is a promising area for future work. I should also here consider the place of these results in a broader picture of the history of the biometry-Mendelism debate. I have noted already that these results offer us an interestingly different perspective from either that of the standard history or Kim’s sociological take. Such perspectives, then, are always at least somewhat intrinsically valuable. But does this view tell us anything original about the history of the case? And to the extent that it does, is it useful? It would take much more space than I have here to make the case in full, but I believe that we do find a facet of the controversy that is genuinely novel in this analysis, and one that aligns with what we know to have been going on in the same period in the other work of figures like Weldon, Bateson, and Pearson. To perhaps overextend a military metaphor, the biometry-Mendelism controversy was fought and won (or lost) on a number of fronts. These biologists were engaged in, at the very least, theoretical work within their own traditions aimed at expanding and solidifying their accounts of the biological world, debates within



Charles H. Pence

their own traditions about the most effective ways to move forward, debates between the two approaches of biometry and Mendelism, and various kinds of positioning with respect both to other fields of science and to the public. The analysis of the correspondence pages of Nature offers insight into the campaign on at least two of these many fronts. Short articles in Nature in this period, as Melinda Baldwin has detailed, served as a fast-moving, discursive outlet for an entire generation of young British men of science (Baldwin 2015, 63–67). Weldon himself notes its peculiar place in correspondence with Pearson over the future direction of their journal, Biometrika. He laments that, despite the fact that “people now want very technical journals, plus Rudyard Kipling and the evening papers,” nonetheless “Nature survives, somehow. Do you think you would be proud to run another such?” (Weldon 1901a). Since Nature is not technical enough to publish “real” results, nor as loose as the “evening papers,” discussion there must “be more or less gaseous, and quote a lot of details,—otherwise one will have to fight the quotations out, letter by letter & comma by comma” (Weldon 1901b). Why, then, continue to publish there? First, such contributions served the important function of positioning projects with respect to the broader scientific community as a whole, an angle on the biometry-Mendelism debate that is not often enough considered by our more internalist approaches to the issue and the period. Second, the peculiar role of Nature meant that it would serve as one of the only possible places for the inter-camp debate between the biometricians and the Mendelians to occur. The technical work of both sides—though clearly often written with an attack on the opposing camp in mind—was too focused to really permit this kind of discussion, nor was such discussion suitable enough for the general populace to make the evening papers. We should thus expect, I think, that Nature would be a place where debates like these are overrepresented.11 Scientific Communities in Crisis

If we take the biometry-Mendelism case as an example of a community in the middle of a Kuhnian crisis, what lessons might we be able to draw about scientific theory change as a whole? First and most obviously, we should be hesitant to think that the structure of such communities will have many common features, or even many temporally stable features, during the course of the crisis. Kuhn, for his part, describes crises as particularly all-consuming and significant. They constitute, he says, “a period of pronounced professional insecurity” (1962, 67–68) or occur as a result of the “breakdown of the normal technical puzzle-solving activity” (1962, 69). The data support this picture in part—this is pre-

How Not to Fight about Theory

cisely what we see from 1895 to 1899 (figure 8.3), with the crisis figures forming a well-connected network at the center of biological discussion in one of the most significant journals in the field. But we see precisely the opposite of this from 1900 to 1904. The crisis, which is taken by many historians to remain the central concern in the field in this later period, only finds itself on the fringe of the network of discourse. Normal science proceeds apace for the majority of the researchers I have analyzed here. Before I continue, I should note that one might simply take these network data as evidence that, despite the emphasis placed upon this episode by historians like Provine, there quite simply was no crisis in the field at this point. And there are good reasons to think that some kind of more complex story is at work. As already mentioned, Vicedo has persuasively argued that we should be skeptical of “forcing many historical actors into the monolithic and static categories of Biometry and Mendelism” (1995, 374).12 The unpacking of the biometry-Mendelism controversy into a variety of separate, independent research debates, as performed for example by Olby (1989), could also be read as casting doubt on the idea that there is a biometry-Mendelism debate, as opposed to a cluster of related debates. On the one hand, I am sympathetic to the idea that we have likely oversimplified the kind of debates occurring in this period. Weldon’s archival materials, for instance, indicate a wide array of concerns with the life sciences of his day, including chromosome theory, agricultural breeding, statistics, probability theory, measurement, and experiment. On the other hand, the data from 1895 to 1899 are fairly compelling: there was a controversy here, significant enough to dominate the literature for several years. It resulted in the redistribution of academic resources, spawned the founding of journals, and altered the creation and dispensation of professorships. By the final time period (and even to some extent from 1905), there is no detectable signal that the discussion is continuing in the literature. The debate thus, as a matter of empirical fact, disappeared. There is therefore a perfectly coherent way in which to read the data that support the existence of some form of crisis—though perhaps only a crisis on a very small scale13 —and resolution. Returning to the broader morals that we might draw from this case, another notable feature is the ability (or lack thereof) of scientists involved in the crisis to move back and forth between debates concerning the crisis itself and the work of “normal science,” which continues to move forward during the period. To take just a few examples from the case study, Pearson seems to be particularly adept at moving into and out of the crisis debate. At least as measured by clustering in the network



Charles H. Pence

of discourse, he is involved in 1895–1899, not involved from 1900 to 1904, and involved again from 1905 to 1909. This forms a stark contrast with Weldon, who from 1890 onward can, it seems, do very little in the pages of Nature but contest Bateson. Bateson lies in the middle, involved in the debate for around fifteen years but recognizing his victory after Weldon’s death. The explanation for these responses, on the other hand, is less obvious and another place for further fruitful work. Some of the relevant factors will likely be personal. In a letter to Pearson, after he had been invited to a public debate on Mendelism at the British Association, Weldon writes that “what I hate is that I want to get a definite result. I want the thing to be proved nonsense. That is a thoroughly unhealthy and immoral frame of mind, and I expect it will lead to a well deserved smash” (Weldon 1902). Bateson was similarly attracted to personal controversies (Cock and Forsdyke 2008). Other explanations, in turn, are likely to be institutional. Although Weldon’s Oxford position was quite comfortable, he regularly lamented his inability to obtain quality students, while Bateson had a revolving cadre of devotees at Cambridge. The interplay of these and other influences deserves further study. A further feature of the temporal evolution of this network is evocative. Those most invested in the debate, in our example here—Weldon and Bateson—exhibit a tendency not to make themselves central players in the broader literature but to marginalize themselves. This class of scientists—we might call them paradigm debaters or paradigm warriors— seems to sacrifice their connection to what remains of normal scientific practice, instead focusing single-mindedly on the active crisis debate. If this is a recurring feature of scientific crises, it has gone entirely unremarked upon in the literature and deserves to be examined in further case studies. This leads to one final question. Do any features of the digital analysis point toward general features of scientific crises that are exemplified in this case? In particular, the biometry-Mendelism controversy is often pointed to as an example of “fruitless” debate—had the interlocutors only been able to see past their petty differences, it is sometimes said, we could have potentially seen the development of the Modern Synthesis several decades earlier than it actually appeared. While I think this latter claim both is historically unfounded and does not do justice to the legitimate problems with which the biologists in this period were wrestling, it is notable the extent to which the two sides were unable to come to anything like worthwhile exchange. Several of the features exposed by the digital analysis might help us understand why that would have been so. The presence of these “paradigm warriors,” along with their lack of mobility in and around the networks of discourse, might indicate that

How Not to Fight about Theory

the debate had “degraded”—that the partisans truly invested in it were no longer able to argue about the merits in the course of performing normal scientific, technical work and were reduced to public fighting in a nontechnical venue. We see this to some degree mirrored in the private work of the biometricians, who had in large part decided by about 1904 that the way to prevail was not to engage directly in what we might call “productive” debate with the Mendelians, but rather to produce a theory that could convince others of the validity of the biometrical program independently of its role as weapon in the debate (see, e.g., the work of Weldon described after his death in Pearson [1908]). In that sense, this case offers us an example of how not to fight about theory—disengagement on a technical level combined with acerbic argument on a public level seems to have rendered it more acute and more personal. In short, not only has the analysis of the network of discourse proven fruitful for the biometry-Mendelism debate, but it has unearthed a number of features of the debate that have not been sufficiently studied. In turn, many of these point toward questions about crises and scientific revolutions more generally, a compelling set of problems for any account of theory change in the sciences, from Kuhn to today. Digital analyses can, indeed, make good on their twofold promise: to reveal interesting facts about the philosophy and history of science and to generate novel research questions that we would have been likely to miss without digital aid.


Chapter 9

Topic Modeling in HPS Investigating Engaged Philosophy of Science throughout the Twentieth Century Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

Many corpora that used to be hard to access are now easily found in digital form. This is notably true of scientific corpora that can be mined for history and philosophy of science (HPS) purposes, but also of philosophy of science corpora that can be examined in the framework of history of philosophy of science investigations. Open-source textual analytical tools now abound that can readily be implemented provided basic programming skills in R or Python (e.g., Aggarwal and Zhai 2012). One of the simplest and yet probably most powerful tools to be used is topic modeling. Such algorithms make it possible to automatically uncover the topics that populate any given set of texts. Among many other things, this can be done with a view to identifying the research topics of any given scientific community, to map their evolution over time, or even as a first step toward conceptual analyses of critical notions used in any field of science (e.g., Malaterre et al. 2021). To provide a concrete test case for describing and discussing these methodologies, we investigate a particular genre of philosophy of science that focused on topics such as science policy, science and political ideology, science and society, or science and values: in short, we aimed at characterizing a form of socially engaged philosophy of science that is said to have marked the discipline for several decades from the 1930s to the 1960s before disappearing almost completely until recently being advocated for (Howard 2003; Douglas 2010; Cartieri and Potochnik 2014; Vaesen and Katzav 2019; Dewulf 2021). To do so, we used topic-modeling algorithms that were applied to the complete full-text corpus of Philosophy of Science from 1934 until 2015. Our analyses corroborate these views about the 164

Topic Modeling in HPS

changing fate of engaged philosophy of science throughout the twentieth century, while also providing complementary insights, including areas for additional investigations. We start by introducing topic modeling and the key intuitions that guide these computational tools. We then present the case study of socially engaged philosophy of science and its context. We thereby set the stage for applying topic-modeling approaches whose methodological steps we describe. We then present the results of our analyses, describing which topics have been found that relate to some form of engaged philosophy of science and how these topics have evolved over time. We assess these results in light of existing research about the history of philosophy of science and notably Howard’s (2003) findings, since his article is—to our knowledge—one of the most detailed historical perspectives on the question. Finally, we discuss the methodology, highlighting its strengths, in particular as a complementary tool to usual historical or philosophical approaches, but also some of its most significant limitations. The Philosophy of Topic Modeling Text-mining approaches—notably topic modeling—are statistical algorithms designed to uncover unknown information from texts (see also chapters 5 and 6 in this volume). These approaches exploit the fact that words are used not at random in texts but in specific combinations and frequencies. For instance, in philosophy of science texts, one can expect that words such as “natural” and “selection” will tend to occur more frequently together than with words such as “quantum” and “mechanics.” One can also expect that “natural” and “selection” will tend to occur more frequently in philosophy of biology articles than in philosophy of physics, contrary to “quantum” and “mechanical.” Analyzing which words tend to co-occur in which text documents can thereby be informative about the semantic content of specific sets of documents. Though only investigating texts at the superficial level of their lexicon, text-mining approaches aim at revealing deeper underlying semantic regularities by appealing to the coherent use of words by authors (Firth 1957). Text-mining algorithms have been specifically devised to computationally analyze word patterns in digitized text corpora, notably corpora that are too large for manual analysis (e.g., Srivastava and Sahami 2009; Aggarwal and Zhai 2012). Topic-modeling algorithms in particular identify sets of words that occur with similar associative patterns in a given corpus so as to group them into meaningful topics. They thereby make it possible to explore a corpus without any a priori knowledge of its content, and notably without any preconceived idea of which topics may be present or not. The



Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

outputs of such algorithms include, on the one hand, topics construed as ordered lists of words and, on the other, indications of which topics are present in which documents throughout the corpus. In turn, analyzing in detail these topics—and notably their most significant words and most closely related documents—makes it possible to investigate the thematic content of that corpus. The evolution of topics over time can also be examined, notably by considering publication dates. Such diachronic analyses enable assessment of the extent to which specific topics were or were not present in specific groups of texts during specific time periods. Several topic-modeling approaches have been developed, including latent semantic analysis (LSA) (Deerwester et al. 1990), probabilistic latent semantic analysis (pLSA) (Hofmann 1999), or latent Dirichlet allocation (LDA) (Pritchard, Stephens, and Donnelly 2000; Blei, Ng, and Jordan 2003), the last being one of the most commonly used models. In this chapter, we specifically focus on the LDA model. This model computes optimal probability distributions of words within topics and of topics within documents through an iterative convergent process (see also chapters 5 and 6 in this volume). One of its advantages is that it encodes—via Dirichlet prior distributions—the intuition that documents are usually structured to convey a limited set of topics and that, in turn, topics are usually expressed with a limited number of words. Such topic models have been successfully deployed and tested on many occasions, lending confidence to the approach (e.g., Griffiths and Steyvers 2004; Newman and Block 2006; DiMaggio, Nag, and Blei 2013). It is this model that we applied to the complete full-text corpus of the journal Philosophy of Science from its start in 1934 until 2015 (Malaterre, Chartier, and Pulizzotto 2019). In what follows, we conduct an in-depth analysis of topics that relate more specifically to some form of engaged philosophy of science, notably essays that dwell on science policy, on matters of political ideology, on the sociology of knowledge, or on science and values. Topic Modeling as a Means to Investigate Socially Engaged Philosophy of Science In recent years, views in favor of a socially engaged philosophy of science have surfaced in the literature, contrasting with the neutral and apolitical stance that has marked the discipline, at least for several decades (Douglas 2010; Cartieri and Potochnik 2014). Advocates of such socially engaged philosophy of science ask for a renewed debate about the cultural, social, and political mission of the discipline, bringing to the forefront questions about the role of science in democratic societies, the proper place of values in science, the role for publics in science, and

Topic Modeling in HPS

many others. Interestingly, the discipline has not always been apolitical or socially disengaged. Quite the contrary. Many of its early founders in the 1920s and 1930s—notably émigré logical empiricists—were motivated by explicit social and political concerns, promoting a specific image of science in the service of mostly socialist and anti-conservative ends. All throughout the 1950s, many philosophers of science debated about the role of science in society and its articulation with sociopolitical frameworks, sometimes championing an empiricist conception of science as perhaps one of the most efficient tools against threats such as fascism (Howard 2003; Uebel 2005; Reisch 2009). The situation changed markedly in the 1960s, with the quick disappearance of social and political concerns from the philosophy of science literature. Several factors likely contributed to this change, including the varying character of the philosophy of science as exemplified by the changing editorial policy of the journal Philosophy of Science, the professionalization of the discipline that was accompanied by a noticeable drift to the political center and toward political neutrality, and the broader context of McCarthyism in the United States and the Cold War (Howard 2003). Decisions made at the History and Philosophy of Science subprogram of the US National Science Foundation’s Social Science program have also been identified as playing a role (Vaesen and Katzav 2019). For still others, it is the absence of a leading philosopher interested in the science-society relationship that is to blame for the narrowing focus of the philosophy of science at that time (Dewulf 2021). Howard argues that at least four genres of literature that had been common in the journal Philosophy of Science before 1959 nearly totally disappeared thereafter: (1) essays on science and value, (2) explicitly ideological essays or essays explicitly concerning matters of political ideology, (3) essays on science planning, science policy, and related topics, and (4) essays on the sociology of knowledge (2003, 66). If these changes indeed happened, they should show in a topic-modeling analysis of the journal. At the very least, the confrontation of Howard’s claim with text-mining approaches should be informative: if the results go in the same direction, the coherence of text-mining and historical methodologies should strengthen both approaches and the claim; if not, they should raise novel questions for investigation. Taking Howard’s claim as a working hypothesis, we were curious to see whether a topic modeling of the full-text corpus of Philosophy of Science over the past eighty years would reveal the presence of socially engaged topics in the first decades of the journal, followed by their disappearance from the 1960s onward, with, maybe, a resurgence in the last decade due to a recent renewal of interest. Of course, it could



Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

be the case that socially engaged papers did not diminish during that time but were instead published by other philosophy of science journals. Analyzing a broader set of journals may therefore lead to contrasting results. Nevertheless, Howard’s claim about Philosophy of Science itself remained. We therefore decided to focus on that journal, all the more so because we had previously carried out a topic-modeling analysis over its complete corpus (Malaterre, Chartier, and Pulizzotto 2019). We used that topic modeling as a basis and set out to explore in more detail topics and articles that could relate to Howard’s four genres. Topic-Modeling Methodology One of the peculiarities of topic modeling is that it identifies groups of words that tend to frequently co-occur in text segments. Because such groups of words often form semantically coherent sets, they can usually be interpreted as specific topics that capture the thematic content of the corpus. Topic-modeling algorithms make it possible to identify which documents are the most closely associated with any given topic while also mapping which topics are the most dominant in any given text. With the addition of publication years, diachronic analyses can be conducted that show how topics have evolved in relative predominance in the corpus over the years. It is this overall methodological approach that was implemented to identify the topics of the journal Philosophy of Science and their evolution in the last eight decades, from 1934 until 2015. The topic-modeling algorithm we used is based on the wellknown LDA model (Pritchard, Stephens, and Donnelly 2000; Blei, Ng, and Jordan 2003). This algorithm is part of a larger family of unsupervised statistical algorithms for topic discovery in texts that—by virtue of being unsupervised—make it possible to explore corpora without any a priori knowledge of their specific content and, in particular, without any specific knowledge of which topics might or not be present (hence the intuition that the LDA uncovers unknown “latent” structures—but see chapter 6 for a word of caution about unsupervised algorithms). The LDA model construes topics as probability distributions over all words that are present in the corpus. In turn, each document is characterized by a probability distribution over topics. Consequently, one can make sense of any given topic by looking at the words that have the highest probability in that topic, and, conversely, one can characterize the topical content of any document by analyzing the topics that have the highest probability of occurrence in that document. The overall topicmodeling methodology can be described in five main stages (more details found in Malaterre, Chartier, and Pulizzotto 2019).

Topic Modeling in HPS

(1) Corpus retrieval and cleaning. All full-text articles of Philosophy of Science from 1934—its first year of publication—until 2015 were retrieved from JSTOR, hence a total of eighty-two complete years. A peculiarity of the journal is that it started publishing the proceedings of the biennial meetings of the Philosophy of Science Association (PSA) in the mid-1990s. Because these proceedings—which were published separately from the journal from their start in 1970 until 1994, and then jointly—were peer-reviewed papers, like regular articles only slightly shorter, it was decided to keep them in the corpus. On the other hand, editorials, announcements, book reviews, and other notices, as identifiable by specific metadata, were removed. This resulted in a corpus of 4,602 articles (3,730 regular articles and 672 proceedings articles). Cleaning consisted in removing textual noise, mostly resulting from the optical character recognition (OCR) process (such as issues with hyphenated words, presence of page numbers, and headers and footers). (2) Data preprocessing. Preprocessing was done in a standard way. This included sentence and word tokenization, spelling normalization by lemmatization, and morpho-syntactic disambiguation by part-ofspeech tagging—so as to encode and prepare the data in a suitable way for computational analysis. To this effect, the TreeTagger algorithm (Schmid 1994) was used together with Penn Treebank for tagging (Marcus, Marcinkiewicz, and Santorini 1993). Following this step, specific types of words were removed—such as determinants, prepositions, or pronouns—to reduce the lexicon (and hence computational time) and the noise in the topic modeling. Also, words that occurred in less than fifty sentences in the corpus were removed. The preprocessing stage thereby resulted in a lexicon of 10,658 distinct words distributed among 976,263 sentences. (3) Topic modeling. The topic modeling of the corpus was done by implementing an LDA-based algorithm (Blei, Ng, and Jordan 2003) together with a Gibbs sampling method to facilitate the iterative convergence of the model, as described in Griffiths and Steyvers (2004). The LDA was performed through an API for Python (https://py Like all other well-established topicmodeling approaches, LDA models consider the number k of topics to be a parameter whose value is known beforehand. Heuristics have been developed to estimate proper values for k, yet their efficiency is limited; in practice, this is done through trial and error and a critical analysis of the topic-modeling results. In the present case, after comparing the resulting topics for several values of k, the value k = 200 topics appeared to offer a proper topic granularity in terms of



Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

both interpretability (i.e., semantic coherence of topic most probable words) and specificity (i.e., minimal overlap between topics). As we will see below, this k value resulted in four topics being relevant to socially engaged philosophy of science (though these topics are also present in other types of work). Choosing a higher value for k might have led to more-specific topics but in higher numbers, while choosing a lower value might have led to a single yet broader topic. In light of these considerations, k = 200 appeared as a most reasonable granularity. The topic-modeling stage thereby resulted in 200 probability distributions over the corpus lexicon (probabilities of finding specific words in any specific topic) and 4,602 probability distributions over the 200 topics (probabilities of finding specific topics in any specific article). (4) Topic interpretation. As mentioned above, topics are probability distributions over the words of the corpus. By analyzing the most probable words for each topic, one can usually infer the semantic content that these groups of words are supposed to convey in the corpus and thereby attribute meaningful labels to each topic. Interpretation can also be guided by examining the documents in which the different topics are the most probable. Though manual and error-prone, topic interpretation remains much constrained both by the sets of dominant words that characterize each topic and by the documents in which topics appear. (5) Diachronic topic analysis. By taking into account the publication year of each article, the diachronic distributions evaluate all topics over the eighty-two years of Philosophy of Science, from 1934 until 2015. In order to average out possible year-to-year variance, the corpus was split into seventeen periods of five years each (except for the last period, which included only two years). For every topic, its probability of being found in any given period was computed by averaging the probability of finding that topic in all articles of the period. Applying Topic Modeling to Assess Engaged Philosophy of Science To assess Howard’s claim about the disappearance of four genres in Philosophy of Science after 1959 and, more broadly, the thesis of the existence of a politically and socially engaged philosophy of science pre-1960s, we examined the topics that resulted from the topic modeling. First, we examined the top thirty words, and searching for keywords that included “value,” “social,” “sociology,” “policy,” and “political,” we retrieved three topics of direct interest for the question at stake: economy,


Topic Modeling in HPS Table 9.1. Topics with keywords related to socially engaged philosophy of science

among their top thirty words (topic ID is a unique identification number used for computational purposes). Topic

Top 30 words

Topic ID


economic; price; economics; public; social; political; society; interest; war; demand; good; market; economy; production; policy; exchange; cost; economist; people; labor; resource; supply; firm; service; private; business; government; engineer; party; income


logical; empiricist; empiricism; view; doctrine; carnap; philosophical; philosophy; russell; kant; metaphysical; positivist; thought; tradition; Philosophicalmaterialism; positivism; dialectical; metaphysic; schools epistemological; position; traditional; epistemology; idealism; dewey; idea; mach; critical; conception; philosopher; critique



social; science; culture; society; study; cultural; human; historical; history; political; sociology; institution; anthropology; sociological; man; life; practice; anthropologist; natural; people; scientist; psychology; kinship; moral; interest; sociologist; technology; organization; ideology; psychological



judgment; make; scientist; objective; moral; agreement; value; concern; decision; regard; opinion; judge; subjective; ethical; ought; base; judgment; expert; personal; person; involve; objectivity; agree; disagreement; epistemic; consensus; consideration; basis; rather; intuition


social-science, and science-and-values. Second, considering the top thirty publications and searching for the articles that Howard had identified, we retrieved one additional topic on top of the three previous ones: philosophical-schools (this can be explained by the fact that the terms “value,” “social,” “sociology,” “policy,” and “political” do not describe the full range of genres that Howard identified). The top thirty words shed light on each topic (table 9.1). For each topic, we also listed the top thirty articles sorted by time period (table 9.2). Interestingly, many of these articles do cover themes that clearly reflect engaged philosophy of science, and many (but not all) of the articles identified by Howard are present, in addition to others that are clearly relevant despite not having been identified by Howard (most of them predating Howard’s [2003] chapter). Note that one of the topics—science-and-values—includes


Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

many relevant articles to socially engaged philosophy of science, yet none was reported by Howard. Note also that none of the four topics in this topic modeling are exclusively related to engaged philosophy of science themes, all simultaneously covering other neighboring questions. The topic Economy includes several terms that clearly express economics-related questions (e.g., “economic,” “price,” “demand”), as well as other terms that could also refer to engaged philosophy issues (e.g., “social,” “public,” “policy,” “war,” “people”). Among the most strongly associated articles, many include ideological or policy-oriented essays—for instance, on Soviet science, on communist scientific methods, or on dialectical materialism—that clearly relate to Howard’s genres (2) and (3) throughout the 1960s, but also later relevant articles, for instance on engineering, technology, and ethics in the 1980s. Yet the topic also includes a significant share of philosophy of economics articles—for instance, related to questions about the scientific status of the discipline or the formalization of economic theory—as well as a few more general articles that concern modeling or complexity and that illustrate their theses by examining examples drawn from economics. The topic Philosophical-schools is characterized by words that relate to logical empiricism (e.g., “logical,” “empiricism,” “carnap”), to philosophical doctrines (“doctrine,” “philosophical,” “metaphysical,” “position”), and also to some possibly more engaged themes, notably if one considers “dialectical” as related to “materialism,” for instance. Examining the top thirty articles for the topic confirms this characterization. These articles include essays that indeed relate to philosophical schools in a broad sense, notably empiricism but also idealism and existentialism (likely bound together by the use of similar sets of words such as “doctrine,” “logical,” or “philosophical”). Yet the top articles also include essays much tinted by ideological and political questions, for instance on the social role of empiricism, on dialectical fundamentalism, or on Marxism, that clearly relate to Howard’s genres (2) and (3). When it comes to the topic Social-science, its top words appear to include three sets of terms: sociocultural terms (e.g., “social,” “sociological,” “society,” “culture”), historico-anthropological terms (e.g., “history,” “anthropology”), and human and political terms (e.g., “human,” “people,” “political”). The characterization of the topic is confirmed by examining its top thirty articles, which include articles about the sociology of knowledge—Howard’s genre (4)—but also articles about the philosophy of the social sciences, history, and anthropology, as well as a significant share of essays related to science and values—Howard’s genre (1). It is interesting to note that the topic is also strongly connected to post-1960s articles that examine gender and other biases on culture,

Topic Modeling in HPS

science, and science studies, most notably starting in the 1980s, for instance with articles by Harding, Haraway, or Smocovitis (see table 9.2). As for the topic Science-and-values, its most significant terms include words related to scientific judgment and values (e.g., “judgment,” “opinion,” “expert,” “value”) but also to moral philosophy (e.g., “moral,” “ethical”). Among the top thirty articles, one finds articles that clearly relate to ethics (for instance, on moral obligation and normativity in psychology) but also numerous articles that concern other facets of engaged philosophy of science, including articles about value judgments in science, objectivity (or lack thereof), and expertise, as well as science and democracy, from authors such as Bisbee, Frank, Hartman, Scriven, McMullin, and, more recently, Elliott or Steele. Note that none of these articles had been reported by Howard. Note also that the topic includes none of the articles he had identified, which shows the potential of topic modeling both as an analytic tool and as a heuristic approach. The topic-modeling methodology we used makes it possible to aggregate topic probabilities per time period—simply by aggregating topic probabilities in articles that belong to the same time period— hence revealing the diachronic patterns of topics over time (see figure 9.1). The results of these analyses show that all four topics (Economy, Philosophical-schools, Social-science, and Science-and-values) were more prominent in Philosophy of Science before the 1960s than after, though this pattern is less marked for the topic Science-andvalues. As we noted, however, the four topics do not exclusively map onto socially engaged philosophy of science papers, being found also in other groups of papers that use similar terms. In order to assess whether the topic-diachronic patterns indeed reflected the evolution of socially engaged philosophy of science, we counted the number of engaged philosophy of science papers among the top thirty articles for all four topics per time period (on the basis of table 9.2) and plotted the ratio of that number over the total number of articles in the corpus per time period (see figure 9.2). The diachronic evolution follows a very similar pattern, which gives the approach additional confidence. Discussion of Results Topic-modeling approaches applied to the full-text corpus of Philosophy of Science made it possible to retrieve topics and articles that help better describe and document the place of engaged philosophy of science in the journal. Overall, the four topics that were initially identified—on the basis of some keywords—as possibly relevant to examining the existence of some form of engaged philosophy of science indeed turned out


Hartung, Frank E. (1945) “The Social Function of Positivism” Hartung, Frank E. (1947) “Sociological Foundations of Modern Science” White, Leslie A. (1947) “The Locus of Mathematical Reality: An Anthropological Footnote” Zerby, Lewis (1945) “Normative, Descriptive, and Ideological Elements in the Writings of Laski”

Hartung, Frank E. (1945) “The Social Function of Positivism” Johnson, A.H. (1945) “Whitehead’s Theory of Actual Entities: Defence and Criticism” Klausner, Neal W. (1947) “Three Decades of the Epistemological Dialectic 1900–1930”


Rautenstrauch, Walter (1945) “What Is Scientific Planning?” Winthrop, Henry (1945) “Conceptual Difficulties in Modern Economic Theory” Feuer, Lewis S. (1948) “Indeterminacy and Economic Development”

Kluckhohn, Clyde (1939) “The Place of Theory in Anthropological Studies”





Mayer, Joseph (1935) “The Techniques, Basic Concepts, and Preconceptions of Science and Their Relation to Social Study” Bierstedt, Robert (1938) “The Meanings of Culture” Merton, Robert K. (1938) “Science and the Social Order”

Emery, A. (1935) “Dialectics versus Mechanics: A Communist Debate on Scientific Method” Frank, Philipp, Shorr, Philip (1937) “The Mechanical versus the Mathematical Conception of Nature”

Emery, A. (1935) “Dialectics versus Mechanics: A Communist Debate on Scientific Method” Mayer, Joseph (1936) “Pseudo-Scientific Economic Doctrine” Mayer, Joseph (1936) “Pseudo-Scientific Economic Doctrine—Continued”

Social-science (22)

Philosophical-schools (21)

E conomy (2)


of direct relevance to engaged philosophy of science are underlined).

Johnson, A.H. (1945) “Whitehead’s Theory of Actual Entities: Defence and Criticism”

Bisbee, Eleanor (1937) “Objectivity in the Social Sciences”

Science-and-values (50)

Table 9.2. Top thirty articles per selected topic (topic ID in parentheses) and per time period (articles cited by Howard [2003] are in bold type; other articles


Bergmann, Gustav (1956) “Russell’s Examination of Leibniz Examined” Freistadt, Hans (1956) “Dialectical Materialism: A Friendly Interpretation” Riepe, Dale (1958) “Flexible Scientific Naturalism and Dialectical Fundamentalism” Mikulak, Maxim W. (1958) “Soviet Philosophic-Cosmological Thought”

Boulding, Kenneth E. (1956) “Some Contributions of Economics to the General Theory of Value”


Brodbeck, May (1954) “On the Philosophy of the Social Sciences” Gewirth, Alan (1954) “Can Men Change Laws of Social Science?” Wein, Hermann (1957) “Trends in Philosophical Anthropology and Cultural Anthropology in Postwar Germany”

Social-science (22) Shils, E.A. (1949) “Social Science and Social Policy” Simpson, George (1950) “The Scientist—Technician or Moralist?”

Philosophical-schools (21) Feuer, Lewis S. (1949) “Dialectical Materialism and Soviet Science” Feigl, Herbert (1950) “Existential Hypotheses: Realistic versus Phenomenalistic Interpretations” Cerf, Walter (1951) “Logical Positivism and Existentialism”

E conomy (2)

Feuer, Lewis S. (1949) “Dialectical Materialism and Soviet Science” Shils, E.A. (1949) “Social Science and Social Policy” Reiser, Oliver L. (1949) “A Resolution of the ‘East-West Problem’ by Way of a Scientific Humanism” Merton, Robert K. (1949) “The Role of Applied Social Science in the Formation of Policy: A Research Memorandum” Somerville, John (1952) “A Key Problem of Current Political Philosophy: The Issue of Force and Violence”


Science-and-values (50)


Frank, Jerome (1949) “The Place of the Expert in a Democratic Society”






McCarthy, Thomas (1978) “History and Evolution: On the Changing Relation of Theory to Practice in the Work of Jürgen Habermas”

Coffa, J. Alberto (1976) “Carnap’s Sprachanschauung Circa 1932” Norton, B. (1977) “On the Metatheoretical Nature of Carnap’s Philosophy” McCarthy, Thomas (1978) “History and Evolution: On the Changing Relation of Theory to Practice in the Work of Jürgen Habermas”

Gaa, James C. (1977) “Moral Autonomy and the Rationality of Science” Singer, Marcus G. (1977) “Justice, Theory, and a Theory of Justice” Vickers, John M. (1978) “On the Reality of Chance”

Scriven, Michael (1972) “The Exact Role of Value Judgments in Science”

McCarthy, Thomas (1972) “The Operation Called Verstehen: Towards a Redefinition of the Problem”

Laudan, Larry (1971) “Towards a Reassessment of Comte’s ‘Méthode Positive’” McCarthy, Thomas (1972) “The Operation Called Verstehen: Towards a Redefinition of the Problem”

Sussmann, Héctor J. (1976) “Catastrophe Theory: A Preliminary Critical Study” Olson, Mancur (1976) “Cost-Benefit Analysis, Statistical Decision Theory, and Environmental Policy” Miller, R.W. (1978) “Methodological Individualism and Social Explanation”



Science-and-values (50) Turnbull, Robert G. (1960) “Imperatives, Logic, and Moral Obligation” Levi, Isaac (1962) “On the Seriousness of Mistakes” Hartman, Robert S. (1962) “Axiology as a Science”




Social-science (22) Ehrlich, Howard J. (1962) “Some Observations on the Neglect of the Sociology of Science” Gellner, Ernest (1960) “The Concept of Kinship: With Special Reference to Mr. Needham’s ‘Descent Systems and Ideal Language’” Gellner, Ernest (1963) “Nature and Society in Social Anthropology” Naroll, Raoul (1961) “Two Solutions to Galton’s Problem”

Philosophical-schools (21) Sellars, Roy Wood (1960) “Panpsychism or Evolutionary Materialism” Mattick, Paul (1962) “Marxism and the New Physics”

E conomy (2)

Hodges, Donald Clark (1962) “The Dual Character of Marxian Social Science” Mattick, Paul (1962) “Marxism and the New Physics”


Hatfield, Gary (1984) “Spatial Perception and Geometry in Kant and Helmholtz” Lewis, Joia (1988) “Schlick’s Critique of Positivism”

Dunnell, Robert C. (1984) “Methodological Issues in Contemporary Americanist Archaeology” Haraway, Donna J. (1984) “Primatology Is Politics by Other Means” Jarvie, I.C. (1984) “Anthropology as Science and the Anthropology of Science and of Anthropology or Understanding and Explanation in the Social Sciences, Part II” Kincaid, Harold (1986) “Reduction, Explanation, and Individualism”


Hausman, Daniel M. (1984) “Philosophy and Economic Methodology” de Marchi, N., Kim, J. (1988) “Ceteris Paribus Conditions as Prior Knowledge: A View from Economics”

Social-science (22) Harding, Sandra (1980) “The Norms of Social Inquiry and Masculine Experience” Hartsock, Nancy C.M. (1980) “Social Life and Social Science: The Significance of the Naturalist/ Intentionalist Dispute”

Philosophical-schools (21) Shope, R. (1979) “Eliminating Mistakes about Eliminative Materialism” Bernstein, R.J. (1982) “What Is the Difference That Makes a Difference? Gadamer, Habermas, and Rorty”

E conomy (2)

Cyert, Richard M., Pottinger, Garrel (1979) “Towards a Better Microeconomic Theory” Gravander, Jerry W. (1980) “The Origin and Implications of Engineers’ Obligations to the Public Welfare” Hartsock, N. (1980) “Social Life and Social Science: The Significance of the Naturalist/Intentionalist Dispute” Rogers, C. Thomas (1980) “The EndUse Problem in Engineering Ethics” Hausman, D.M. (1981) “John Stuart Mill’s Philosophy of Economics” Bantz, D. (1982) “The Philosophical Basis of Cost-Risk-Benefit Analyses”


Science-and-values (50)

Mayo, Deborah G. (1988) “Toward a More Objective Understanding of the Evidence of Carcinogenic Risk”

Van Fraassen, Bas C. (1980) “Rational Belief and Probability Kinematics” Kourany, Janet A. (1982) “Towards an Empirically Adequate Theory of Science” Thagard, Paul (1982) “From the Descriptive to the Normative in Psychology and Logic” McMullin, Ernan (1982) “Values in Science”





Jarvie, I.C. (2001) “Science in a Democratic Republic”

Uebel, Thomas E. (2000) “Logical Empiricism and the Sociology of Knowledge: The Case of Neurath and Frank” Richardson, Alan (2002) “Engineering Philosophy of Science: American Pragmatism and Logical Empiricism in the 1930s”

  Steel, Daniel (1998) “Warfare and Western Manufactures: A Case Study of Explanation in Anthropology”

Jarvie, I.C. (2001) “Science in a Democratic Republic” Mallon, Ron, Stich, Stephen P. (2000) “The Odd Couple: The Compatibility of Social Construction and Evolutionary Psychology” Nichols, S. (2002) “On The Genealogy of Norms: A Case for the Role of Emotion in Cultural Evolution”

Smocovitis, Vassiliki Betty (1994) “Contextualizing Science: From Science Studies to Cultural Studies”

Social-science (22) Harding, Sandra (1992) “After Eurocentrism: Challenges for the Philosophy of Science”

Philosophical-schools (21) Richardson, Alan (1990) “How Not to Russell Carnap’s Aufbau” Creath, Richard (1990) “The Unimportance of Semantics” Mormann, Thomas (1991) “Husserl’s Philosophy of Science and the Semantic Approach”

E conomy (2)

Kauffman, Stuart A. (1990) “The Sciences of Complexity and ‘Origins of Order’”

Science-and-values (50)

Levi, Isaac (1999) “Value Commitments, Value Conflict, and the Separability of Belief and Value” Allchin, Douglas (1999) “Do We See through a Social Microscope? Credibility as a Vicarious Selector”

Pierson, Robert (1994) “The Epistemic Authority of Expertise” Barrett, Jeffrey A. (1996) “Oracles, Aesthetics, and Bayesian Consensus” Kyburg, Henry E. (1997) “Quantities, Magnitudes, and Numbers” Goldman, Alvin I. (1997) “Science, Publicity, and Consciousness”

van Fraassen, Bas C. (1992) “From Vicious Circle to Infinite Regress, and Back Again”




Neuber, Matthias (2011) “Feigl’s ‘Scientific Realism’”




Philosophical-schools (21) Frost‐Arnold, Greg (2005) “The Large Scale Structure of Logical Empiricism: Unity of Science and the Elimination of Metaphysics”

E conomy (2)

Alexandrova, Anna (2008) “Making Models Count” Ross, Don (2008) “Ontic Structural Realism and Economics”





Social-science (22)

Science-and-values (50)

Bradley, Richard, Dietrich, Franz, List, Christian (2014) “Aggregating Causal Judgments” Keren, Arnon (2015) “Science and Informed, Counterfactual, Democratic Consent” Rolin, K. (2015) “Values in Science: The Case of Scientific Collaboration” O’Neill, Elizabeth (2015) “Which Causes of Moral Beliefs Matter?”

Elliott, Kevin C. (2011) “Direct and Indirect Roles for Values in Science” Steele, Katie (2012) “The Scientist qua Policy Advisor Makes Value Judgments” Magnus, P.D. (2013) “What Scientists Know Is Not a Function of What Scientists Know”


9.1. Evolution of topic probability (y axis) over time periods (x axis) in the corpus of Philosophy of Science.

9.2. Evolution of the cumulated topic-probability for the four retained topics (left-side y axis) and the ratio of the number of socially engaged papers among the top-thirty papers for the four topics compared to the total number of articles per time period (right-side y axis) over time periods (x axis) in the corpus of Philosophy of Science.

Topic Modeling in HPS

to be good markers for identifying articles that pertained to the four genres identified by Howard (2003) throughout the eight decades of Philosophy of Science. About twenty of the articles retrieved are identical to those identified by Howard, hence a significant overlap. Note, however, that not all of the articles identified by Howard were retrieved: about thirty of them were not present in the top thirty articles of the four selected topics (though they are, of course, present in the corpus). On the other hand, about thirty-five new articles were identified that were not present in Howard’s list and that nevertheless touch upon a diverse range of questions that are highly relevant to engaged philosophy of science, mostly in the 1980s and 1990s (and a few after Howard’s chapter publication date, as could have been expected). Howard’s thesis about the disappearance of four genres of literature in Philosophy of Science after 1959 is well corroborated when it comes to genres (2), “explicitly ideological essays or essays explicitly concerning matters of political ideology,” and (3), “essays on science planning, science policy, and related topics,” as well as genre (4), “essays on the sociology of knowledge.” Yet the evolution of genre (1), “essays on science and value,” is probably less pronounced, as several articles relevant to that genre were after the 1980s most notably (see in particular among the underlined articles in table 9.2 for the topics Science-and-values and, to a lesser extent, Social-science). One limitation of the analyses comes from the delineation of the topics. It is rare that topics—which are generated through a bottom-up and unsupervised approach—map one-on-one to themes of interest. As mentioned, the four topics we retrieved do not perfectly map onto Howard’s four genres. In particular, one topic, Science-and-values, does not include any of the articles identified by Howard among its top thirty articles, while another, Philosophical-schools, does not include the most central keywords used to depict socially engaged philosophy of science. The other two topics, Social-science and Economy, include terms that can be used in other contexts as well and are indeed present in articles that are not about socially engaged philosophy of science. This means that the terms used in socially engaged philosophy of science may also be used in other contexts (see chapter 5 for a plea for context sensitivity of topic modeling). It may mean as well that the terms used in engaged philosophy of science are also used in association with other terms that are not necessarily linked to this field and that can be used for other argumentative purposes. Finally, it may reflect the fact that engaged philosophy of science covers a broad range of quite different things, some of which were identified by Howard and others not (as is the case for the topic Science-and-values and its associated articles).



Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

This situation has two consequences. First, not all of the articles retrieved by looking at their probability of displaying one of the four topics are, in fact, directly linked to engaged philosophy of science. As we have seen, some of these articles concern, for instance, the philosophy of economics, logical empiricism, idealism, the philosophy of anthropology, or many other questions. It is just that these articles share common sets of words with articles that are of direct relevance to the four genres and that all of these articles feature, with relatively high probability, terms among the top terms of each topic. It is therefore necessary to complement the analyses with a careful assessment of the relevance of all retrieved articles. Second, the topic-diachronic patterns of figure 9.1 also cannot be said to represent exclusively the evolution of the four genres over time: it is likely that only a fraction of the patterns is due to these four genres. However, since the patterns all show more significant topic probabilities before the 1960s than after, the evolution of the four topics denote a significant change in style in Philosophy of Science during the 1960s. A possible solution to these limitations could be to conduct a more fine-grain topic modeling, typically one in which the number of topics would increase so as to split some of the four topics into more fine-grain ones (for instance, Economy into a topic on communism and another on economics, or Social-science into a topic on sociology and another on values, though one cannot determine beforehand how the topic-modeling algorithms will turn out to allocate terms). As noted above, the patterns also align with the count of socially engaged philosophy of science papers among the top thirty articles of the four topics (relativized to the corpus size per time period, as shown in figure 9.2). Such similarity in diachronic patterns increases the confidence that the topic evolution somehow indeed captures relevant features of the evolution of socially engaged philosophy of science research. One advantage of the present topic model is its ability to retrieve a broader scope of articles than the ones that had been manually identified, notably by Howard (2003). This is the case not only for the pre1960s period but most significantly for the decades that followed: the algorithmic method made it possible to identify several additional articles that concerned, for instance, the social impacts of engineering and technology, the relevance of gender biases in science, notably in the social sciences, and more generally the influence of values in scientific judgment. These findings tend to relativize Howard’s thesis: the four genres did not disappear so strongly after the 1960s; they certainly decreased in relative significance but also changed in content. One genre in particular—genre (1), “essays on science and value”—has even been quite more present than assessed by Howard, especially as revealed by

Topic Modeling in HPS

the post-1960s articles we retrieved. These articles show that there still was some form of engaged philosophy of science after the 1960s, maybe even more significantly after the 1980s. Yet, when relativized to all other articles, the topic probability patterns do not show any significant upward trend after the 1980s: they simply indicate a stagnation of the four genres at a rather low level in terms of their relative probabilities of occurrence compared to other topics in the journal articles. These findings are in line with the increased numbers of articles that the journal published from the 1970s onward (including the proceedings of the Philosophy of Science Association meetings; see Malaterre, Chartier, and Pulizzotto 2019). From a methodological point of view, the findings also show the value of computational approaches to texts in complement to usual historical approaches, at minimum from a heuristic point of view, but also with the potential of providing more quantitative views and on a much broader scope of documents. As we have seen here, computational approaches can be used with a view to testing specific hypotheses, for instance about the presence of certain themes within a given corpus and how they evolved over time. These approaches thereby lend themselves to investigations that can be both exploratory and hypothetico-deductive. Methodological Strengths and Limitations Broadly speaking, topic modeling is a well-tested approach to textual analysis. On numerous occasions, LDA topic modeling has been shown to be a very reliable algorithmic tool for identifying topics in large text corpora (e.g., Griffiths and Steyvers 2004; Blei and Lafferty 2009; DiMaggio, Nag, and Blei 2013). As with any tool or instrument, its reliability depends on how it is put to use and how results are interpreted. In this respect, all stages of the methodology involve crucial operations and decisions, often relying on several cycles of feedback loops between parameter setting, computer simulations, and careful inspection of intermediate results (Hu et al. 2014). For instance, corpus retrieval and cleaning (stage 1) must be done appropriately. These are often the most time-consuming and yet the least rewarding tasks, but they are also absolutely necessary, simply being a prerequisite for carrying out any subsequent analysis. Data preprocessing (stage 2) requires caution, in particular when it comes to inspecting and validating the lexicon that resulted from lemmatization processing and word filtering based on POS tagging (visual inspection aided by specific queries for stop words, special characters, and word frequencies). For topic modeling (stage 3), different topic-modeling algorithms and implementations can be used to check for robustness, and, for each of these, different parameters need



Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto

to be carefully chosen. One of these parameters, as mentioned above, is the number k of topics, which ultimately rests on an appreciation of the proper granularity of the results given a certain background knowledge of the corpus and research objectives. Topic interpretation (stage 4) also needs to be done with a lot of caution, in particular to avoid narrow or biased labeling of topics. This interpretative step is, however, very much constrained by the top words for each topic—in particular how these top words contrast from topic to topic—and by the sets of documents in which the topics are the most probable. The diachronic topic analysis (stage 5) is probably more straightforward, as it follows from adding metadata—publication years—to the results of stage 3. Nevertheless, examining how top articles are spread timewise and checking against known historical episodes that are supposed to be present in the corpus can help a researcher gain confidence in the methodology and the quality of its implementation. One of the limitations of topic modeling is that the methodology only results in identifying topics—which are no other than ordered lists of words—and their relative probability of occurrence in documents, and nothing more. If one wishes to identify more sophisticated or simply different features of the corpora, other computational approaches should be used, for instance, for conceptual analysis or argument-mining methodologies (e.g., Peldszus and Stede 2013; Swanson, Ecker, and Walker 2015). Topic modeling makes possible certain analyses in terms of occurrence and evolution of topics in a corpus but cannot reveal deeper features such as the argumentative relationship between topics in given sets of articles. One should therefore be cautious not to infer too much from the simple co-occurrence of topics in documents or time periods. That being said, it is worthy to note again that topic modeling belongs to an unsupervised family of data-driven approaches for textual analysis. In other words, it works without any a priori knowledge about which topics indeed populate a corpus (though, of course, knowledge about the corpus content in general is helpful, for instance, to interpret topics or make sense of the findings). In this way, topic-modeling methods can be said to provide empirical grounding to specific theses about the content of the corpora onto which they are applied. In the case of Philosophy of Science, the topics of the present topic modeling were found according to such data-driven approaches, and there was no guarantee at start that the topics would make salient certain topics at certain times according to patterns that matched what was independently known of the history of philosophy of science. Furthermore, an advantage of the methodology is its scalability and its efficiency to infer meaningful patterns in corpora that are too large for manual scholarly analyses.

Topic Modeling in HPS

Investigating corpora by means of text-mining algorithms is extremely powerful. It makes it possible to examine, in a systematic fashion, sets of texts that are too large for traditional analyses. It can provide empirical grounding to historical and philosophical claims by means of quantified information over the content of these corpora, possibly with the help of metadata such as publication years or others (hence acting somehow as in the context of hypothesis testing). It can also serve as a heuristic to identify possible claims and documents that may, in turn, be put to scrutiny under the more usual methodologies of history and philosophy of science (hence here an exploratory role). Topic modeling is one such tool, and the LDA approach described here is one among several approaches to topic modeling. In the present case, the LDA model made it possible to easily identify several clusters of words relevant to investigating the scope and historical pattern of socially engaged philosophy over eight decades of Philosophy of Science. It also made it possible to identify sets of articles—including some that had not been identified before—as relevant to this topic, hence proving the heuristic value of the methodology. As pointed out above, other text-mining analyses would definitely be possible, including more fine-grain topic modeling (so as to identify more precise topics in line with the research questions) and author community detection (in order to identify the clusters of authors related to engaged philosophy of science and the relationships between those authors) among many others. One of the key challenges in applied text-mining research is the formulation of tractable questions for algorithmic approaches. This goes both ways. It first requires a good understanding of what the tools can and cannot do, and hence strong technical capabilities. But it also requires creativity to formulate meaningful questions from historical and philosophical perspectives that such tools can be put to answer. For these reasons, text mining is bound to change the way research is done in HPS, notably leading to larger and more multidisciplinary teams that can share corpora, text-mining expertise, and deep knowledge of the field.


Chapter 10

Bolzano, Kant, and the Traditional Theory of Concepts A Computational Investigation Annapaola Ginammi, Rob Koopman, Shenghui Wang, Jelke Bloem, and Arianna Betti

Recent research shows that valuable contributions are obtained by applying even simple, well-known computational techniques to texts from the history and philosophy of science (van Wierst et al. 2016). In this chapter we substantiate the point by relying on computational text analysis to address an open question regarding work on the general methodology of the sciences by Bernard Bolzano (1781–1848). A Bohemian polymath known to most for an important theorem of analysis bearing his and Karl Weierstrass’s name, and for his work on the infinite, Bolzano was arguably the most important logician between Gottfried Wilhelm Leibniz and Gottlob Frege.1 We focus on certain aspects of Bolzano’s theory of concepts relevant to his theory of grounding (Abfolge), a sophisticated notion of (noncausal) scientific explanation contained in his main work, the Theory of Science (Wissenschaftslehre; Bolzano 1837). Given that grounding (“proving why” or “giving reasons why”) is, traditionally, scientific explanation (Dear 1987, 138; van den Berg 2014, chs. 2–3; Beaney 2018), it is rather odd that Bolzano’s ideas on grounding are highly appreciated in logic and metaphysics (Correia 2010; Schnieder 2011; Fine 2012, 2022; Rumberg 2013; Poggiolesi 2016a, 2021) but rarely discussed in the philosophy of science. One exception is Roski (2019), who stresses the relevance of Bolzanian grounding for the concept of unification roughly in the sense of a minimal deductive 186

Bolzano, Kant, and the Traditional Theory of Concepts

base à la Kim (1994; see also Friedman 1974; Kitcher 1989). As Betti (2010) argues, the oddity of the situation is easy to explain, as grounding is a form of conceptual, noncausal explanation, and philosophy of science tends to marginalize noncausal explanations; still, restricting the scope of the meaning of “explanation” to “causal explanation” is dangerous, as it might hamper our understanding of the past, making us miss important insights from a wealth of historical case studies and thus engendering an undesirable separation of philosophy of science from its history. Bolzano’s stance on scientific explanation is due to his adherence to a (broadly speaking) axiomatic, millennia-old tradition that originated from Aristotle’s Analytica Posteriora (de Jong and Betti 2010; de Jong 2001; Betti 2010; Lapointe 2010; Roski 2017) and influential upon Descartes, Newton, Kant, Frege, Carnap, and Dedekind to name just a few (Losee 2001, ch. 8; Anstey 2017; de Jong 2010; Macbeth 2016; Klev 2011, 2016). Importantly, axiomatics is not a purely logico-mathematical affair: notable axiomatic approaches exist in biology (see Smocovitis 1992 on Woodger and Haldane) and in physics (see Brading and Ryckman 2008 on Hilbert; Glymour and Eberhardt 2016 on Reichenbach), including twentieth-century influential takes such as Patrick Suppes’s (starting from McKinsey and Suppes 1953, 1955; McKinsey, Sugar, and Suppes 1953). Despite claims that axiomatic efforts cannot in fact properly accommodate experimental science (Muller 2011, 97), recent work in the history of biology shows that no such incompatibility exists (van den Berg and Demarest 2020). According to the axiomatic tradition, a proper science is cognitio ex principiis—knowledge from the principles—and this at two levels: at the level of truths (true statements), insofar as in a proper science certain fundamental truths are the principles from which all other truths follow; and at the level of concepts, insofar as certain fundamental concepts are the building blocks of which all other concepts are composed or defined (de Jong and Betti 2010, 190). Bolzano, famously, was the first to give a thorough account of the key relation among truths in a proper science as cognitio ex principiis in the Wissenschaftslehre, namely in terms of the relation of ground and consequence.2 Grounding among conceptual truths imposes a hierarchy that is related to a hierarchy among concepts. Since grounds are simpler than their consequences for Bolzano (Roski and Rumberg 2016, § 3.1), and since conceptual truths consist exclusively of concepts, it follows that concepts themselves are ordered according to simplicity. Yet simplicity is not the only ordering constraint relevant for grounding: grounds are also required to be more general than their consequences; and, importantly, the two constraints



Ginammi, Koopman, Wang, Bloem, and Betti

of simplicity and generality are sometimes in conflict in Bolzano’s writing on grounding (Roski 2017, § 4.4). Why exactly are simplicity and generality in conflict? No answer to this question is available yet, for little is still known about exactly how Bolzano’s truth and concept hierarchies interact. We contribute here to clarifying the latter point by showing that Bolzano acknowledges two non-interdependent orderings of concepts: the ordering by composition (from simple to more complex) and the ordering by subordination (from general to more particular).3 Only the latter, we maintain, is to be identified with Bolzano’s hierarchy of concepts. That this finding is not only historically relevant might already be clear from its relevance to a proper understanding of Bolzano’s grounding and the relevance of the latter for present-day philosophy mentioned above; but there’s also a specific and direct repercussion of our results upon present-day research on Bolzano-inspired, proof-theoretic logics of grounding (e.g., Poggiolesi 2016b, 2018, 2021; Poggiolesi and Francez 2021). Namely, our results offer an alternative philosophical basis for the grounding order with respect to Poggiolesi’s account, which relies on conceptual simplicity (Poggiolesi, pers. comm.). One interesting advantage of this possible alternative is that, from the modern point of view, conceptual generality might also be a technically easier notion to explicate than conceptual simplicity. To arrive at our results we used a mixed method: an ideengeschichtlich model approach coupled with both computational techniques (text mining) and traditional close reading, that is, fine-grain manual textual analysis including following the cross-referencing indicated by the source text. As for the ideengeschichtlich model approach: the model approach (which is independent of the use of computational tools) requires using systematized clusters of stable and variable conditions (models) to fix the precise meaning of the concept(ion)s at scrutiny (see Betti and van den Berg 2014). Models in this sense are especially useful to trace (dis)continuities among different thinkers. We approach the issue of the ordering of concepts in Bolzano by focusing on the continuities between his ideas and Kant’s on the so-called traditional theory of concepts. For, notwithstanding the fact that the literature tends to focus on his differences with Kant, Bolzano was heavily influenced by Kant (see, e.g., Blok 2016, § 4.4); furthermore, Kant’s conception of science, like Bolzano’s, closely fits the traditional ideal of cognitio ex principiis (de Jong 2001, 2010), and Kant’s theory of concepts follows the traditional doctrine of praedicabilia behind the “tree of Porphyry.”4 The latter is a hierarchical ordering of being and, derivatively, concepts, such that the lower, more complex concepts are composed of higher, simpler ones. Our ap-

Bolzano, Kant, and the Traditional Theory of Concepts

proach makes methodological sense because the conceptual apparatus that Kant’s ideas presuppose has the right type of compatibility for us to compare the two thinkers fruitfully; in addition, if the traditional theory of concepts plays in Bolzano the role that it plays in Kant’s theory of science, then it is prone to be of substantial influence on Bolzano’s views on grounding—a circumstance that thus far has gone unnoticed. As to the computational techniques, we use an interactive, webbased information retrieval tool with a close-reading visualization interface at the front end called BolVis. BolVis has been codeveloped by our team for the specific goal of aiding philosophers in analyzing unusually extended text collections (van Wierst et al. 2018). Its main functionality is to perform queries on a collection of digitized texts (the corpus) and to return the corpus’s most relevant sentences for that query.5 The corpus counts about 11,000 pages (about 35%) from the Bernard-BolzanoGesamtausgabe (BGA), the modern edition of all of Bolzano’s writings, that have been professionally digitized and corrected to 99% accuracy for the sole purpose of internal research use within our team. Given our methodological and interdisciplinary angle here, this chapter is written in a somewhat unusual way for one in philosophy: for one thing, the philosophical results we obtain computationally are reported in a way that matches their actual discovery, a style reminiscent of use case descriptions in computer science papers focusing on tool evaluation. We do this to highlight the explicit, logbook-style manner of research appropriate to truly interdisciplinary efforts in teams composed of philosophers and computational experts with equal weight. We introduce the traditional theory of concepts before discussing computationally appropriate formulations of our research questions so that they can be addressed with a tool such as BolVis. We offer a technical description of BolVis for a philosophy audience. The final section describes how we use BolVis to answer our research questions and how it aids text-based philosophical research in general. The appendix (online only) describes corpus (A), algorithmic steps (B), and an additional use case (C) related to ours for control, that is, to exclude that the valuable results we get on our use case result from overfitting (i.e., don’t generalize).6 The Traditional Theory of Concepts De Jong and Betti’s (2010) Classical Model of Science (CMS) models the ideal of science as “knowledge from the principles” as a system of seven conditions with fixed and variable parts. Three of these conditions concern concepts (de Jong and Betti 2010, 186); here are the first two:



Ginammi, Koopman, Wang, Bloem, and Betti

A science S is a proper science according to the CMS, if: (2a) There are in S a number of so-called fundamental concepts (or terms). (2b) All other concepts (or terms) occurring in S are composed of (or are definable from) these fundamental concepts (or terms).

Conditions 2a and 2b express that in a proper or ideal science according to the CMS there is a distinction between fundamental and non-fundamental concepts, where the latter are (ultimately) defined or composed starting from the former. Conditions 2a and 2b can be said to capture the common minimal core of the definitional ordering of concepts in any axiomatic conception of science. A fortiori, 2a and 2b also hold for the so-called traditional theory of concepts. What is distinctive about this theory is a specification of 2b to the effect that two situations occur: (1) the definitional order proceeds by term conjunction (logically speaking) or mereological concept-composition (ontologically speaking, that is, when one considers concepts as a special kind of object) of simple(r) concepts, so that complex concepts are composed from simple(r) ones; (2) the composition/conjunction at issue is of genus proximum and differentia specifica according to the tree of Porphyry. In other words, for A, a, b concepts, in the traditional theory of concepts, definitions have this form (de Jong 2010, § 4): A = a ⊕ b,

where the (complex) definiendum A is a conjunction or composition of the (simpler) definientia a and b, where a is genus proximum (with respect to definiendum A) and b is the differentia specifica (with respect to genus a). A famous example of such a genus differentia definition is the traditional definition of the species human as a rational (= differentia) animal (= genus). So defining A (e.g., human) comes down to indicating that the concept-whole A is composed of concept-part a (genus proximum) and concept-part b (differentia specifica). Note that in this example the genus (animal) is a more fundamental concept than the definiendum (though not absolutely fundamental, as it is in turn capable of analysis). Importantly, such genus differentia definitions obey a hierarchical system of concepts ordered by generality: going up in the hierarchy, one finds ever higher and more universal genera; going down, one encounters ever lower and more specific species (cf. de Jong 1995, 623). Previous research on Kant has shown that he followed the traditional theory of concepts—a key factor to understanding Kant’s famous analytic-synthetic distinction in the context of his philosophy of science (de Jong 1995). This suggests the following specification for CMS for Kant (specification underlined, henceforth CMSk):

Bolzano, Kant, and the Traditional Theory of Concepts

A science S is an ideal science according to the CMSk, if: (2a k) There are in S a number of concepts that are fundamental. (2bk) All non-fundamental concepts in S are definable as a conjunction/ composition of genus proximum and differentia specifica.7

We leave aside here all details related to conjunction/composition issues8 and concentrate on finding the answer to this question: (Q) Are, according to Bolzano, all non-fundamental concepts in an ideal science definable per genus proximum and differentia specifica? In the next section, we set up our research to answer this question. Ours is a more articulated and nuanced answer than that given by de Jong and Betti (2010), which is an unargued no. Research Setup for Bolzano and the Traditional Theory of Concepts with BolVis When relying on a text-mining tool such as BolVis for text-based historico-philosophical research, we need to take into account two basic methodological circumstances related to the fact that BolVis’s query input and output are term-based (see also the concluding section of chapter 9 in this volume by Malaterre, Chartier, and Pulizzotto). First, since (Q) is formulated as a concept-based philosophical question, BolVis cannot be used to answer it; (Q) must first be translated into (multiple) term-based questions (van Wierst et al. 2016), including at least this: (Q*) Are there units of text in our Bolzano corpus containing the terms genus proximum and/or differentia specifica or containing semantically related words?

Note that we take it to be a matter of course that it is always possible to pass from questions such as (Q) to questions such as (Q*) whenever (Q) itself can be answered in a traditional way by reading texts: the process of identifying terms that represent in a text the concepts of interest is one that we as philosophers carry out routinely, but also largely implicitly. In computational text analyses, however, that process needs to be made as explicit as possible. Second, we need to identify which other term-based questions like (Q*) we need to answer, which means identifying as many relevant query terms as we can beforehand. The more we identify, the larger the chance that we arrive at all the corpus passages pertinent to (Q). Key to this process is an understanding of the context of our questions: the more precise and more in-depth that understanding, the more promising will be our queries. We know from traditional research that the tree



Ginammi, Koopman, Wang, Bloem, and Betti

of Porphyry, which relies with modifications on Aristotelian doctrine, comprises five praedicabilia (genus, species, differentia, proprium, and accident), so looking for proprium, for example, instead of genus might give pertinent results as well. For an understanding of the context of our questions, we rely on de Jong’s (1995) reconstruction of Kant’s account of concepts.9 We will see how this helps us identify (potentially) relevant query terms and create hypotheses about Bolzano’s account. Kant related the composition of concepts to a famous distinction he introduced between two types of a priori judgments: analytic and synthetic (de Jong 1995, §§ 7, 10). Complex concepts, in Kant’s view, are composed of other concepts: the characteristics, parts, or partial concepts of that complex concept (de Jong 1995, 622). The proximate parts of a complex concept—also called by Kant the constitutiva, rationes, or essentialia of that concept—make up the logical essence of that concept (de Jong 1995, 633, 635–37) and are always two: a kind or genus concept and a specific difference (differentia specifica) concept (de Jong 1995, 624, 633).10 In accordance with the tradition, Kant held that concepts are ordered in a hierarchy as “higher” and “lower” concepts based on the relation between genus and species, where higher/lower relate to the concepts’ position in a Porphyrian tree: a genus is a higher concept with regard to its (remote) species, and a species is a lower concept with regard to its proximate genus and (if any) its remote genera (de Jong 1995, 626). Kant also expresses the hierarchical ordering of concepts in terms of intension and extension and the relation of “contained in” and “contained under.” Kant calls the collection of partial concepts (proximate as well as remote) of a given complex concept the concept’s intension and says that these partial concepts are contained in that complex concept (de Jong 1995, 622–23, 626). Kant calls the lower concepts relative to a given concept (i.e., that concept’s species) that concept’s extension and says that they are contained under that concept (de Jong 1995, 622–23, 626). Intension and extension are inversely related in Kant’s view: if a certain concept is contained in another one, then the latter is contained under the former (de Jong 1995, 626–27). Hence, Kant accepted what is known as the “canon of reciprocity”: the bigger a concept’s intension, the smaller its extension (de Jong 1995, 622–23, 626). Answering our question (Q) will establish whether Bolzano’s account of concepts is similar to Kant’s. So far, interpreters have claimed that it rather isn’t. Siebel (2011, 102), for example, claims—in arguing that Bolzano’s famous criticism of Kant’s distinction between analytic and synthetic judgments is not to the point—that Bolzano misunderstood Kant because he failed to see that Kant adhered to the traditional

Bolzano, Kant, and the Traditional Theory of Concepts

theory of concepts.11 Siebel seems thus to imply that Bolzano himself had a different conception of concepts. By contrast, we focus on the similarities between Bolzano and Kant. As we shall see, this enables us to shed new light on Bolzano’s take on CMSk. Method and Tool(s) To obtain the results we present in the next section, we use the following (iterative) workflow: we identify, starting from an abstract philosophical question, promising query terms for the text corpus and then close-read the (sections containing the) sentences returned by BolVis. The text queries we performed are indicated as Query and Exact Query. Exact Query indicates exact string matching searches. Given an input query—for example, Definition as in Exact Query 1 below—BolVis returns sentences containing exactly the string queried; for instance, (1) Definition or even (2) Def[ini]tio[n], but not (3) Definitionen or definiren, as the corpus is not stemmed, nor (4) Erklärung. The query syntax allowed by the tool, however, admits rather complex exact queries via regular expressions,12 and a result like (3) can be easily obtained this way. Query, by contrast, indicates a search functionality that returns results related, but not necessarily identical, to the initial query, such as (4). For example, in Query 1, we search for “genus proximum et differentiam specificam,” and BolVis returns, among other strings of text related to the initial query, one important two-language sentence containing the string “genus prox. und differentia spec.” The language technology behind BolVis makes the Query kind of results possible. In particular, note that we did not have to restrict the search to a particular language. BolVis is a web interface built on top of a text-mining tool called Ariadne, developed at OCLC Research (Koopman et al. 2015; Koopman, Wang, and Englebienne 2019). BolVis makes the technology from Ariadne easily available to users via their web browser, offering a way to navigate the structure of the corpus and close-read the text. Several instances of Ariadne exist, one for each corpus of application,13 and three different versions exist based on our Bolzano corpus. For this chapter, we have used two of them: the web version with BolVis as interface at the front end and a desktop version from autumn 2018 with a more basic interface that also allows exact searches. The inner workings of the two versions of Ariadne are identical. For reasons of space, we confine a step-by-step description of Ariadne’s algorithmic procedure to appendix B online, but note that we consider step-by-step descriptions of this type to be of key methodological importance for tool transparency. The language technology used by Ariadne is statistical and relies on the distributional hypothesis (Harris 1954; Sahlgren 2008), which



Ginammi, Koopman, Wang, Bloem, and Betti

Table 10.1. A count vector for the word Gattung















states that words with similar meanings tend to occur in similar contexts. Natural language processing develops and applies statistical models based on this idea—distributional semantic models (DSMs). While little conceptual work is done in clarifying what aspect of language DSMs exactly capture (Gladkova and Drozd 2016), DSMs prove helpful for finding units of text, such as paragraphs or sentences, that are related to specific search terms in a large corpus, even if the search terms themselves are not used in the relevant paragraph.14 In general, DSMs rely on the so-called vector space model: the basic tenet behind it is that it is fruitful to model words or documents mathematically as vectors in high-dimensional spaces and to see certain properties of vectors as representing (measures of) similarity between words. In previous work we described the basics of the vector space model in connection with another software we developed, SalVe, based on that model (van Wierst et al. 2016). In explaining the working of Ariadne in this section, we take for granted some of the basics just mentioned. Ariadne uses count-based modeling; that is, it creates DSMs that rely on counting how often words co-occur with other words in the same unit of text (a sentence, a paragraph, etc.) or word window of a certain size. Let’s consider table 10.1 for an example. In table 10.1, a vector is represented as an array of numbers modeling the word Gattung in our corpus (we also say that the vector is constructed or generated for the word Gattung by using a certain method). The Gattung-vector has as many components as the number n of (other) words in the corpus (more precisely word types): n will also be the number of dimensions of the vector space. Since the corpora in DSMs tend to be very large, the number of dimensions of the vector space also tends to be very large (typically a few hundred thousand dimensions). The Bolzano corpus we use in the research here is quite small compared to typical corpora in distributional semantics, but it still has over nineteen thousand different word types, which implies that the dimensions of the semantic space of the Bolzano corpus is also above nineteen thousand. Table 10.1 displays seven of the n components of the Gattung-vector (seven of the over nineteen thousand dimensions of the semantic space), associated with the words Artunterschied, zur, Art, Urtheile, Urtheilen, Theile, and Beschaffenheiten. The numbers specified in the table indicate that Gat-

Bolzano, Kant, and the Traditional Theory of Concepts

tung and Artunterschied co-occur only five times, whereas Gattung and Art co-occur eighty-eight times within a certain word window. Under the distributional hypothesis we mentioned above, count vectors—namely, vectors the components of which are co-occurrence counts—may be thought of as representations of word usage insofar as (very) similar words will (ideally) have (very) similar vectors. This is also why each vector component is often thought of as representing an aspect of the meaning of a word—in some rough sense of “meaning” related to language use that remains unclear in NLP and related fields.15 We won’t attempt to clarify the issue here: we follow practice and talk of “semantic similarity” between vectors, but it’s important to stress that semantic similarity of two words in this context is simply a mathematical measure of distance between two vectors generated for the two words in question given a certain corpus; a widely used distance measure is, for example, cosine similarity (see Gladkova and Drozd 2016 for a useful discussion of similarity and relatedness in distributional semantic models). Note that what we just saw might help clarify one important reason why the corpora used in distributional semantic modeling are typically very large. DSMs rely on a notion of semantic similarity based on actual statistics of word usage: this makes data sparsity problematic because small corpora are bound to be data sparse, as data here are occurrences of words in context. Consider the vector v of a word A occurring only a few times in a corpus of several million (token) words: v will be sparse, as co-occurrence counts in every dimension will be low. Suppose now that A is, on an intuitive understanding of meaning, very similar in meaning to another word B; however, word B occurs far more frequently in the corpus. The vectors for A and B would be quite different from each other, despite the fact that A and B are close in meaning in an intuitive sense, because the corpus contains too little evidence for all the possible dimensions of the infrequent word. To obviate this problem, in most DSMs co-occurrence counts are normalized to control for word frequency. This also happens in the Ariadne model of our Bolzano corpus. However, the larger the corpora, the higher the number of dimensions of the vector space. Since systems that need to compute a very high number of dimensions are computationally inefficient, techniques of dimension reduction are usually applied, as a smaller semantic space (i.e., the geometrical entity defined by the amount of dimensions) is easier to process. Dimension reduction consists, very roughly, in removing some of the over nineteen thousand columns in table 10.1, while aiming still to preserve distance relations among vectors (and thus among words). Preserving distance among vectors is important because, as we have seen, vector distance is what represents similarity relations among



Ginammi, Koopman, Wang, Bloem, and Betti

words as they are used in the corpus: removing dimensions from vectors means influencing distance among them and thus the representation of similarity relations among vectors. Under certain conditions, distance is generally considered as preserved. It remains difficult, though, to have any intuitive understanding of which aspects of meaning, if any, are captured by vectors that undergo dimension reduction. This is because vectors are then created from highly compressed information and because the compression in question typically makes use of randomization techniques that influence the end result. Dimension reduction is also applied in the Ariadne model for our corpus. What is the impact of normalization or dimension reduction for our case? This is a question we aren’t yet able to answer satisfactorily. Research in applying various types of distributional semantic modeling to philosophical corpora as small as ours is in its infancy, and it generally does not include fine-grain investigations at the detail of tweaking, for example, several different types of dimension reduction used (or not using any) or the removal of very infrequent words (or not removing anything). We will come back to this very important point—proper evaluation—in the conclusion. Results In this section, we show how we use BolVis to answer our question (Q): Are, according to Bolzano, all non-fundamental concepts in an ideal science definable per genus proximum and differentia specifica? As we saw above, a natural first step to take to find the answer—that is, to discover to what extent Bolzano adhered to CMSk—is to check if Bolzano discussed (Q) explicitly and, if so, what he wrote about it. To this aim, we initially focus on (Q*): Are there units of text in our Bolzano corpus containing the terms genus proximum and/or differentia specifica or containing semantically related words? Query 1

We query BolVis for “genus proximum et differentia specifica.” BolVis returns a list with sentences,16 of which we skim the first fifty. Of these, 1–6 actually contain the terms genus proximum and differentia specifica, while 12 contains those terms abbreviated (“genus prox. und differentia spec.”); the remainder are sentences (partly) in Latin. The sentences containing the query terms (or their abbreviations) all seem relevant, especially the second on the list, this sentence from the Wissenschaftslehre, section 559 (henceforth WL§559): “Unrichtig, und zum Theile schon mit den beiden vorigen widerlegt, däucht mir auch der fast allgemein angenommene Kanon, daß eine jede Erklärung aus genus proximum

Bolzano, Kant, and the Traditional Theory of Concepts

und differentia specifica bestehen müsse.” Here, Bolzano writes that to him it seems wrong and already partially refuted by what he wrote before, that, as is commonly accepted, every definition should consist of genus proximum and differentia specifica. This seems to be a good starting point for our research, so we turn to a close reading of the source text. Close reading. In WL§559, Bolzano discusses existing accounts of definition (his term is Erklärung), and from this we learn some important things about Bolzano’s own views.17 Bolzano believes definitions of a certain concept give the parts of that concept—that is, its partial concepts (Bestandtheile)—plus the way in which these parts are connected in that concept. A definition (the definiens) is not merely equivalent to the concept it defines (the definiendum) but is identical to it in Bolzano’s view, and thus every complex concept has exactly one definition. About genus-differentia definitions specifically, Bolzano writes that it is not the case that every definition consists of genus proximum and differentia specifica because such definitions do not work for certain kinds of concepts: negative, objectless, and imaginary concepts. Later, in WL§559.13, Bolzano writes that even for concepts that are highly complex, it is not wrong to say that they consist of two parts: genus proximum and differentia specifica. So far Bolzano seems to tweak (rather than outright reject) condition 2bk of CMSk. It remains to be clarified, for cases in which according to Bolzano definitions may be considered of the genus-differentia type, what role he attributes to the genus and differentia concepts. Specifically, do genus and differentia concepts determine the conceptual hierarchy captured in the CMSk, as they do for Kant and the tradition? The third result of the previous query seems interesting: a sentence of WL§509 in which Bolzano writes that one cannot find the highest genus (höchsten Gattung) by means of identifying genus proximum and differentia specifica but rather by identifying that simple concept which has the largest extension (weitesten Umfange). Query 2

To gather more evidence on Bolzano’s concept of genus, we query BolVis for Gattung (genus). Of the first seven sentences that BolVis returns, four of them are from WL§117, so we close-read that section. Close reading. In WL§117, Bolzano discusses the ancient doctrine of the universals (Universalia oder Praedicabilia) and discusses the concepts genus, species, and differentia (Unterschied). We learn that Bolzano regards the notion of genus as highly similar to his own notion of common concept (Gemeinbegriff ). However, not all common concepts are genera, according to Bolzano, for some common concepts have just two



Ginammi, Koopman, Wang, Bloem, and Betti

objects in their extension and are therefore not genera but species, to wit, lowest species. Moreover, the concepts of genus, of species, and even that of necessary property (proprium) are relative, in the sense that one can call a certain concept in one context a genus and in another a species or necessary property, Bolzano writes. Notably, Bolzano writes that many, but not all, species concepts are composed of a genus concept and a differentia concept. As an example, he mentions the concepts “real” (wirklich) and “possible” (möglich), which he takes to relate to one another as species and genus and thus as the former being subordinated (untergeordnet) to the latter, whereas he takes it not to be the case that the concept “real” consists of the concept “possible” plus something else (a differentia). For any two simple (einfache) concepts of which one is subordinated to the other, Bolzano adds, it necessarily holds that they have no partial concepts in common at all.18 This is remarkable: apparently, according to Bolzano, it can be that two concepts are simple and one is subordinated to the other. This implies that there are different manners to order concepts, which are both in line with Bolzano’s views: by composition and by subordination. Our new hypothesis is that there are two orderings of concepts: from more to less general and from simple(r) to (more) complex. We now want to know which one of these two orderings is correct to take to be “the conceptual hierarchy” in Bolzano’s view. Query 3

We query for Unterordnung (subordination) and the first result is the title of WL§97, a section dedicated to exactly that notion. Close reading. When we read that section, we see that Bolzano explains that a concept A is subordinated to a concept B iff every object that falls under A (i.e., every object to which A refers) also falls under B, but not vice versa (and thus concept A is more particular than B, B more general than A). Importantly, Bolzano here talks of “higher” (höher) and “lower” (niedriger) concepts: if a concept A is subordinated to a concept B, then Bolzano says B is a higher concept than A, A a lower concept than B. Now we want to check whether Bolzano talks of “higher” and “lower” concepts also with regard to composition, that is, whether Bolzano would also say that a concept A is higher than a concept B in case B is composed of A and something else. Query 4

To this aim, we do several queries related to the conceptual hierarchy (e.g., höher niedriger Vorstellung and unterordnung allgemein)19 and for all of them check the first fifty sentences that BolVis returns. In none of these sentences does Bolzano use the words “higher” and “lower”

Bolzano, Kant, and the Traditional Theory of Concepts

with respect to the composition of concepts (except in a passage from WL§97, in which he discusses a view from other logicians that he explicitly rejects), and in many instead he uses these words again in relation to the notion of subordination. We conclude that according to Bolzano, the hierarchy of concepts is governed by subordination, not by composition. We have seen that for Kant the genus and differentia determine the concept’s position within the conceptual hierarchy, and the latter is based on the concept’s intension—and, in virtue of the canon of reciprocity, on extension, too. For Bolzano, to the contrary, the conceptual hierarchy is based on extension only—namely, via the relation of subordination—and this position is available to Bolzano only in virtue of his rejection of the principle of reciprocity.20 The key step here is indeed Bolzano’s severance of the two orderings of subordination (related to generality) and composition (related to simplicity): Bolzano’s notions of genus and species regard only the concept’s extension in the Bolzanian sense—that is, the objects falling under the concept—and it is the concept’s extension that determines a concept’s place in the conceptual hierarchy; the concept’s place in the hierarchy is not (also, and equally) determined via the intensional operation of adding differentiae to the (intensional) genus as part of the content of a concept, as it is in Kant. However, we have also seen that Bolzano argues that although not all, still many definitions are (or may be considered) of the genus-differentia type. What, then, does he see as the role of such definitions? In order to understand this, we first need a better idea of what exactly definitions are in Bolzano’s view. Query 5

We query for “definition” (Erklärung) and obtain several sentences in which Bolzano defines this concept: a definition of a concept A gives the parts of A plus the way in which these parts are connected in A (e.g., WL§§23, 555, 559). However, we note that in writing about definitions, Bolzano uses many words that have to do with engaging in scientific activity from the subjective point of view of human agents rather than with the objective side of science in itself. For example, in WL§559, Bolzano writes that definitions serve to make concepts (more) “distinct” (Verdeutlichung) and, in WL§23, that definitions amount to “pointing out” (angeben) the components of a complex concept. Now Bolzano is famous for having been the first to rigorously distinguish between science in an objective sense, that is, as a body of mind-independent truths and concepts, and science in a subjective sense, that is, as an activity that people engage in, as something presented in textbooks and taught to



Ginammi, Koopman, Wang, Bloem, and Betti

students, and so on (de Jong 2001). The distinction between the objective and the subjective order of science is rather clearly reflected in Bolzano’s use of language in the two different cases (van Wierst et al. 2016). Therefore, at this point our hypothesis is that definitions, for Bolzano, play a role only on the subjective side and not on the objective side of science. Our query with BolVis does not give any result that explicitly confirms our hypothesis, but it does give many more sentences concerning definitions in which Bolzano again uses words that have to do with science as a subjective activity. Exact Query 1

In the hope to find more relevant passages using exact search, we query Definition in exact search and here we find our confirmation. In his philosophical diaries (Philosophische Tagebücher 1811–1817), Bolzano writes: Defi[ni]tion[en] s[in]d k.[eine] eig[en]tl.[ichen] Urth[ei]le, n[ä]hml[i]ch k[ein]e solch[e]n, [die] [einer] Wiss.[enschaft] an sich g[e]hört[en] sond. [ern] hist.[orische] Urth[ei]le, w[e]lch[e] uns sag[en], d[a]ß wir z[u]r B[e] z[ei]ch[nun]g d[ie]s[e]s o[der] j[ene]s B[e]g.[riffes] d[ie]ß o[der] j[ene]s Z[ei] ch[en] w[ä]hl[en] Urth[ei]l[e], w[e]lch[e] w[o]hl als Hülfs[mi]tt[e]l z[u]r V[e]rst[ün]d[i]g[un]g üb[e]r d[ie] Wiss.[enschaft] nothw.[endig] s[in]d, ab[e]r [ni]cht S[ä]tze d[er] Wiss.[enschaft] s[e]lbst ausmach[en].” (Bolzano 1803–1817, 38–39)21

Even though this passage is from an early date and Bolzano’s vocabulary is not yet as exact as later on in the WL, here he also clearly distinguishes between the subjective side (signs, comprehension) and the objective side of science (“science in itself”). And Bolzano expresses very explicitly in this passage that definitions belong to the former, not the latter. Because this passage confirms the hypothesis we formed based on passages from the WL, we feel confident enough to conclude that Bolzano consistently held that definitions have their place merely on the subjective, and not on the objective, side of science; that is, they do not have a place within the objective order of truths but merely help human beings to discover and understand those truths. This implies that genus-differentia definitions in Bolzano’s view also belong merely to the subjective side of science. And thus, it seems Bolzano believes that considering complex concepts as consisting of a genus and differentia may be useful for, for example, making a number of concepts (not all!) distinct, or for the presentation of truths, but the notions of genus and differentia do not play a role with respect to the (relations between) truths and concepts in themselves.

Bolzano, Kant, and the Traditional Theory of Concepts

Let us finally formulate an answer to our question: (Q) Are, according to Bolzano, all non-fundamental concepts in an ideal science composed of (definable per) genus proximum and differentia specifica? No, at least not to Kant’s extent, but this does not mean that Bolzano’s position is an all-encompassing rupture with Kant’s, as suggested in the literature. In particular, Bolzano retains the specific role that the conceptual architecture of the praedicabilia plays with respect to the analytic/synthetic and a priori/a posteriori dichotomies. First note that the term “fundamental” in conditions 2a and 2b of the CMS has become somehow ambiguous when considering Bolzano’s views—or better: 2a and 2b do not tell the whole story. In Kant and the tradition, as we know from de Jong (1995), the hierarchy of concepts is such that the fundamental concepts are the simplest and most general concepts. That is, Kant does not (need to) distinguish the conceptual order imposed by generality from the conceptual order imposed by simplicity, because for him complexity and generality are inversely related. But for Bolzano, as we have seen, absolutely simple concepts are not necessarily equally general, and thus Bolzano does (need to) distinguish the two orders. In the order imposed by composition, simple concepts are more fundamental than (more) complex concepts, whereas in the order imposed by subordination, general concepts are more fundamental than (more) particular concepts. But with respect to the former, as we have seen, Bolzano writes that even for highly complex concepts, it is (in most cases) not wrong to say that they consist of two parts: a genus proximum and a differentia specifica. However, such genus and differentia concepts may serve in Bolzano’s view in definitions and may be epistemically useful to make concepts distinct or in the presentation of a science, but they do not play a technical role in the objective order of a science. Non-fundamental concepts, according to Bolzano, are composed of or definable from fundamental concepts, as required by condition 2b of CMS, only when fundamental is to mean only the simplest and not (also) the most general concepts. But our research has revealed that Bolzano speaks of higher and lower concepts when discussing the order imposed by subordination, and not when discussing the one imposed by composition. This suggests that, even though Bolzano certainly accepts conditions 2a and 2b, these conditions for Bolzano do not capture “the conceptual hierarchy,” for by this he means the hierarchy of concepts in terms of subordination (i.e., generality). We take that it is this ordering that is relevant for grounding. This finding has important repercussions for existing accounts of grounding, and any present-day research relying on them, and opens up a substantial line of research, which we will pursue in future work.



Ginammi, Koopman, Wang, Bloem, and Betti

What do 2a and 2b in Bolzano’s embodiment of the CMS look like then? We have argued that Bolzano’s view is a modification of the traditional theory of concepts, which we can fix as follows: In a proper science S, (2aB) There are a number of concepts in S that are fundamental in the (mereological) sense of being simple(st) (of no/least complexity) and a number that are fundamental in the (quasi-semantic) sense of being most general (least particular). (2aBbis) Among simple(st) concepts, some are more general than other concepts. (2bB) Not all, though likely most, of the non-fundamental concepts in S in the mereological sense, i.e., the (more) complex ones, are definable per genus proximum and differentia specifica.

DSM Ariadne Is in Need of Proper Evaluation for Philosophy To arrive to the reconstruction of Bolzano’s embodiment of the CMS that wraps up the last section, we have applied a mixed method including a close-reading tool, BolVis, to a corpus of a large part of Bolzano’s writings. Searching on BolVis for freely specified fragments of texts returned results ranked by highest semantic similarities with respect to the initial query in the sense we explained above. BolVis has been both complementary and crucial to our investigation, by enabling in practice the examination of a far larger volume of text than is usual in more traditional modes of research in philosophy. The advantages of this approach for philosophy and related disciplines are undeniable. We have valuably used Ariadne’s technology to find out which words or fragments of text are most similar to a certain input (query)—that is, words with vector representations with a high cosine similarity—even if the words that are used in the text are completely different from the query. Note, however, that the language technology behind BolVis, Ariadne, has never been properly evaluated against expert judgments from philosophy or related domains. That is, whereas Ariadne has received evaluation against generic technical benchmarks such as STS (semantic textual similarity, cf. Koopman, Wang, and Englebienne 2019),22 methodological questions arising from our specific computational procedure for corpora such as ours do not yet have a proper answer. We don’t know, for example, why exactly we got results mostly from the Wissenschaftslehre. The point is that we can’t take this at face value.23 We don’t know whether we missed out on important passages that the

Bolzano, Kant, and the Traditional Theory of Concepts

system did not output (recall)—or those it did output but were misranked from the viewpoint of the philosophers’ judgment of relevance (precision). We do know that some of the output passages were, for the philosophers, irrelevant and that we generally cherry-picked the relevant ones. Whereas fixing standards of relevance for the field that can approximate expert judgment is highly challenging, it is also the only way to increase trustworthiness of computational text analysis: we need to design experiments to make these standards explicit and measurable.24 Our use of models is a step in this direction, but going beyond the kind of work we do in this chapter requires that we provide annotations for model variants such as CMSK and CMSB that are more specifically of a linguistic type. This will be a topic for future work.


Chapter 11

The Evolution of Evolutionary Medicine Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

For several decades interdisciplinary research has been pushed by funding agencies, science administrators, and generations of well-intentioned scientists. Interdisciplinary research is needed, so the argument goes, because the problems we face in medicine, environmental sciences, sociology, or anthropology—the list can go on—are too complex to be mapped onto one traditional discipline. While the motivation for interdisciplinary research is clear (Frodeman, Klein, and Pacheco 2017), its actual success is less obvious. For one, we don’t quite know how to measure interdisciplinarity. Although there have been many attempts at measuring interdisciplinarity (Bordons et al. 2015; Morillo, Bordons, and Gómez 2001; Nichols 2014; Porter et al. 2007; Porter and Rafols 2009), they typically refer to simple collaborations between field A and field B without taking into account the knowledge exchange at the heart of interdisciplinarity. Multiple studies find the need to distinguish interdisciplinary from multidisciplinary and transdisciplinary (Collin 2009; Frodeman, Klein, and Pacheco 2017; Helmane and Briška 2017; Klein 2008; Stock and Burton 2011); where interdisciplinarity combines disciplines in an integrative approach, multidisciplinarity simply uses two separate approaches from different disciplines, and transdisciplinarity occurs when disciplines are integrated but the product transcends disciplines and becomes more than the sum of its parts. Here, we exposed the heterogeneity of interdisciplinarity through computation. We also have a difficult time distinguishing different degrees of interdisciplinarity. Do we mean actual collaborations between scholars from different disciplines or are we more interested in a combination of 204

The Evolution of Evolutionary Medicine

different conceptual and methodological approaches, perhaps even in one person’s work? And how closely are those two layers linked? Does the successful application of different approaches require collaboration between scholars with different backgrounds? How can we tell whether any interdisciplinary approach is “better” and in what ways? Traditionally these questions are addressed in the context of individual case studies, such as with breakthrough discoveries. While those narratives provide detailed insights into some localized scientific cultures, we have no way of answering questions about interdisciplinarity on a larger scale. Yet understanding across individual cases is exactly the kind of information we need if we want to retool the scientific enterprise toward greater degrees of interdisciplinarity. In this study of evolutionary medicine, we asked these questions using evolutionary biology analogies and based on a complete dataset (continuously growing) of all publications in evolutionary medicine over the last four decades. By using these analogies, we can frame our results in familiar terms, move beyond traditional historical questions, and better understand the nature of interdisciplinarity and how to measure it. Evolutionary medicine is an interesting case, as it was quite intentionally created as an interdisciplinary field of science by Randolph M. Nesse and George C. Williams, first with a conceptual essay (Williams and Nesse 1991) and later a successful book (Nesse and Williams 1994). These publications serve, in effect, as the founder effect for the field. Their argument was quite straightforward. Humans are the product of evolution; so are diseases. In order to better understand and treat them, medicine needs to incorporate evolutionary perspectives. Conceptually the argument was easy to follow and quite convincing. Yet the actual scientific practice is another matter, and it is difficult to assess from individual reports whether and how evolution makes a difference to clinical practice. Analyzing the large corpus of all publications claiming to incorporate evolutionary biology into medicine can give us an answer to the question: What difference did it make to bring evolution into medicine? And we ask, in effect, did evolutionary medicine become a new scientific field distinguishable from other disciplines? In analyzing the evolution of evolutionary medicine, we first observe that the field steadily grew—in what can be interpreted as a successful population dynamics represented by exponential growth (figure 11.1). The specific patterns of how evolutionary medicine was established follow many other scientific fields (from initial informal gatherings and interest groups to the establishment of a scientific society and a journal), showing a standard growth rate for new areas of science (figure 11.1)— parallel to how a new species colonizes a new niche. As any new species



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

becomes established, it undergoes an intricate dance of highly regulated growth and maturation. Similarly, we provide evidence that evolutionary medicine also goes through a kind of growth and maturation phase. So far so good. But just how interdisciplinary has evolutionary medicine been? To address this question, we created different types of networks and quantitative metrics that crucially depend on a much larger dataset that one of us analyzed for his dissertation (Painter 2019). We show in a nutshell the following: Evolutionary medicine, as a scientific community, attracts researchers from different backgrounds—this pattern is a combination of migration and selection—including people from medical fields and evolutionary biologists, with the latter dominating slightly. At the level of participation, the field is indeed interdisciplinary. Publications in evolutionary medicine appear in a number of different journals, and again both medical and evolutionary biology journals are well represented. Evolutionary biologists are also publishing their research in more medical journals and vice versa. This again supports the claim of interdisciplinarity. Then, we compared how authors collaborate to the way keywords appear together within the publications. This is analogous to comparing the genotype (seen as the mixture of two parental types) of an organism to its phenotype (the individual appearance). The authors can be thought of as part of the genotype of evolutionary medicine, as they possess codified knowledge about their respective fields within themselves. This codified knowledge manifests as the language and ideas contained in their publications, giving evolutionary medicine its substantive characteristics—its phenotype. Analyzing the whole corpus, we find a conceptual history that reflects some of the emerging narratives given by practitioners about the history of the field, with a gradual diversification of topics and concepts. But if we perform a more fine-grain analysis based on the different kinds of journals where research is published and the background of those researchers, we can identify distinct subdiscourses that follow their own logic and history and create their own conceptual communities. Network analysis allows us to quantify these differences (see Painter, Daniels, and Laubichler 2021 and Painter, Daniels, and Jost 2019). These results challenge some of the perceptions that evolutionary medicine is a unified interdisciplinary discourse. They also suggest that we need comparative analyses of other interdisciplinary fields in order to investigate whether these are common patterns. Of course, this can only be done in the context of computational history because traditional methods are ill equipped to process large amounts of data. Ideally we

The Evolution of Evolutionary Medicine

can base these analyses on linked databases with standardized storage and metadata standards. We also looked at the patterns of collaboration in the evolutionary medicine corpus. Here we found a surprisingly low number of interdisciplinary collaborations between individuals who explicitly identify as interested in evolutionary medicine. At this level the field clearly lacks interdisciplinarity. These results of computational analyses allow us to better frame the discussion about interdisciplinarity by distinguishing different layers as well as providing quantitative evidence. Our analyses complement historical narratives of individual cases and can detect successes and failures, and therefore they provide a broader context for science policy. Learning to complement traditional historical scholarship with computational methodologies will prove essential for established historians and budding scholars, as future analyses may seem incomplete without some form of data analysis to accompany it. In addition, it is also a prime example for the evolution of knowledge. The Evolutionary Medicine Corpus A corpus is essentially a collection of texts, usually about a particular person or topic. When one sets out to create a corpus of texts, there are several considerations that must be dealt with. Reppen (2010) identifies several useful examples. Clearly defined questions are the pillar of corpus creation. We chose to define our corpus as broadly as possible with the driving question, How successful has evolutionary medicine been at bringing evolutionary biology into medicine? We justify using a broad definition because of the nature of studying the interdisciplinary phenomenon. By choosing to be more inclusive rather than restrictive, we are able to include disciplines that begin outside the scope of evolutionary medicine and then map their path of integration. With a more restrictive criteria, new disciplines simply appear as part of evolutionary medicine with no ability to measure how they got there. An inclusive perspective provides a more complete picture of evolutionary medicine. This corpus is the same evolutionary medicine corpus used in Painter, Daniels, and Laubichler (2021) and Painter, Daniels, and Jost (2019). It was created using two groups: 1. individuals who self-identify as interested in evolutionary medicine from the International Society for Evolution, Medicine, and Public Health (ISEMPH) global directory for interested scholars, clinicians, students, and community supporters (EvMedNetwork) (Nesse 2019) and



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

2. contributing authors to two major evolutionary medicine textbooks available at the time (Gluckman, Beedle, and Hanson 2009; Trevathan, Smith, and McKenna 1999). For each individual in this group, a comprehensive list of publicly available publications was obtained. This proved to be a herculean task, as it is important to disambiguate the individuals first. Disambiguation involves correctly attributing authorship. It is a well-known issue that scientists do not always publish using the exact same version of their name. Middle initials can be included or excluded depending on journal practice and personal preference. Certain heritages can often have many people with the exact same name. Disambiguation is an important step, but perfection is not required for large corpora. Small imperfections have been estimated to have little impact on the overall results (Barabási et al. 2002; Newman 2001). The Clarivate database Web of Science (WoS) was queried for each member’s publication history and provided exhaustive metadata that included author names, publication title, full citation records, and so forth, totaling 13,564 metadata records. Of these, 1,241 were duplicate records identified by more than one individual, so these were removed from the metadata, creating a corpus of 12,323 unique records. Figure 11.1 shows the number of evolutionary medicine publications each year, built from the evolutionary medicine metadata, compared to other biology disciplines heavily involved with evolutionary principles. Future work will explore how these growth rates scale to the total number of publications in the WoS. The metadata were used to collect PDF versions of each publication. The text was extracted where embedded text was available. The older PDFs were processed through ABBYY FineReader PDF, an optical character recognition technology (Heliński, Kmieciak, and Parkoła 2012). After we removed 5,867 text files that could not be deciphered because of poor-quality conversions, 53% of the corpus (or 6,456 full-text files) remained normally distributed among authors with a background in biology or clinical medicine. Therefore, our corpus effectively consisted of two parts: the metadata of all the records of evolutionary medicine and the remaining full-text documents. The individuals in the EvMed corpus are doctors, nurses, veterinarians, industry research scientists, academic professors, gym owners, and other members of the general populace. Most people in the EvMed corpus came from the EvMedNetwork list, which is not a place to publish original research but a place to collect and localize knowledge about evolutionary medicine (Nesse 2019). Anyone interested in evolutionary medicine can sign up to be on the EvMedNetwork list. Not everyone on

The Evolution of Evolutionary Medicine

the list has published in a peer-reviewed journal; therefore, not everyone will appear in the analyses. The corpus is representative of the people who contribute to or are interested in evolutionary medicine. While it undoubtedly includes publications not pertaining directly to evolutionary medicine, this corpus is overwhelmingly complete and contains the vast majority of all evolutionary medicine journal articles and books available. This is supported by Alcock (2012), who reported similar statistics in an analysis of evolutionary medicine publication trends from 1991 to 2010. When individuals sign up on the EvMedNetwork, they are given the opportunity to provide a short biography about where they work and what they are interested in, as well as contact information. This information was used to classify individuals based on their professional background and interests. The people were categorized into three groups: evolution, medicine, and other. Often, individuals explicitly stated their professional backgrounds. Others were researched through institutional biographies, lab web pages, Google Scholar accounts, PubMed IDs, and the like. Since we are interested in interdisciplinarity and the flow of knowledge, it is necessary to classify expertise in order to measure interdisciplinarity. Growth and Development Founder Effect

Every new species is influenced by a founder effect, which represents a subset of the genetic variation represented in the parent species. These differences are further reinforced as populations become isolated and start to gradually evolve into a new species. The evolutionary potential of emerging new species is constrained by the amount of genetic variation present in the founding members. New scientific fields are similarly influenced by their founders. Evolutionary medicine begins with “Dawn of Darwinian Medicine” (Williams and Nesse 1991) and the follow-up book Why We Get Sick: The New Science of Darwinian Medicine (Nesse and Williams 1994). These ideas grew into an endeavor to better educate medical professionals by introducing them to the general principles of evolution (Nesse et al. 2010). The founders of evolutionary medicine, two men, introduced the initial conceptual DNA: George C. Williams, best remembered for his work on senescence and his hypothesis of antagonistic pleiotropy (Williams 1957), and Randolph M. Nesse, who, prior to evolutionary medicine, was a distinguished professor and practitioner of psychiatry at the University of Michigan. In a 2016 interview with David Sloan



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

11.1. Publications in the Web of Science. It is typical for new areas of research

to begin with few publications and grow exponentially after reaching a particular saturation threshold. Note the y axis is log scale. The general theme of slow growth followed by a rapid increase in publications is shared by all the examples.

Wilson, Nesse recounts how he found Williams in his search for answers about senescence (Wilson and Nesse 2016). Nesse remembers two main questions fueling his curiosity: (1) Why had natural selection not gradually eliminated genes that cause faster aging? and (2) Why is the body not better designed? These two questions eventually led Nesse to meet with Williams, and the process of the speciation of evolutionary medicine from medicine and evolutionary biology began. Initial Growth

To understand the evolution of a scientific field, one must understand how early events in its development constrain and enable later stages. Here, we borrow the mechanistic approach from the field of developmental evolution to explore the role of key events that led to growth and development of evolutionary medicine (Laubichler and Maienschein 2007). Figure 11.1 illustrates that, while there are individual variations, many scientific fields follow a typical pattern of growth: many years of relatively few publications each year followed by a tipping point where the number of yearly publications drastically increases. The growth and evolution of these fields are driven by specific events in their respective histories. The first such events in evolutionary medicine came in the early days after the publication of Why We Get Sick, when much of the goal of evolutionary medicine was focused on getting evolutionary biology into medical schools’ curriculum (MacCallum 2007; Nesse 2008a,

The Evolution of Evolutionary Medicine

2008b). It was observed in 2003 that the majority of medical schools in the United States and the United Kingdom did not have a single evolutionary biologist on staff (Nesse and Schiffman 2003). Stephen C. Stearns, known for his work on life histories (Stearns 1992), was one of the earliest adopters of evolutionary medicine. Nesse and Stearns argued at length that evolutionary biology is a basic science and a necessary framework for fledgling doctors to organize their expansive medical fact base (Nesse and Stearns 2008). The collaborations between Nesse and Stearns led to symposiums, which eventually grew into full-blown evolutionary medicine conferences. The first meeting of the Evolutionary Medicine Network in 2005—held at the Institute for Theoretical Biology, Humboldt University, Berlin—marks the tipping point in the transition away from relatively few publications per year. Other developmentally significant events can be found along the growth curve, shown in figure 11.1. The EvMed Review was founded in 2008 as an information nexus for the evolutionary medicine community. The year 2013 saw the creation of the first journal dedicated to evolutionary medicine, Evolution, Medicine, and Public Health. The simple act of mapping the number of publications revealed historical events of interest worthy of further investigation, and a developmental evolution framework allows for such questions. For instance, after 2005, when the first meeting was held, evolutionary medicine experienced rapid growth in the number of publications—undoubtedly a result of increased exposure and legitimacy within the scientific community. And after the evolutionary medicine journal was created—thereby creating a dedicated repository for publications about evolutionary medicine— more articles about evolutionary medicine than ever before were being published yearly. Population Dynamics

Evolutionary medicine is a diverse field. In our corpus the individuals are classified based on their self-reported expertise or by researching the individuals’ publication history, and the journals in which they published are classified based on their assumed main reader base. Individuals were given one of three designations: evolution, medicine, or other. The EvMed corpus contained 549 individuals classified with an evolutionary biology background; 210 individuals claimed to have a medical background, and 95 people were classified as other. These are only the individuals registered in the EvMedNetwork and contributing to the textbooks. The entire corpus contains over 33,000 authors. Not every individual registered in the EvMedNetwork had published in journals available in the WoS database. The WoS platform covers 34,200 jour-



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

11.2. Author Background and Publication Location. The authors in the evolutionary medicine corpus were labeled by their reported professional interests. Those with a clinical background were labeled medicine, and those with an evolutionary biology (or similar field) background were labeled evolution. The rest of the individuals were labeled as other (not shown). The journals that published the evolutionary medicine publications were also labeled in a similar fashion based on the journal’s reader base. If a journal was most likely to be read by evolutionary biologists, then it was labeled biology. If it was more likely to be read by a clinician, it was labeled medical. For the journals, there was also a general interest category and an other category (both not shown).

nals, books, proceedings, patents, and datasets with 151 million individual publication records, 37.2 million patents, and 7.3 million datasets. The journals were classified as biology, medical, or other based on their intended audience. The EvMed corpus contains publications from 623 biology journals, 1,520 medical journals, and 588 journals labeled “other.” The latter includes general-interest journals such as Science, Nature, and PNAS (Proceedings of the National Academy of Sciences). There are more than double the number of medical journals compared with biology journals, which is an interesting dichotomy in that the majority of individuals have an evolutionary biology background and the majority of journals containing articles about evolutionary medicine are for a medical audience. Migration

Evolutionary medicine was created to bring the lessons learned from evolutionary biology into medicine. This represents a kind of knowledge migration, with knowledge moving from discipline to discipline. Here, we consider the disciplinary background of the individual scientist to represent the kind of knowledge they contribute, and the journal indicates the discipline where this knowledge is migrating to.

The Evolution of Evolutionary Medicine

When the publications themselves are examined, 3,924 articles were published in journals with the biology designation, and 5,981 articles were published in medical journals—the opposite of what one might expect given the expertise of the individuals. This indicates a certain level of interdisciplinarity within evolutionary medicine. Figure 11.2 illustrates what author background was denoted for those publications in each type of journal. In biology journals, 3,118 publications were authored by individuals with an evolutionary biology background. In this analogy, no knowledge is said to have migrated outside of biology. However, 4,220 publications in medical journals came from evolutionary people. This would indicate knowledge from biology is migrating to medicine. In the reverse direction, 743 publications in biology journals came from authors with medical backgrounds, compared to 1,661 publications in medical journals coming from medicine authors. Surprisingly, only 65 publications in biology journals and 102 publications in medical journals contained authors with both medical and evolutionary biology expertise. These results also show that interdisciplinarity is not as straightforward as one might expect. If we only examine the individuals or the journals they publish in—the population dynamics—evolutionary medicine is a successful interdisciplinary field. That is, 64% of the individuals have an evolution background, 25% have a medical background, and 11% of the people have a miscellaneous background. When the journals are examined, roughly 50% of the total articles in the corpus are published in medical journals and 33% are published in biology journals, with the remaining 17% being published in general-audience and unrelated journals. If we define interdisciplinarity only by who is publishing in a scientific field and where, evolutionary medicine appears to follow that definition. Figure 11.2 clearly shows how these researchers publish. The majority of the articles in the EvMed corpus are published by individuals with an evolutionary background in medical journals, followed by evolutionary backgrounds in biology journals. This is not surprising given that the majority of individuals in the EvMedNetwork possess an evolutionary biology background. Figure 11.2 also illustrates that most of the articles in the corpus are published in medical journals. Selection

Sometimes a historian of science engaged in computational analysis will need to assume the mantle of scientist. We can learn something about the conceptual ecosystem of evolutionary medicine by examining those who choose to publish about it. When we consider that twice



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

as many evolutionary biologists as clinical professionals are involved in evolutionary medicine, it is reasonable to conclude that evolutionary medicine is somehow a more attractive research agenda for evolutionary biologists than medical professionals—they have a higher fitness in the evolutionary medicine conceptual ecosystem. If scientists are likely to recruit other scientists into evolutionary medicine and the same with doctors, the difference in conceptual fitness is illustrated simply based on participation rates. It helps to think about this in terms of selection. Individuals in a population with a high fitness for a particular environment thrive and reproduce. This seems to have been the case for individuals with an evolutionary biology background upon entering the evolutionary medicine ecosystem. Genotype

For the purpose of our argument, we use a simplified version of the genotype/phenotype distinction. We define the set of individuals identified from the EvMedNetwork List and evolutionary medicine textbooks as the genotypes of evolutionary medicine. The equivalent to the genetic information is the knowledge encoded within these individuals. Examining how individuals collaborate with each other is analogous to studying the interactions within the genotypes of evolutionary medicine. So far, we have shown how a well-curated corpus with minimal processing can produce interesting insights into the structure of a scientific field. With the metadata from the EvMed corpus, we created coauthorship networks to examine macro-scale patterns of collaboration within evolutionary medicine. We are aware that this simplified metaphor is just that—simple. There is a plethora of ways in which knowledge can be transferred within a community. We chose to examine collaborations as only one mode of transmission. Future work should include data on conference attendance, acknowledgments, committee participation, and the like. Let us begin by defining our network graph as G = (V, E), where V is the set of nodes (sometimes referred to as vertices) representing authors, and E is the set of edges connecting the nodes representing a shared authorship on a publication. Networks are a way to measure pairwise relationships between two entities. Coauthorship networks contain edges that do not have a particular direction of information flow. The edge is present or absent. It can also be weighted based on how many times two people have coauthored papers together. Other kinds of networks might have directed edges, as is the case with email networks, where an edge would travel from B to A, not from A to B, depending on who sent an email to whom. The coauthorship networks presented in this chapter are

The Evolution of Evolutionary Medicine

undirected and weighted, meaning the edges imply mutual information flow. It doesn’t matter if we say A coauthored with B or B coauthored with A, and the edges store information about how many times two authors collaborate. A network represents the formal nature of interactions without providing any access to the actual substance. For example, a coauthor network tells us that two people wrote a paper together, but it doesn’t tell us who was first author, what the paper was about, or what the arguments were. Coauthorship networks are a type of social network where the nodes are the authors and the edges between them are a proxy for some kind of working relationship. Coauthorship networks are not new (Newman 2001). See Barabási et al. (2002), Bordons et al. (2015), Kumar (2015), Glänzel and Schubert (2004), and Li, Liao, and Yen (2013) for other ways in which coauthorship networks are beneficial. In this section, we focus on two main points. 1. From 2005 to 2006, the evolutionary medicine network underwent a drastic change in the coauthorship network structure. 2. We identified latent structures within the coauthorship networks by examining the edge Forman-Ricci curvature distributions. Figure 11.1 illustrates that evolutionary medicine experienced slow growth in the decade following its inception. Many of the early evolutionary medicine articles are only one- or two-author publications. This is a time when scientific fields decide their boundaries and determine who is part of it and who is not (Bettencourt et al. 2008; Cahlık and Jiřina 2006; Herrera, Roberts, and Gulbahce 2010). Figure 11.1 shows that this period for evolutionary medicine lasted approximately ten years. During this time, there were many publications about what is evolutionary medicine and why it should be in the medical school curriculum (LeGrand and Brown 2002; Lochmiller and Deerenberg 2000; Nesse and Berridge 1997; Nesse and Williams 1997; Trevathan, Smith, and McKenna 1999; Weiner 1998). Then, a year after the first recorded evolutionary medicine conference at Humboldt University, the publication and collaboration patterns change drastically. There are more multiauthor articles being published, and the coauthorship network becomes scale-free and small-world for the first time. Scalefree networks follow a power law in their degree distribution, typically with the form d(n): the number of nodes of degree n is proportional to n α with exponent α being between 3 and 2 (Albert and Barabási 2002). Degree distributions are a direct result of the normative publishing hab-



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

its of a particular field. The degree of a node is simply the number of neighbors connected to a given node. A small-world network describes the network property that most nodes in the specific network of interest are likely to be neighbors and most nodes in the larger network are not neighbors. Barabási et al. (2002) and Newman (2001) show that degree distributions from coauthorship networks of medicine, neuroscience, astrophysics, and computer science are significantly distinct from one another due to the normative collaboration habits between disciplines. For example, it is not uncommon for medicine or physics to publish with a very large number of coauthors and for the history of science to publish with only one or two. Evolutionary medicine is the marriage of two distinct fields. As the field matures, it begins to settle into its own normative collaboration habits. While degree is a node property, the Forman-Ricci curvature is an edge property. Originally introduced in Forman (2003), it has proved useful for network analysis in Sreejith et al. (2017), Saucan et al. (2018), and Painter, Daniels, and Jost (2019). The curvature of an edge between two vertices X, Y is defined as R(X, Y ) = 4 − deg X − deg Y. (1) The minus signs and the number 4 come from Riemannian geometry (see Jost 2017). In this study, we consider them historical conventions. Forman-Ricci curvature distributions can be used in network analysis to detect hidden communities that are not always visible when “just looking” at a network. Interestingly, the Forman-Ricci edge curvature distribution typically has more than one hump (Saucan et al. 2018), and these humps correspond to hidden communities within a network. In a coauthorship network, these communities are researchers who publish often within the same group of coauthors. This does not necessarily indicate exclusivity or elitism, but rather it indicates a signature of subgroups within a network. These could be grouped by publishing in a particular field or between complementary labs sharing a large grant. Evolutionary medicine is built from two major disciplines, evolutionary biology and medicine; therefore, it is not surprising that FormanRicci edge curvature distributions consistently display at least two major peaks. It is noteworthy that as evolutionary medicine transitioned between 2005 and 2006, the third middle peak was beginning to develop. This peak is quite prominent when the EvMed corpus is analyzed on the whole. The third peak is interesting because 2005 and 2006 was a transitional period of evolutionary medicine. The three peaks might suggest there is a practical and conceptual melding of evolutionary biology and medicine. The largest peak in figure 11.3i is symbolic of the large group

The Evolution of Evolutionary Medicine

11.3a. Coauthorship Networks. (a) and (d) show a fundamental shift in publication patterns. (a) experienced more single-author publications than (d). Single-author publications are not shown in (a), (d), or (g). The collaboration patterns shifted between 2005 and 2006 as shown in (b) and (e). (b) appears as a random degree distribution, but (e) shows a shift toward a scale-free, smallworld network. (c), (f), and (i) exhibit latent community structures, represented by the peaks in the Forman-Ricci distribution.

of evolutionary biologists, just as the smallest peak represents the group of medical professionals. Analyzing the Forman-Ricci edge curvatures, a group of individuals forms with moderate curvature, inversely related to centrality, suggesting a latent interdisciplinary group. When all three journal types are considered together, 993 publication records contained two or more individuals identified in the EvMedNetwork list with an evolutionary background. Only 22 publications contained two or more individuals from the list with a medical background. Out of 13,564 publications, 231 contained at least one person











Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

with an evolutionary background and at least one person with a medical background. It is interesting that less than 1% of the overall articles contains at least one individual with an evolution background and at least one individual with a medical background. This small amount of actual interdisciplinary collaborations is startling. Nesse and Stearns (2008) echo these findings by interpreting the citation maps in Rosvall and Bergstrom (2007) as showing very little community cross talk between evolutionary biologists and medicine in 2007. Interdisciplinarity is complex. We see a high level of interdisciplinary knowledge flow from evolution to medicine by virtue of the individuals with an evolutionary background publishing in the medical journals—exactly the goal set out by Williams and Nesse. However, evolutionary medicine is lacking in the interdisciplinary collaborations that some consider fundamental to an interdisciplinary field and interdisciplinary research. Phenotype

Scientific fields are composed of individuals and the journals in which they publish. In biology, a phenotype is the set of observable traits of an organism. The previous section illustrated how the individuals and the way they work together can be considered the genotype. Continuing with the evolution analogy, the knowledge produced by the individuals—that is, the articles they publish in journals—can be interpreted as the phenotype of a scientific field. Keywords are embedded within each publication of the EvMed corpus. Keywords represent major themes or parts of an article. The Web of Science provides researchers with lists of author-supplied keywords and what they proprietorially refer to as KeyWords Plus. For this study, we identified our own keywords based on full-text analysis in WordSmith Tools. WordSmith Tools is a computational linguistics program used to extract keywords from the plain text of the PDF files (Scott 2008). Before extracting keywords, each plaintext file is turned into a word list where every unique word is catalogued alongside its frequency of occurrence. Then the word lists are compared to a reference corpus meant to represent a general frequency distribution of words for a given language. In this study, we chose the corpus Baker-Brown for General American English (AmE06) (Baker 2006). The AmE06 corpus was selected because it is general and not specifically intended for scientific journals. For this study, we required a baseline that would remain uniform and would not exclude scientific keywords. A curated stoplist removed irrelevant words that offered nothing to the keyword analysis. The keywords were identified using WordSmith Tools’ default significance threshold p-value of 10−6 in a standard chi-


11.4b. Keyword Co-occurrence Networks. Keyword co-occurrence networks are graphs of nodes representing keywords and edges that represent a shared publication. The keywords are grouped using the technique found in Waltman, Van Eck, and Noyons (2010) and visualized in Van Eck and Waltman (2009). The size of the keywords represents their frequency of occurrence. Only the highest frequency keywords are displayed, and the edges are removed to aid in visualization.



The Evolution of Evolutionary Medicine

square test, with Yates correlation for a 2 × 2 table, keyword analysis (Yates 1934). The Yates-corrected Pearson’s chi-squared is formally

where O is the observed frequency, E is the expected frequency, and N is the number of words. Keywords are subsequently normalized and combined for consistency (known as lemma, “organisms” combined with “organism,” etc.). The keywords from each publication are then used to create keyword co-occurrence networks (KCNs). Again, we formally define our network graph as G = (V, E), where V is the set of nodes representing keywords, and E is the set of edges representing a shared publication. If two keywords appear in an article together, they are connected in the network. Keywords represent concise descriptions of content. KCNs have previously been used to measure trends in technology foresight research (Su and Lee 2010), innovation systems (Lee and Su 2010), environmental health and safety systems (Radhakrishnan et al. 2017), LED and wireless broadband patents (Choi and Hwang 2014), and library and information science. For a comprehensive overview of keyword cooccurrence networks, please reference Börner (2010) and Börner, Chen, and Boyack (2003). Through KCNs, this study identifies salient topics in various divisions of evolutionary medicine. In figure 11.4, the KCNs are divided into the four major divisions illustrated in figure 11.2. Parsing the networks in this manner creates knowledge landscapes for the kind of information traveling from a specific expertise to an intended audience. Figure 11.4a is a keyword co-occurrence network for individuals with an evolutionary background publishing in evolution journals. The network clusters are identified using the technique found in Waltman, Van Eck, and Noyons (2010) and visualized in VOSviewer (Van Eck and Waltman 2009). The topmost cluster is a focus on social insects. This cluster comes from a group of individuals using social insect behaviors to understand human behaviors in relation to disease. Below that is a cluster of keywords relating to infectious disease. The left cluster primarily deals with population genetics. Keywords related to sex and mate selection populate the rightmost cluster. Cancer is overwhelmingly represented in the bottom cluster. These five clusters are all centered around “evolution.” This is unsurprising as evolution is foundational to evolutionary biology. These are five main themes used by individuals



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

with an evolutionary background when they publish for an evolutionary reader base. When the boundary areas are examined, interesting connections form. Life history theory bridges sex to cancer, and competition connects social insects, infectious disease, and sex. Boundary areas between the major clusters represent the language used to create a cohesive message. We want to point out that real-world complex networks, like these keyword networks, are notoriously difficult to visualize. We chose to illustrate the networks in figure 11.4 as density clusters of keywords with the highest frequencies without edges to allow the casual reader to reference our conclusions. The underlying edges create hidden subgraphs within the overall networks. These are not readily apparent when the network is drawn with thousands of edges. Therefore, simplicity and comprehension dictate an alternative visualization where denser areas of the graph are colored darker and subgraphs are identified by their special proximity to one another. In figure 11.4b, the major clusters are not as clearly divided as in figure 11.4a. Here, we see public-health-related keywords just above a large cluster containing keywords relating to aging, various types of cancer, and the central nervous system. This suggests individuals with an evolutionary background view these three themes as closely related to public health when publishing for medical professionals. In this network, “evolution” is not a central term but a bridge between clusters. The dark cluster near the bottom right contains evolutionary psychology and sex keywords connected through “evolution” to the cancer keywords. Two separate clusters contain a variety of psychiatric disorders. “Depression” (not the economic kind—although this may very well lead to clinical depression) bridges to the public health cluster, implying biologists view widespread depression becoming a public health concern. The keyword co-occurrence network from medical professionals publishing in biology journals, figure 11.4c, clusters similarly to figure 11.4b. “Evolution” is once again the central unifying term. The groupings, however, are smaller and more numerous than those in figure 11.4a or figure 11.4b. The cluster on the lower-right area of the network is populated by keywords with regard to climate change. This type of cluster appears only in this singular network. Medical professionals view climate change as a clinical problem for scientists. It should also be noted that this cluster is not directly connected to “evolution” like the other clusters. This implies that the publications containing information about climate change view it as outside the purview of evolution. The border area on the left shows several keywords about the microbiome. The microbiome is the collection of microorganisms that live in conjunction

The Evolution of Evolutionary Medicine

with another organism, often in the gut or on the skin (Turnbaugh et al. 2007). The two clusters on the right side of the network both contain keywords relating to women’s health, mate choice, and reproduction. The cluster near the top and the cluster to its left focus on treating infectious diseases. The cancer cluster is shown near the bottom of the graph. Last, figure 11.4d is a keyword co-occurrence network from medical professionals publishing in medical journals. Here, we see a return to larger clusters, similar to the figure 11.4a network. However, the keywords within these clusters do not clearly map out to distinct content areas. There is a small cancer cluster near the center left of the network. As to be expected, it is connected directly to “evolution.” In this network, however, “evolution” is not a central keyword but a bridge keyword as in figure 11.4b. The cluster on the right contains neuroscience keywords mixed with psychiatric disorders. Above this cluster, women’s health is connected to “vaccine” and “pediatrics.” Continuing counterclockwise around the outside of the network, the next cluster is populated with arthritis keywords. Below this, a smaller cluster centers on “apoptosis,” regulated cell death. Afterward, we come to the cluster in the bottom-left section of the network. Here we find a group of keywords relating to illnesses of the digestive tract. The final cluster as we complete the loop around the network is the one on the bottom right. Human development and public health keywords dominate this cluster. The keyword co-occurrence networks in figure 11.4 illustrate three major insights into the phenotype of evolutionary medicine. 1. When individuals are publishing for their peers (e.g., figure 11.4a and figure 11.4d), the clusters of keywords are larger and defined primarily by content. These individuals know the boundaries of their content and frame it in such a way as to conform with the accepted practices of that field. We see the opposite occur when they publish outside their comfort zone (e.g., figure 11.4b and figure 11.4c). Here, clusters are smaller, more numerous, and less dominated by a singular theme of keywords. 2. There are recurring themes in evolutionary medicine that are specific to the authors and audience. Medical professionals publish on women’s health issues regardless of the intended audience. It is the connections to women’s health that change depending on the audience. In figure 11.4c and figure 11.4d, women’s health is closely connected to “evolution.” However, figure 11.4c shows women’s health as a bridge between the climate change cluster and infectious disease cluster, and figure 11.4d links it more closely to the mental health



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

cluster. This explicitly illustrates how KCNs can identify changes in context and usage. This is a reasonable conclusion because researchers with specialized knowledge will converse differently based on how much of said knowledge is shared by the audience. Cancer is a prominent cluster in biology journals, shown in figure 11.4a and figure 11.4c. Figure 11.4b illustrates biologists publishing extensively on cancer in medical journals, too. One might expect, given the clinical nature of cancer, that it would occupy a large central role in the network. However, medical professionals publishing in medical journals (e.g., figure 11.4d) indicate cancer is a small, centrally located cluster. This indicates a fundamentally different conceptualization based on the richness of connections to other content areas by its direct connections to “inflammation,” “infection,” and “disease.” 3. “Evolution” is not a frequent or central keyword in medical journals. Figure 11.4b puts “evolution” between chronic illnesses, sex, and evolutionary psychology. It is completely disconnected from psychiatric diseases like “trichotillomania” (a hair-pulling disorder), “obsessive compulsive disorder” (a personality disorder characterized by excessive orderliness and a need for control), and “depression” (feelings of sadness and loss of interest). Figure 11.4d identifies evolution as low frequency and bridging the women’s health cluster to the small cancer cluster. Conversely, “evolution” is centrally located in the keyword co-occurrence networks of biology journals, as in figure 11.4a and figure 11.4c. Some Initial Reflections Based on These Analyses Here we have highlighted the difficulties measuring interdisciplinarity and offered examples of how computational history of knowledge can produce rich, robust analyses of interdisciplinarity by using evolutionary biology analogies. We used a corpus built from a list of individuals who self-identified as interested in evolutionary medicine and included contributors from two evolutionary medicine textbooks. We gathered their metadata from the Web of Science and available publications. Were any of the individual aspects of evolutionary medicine to be considered separately, they would produce very different conclusions. If we were to measure the interdisciplinarity of evolutionary medicine only by the founders (founder effect), it would be considered interdisciplinary because Williams and Nesse were accomplished in the fields of evolutionary biology and psychiatry respectively. If one were to consider the backgrounds of only individuals who self-identify as interested in evolutionary medicine and contribute to

The Evolution of Evolutionary Medicine

evolutionary medicine textbooks (population dynamics), it would appear that evolutionary biology is overrepresented because the majority of the individuals come from an evolutionary science background. This may be an artifact of a sort of academic fitness in the evolutionary medicine ecosystem (selection). Perhaps evolutionary biologists find it easier to get published when they write about evolutionary medicine, or they see themselves as having more to gain participating in evolutionary medicine than medical professionals. Hopefully, this is not the case, as both sides stand to benefit from incorporating evolution, the foundation of biology, in medicine. It is possible to measure the interdisciplinarity of evolutionary medicine by examining how much knowledge moves between evolutionary biology and medicine (migration). The data show evolutionary biologists publishing the majority of their articles in medical journals—see figure 11.2. This is in line with the goals of evolutionary medicine, bringing evolution into medicine, and would be considered interdisciplinary. If we were to consider only the population of authors, who they publish with, and how they collaborate (genotype), we find that the number of collaborations between individuals with an evolutionary background and individuals with a medical background are staggeringly low. Were this the only consideration, evolutionary medicine would not be very interdisciplinary. Here, we explicitly examine the breakdown of collaborations by using coauthorship networks to understand the professional collaboration patterns of evolutionary medicine. We found that evolutionary medicine undergoes a drastic transformation between 2005 and 2006. Evolutionary medicine begins to subsist of more multiauthor publications in 2006. The degree distribution shifts from a random distribution to a scale-free distribution. This change was also found in the Forman-Ricci curvatures as hidden subgroups begin to form. Again, this would be considered interdisciplinary. Finally, we examine the content of the publications (phenotype). By extracting the keywords from publications in the EvMed corpus, we dissected the substance of the information produced by the biologists and medical professionals flowing into biology and medical journals. We found that there are reoccurring themes dependent on the author and the intended audience. Overall, KCNs showed a unified network of keywords integrating evolution and medicine in different ways depending on the audience. Regardless of the audience, the KCN analysis suggests evolutionary medicine is interdisciplinary. In conclusion, we discovered evolutionary medicine is a true interdisciplinary field, to a certain extent. We show there are many ways to measure interdisciplinarity, and depending on the measurement, sepa-



Deryc T. Painter, Julia Damerow, and Manfred D. Laubichler

rate observers may draw vastly different conclusions. Thus, in order to truly measure interdisciplinarity, one must take a pluralistic approach and measure multiple aspects of a field with multiple definitions of interdisciplinarity. In evolutionary medicine, the knowledge being transferred between disciplines, evolution into medicine and vice versa, is substantial. The majority of the publications come from biologists publishing in medical journals. However, the actual interdisciplinary collaborations between the individuals with different backgrounds is sorely lacking. Less than 1% of all the publications in the EvMed corpus were authored by someone with an evolutionary background collaborating with someone from a medical background. This speaks to the overall difficulty for actual interdisciplinary research in the current academic environment (Maglaughlin and Sonnenwald 2005; Sá 2008; Van Rijnsoever and Hessels 2011). Computational history of knowledge is still growing, with many parallels to evolutionary medicine. Computational techniques from complexity science, linguistics, and the social sciences are being incorporated into the traditional history of science realm. There is much work left to be done. Many of these analyses need robust, large-scale studies across many fields. Only with a multitude of data can computational historians begin to make broad generalizations about the patterns in science. This study is limited in scope to the genesis and evolution of evolutionary medicine. Future studies could apply these same techniques to evolutionary biology and medicine separately to examine how the seeds of evolutionary medicine, and other fields, began to take root within the parent disciplines. We hope this study will provide a guiding path for any “traditional” historians interested in incorporating computational techniques into their research. Computational methods allow traditional historians to analyze their data at much bigger scales than analog methods. Traditional historical methods provide colorful subtext for the data gathered by the computational historian. While these may be seen today as separate divisions, they will grow and develop together, not as distinct endeavors. This can be achieved only by incorporating computational methods into history of science programs. Computational techniques should be a basic tenet in a history of science education. Perhaps there are more parallels between history of science and evolutionary medicine than meets the eye.


Introduction: Tools, Tests, and Data 1. It could be argued, though, that science of science is much older and dates back to the 1930s. According to the former chairman of Thomson ISI, Eugene Garfield, “science of science” was back then what we now call scientometrics (Garfield 2009). 2. Still, the references to philosophy of science in science of science reviews are quite limited. The review paper of Fortunato and colleagues (2018) does not refer to a single paper published in a philosophy of science journal. 3. Today, much of the computing is done by computers because this is often more efficient or convenient, but computation clearly does not require computers. In fact, sometimes more insight is yielded when the computation is done by a person. For instance, “simple probabilistic models . . . often give a better glimpse into their overall behavior when they are stated as formulas instead of as computer programs” (Anderson 2014, 6). Chapter 1: Five Models of Science, Illustrating How Selection Shapes Methods 1. Unfortunately, Amgen signed secrecy agreements with the scientists who produced the original work, so the specific studies that failed to replicate are unknown (Harris 2017). 2. In reality, many scientific results carry explicit uncertainty and are more probabilistic than implied by my simple dichotomous characterizations of hypothesis states and experimental outcomes. I use these dichotomies for ease of explanation, as an excessive appeal to nuance can hinder the development of theoretical explanations (Healy 2017). Readers interested in nuance of this sort are directed to Gelman and Carlin (2014), as well as many other works in the philosophy of science. 231


Notes to Pages 22–46

3. The model portrays hypothesis selection as preceding investigation. In reality, new hypotheses do sometimes arise at various stages in the scientific process, but for simplicity we organize the model of science in this stylized way. 4. Two related studies are worth mentioning here. First, Romero (2016) examined how meta-analysis of a publication record could fail to correctly identify real effects under constraints of limited budgets, investigator biases, and publication bias. Second, Nissen et al. (2016) examined how the accumulation of biased results can lead to the “canonization of false facts.” 5. It should go without saying that you should never do this. 6. This is essentially the logic of signal detection theory. 7. Of course, many individual studies had quite high power. 8. The results are qualitatively the same as long as replications are worth more than nothing and less than the prestige of a novel finding. 9. Selection for publication is of course not the only force shaping the cultural evolution of science, and methodological rigor is not the only behavior that cultural selection acts upon. For example, Holman and Bruner (2017) showed how industry funding can shape results to align with corporate interests in a manner reminiscent of Noam Chomsky’s famous quip to a journalist: “I’m sure you believe everything you’re saying. But what I’m saying is if you believed something different, you wouldn’t be sitting where you’re sitting.” Likewise, O’Connor (2019) used a model similar to ours to study how selective forces shape risk-taking and conservativism in hypothesis selection. 10. Of course, this result might be at least partly driven by researchers being more likely to use registered reports when they are less confident in their hypotheses. Chapter 2: Pooling with the Best 1. Relatedly, Huttegger, Skyrms, and Zollman (2014) consider the learning dynamic probe-and-adjust and apply this dynamic to the Bala-Goyal network game. Probe-and-adjust is a low-rationality learning rule, and they find that even this extraordinarily simple learning dynamic leads to the formation of a social network that enables efficient information transfer. 2. Since polarization isn’t the central focus of this paper, we mention only in passing a number of formal models that have been used to study this phenomenon. 3. To clarify, this means individuals are more likely to consult those they pooled with even though some of the agents in question failed to get the right answer. Alternatively, the individual may reinforce their relationship with only those who themselves identify the correct state of the world. While this may appear to be a more reasonable assumption, we believe this chapter has demonstrated that epistemically beneficial communication networks can naturally emerge even when individuals are not reinforcing in a particularly fine-grain fashion. 4. In particular, we use the so-called Herrnstein reinforcement dynamics (Skyrms 2010). Agents reinforce over all possible subsets of agents they can consult.

Notes to Pages 46–59

In our simulations, we limit the number of agents an individual can consult to two or three, although in principle the agent could consult the entire community. We feel, however, that unlimited communication is infeasible due to the cost associated with deliberation and information acquisition. 5. There are also similarities to the opinion dynamics model of Hegselmann and Krause (2002). In their model, individuals have priors and then pool with others until the population converges on some belief. Unlike their basic model, however, we assume some agents are more reliable than others, place individuals on a network, and allow individuals to update their connections on the basis of reliability. 6. Furthermore, it may be the case that the associate comes after the focal agent in the randomly chosen order. In this case, the associate will have not yet had the opportunity to pool with the most reliable individual. 7. Likewise, bias could be modeled as a noise term. For instance, the individual’s evidence from nature could be construed as N(µ, σ2) + N(βi, σ12). 8. Of course, in the real world this is hardly how pooling happens, as individuals often weigh the opinions of some more than others. In our model, individuals can still discriminate against those they believe to be unreliable, but they must treat all of the individuals they deem reliable in a similar fashion. This simple linear pooling model is considered for the sake of tractability. A more nuanced exploration involving a more sophisticated pooling procedure will be left for another time. 9. We are of course assuming here that individuals vote on the basis of their beliefs after pooling has taken place. If instead when individuals vote they ignore their current belief and just report the private signal they received from nature, this tension between group and individual epistemic rationality would not occur. Chapter 3: Promoting Diverse Collaborations 1. Thanks to Liam K. Bright for input in this discussion. 2. Freeman and Huang (2015) similarly find that ethnically diverse collaborative papers tend to be published in more prestigious journals and cited more often, possibly due to diversity of knowledge, though, in this case, also possibly due to network effects. That is, the authors might belong to different social networks so, together, they can inform more people about their work, resulting in increased citations. 3. Notice that this model assumes a “credit economy” for academics, where they are motivated to seek credit, just as others might be motivated by money. This is a standard assumption in formal social epistemology. 4. In focusing on two types, their models ignore the possibility of intersectional identities. For instance, a researcher might be both a woman and Black or both a man and a disabled person. O’Connor, Bright, and Bruner (2019) look at related, non-network models where there are multiple intersectional identities. They find that inequitable conventions can emerge across any division into types. Whenever



Notes to Pages 59–67

that happens, the work of Rubin and O’Connor (2018) predicts that homophily should emerge. Furthermore, our analysis of interventions to increase out-group collaboration should apply to these more complicated scenarios directly. 5. This is a result in asymmetries in their number of between-group links. In particular, between any two groups there is some number, n, of collaborative engagements. This means that when one group is in the minority, they will have more between-group collaborative links on average than the majority group. As a result, their strategy updates will tend toward more conservative, lower demands, which means that the chance they end up receiving less credit at equilibrium is increased. See Rubin and O’Connor (2018) for further details. 6. In the event of a tie for best, one of the best responses is chosen at random. 7. Data were collected for networks of one hundred agents with a high demand of 6, probability of taking an action set to 10%, a probability of updating a link rather than a strategy of 20%, minority size ranging from 10% to 50% of the population in intervals of 10%, and maximum number of links set to 3 or 9. Each combination of parameters was run one hundred times and for ten thousand rounds. A copy of the code has been made available on the Open Science Framework at 8. In fact, the proportion of majority expected to discriminate (i.e., the number of majority group members whose out-group strategy is to demand High) is very similar to what Rubin and O’Connor (2018) find, ranging from about 0.4 when the minority is 10% of the population to about 0.1 when the minority is 50% of the population. 9. The one exception to this is at π = 1.125, where there is a dip in actual discrimination. This is because both majority and minority members prefer fair between-group collaboration to fair within-group collaboration (5 · 1.125 > 5), while both (but most notably the minority) still prefer fair within-group collaboration to receiving the low payoff from between-group collaborations (5 > 4 · 1.125), meaning that they break off between-group links with discriminators. Note that even though the proportion of discrimination decreases at this point, the instances of discrimination increase. 10. This means not all the agents will have the maximum number of links at the start of the simulation, but those agents can form links up to the maximum as they update their collaborations. 11. The average homophily was estimated by forming ten thousand networks for each possible minority group size, then averaging over all data points. 12. The first two results from this section are affected in predictable ways. The minority group is still more likely to be discriminated against the smaller it is, but a greater proportion of the majority discriminate at every group size. This should be unsurprising, since we started with the majority more likely to discriminate. For the effect of π on the amount of homophily, the shape of the line is the same as in figure 3.1, but homophily varies from about 0.65 for π = 0.5 to about −0.15 for π

Notes to Pages 67–74

> 1.25. This is because we started the network with homophily around 0.18 and simulations were only run for ten thousand rounds. 13. Thanks to Jan-Willem Romeijn for this phrase. 14. Consider a simplified example to demonstrate this point. A minority group and a majority group member each have a fair within-group link, garnering a payoff of 5. Then an initiative corresponding to π = 1.5 is put in place, and they form a between-group discriminatory link. Now the minority group member receives a payoff of 6 and the majority group member a payoff of 9. Both agents receive higher payoffs than before, but inequity has increased—the majority group member benefits much more from the initiative. 15. There is some evidence suggesting that minority groups may organize in this way. For instance, Botts et al. (2014) find that Black philosophers tend to cluster in subfields. 16. Thanks to Liam K. Bright for this phrase. Such a “contagion” is reminiscent of the well-studied “devaluation view” in sociology of employment, which holds that “a change in the gender composition of an occupation will lead to a change in the valuation of the work being performed, leading to a change in occupations’ relative pay rates” (Levanon, England, and Allison 2009, 868). There is some evidence that such a contagion exists even before the niches become meaningfully segregated. For instance, Larivière et al. (2013) find that papers across many scientific disciplines for which women were the sole author, first author, or last author were cited less often. It could be interesting to apply the quali-quantitative methods outlined by Chavalarias, Huneman, and Racovski in chapter 4 of this collection to attempt to track the contagion and its epistemic consequences, beginning from such early moments. 17. When the activist Robin Morgan spoke to the American Home Economics Association convention in 1972, she declared, “As a radical feminist, I am here addressing the enemy” (Stage 1997, 1). 18. Thanks to Liam K. Bright again. Chapter 4: Using Phylomemies to Investigate the Dynamics of Science 1. For example, in evolutionary biology, the first approach, from processes to patterns, is well exemplified by the neutral theory in molecular evolution: here, the theoretical modeling of neutral evolution and the constancy of the substitution rate of nucleotides supports some ideas regarding patterns of evolution—e.g., the notion of an “evolutionary clock” (Kimura 1983). The second approach, from patterns to processes, is often found in paleobiology (Jablonski 2000; Sepkoski and Ruse 2009). For instance, the theory of “punctuated equilibria” elaborated by Gould and Eldredge (1977) relies on the fossil record to point at punctuated patterns of phenotypic evolution. This then sets the agenda for research about processes; the task consists in identifying the processes likely to promote such patterns—name-



Notes to Pages 74–97

ly, stasis (long periods of almost no evolutionary change) and punctuation (short periods of important morphological and functional change in clades) (Raup et al. 1973; Erwin 2000). 2. Science is usually defined as a “1) body of knowledge, 2) method, and 3) way of knowing” (Abell and Lederman 2007, 833). Here we focus on science as body of knowledge. 3. Jinha (2010) estimated the total number of scientific papers published during the year 2009 as about 1,500,000. The same study estimated that the total number of scientific articles ever published reached the fifty million mark that year. There is a high margin of error for these estimates, but they give an idea of the importance of the production. Medline, the biomedical database, alone had more than 1.28 million new records in 2019. pubmed?term=(%222018%2F01%2F01%22%5BDate%20-%20Create%5D%20 %3A%20%222019%2F01%2F01%22%5BDate%20-%20Create%5D). 4. For example, one extreme but influential position on this matter states that all differences between those elements are arbitrary or conventional and that one should turn to a descriptive approach integrating all those elements (Latour 1987). 5. Some philosophers and historians have endorsed this approach of the VPI project and have refined it to overcome its problems (see Burian 2002; Haufe 2016; Hull 1992; Scholl 2018). 6. In effect, natural selection tends to constrain genomic sequences—since genomic sequences decreasing mean fitness are counterselected—while stochastic processes assume equal fitness; therefore, all alleles being indifferent to reproductive success, all of them can be there and the amount of variation across the population regarding the focal sequence may be higher. Many tests have been designed (Tajima 1989; McDonald and Kreitman 1991), but the very idea is that the fact that selection or drift acts upon populations produces different patterns of genomic variation; thereby, and inversely, detected patterns signal the underlying processes. 7. One might, however, ask the question of the strategic importance, for the study and development of science, of the public large-scale availability for text- and data-mining purposes of the scientific corpora we collectively produce. 8. EU FP7 FET Open project TINA, #245412, with the principal investigator David Chavalarias. 9. See Wikipedia, s.v. “Timeline of Quantum Computing and Communication,” last modified January 10, 2022, quantum_computing for a timeline. 10. We define visualization software as macroscopes based on an analogy with the role that microscopes play in biology. Gargantext, 11. The analogy between phylomemies (and more generally phylogenies of cultural objects) and biological evolution goes in both directions. Bedau and Packard (1991) designed tests for the recognition of patterns of punctuation in technological evolution by identifying statistical patterns in phylogenies of patents. Woodberry,

Notes to Pages 103–127

Korb, and Nicholson (2009) imported this test into evolutionary biology in order to detect the punctuated equilibria hypothesized long ago by Gould and Eldredge (1977). Chapter 5: LDA Topic Modeling We are grateful Jo Guldi and two anonymous referees for helpful comments on an earlier draft. CA also would like to thank the audiences for his talks at the LEAHPS II conference at the University of Hannover in August 2019 and in the University of Pittsburgh Mellon-Sawyer Information Ecosystems lecture series in February 2020. Finally, we are grateful for the invitation to contribute to this volume and the patience of the editors during the writing process. 1. See especially the chapter by Christophe Malaterre, Jean-François Chartier, and Davide Pulizzotto, and the chapter by Krist Vaesen. 2. Thanks to Jo Guldi for this observation. 3. Important examples include treatments of the journal Science by Blei and Lafferty (2006, 2007); of Cognition by Cohen Priva and Austerweil (2015); of the Journal of the History of Biology by Peirson et al. (2017); of the Proceedings of the Cognitive Science Society by Rothe, Rich, and Li (2018); and of Philosophy of Science by Malaterre, Chartier, and Pulizzotto (2019; see also chapter 9 in this volume by these authors). Similar projects have been pursued with other temporally sequenced datasets (see Brauer and Fridlund [2013] for an early review), from parliamentary debates in France (Barron et al. 2018) and Britain (Guldi and Williams 2018; Guldi 2019a, 2019b) to the eighteenth-century Encyclopédie (Roe, Gladstone, and Morrissey 2016) and nineteenth-century novels (Jockers 2013). 4. Many have made this point, but see Ravenscroft and Allen (2019) for specific discussion of the strengths and weaknesses of LDA for argument-based analysis in HPS. For general discussion of the limits of current AI/ML, see Mitchell (2019); Smith (2019); Marcus and Davis (2019). Chapter 6: The Potential of Supervised Machine Learning for the Study of the Evolution of Science 1. Various algorithms can be used for supervised machine learning, including, among others, support vector machines (SVM), neural networks (or deep learning), applied alternating decision tree (ADTree), least absolute shrinkage and selection operator (LASSO), random forest (RF), ridge regression, elastic net regression (Enet), and conditional inference forest (CF). The extent to which these algorithms are useful in the context of the study of scholarly articles is an open question. 2. Much of the literature on machine learning and opacity is concerned with the question of whether opacity stands in the way of explanation and understanding (see Sullivan, forthcoming, and references therein). This question is tangential to my concerns. I am only interested in the use of SML in the context of scientific



Notes to Pages 132–152

cartography, which doesn’t so much target explanation as description of trends in science. Explanations of those trends, if given at all by scientific cartographers, would still come from techniques other than machine learning (e.g., qualitative study). Chapter 7: Help with Data Management for the Novice and Experienced Alike Thanks to the National Science Foundation for a workshop grant on Data Management, SES-1430608, and to Arizona State University for generous support. Chapter 8: How Not to Fight about Theory For extensive discussions during this project, thanks to Juan Escalona Mendez, Trevor Pearce, and Gregory Radick. Thanks to an audience at the Science of Evolution and the Evolution of the Sciences Conference, held at KU Leuven, especially Simon DeDeo, Cailin O’Connor, Jan Heylen, Erick Pierson, and Grant Ramsey; an audience at University College London, especially Luke Fenton-Glynn and Juan Camilo Chacon-Duque; and an audience at the University of Leeds HPS Centre Seminar, especially Alex Aylward, Richard Bellis, Ellen Clarke, Richard Forsyth, Juan Escalona Mendez, Gregory Radick, and Juha Saatsi. Thanks to Nature Publishing Group for access to the Nature content. 1. All of this sounds a bit evolutionary, of course, a fact not lost on either Kuhn himself (who draws the analogy at the end of Structure) or on David Hull (1988), who significantly expanded on the analogy between scientific theory change and evolution in his Science as a Process. 2. Access to the Nature corpus is provided courtesy of Nature Publishing Group to the evoText Project (Ramsey and Pence 2016), with thanks. 3. This standard story is, of course, a simplification of the real facts of the case—in particular, there is more continuity between the biology that immediately follows Darwin (and will be discussed here) and the Modern Synthesis than is accounted for by this view. My full argument for this claim can be found in a recent book (Pence 2022), though the discussion here differs from the story there in a number of ways, as this chapter was written years earlier. 4. Vorzimmer (1963) also convincingly argues that this is not merely due to Darwin’s reading of Fleeming Jenkin’s review of the Origin, as is often supposed. For more on the details of the substance of Jenkin’s argument and its relationship to the critical environment around the Origin, see Bulmer (2004). 5. For skeptical takes on Kim’s major thesis, see Barnes (1996) and Vicedo (1995). Vicedo helpfully questions the general utility of the categories “Mendelian” and “biometrician,” an important task I am forced to avoid here. Ankeny (2000) also disputes the usefulness of the “conversion” metaphor, at least for the case of Darbishire, another important issue with Kim’s view, and the reason that I per-

Notes to Pages 152–164

sistently place “conversion” in scare quotes. For an important predecessor to Kim’s approach, see the similar community-of-actors approach in Rudwick (1985). 6. I have analyzed a portion of this content in detail elsewhere (Pence 2011, 2015). 7. Schuster, Pearl, and Shull contributed no articles to Nature. All the data discussed here, along with a detailed discussion of the methodology used, is available under the CC-BY license at An animated network visualization of the data discussed below is also available at https://na 8. One iteration of this “snowball sampling” proves to be the sweet spot—it provides a large enough corpus for analysis, whereas running a second iteration of snowball sampling would require the manual inspection of some 16,427 proper names that appear in the current dataset of 1,622 articles, a clearly impractical undertaking. The Stanford NER is too inaccurate to use its output without at least some sort of manual filtering. Edward Murray East and Richard South also had to be removed from analysis, as searching for instances of their last names (for obvious reasons) returns much more noise than signal. 9. I performed this “seed set with snowball sampling” method for two reasons: (1) to focus, to the extent that I could, on biological discussions in Nature, as opposed to those from other disciplines; and (2) to minimize confusion regarding common last names, so that, when a surname is mentioned, it is likely to refer to the relevant biologist. 10. Thanks to Trevor Pearce, in particular, for bringing this debate to my attention. 11. Writing for the official history of the journal, Ruth Barton even suggests that such controversies were actively courted by Nature’s early editorial team (unfortunately, this web resource now appears without author and citation information; see 12. Both this critique and Vicedo’s apt worry about the scope of Kim’s analysis, particularly the justification of which scientists are included and which are left out, are simply inherited by my work here. Many of the concerns with the analysis that I mentioned above are identified precisely because they would help us address Vicedo’s worries. 13. Kuhn himself argued that revolutions might happen on any scale, from those as broad as the Copernican to those as fine-grain as the development of a new piece of technical methodology by scientists studying the use of X-rays (Kuhn 1962, 92–93). Chapter 9: Topic Modeling in HPS The authors thank JSTOR for kindly providing access to the complete numerical corpus of Philosophy of Science. They also thank Grant Ramsey and Andreas De Block for their editorial initiative. Funding from Canada Foundation



Notes to Pages 186–192

for Innovation (Grant 34555), Canada Social Sciences and Humanities Research Council (Grant 435-2014-0943), and Canada Research Chair (CRC-950-230795) is gratefully acknowledged. Malaterre conceived the study, analyzed the results, and wrote the manuscript. Chartier and Pulizzotto collected/pretreated the corpus, ran the LDA analyses, and contributed to the manuscript. Chapter 10: Bolzano, Kant, and the Traditional Theory of Concepts Work on this chapter has been funded by NWO and OCLC under projects 277-20-007 and 314-99-117. The authors are grateful to Sylvia Pauw, Hein van den Berg, Anna Bellomo, Greta Adamo, and Davide Quadrellaro for discussion on an earlier draft. 1. Cf. Morscher (2018), which is also an excellent introduction to Bolzano’s ideas. 2. Among historical investigations alongside Betti (2010), see Roski (2017) for a reconstruction. Bolzano’s grounding is a relation between truths or collections of truths. When A is grounded in B, then Bolzano says that B is the ground of A, A the consequence of B. 3. We make the point for the later Bolzano, thus confirming for this period the point made for the early Bolzano by Blok (2016, 216, 219). 4. Cf. de Jong 1995, 624; and Porphyry’s famous introduction to Aristotle’s Categories (Warren 1975, 34). 5. BolVis is not yet publicly available, as its back end is a count-based, word-embedding language modeling software, Ariadne, which is under development in industry and which our team is currently evaluating on the basis of an expert-controlled ground truth for a corpus containing virtually all of Willard Van Orman Quine’s works in English. The specifics of the Bolzano corpus we use here are contained in appendix A. 6. The appendix is available online at Concepts in Motion, “Ginammi et al. 2020: Appendix,” 7. 2bk is called the “Postulate of Classical Definition” in de Jong and Betti (2010, 190); it was widespread in the eighteenth century (see, e.g., the lexica of Zedler 1734, 409–10; and Walch 1726, 479, as well as Wolff 1732, 208–9; Müller 1733, 270; Layritz 1755, 90–91; and Meier 1762, 451). Note that, in terms of eighteenth-century theories such as those of Wolff and Kant, the postulate applies specifically to concepts, not intuitions, that is, nominal definitions and the part of real definitions corresponding to nominal ones. See for Kant in particular van den Berg 2014, 19–20; Nunez 2014. 8. See however note 44 in the appendix. 9. For other relevant work that builds on de Jong (1995), see Anderson (2004, 2005).

Notes to Pages 192–199

10. In an email of September 3, 2019, Wim de Jong confirmed to the authors that it may be doubted that Kant took the logical essence to consist of the proximate parts only: at some places, Kant seems to suggest that he took both the proximate and the remote parts of concepts (i.e., all partial concepts) together to make up the logical essence. If he indeed held the latter, then, as we will see, Kant’s distinction between logical and real essence corresponds exactly to Bolzano’s distinction between foundational essence and essence in the broad sense. In this interpretative choice, we took the option that is least favorable to the argument of the present chapter (i.e., that Bolzano’s views on concepts are highly similar to Kant’s). 11. See Blok (2016, 208–9) for a passage from Bolzano (1810) showing that Siebel’s claim cannot be correct. 12. “Syntax,” Go, version go1.17.6, January 6, 2022, regexp/syntax/. 13. The initial instance uses a very large corpus of scientific articles; another called LittleAriadne a corpus of astrophysical journals (Koopman, Wang, and Scharnhorst 2017). 14. Among studies showing the relevance of DSMs for digital humanities, see Recchia (2016) and, for philosophy in particular, Herbelot, von Redecker, and Müller (2012). 15. Predictive models use machine learning to train a pattern-recognizing system (usually an artificial neural network) to predict either the context of a word or the word itself by tweaking a large number of parameters. The parameter values that the system “learns” for a word are treated as the vector representation of that word. These vectors typically have far fewer dimensions than count vectors such as the one in table 10.1, but unlike in count-based models, it is less clear what each dimension represents (see also chapter 5 by Allen and Murdock, this volume, for a critical discussion regarding sophistication and transparency of computational methods). 16. See van Wierst et al. (2018) for (visual) details of the query output (results list) of BolVis. 17. In addition to the views reported here, we also found in WL§559 that Bolzano distinguishes between “essential” and “derivative” properties; this is the starting point for our second use-case in appendix C. 18. Cf. on this point in the early Bolzano, Blok (2016, 215). 19. Vorstellung (idea) is Bolzano’s general term for those parts of a proposition that are not themselves a proposition. Concepts (Begriffe) are a special kind of idea for Bolzano, namely ideas that do not contain intuitions. To keep things simple, we translate both Vorstellung and Begriff as concept. 20. Bolzano’s rejection of this principle is regularly mentioned or discussed in the secondary literature—e.g., Centrone 2010; Lapointe 2011; Roski 2017, 23— though not, as we do here, in connection with the praedicabilia, the hierarchy of



Notes to Pages 200–203

concepts and the analytic/synthetic and a priori/a posteriori dichotomies. Blok (2016, 219) does make this connection for the early Bolzano. 21. “Definitions are no real judgments, namely no such judgments which belong to a science in itself; but historical judgments, which tell us that we have chosen this or that sign to signify this or that concept; judgments, which are necessary for the comprehension of science, but are not parts of science themselves. (Just as little as the paper, the letters, etc. on and with which a mathematical theory is printed, belong to this mathematical theory.)” Note that this text comes from a manuscript that has been printed in such a way in the BGA as to display all editorial insertions. Clever preprocessing makes it possible to extend searches in Ariadne to fragments such as these, which are otherwise virtually impossible to search via full exact matching. We transcribe the text from BolVis as is—including encoding problems (e.g., V[e]rst[√n]d[i]g[un]g). 22. “STSbenchmark,” IXA, mark, last modified January 22, 2019. STS datasets comprise image captions, news headlines, and user forums in English. 23. Note, en passant, that even if this result could be taken at face value, it would not mean that we could have “just read the Wissenschaftslehre.” First, there are key passages that come from other texts; second, if there had been none, the negative result that we did not obtain passages from other texts is relevant in itself: it saves us the work of going through about 8,500 pages of text by hand in search of something that (probably) is not there. 24. An anonymous reviewer rightly pointed out that the computational methods we use do not substitute for human judgment but supplement it, and that they are fundamentally a heuristic. We agree. Our caution about the methods we employ can then be put like this: we don’t yet know how good these methods are as a heuristic.


Abbasi, A., J. Altmann, and L. Hossain. 2011. “Identifying the Effects of Coauthorship Networks on the Performance of Scholars: A Correlation and Regression Analysis of Performance Measures and Social Network Analysis Measures.” Journal of Informetrics 5:594–607. Abell, S.K., and N.G. Lederman. 2007. Handbook of Research on Science Education. Mahwah, NJ: Lawrence Erlbaum Associates. Aggarwal, Charu C., and ChengXiang Zhai, eds. 2012. Mining Text Data. New York: Springer Science and Business Media. Airoldi, Edoardo M., David M. Blei, Elena A. Erosheva, and Stephen E. Fienberg. 2014. Handbook of Mixed Membership Models and Their Applications. London: Chapman and Hall. Akerlof, G.A., and P. Michaillat. 2018. “Persistence of False Paradigms in Low-Power Sciences.” Proceedings of the National Academy of Sciences 115 (52): 13228–33. Akers, Katherine G., and Jennifer Doty. 2013. “Disciplinary Differences in Faculty Research Data Management Practices and Perspectives.” International Journal of Digital Curation 8:5–26. Albert, R., and A.-L. Barabási. 2002. “Statistical Mechanics of Complex Networks.” Review of Modern Physics 74 (1): 47–97. Alcock, J. 2012. “Emergence of Evolutionary Medicine: Publication Trends from 1991–2010.” Journal of Evolutionary Medicine 1:c1–12. jem/235572. Al-Doulat, A., I. Obaidat, and M. Lee. 2018. “Unstructured Medical Text Classification Using Linguistic Analysis: A Supervised Deep Learning Approach.” 15th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), Aqaba, Jordan. 243



Alexander, J.M., J. Himmelreich, and C. Thompson. 2015. “Epistemic Landscapes, Optimal Search, and the Division of Cognitive Labor.” Philosophy of Science 82 (3): 424–53. Allen, C., and D.M.A. Mehler. 2019. “Open Science Challenges, Benefits and Tips in Early Career and Beyond.” PLOS ONE 17 (5): e3000246. Ambrosino, A., M. Cedrini, J.B. Davis, S. Fiori, M. Guerzoni, and M. Nuccio. 2018. “What Topic Modeling Could Reveal about the Evolution of Economics.” Journal of Economic Methodology 25 (5): 329–48. Anderson, B. 2014. Computational Neuroscience and Cognitive Modelling: A Student’s Introduction to Methods and Procedures. Thousand Oaks, CA: Sage. Anderson, R. Lanier. 2004. “It Adds Up After All: Kant’s Philosophy of Arithmetic in Light of the Traditional Logic.” Philosophy and Phenomenological Research 69 (3): 501–40. Anderson, R. Lanier. 2005. “The Wolffian Paradigm and Its Discontent: Kant’s Containment Definition of Analyticity in Historical Context.” Archiv für Geschichte der Philosophie 87 (1): 22–74. Ankeny, Rachel. 2000. “Marvelling at the Marvel: The Supposed Conversion of A.D. Darbishire to Mendelism.” Journal of the History of Biology 33 (2): 315– 47. Anstey, Peter R. 2017. The Idea of Principles in Early Modern Thought: Interdisciplinary Perspectives. New York: Routledge. Antes, Alison L., Heidi A. Walsh, Michelle Strait, Cynthia R. Hudson-Vitale, and James M. DuBois. 2018. “Examining Data Repository Guidelines for Qualitative Data Sharing.” Journal of Empirical Research on Human Research Ethics 13:61–73. Antonovics, J., J.L. Abbate, C.H. Baker, D. Daley, M.E. Hood, C.E. Jenkins, and D. Sloan. 2007. “Evolution by Any Other Name: Antibiotic Resistance and Avoidance of the E-word.” PLOS Biology 5 (2): e30. Arnold, C.W., A. Oh, S. Chen, and W. Speier. 2016. “Evaluating Topic Model Interpretability from a Primary Care Physician Perspective.” Computer Methods and Programs in Biomedicine 124:67–75. Avin, S. 2018. “Policy Considerations for Random Allocation of Research Funds.” RT: A Journal on Research Policy and Evaluation 6 (1). https://doi .org/10.13130/2282-5398/8626. Avin, S. 2019. “Mavericks and Lotteries.” Studies in History and Philosophy of Science Part A 76:13–23. Ayala, Francisco J. 2009. “Darwin and the Scientific Method.” Proceedings of the National Academy of Sciences 106, Suppl. 1: 10033–39. https://dx.doi .org/10.1073/pnas.0901404106. Baker, M. 2016. “Is There a Reproducibility Crisis? A Nature Survey Lifts the Lid on How Researchers View the ‘Crisis’ Rocking Science and What They Think Will Help.” Nature 533 (7604): 452–55.


Baker, P. 2006. Baker-Brown Corpus. Edinburgh: Edinburgh University Press. Bala, V., and S. Goyal. 1998. “Learning from Neighbours.” Review of Economic Studies 65 (3): 595–621. Bala, V., and S. Goyal. 2000. “A Noncooperative Model of Network Formation.” Econometrica 68 (5): 1181–229. Baldwin, Melinda. 2015. Making Nature: The History of a Scientific Journal. Chicago: University of Chicago Press. Barabási, A.-L., H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek. 2002. “Evolution of the Social Network of Scientific Collaborations.” Physica A: Statistical Mechanics and Its Applications 311 (3–4): 590–614. Barjak, F., and S. Robinson. 2008. “International Collaboration, Mobility and Team Diversity in the Life Sciences: Impact on Research Performance.” Social Geography 3 (1): 23. Barnes, Barry. 1996. “Review of Explaining Scientific Consensus: The Case of Mendelian Genetics by Kyung-Man Kim.” Isis 87 (1): 198–99. Barnett, A.G. 2016. “Funding by Lottery: Political Problems and Research Opportunities.” mBio 7 (4): e01369-16. Barrett, J.A., B. Skyrms, and A. Mohseni. 2019. “Self-Assembling Networks.” British Journal for the Philosophy of Science 70 (1): 301–25. Barron, Alexander T.J., Jenny Huang, Rebecca L. Spang, and Simon DeDeo. 2018. “Individuals, Institutions, and Innovation in the Debates of the French Revolution.” Proceedings of the National Academy of Sciences 115 (18): 4607–12. Bastian, Mathieu, Sebastian Heymann, and Mathieu Jacomy. 2009. “Gephi: An Open Source Software for Exploring and Manipulating Networks.” In Third International AAAI Conference on Weblogs and Social Media, 361–62. Palo Alto, CA: AAAI Publications. Bateson, William. 1894. Materials for the Study of Variation, Treated with Especial Regard to Discontinuity in the Origin of Species. London: Macmillan. Beaney, Michael. 2018. “Ancient Conceptions of Analysis.” In Stanford Encyclopedia of Philosophy, s.v. “Analysis” (summer ed.), edited by Edward N. Zalta. Bear, J.B., and A.W. Woolley. 2011. “The Role of Gender in Team Collaboration and Performance.” Interdisciplinary Science Reviews 36 (2): 146–53. Beatty, John H. 2016. “The Creativity of Natural Selection? Part I: Darwin, Darwinism, and the Mutationists.” Journal of the History of Biology 49 (4): 659–84. Bedau, M.A., and N.H. Packard. 1991. “Measurement of Evolutionary Activity, Teleology, and Life.” In Artificial Life II, edited by C.G. Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, 431–61. Reading, MA: Addison-Wesley. Begley, C.G., and L.M. Ellis. 2012. “Drug Development: Raise Standards for Preclinical Cancer Research.” Nature 483 (7391): 531–33.




Bergstrom, C.T., J.G. Foster, and Y. Song. 2016. “Why Scientists Chase Big Problems: Individual Strategy and Social Optimality.” arXiv:1605.05822. Bettencourt, L., D. Kaiser, J. Kaur, C. Castillo-Chavez, and D. Wojick. 2008. “Population Modeling of the Emergence and Development of Scientific Fields.” Scientometrics 75 (3): 495–518. Betti, Arianna. 2010. “Explanation in Metaphysics and Bolzano’s Theory of Ground and Consequence.” Logique et Analyse 211:281–316. Betti, Arianna, and Hein van den Berg. 2014. “Modelling the History of Ideas.” British Journal for the History of Philosophy 22:812–35. Bezuidenhout, Louise M., Sabina Leonelli, Ann H. Kelly, and Brian Rappert. 2017. “Beyond the Digital Divide: Towards a Situated Approach to Open Data.” Science and Public Policy 44:464–75. scw036. Binder, Jeffrey M. 2016. “Alien Reading: Text Mining, Language Standardization, and the Humanities.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein, 201–17. Minneapolis: University of Minnesota Press. Bishop, D. 2018. “Luck of the Draw.” Nature Index, May 7. https://www.naturein Bjork, B.C., A. Roos, and M. Lauri. 2009. “Scientific Journal Publishing: Yearly Volume and Open Access Availability.” Information Research: An International Electronic Journal 14 (1): paper 391. Blei, David M. 2012a. “Probabilistic Topic Models.” Communications of the ACM 55:77–84. Blei, David M. 2012b. “Topic Modeling and Digital Humanities.” Journal of Digital Humanities 2 (1). eling-and-digital-humanities-by-david-m-blei/. Blei, David M., and John D. Lafferty. 2006. “Dynamic Topic Models.” In Proceedings of the 23rd International Conference on Machine Learning (ICML’06), 113–20. Blei, David M., and John D. Lafferty. 2007. “A Correlated Topic Model of Science.” Annals of Applied Statistics 1 (1): 17–35. Blei, David M., and John D. Lafferty. 2009. “Topic Models.” In Text Mining: Classification, Clustering, and Applications, edited by Ashok N. Srivastava and Mehran Sahami, 71–94. Boca Raton, FL: Chapman and Hall/CRC. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (March): 993–1022. Blok, Johan. 2016. “Bolzano’s Early Quest for a Priori Synthetic Principles: Mereological Aspects of the Analytic-Synthetic Distinction in Kant and the Early Bolzano.” PhD thesis, University of Groningen. Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks.” Journal


of Statistical Mechanics: Theory and Experiment 2008 (10): P10008. https://doi .org/10.1088/1742-5468/2008/10/P10008. Bol, T., M. de Vaan, and A. van de Rijt. 2018. “The Matthew Effect in Science Funding.” Proceedings of the National Academy of Sciences 115 (19): 4887–90. Bolzano, Bernard. 1803–1817. Bernard Bolzano Gesamtausgabe / Reihe II: Nachlass / Wissenschaftliche Tagebücher / Philosophische Tagebücher 1811–1817 / Philosophische Tagebücher 1803–1810. Erster Teil: Abt. B /BD 16. Edited by Jan Berg. Stuttgart: Frommann-Holzboog, 1981. Bolzano, Bernard. 1810. Beyträge zu einer begründeteren Darstellung der Mathematik. Prague: Caspar Widtmann; BGA I, 1. Bolzano, Bernard. 1837. Wissenschaftslehre: Versuch einer ausführlichen und grösstentheils neuen Darstellung der Logik mit steter Rücksicht auf deren bisherige Bearbeiter. 4 vols. Sulzbach, Germany: J.E. v. Seidel. 2nd improved ed.: Leipzig: Felix Meiner, 1929, 1929, 1930, and 1931; repr.; Aalen, Germany: Scientia, 1970 and 1981; BGA I, 11–14 (12 BGA vols.). Bordons, M., J. Aparicio, B. González-Albo, and A.A. Dıaz-Faes. 2015. “The Relationship between the Research Performance of Scientists and Their Position in Co-authorship Networks in Three Fields.” Journal of Informetrics 9 (1): 135–44. Börner, K. 2010. Atlas of Science: Visualizing What We Know. Cambridge, MA: MIT Press. Börner, K., C.M. Chen, and K.W. Boyack. 2003. “Visualizing Knowledge Domains.” Annual Review of Information Science and Technology 37 (1): 179–255. Börner, K., R. Klavans, M. Patek, A.M. Zoss, J.R. Biberstine, R.P. Light, V. Larivière, and K.W. Boyack. 2012. “Design and Update of a Classification System: The UCSD Map of Science.” PLOS ONE 7 (7): e39464. Börner, K., J.T. Maru, and R.L. Goldstone. 2004. “The Simultaneous Evolution of Author and Paper Networks.” Proceedings of the National Academy of Sciences 101:5266–73. Boschini, A., and A. Sjögren. 2007. “Is Team Formation Gender Neutral? Evidence from Coauthorship Patterns.” Journal of Labor Economics 25 (2): 325–65. Botts, T.F., L.K. Bright, M. Cherry, G. Mallarangeng, and Q. Spencer. 2014. “What Is the State of Blacks in Philosophy?” Critical Philosophy of Race 2 (2): 224–42. Bourgine, P., N. Brodu, G. Deffuant, Z. Kapoula, J.P. Müller, and N. Peyreiras. 2009. “Formal Epistemology, Experimentation, Machine Learning.” In French Roadmaps for Complex Systems, edited by D. Chavalarias et al., 10–14. HAL Open Science. Bourne, P.E., J.K. Polka, R.D. Vale, and R. Kiley. 2017. “Ten Simple Rules to Consider Regarding Preprint Submission.” PLOS Computational Biology 13 (5): e1005473.




Bowler, Peter J. 1992. The Eclipse of Darwinism: Anti-Darwinian Evolution Theories in the Decades around 1900. Baltimore, MD: Johns Hopkins University Press. Box, George E.P., and Norman R. Draper. 1987. Empirical Model-Building and Response Surfaces. Hoboken, NJ: John Wiley and Sons. Boyd, R., and P.J. Richerson. 1985. Culture and the Evolutionary Process. Chicago: University of Chicago Press. Boyd, R., and P.J. Richerson. 1992. “Punishment Allows the Evolution of Cooperation (or Anything Else) in Sizable Groups.” Ethology and Sociobiology 13 (3): 171–95. Boyd, R., and P.J. Richerson. 2002. “Group Beneficial Norms Can Spread Rapidly in a Structured Population.” Journal of Theoretical Biology 215 (3): 287–96. Boyd-Graber, Jordan, Yuening Hu, and David Mimno. 2017. “Applications of Topic Models.” Foundations and Trends in Information Retrieval 11:143–296. Braam, R., H.F. Moed, and A.F.J. van Raan. 1991a. “Mapping of Science by Combined Co-citation and Word Analysis; I: Structural Aspects.” Journal of the American Society for Information Science 42:233–51. Braam, R., H.F. Moed, and A.F.J. van Raan. 1991b. “Mapping of Science by Combined Co‐citation and Word Analysis; II: Dynamical Aspects.” Journal of the American Society for Information Science 42:252–66. Brading, Katherine A., and Thomas A. Ryckman. 2008. “Hilbert’s ‘Foundations of Physics’: Gravitation and Electromagnetism within the Axiomatic Method.” Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 39:102–53. Brauer, René, and Mats Fridlund. 2013. “Historizing Topic Models: A Distant Reading of Topic Modeling Texts within Historical Studies.” In Cultural Research in the Context of “Digital Humanities”: Proceedings of International Conference 2–5 October 2013, edited by L.V. Nikiforova and N.V. Nikiforova, 152–63. St. Petersburg, Russia: Herzen State Pedagogical University and Publishing House Asterion. Braunstein, S.L., C.M. Caves, R. Jozsa, N. Linden, S. Popescu, and R. Schack. 1999. “Separability of Very Noisy Mixed States and Implications for NMR Quantum Computing.” Physical Review Letters 83:1054. PhysRevLett.83.1054. Briney, Kristin. 2015. Data Management for Researchers. Exeter, UK: Pelagic Publishing. Brischoux, F., and F. Angelier. 2015. “Academia’s Never-Ending Selection for Productivity.” Scientometrics 103 (1): 333–36. Bruner, J.P. 2019. “Minority (Dis)Advantage in Population Games.” Synthese 196 (1): 413–27. Bruner, J.P., and B. Holman. 2019. “Self-Correction in Science: Meta-analysis,


Bias and Social Structure.” Studies in History and Philosophy of Science Part A 78:93–97. Bruner, J.P., and B. Holman. Forthcoming. “Complicating Consensus.” In Expert Disagreement and Measurement: Philosophical Disunity in Logic, Epistemology and Philosophy of Science, edited by L. Garbayo. Dordrecht: Springer. Budden, A.E., T. Tregenza, L.W. Aarssen, J. Koricheva, R. Leimu, and J.J. Lortie. 2008. “Double-Blind Review Favours Increased Representation of Female Authors.” Trends in Ecology and Evolution 23 (1): 4–6. Bulmer, Michael. 2004. “Did Jenkin’s Swamping Argument Invalidate Darwin’s Theory of Natural Selection?” British Journal for the History of Science 37 (3): 281–97. Burian, R.M. 2002. “Comments on the Precarious Relationship between History and Philosophy of Science.” Perspectives on Science 10:398–407. Cahlık, T., and M. Jiřina. 2006. “Law of Cumulative Advantages in the Evolution of Scientific Fields.” Scientometrics 66 (3): 441–49. Callebaut, W., and R. Pinxten, eds. 2012. Evolutionary Epistemology: A Multiparadigm Program. New York: Springer Science and Business Media. Callon, M., J.P. Courtial, and F. Laville. 1991. “Co-word Analysis as a Tool for Describing the Network of Interactions between Basic and Technological Research: The Case of Polymer Chemistry.” Scientometrics 22:155–205. Callon, M., J.P. Courtial, W. Turner, and S. Bauin. 1983. “From Translations to Problematic Networks—An Introduction to Co-word Analysis.” Social Science Information sur les Sciences Sociales 22:191–235. Camerer, C.F., A. Dreber, F. Holzmeister, T.H. Ho, J. Huber, M. Johannesson, M. Kirchler, G. Nave, B.A. Nosek, T. Pfeiffer, and A. Altmejd. 2018. “Evaluating the Replicability of Social Science Experiments in Nature and Science between 2010 and 2015.” Nature Human Behaviour 2 (9): 637–44. Campbell, D.T. 1965. “Variation and Selective Retention in Socio-cultural Evolution.” In Social Change in Developing Areas: A Reinterpretation of Evolutionary Theory, edited by H. Barringer, G. Blanksten, and R. Mack, 19–49. Cambridge, MA: Schenkman. Campbell, D.T. 1974. “Unjustified Variation and Selective Retention in Scientific Discovery.” In Studies in the Philosophy of Biology, edited by F.J. Ayala and T. Dobzhansky, 139–61. London: Palgrave Macmillan. Campbell, D.T. 1976. Assessing the Impact of Planned Social Change. Technical report, Public Affairs Center, Dartmouth College, Hanover, NH. Campbell, L.G., S. Mehtani, M.E. Dozier, and J. Rinehart. 2013. “Gender-Heterogeneous Working Groups Produce Higher Quality Science.” PLOS ONE 8 (10): e79147. Cartieri, Francis, and Angela Potochnik. 2014. “Toward Philosophy of Science’s Social Engagement.” Erkenntnis 79 (S5): 901–16. s10670-013-9535-3.




Cavalli-Sforza, L.L., and M.W. Feldman. 1981. Cultural Transmission and Evolution: A Quantitative Approach. Princeton, NJ: Princeton University Press. Centrone, Stefania. 2010. “Der Reziprozitätskanon in den Beyträgen und in der Wissenschaftslehre.” Zeitschrift für Philosophische Forschung 64:310–30. Chalmers, D.J. 2015. “Why Isn’t There More Progress in Philosophy?” Philosophy 90 (1): 3–31. Chambers, C. 2017. The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice. Princeton, NJ: Princeton University Press. Chang, H. 2011. “Beyond Case-Studies: History as Philosophy.” In Integrating History and Philosophy of Science, edited by S. Mauskopf and T. Schmaltz, 109–24. Dordrecht: Springer. Chang, Jonathan, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” Proceedings of the 22nd International Conference on Neural Information Processing Systems, Vancouver, BC, 288–96. Chateauraynaud, F. 2003. Prospéro: Une technologie littéraire pour les sciences humaines. Paris: CNRS Éditions. Chavalarias, D. 2016. “Reconstruction et Modélisation des Dynamiques Sociales et de l’Évolution Culturelle: Le tournant des Sciences Humaines et Sociales du XXIème siècle.” HDR diss., EHESS, Paris. Chavalarias, D. 2020. “From Inert Matter to the Global Society Life as Multi-Level Networks of Processes.” Philosophical Transactions of the Royal Society B: Biological Sciences 375:1796. Chavalarias, D., and J.P. Cointet. 2013a. “Phylomemetic Patterns in Science Evolution—the Rise and Fall of Scientific Fields.” PLOS ONE 8:e54847. Chavalarias, D., and J.P. Cointet. 2013b. “Science Phylomemy.” Places and Spaces. Chavalarias, D., Q. Lobbé, and A. Delanoë. 2021. “Draw Me Science: Multi-Level and Multi-Scale Reconstruction of Knowledge Dynamics with Phylomemies.” Scientometrics 127:545–75. Chen, C., Y. Chen, M. Horowitz, H. Hou, Z. Liu, and D. Pellegrino. 2009. “Towards an Explanatory and Computational Theory of Scientific Discovery.” Journal of Informetrics 3:191–209. Choi, J., and Y.S. Hwang. 2014. “Patent Keyword Network Analysis for Improving Technology Development Efficiency.” Technological Forecasting and Social Change 83:170–82. Cicchetti, D.V. 1991. “The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation.” Behavior and Brain Sciences 14:119–86. Cock, A.G., and D.R. Forsdyke. 2008. Treasure Your Exceptions: The Science and Life of William Bateson. New York: Springer. Cohen, A., S. Pattanaik, P. Kumar, R.R. Bies, A. De Boer, A. Ferro, A. Gilchrist,


G.K. Isbister, S. Ross, and A.J. Webb. 2016. “Organised Crime against the Academic Peer Review System.” British Journal of Clinical Pharmacology 81 (6): 1012–17. Cohen, J. 1962. “The Statistical Power of Abnormal-Social Psychological Research: A Review.” Journal of Abnormal and Social Psychology 65 (3): 145–53. Cohen Priva, Uriel, and Joseph L. Austerweil. 2015. “Analyzing the History of Cognition Using Topic Models.” Cognition 135:4–9. .cognition.2014.11.006. Cole, S., and G.A. Simon. 1981. “Chance and Consensus in Peer Review.” Science 214 (4523): 881–86. Collin, A. 2009. “Multidisciplinary, Interdisciplinary, and Transdisciplinary Collaboration: Implications for Vocational Psychology.” International Journal for Educational and Vocational Guidance 9 (2): 101–10. Correia, Fabrice. 2010. “Grounding and Truth-Functions.” Logique et Analyse 53 (211): 251–79. Corti, Louise, Veerle Van den Eynden, Libby Bishop, and Matthew Woollard. 2014. Managing and Sharing Research Data: A Guide to Good Practice. London: SAGE. Creel, K. 2020. “Transparency in Complex Computational Systems.” Philosophy of Science 87 (4): 568–89. Cunningham, J.T. 1896. “[Letter of July 30, 1896].” Nature 54 (1396): 295. https:// Currarini, S., M.O. Jackson, and P. Pin. 2009. “An Economic Model of Friendship: Homophily, Minorities, and Segregation.” Econometrica 77 (4): 1003–45. Cyranoski, D., N. Gilbert, H. Ledford, A. Nayar, and M. Yahia. 2011. “Education: The PhD Factory.” Nature 472 (7343): 276–79. Darwin, Charles. 1859. On the Origin of Species. London: Murray and Co. Darwin, Charles. 1871. The Descent of Man, and Selection in Relation to Sex. London: John Murray. Dawkins, R. 1976. The Selfish Gene. Oxford: Oxford University Press. Dear, Peter. 1987. “Jesuit Mathematical Science and the Reconstitution of Experience in the Early Seventeenth Century.” Studies in History and Philosophy of Science Part A 18:133–75. Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. “Indexing by Latent Semantic Analysis.” Journal of the American Society for Information Science 41 (6): 391–407. DeGroot, M.H. 1974. “Reaching a Consensus.” Journal of the American Statistical Association 69 (345): 118–21. de Jong, W.R. 1995. “Kant’s Analytic Judgments and the Traditional Theory of Concepts.” Journal of the History of Philosophy 33:613–41. de Jong, W.R. 2001. “Bernard Bolzano, Analyticity and the Aristotelian Model of Science.” Kant-Studien 92:328–49.




de Jong, W.R. 2010. “The Analytic-Synthetic Distinction and the Classical Model of Science: Kant, Bolzano and Frege.” Synthese 174 (2): 237–61. https://doi .org/10.1007/s11229-008-9420-9. de Jong, W.R., and Arianna Betti. 2010. “The Classical Model of Science: A Millennia-Old Model of Scientific Rationality.” Synthese 174:185–203. Dennett, Daniel. 1998. “A Conversation with Daniel Dennett by Harvey Blume.” Atlantic, December 9. cult/dc981209.htm. Depew, David J., and Bruce H. Weber. 1995. Darwinism Evolving: Systems Dynamics and the Genealogy of Natural Selection. Cambridge, MA: Bradford Books. Deveugele, M., and J. Silverman. 2017. “Peer-Review for Selection of Oral Presentations for Conferences: Are We Reliable?” Patient Education and Counseling 100 (11): 2147–50. Dewulf, Fons. 2021. “The Institutional Stabilization of Philosophy of Science and Its Withdrawal from Social Concerns after the Second World War.” British Journal for the History of Philosophy 29 (5): 935–53. 608788.2020.1848794. DHPS Consortium. 2013. “Digital HPS Metadata Manual.” Digital HPS Repository. Dias, L., M. Gerlach, J. Scharloth, and E.G. Altmann. 2018. “Using Text Analysis to Quantify the Similarity and Evolution of Scientific Disciplines.” Royal Society Open Science 5:171545. Dietrich, M.R., R.A. Ankeny, and P.M. Chen. 2014. “Publication Trends in Model Organism Research.” Genetics 198 (3): 787–94. DiMaggio, Paul, Manish Nag, and David Blei. 2013. “Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics 41 (6): 570–606. Donovan, A., and R. Laudan. 2012. Scrutinizing Science: Empirical Studies of Scientific Change. New York: Springer Science and Business Media. Doris, J.M. 2002. Lack of Character: Personality and Moral Behavior. Cambridge: Cambridge University Press. Douglas, Heather. 2010. “Engagement for Progress: Applied Philosophy of Science in Context.” Synthese 177 (3): 317–35. -010-9787-2. Douglas, Heather. 2016. “Values in Science.” In The Oxford Handbook of Philosophy of Science, edited by Paul Humphreys, 609–30. New York: Oxford University Press. Druery, C.T., and William Bateson. 1901. “Experiments in Plant Hybridization [Translation of Mendel, J. G., 1865, Versuche über Pflanzenhybriden].” Journal of the Royal Horticultural Society 26:1–32.


Eagly, A.H. 2016. “When Passionate Advocates Meet Research on Diversity, Does the Honest Broker Stand a Chance?” Journal of Social Issues 72 (1): 199–222. Eklund, A., T.E. Nichols, and H. Knutsson. 2016. “Cluster Failure: Why fMRI Inferences for Spatial Extent Have Inflated False-Positive Rates.” Proceedings of the National Academy of Sciences 113 (28): 7900–7905. ERC. 2017. “European Research Council Data Management Plan Template.” European Research Council. -plan-template. Erwin, D.H. 2000. “Macroevolution Is More Than Repeated Rounds of Microevolution.” Evolution and Development 2:78–84. Falconer, Hugh. 1852. Report on the Teak Forests of the Tenasserim Provinces. Calcutta: F. Carbery, Military Orphan Press. Fanelli, D. 2012. “Negative Results Are Disappearing from Most Disciplines and Countries.” Scientometrics 90 (3): 891–904. Fang, F.C., and A. Casadevall. 2016. “Research Funding: The Case for a Modified Lottery.” mBio 7 (2): e00422–16. Faust, D., and P.E. Meehl. 1992. “Using Scientific Methods to Resolve Questions in the History and Philosophy of Science: Some Illustrations.” Behavior Therapy 23:195–211. Feldon, D.F., J. Peugh, M.A. Maher, J. Roksa, and C. Tofel-Grehl. 2017. “Time to Credit Gender Inequities of First-Year PhD Students in the Biological Sciences.” CBE-Life Sciences Education 16 (1): ar4. Ferber, M.A., and M. Teiman. 1980. “Are Women Economists at a Disadvantage in Publishing Journal Articles?” Eastern Economic Journal 6 (3/4): 189–93. Fine, K. 2012. “Guide to Ground.” In Metaphysical Grounding, edited by F. Correia and B. Schnieder, 37–80. Cambridge: Cambridge University Press. Fine, K. 2022. “Some Remarks on Bolzano on Ground.” In Bolzano’s Philosophy of Grounding, edited by Stefan Roski and Benjamin Schnieder, 276–300. Oxford: Oxford University Press. Firth, John R. 1957. “A Synopsis of Linguistic Theory 1930–1955.” In Studies in Linguistic Analysis, edited by John R. Firth, 1–32. Oxford: Blackwell. Forman, R.R. 2003. “Bochner’s Method for Cell Complexes and Combinatorial Ricci Curvature.” Discrete and Computational Geometry 29 (3): 323–74. Fortunato, S., C.T. Bergstrom, K. Börner, J.A. Evans, D. Helbing, S. Milojević, A.M. Petersen, et al. 2018. “Science of Science.” Science 359 (6379): eaao0185. Franco, A., N. Malhotra, and G. Simonovits. 2014. “Publication Bias in the Social Sciences: Unlocking the File Drawer.” Science 345:1502–5. Franzoni, C., G. Scellato, and P. Stephan. 2011. “Changing Incentives to Publish.” Science 333:702–3. Freeman, R.B., and W. Huang. 2015. “Collaborating with People Like Me: Ethnic Coauthorship within the United States.” Journal of Labor Economics 33 (S1): S289–S318.




Freese, Jeremy, and David Peterson. 2017. “Replication in Social Science.” Annual Review of Sociology 43:147–65. French, John R.P., Jr. 1956. “A Formal Theory of Social Power.” Psychological Review 63 (3): 181–94. Friedman, Michael. 1974. “Explanation and Scientific Understanding.” Journal of Philosophy 71:5–19. Frodeman, R., J.T. Klein, and R.C.D.S. Pacheco. 2017. The Oxford Handbook of Interdisciplinarity. Oxford: Oxford University Press. Garfield, E. 1955. “Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas.” Science 122:108–11. Garfield, E. 2009. “From the Science of Science to Scientometrics Visualizing the History of Science with HistCite Software.” Journal of Informetrics 3 (3): 173–79. Gayon, J. 1998. Darwinism’s Struggle for Survival: Heredity and the Hypothesis of Natural Selection. Cambridge: Cambridge University Press. Gelman, A., and J. Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. Gennatas, E.D., J.H. Friedman, L.H. Ungar, R. Pirracchio, E. Eaton, L.G. Reichmann, Y. Interian, et al. 2020. “Expert-Augmented Machine Learning.” Proceedings of the National Academy of Sciences 117 (9): 4571–77. Ghaffarzadegan, N., J. Hawley, R. Larson, and Y. Xue. 2015. “A Note on PhD Population Growth in Biomedical Sciences.” Systems Research and Behavioral Science 32:402–5. Gibbons, C., S. Richards, J.M. Valderas, and J. Campbell. 2017. “Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance with Human-Level Accuracy.” Journal of Medical Internet Research 19 (3): e65. Gibson, A., M.D. Laubichler, and J. Maienschein. 2019. “Focus: Computational History and Philosophy of Science.” ISIS 110 (3): 497–501. Giere, R.N. 1973. “The History and Philosophy of Science: Intimate Relationship or Marriage of Convenience?” British Journal for the Philosophy of Science 24 (3): 282–97. Giere, R.N. 2011. “History and Philosophy of Science: Thirty-Five Years Later.” In Integrating History and Philosophy of Science, edited by S. Mauskopf and T. Schmaltz, 59–65. Dordrecht: Springer. Gigerenzer, G., and U. Hoffrage. 1995. “How to Improve Bayesian Reasoning without Instruction: Frequency Formats.” Psychological Review 102 (4): 684. Giordan, G., C. Saint-Blancat, and S. Sbalchiero. 2018. “Exploring the History of American Sociology through Topic Modelling.” In Tracing the Life Cycle of Ideas in the Humanities and Social Sciences: Quantitative Methods in the Hu-


manities and Social Sciences, edited by A. Tuzzi, 45–64. Cham, Switzerland: Springer. Gladkova, A., and A. Drozd. 2016. “Intrinsic Evaluations of Word Embeddings: What Can We Do Better?” In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 36–42. Berlin: Association for Computational Linguistics. Glänzel, W., and A. Schubert. 2004. “Analysing Scientific Networks through Co-authorship.” In Handbook of Quantitative Science and Technology Research, edited by Henk F. Moed, Wolfgang Glänzel, and Ulrich Schmoch, 257–76. Dordrecht: Springer. Gluckman, P.D., A.S. Beedle, and M.A. Hanson. 2009. Principles of Evolutionary Medicine. Oxford: Oxford University Press. Glymour, Clark, and Frederick Eberhardt. 2016. “Hans Reichenbach.” In The Stanford Encyclopedia of Philosophy (winter ed.), edited by Edward N. Zalta. https:// Gneiting, Tilmann, and Adrian E. Raftery. 2005. “Weather Forecasting with Ensemble Methods.” Science 310 (5746): 248–49. science.1115255. Golub, B., and M.O. Jackson. 2012. “Network Structure and the Speed of Learning Measuring Homophily Based on Its Consequences.” Annals of Economics and Statistics 107/108:33–48. Goodin, R., and K. Spiekermann. 2015. “Epistemic Solidarity as a Political Strategy.” Episteme 12 (4): 439–57. Goodin, R., and K. Spiekermann. 2018. An Epistemic Theory of Democracy. Oxford: Oxford University Press. Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, et al. 2014. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLOS Computational Biology 10:e1003542. Gould, S.J. 1985. “Fleeming Jenkin Revisited.” Natural History 94 (6): 14–20. Gould, S.J., and N. Eldredge. 1977. “Punctuated Equilibria: The Tempo and Mode of Evolution Reconsidered.” Paleobiology 3:115–51. Griffiths, P.E., and K. Stotz. 2008. “Experimental Philosophy of Science.” Philosophy Compass 3 (3): 507–21. Griffiths, Thomas L., and Mark Steyvers. 2004. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences 101 (suppl 1): 5228–35. https://doi .org/10.1073/pnas.0307752101. Grim, P., and D. Singer. 2020. “Computational Philosophy.” In The Stanford Encyclopedia of Philosophy (fall ed.), edited by Edward N. Zalta. https://plato Grim, P., D.J. Singer, A. Bramson, W.J. Berger, J. Jung, and S. Page. 2018. “Repre-




sentation in Models of Epistemic Democracy.” Episteme 17 (4): 1–21. https:// Gross, K., and C.T. Bergstrom. 2019. “Contest Models Highlight Inherent Inefficiencies of Scientific Funding Competitions.” PLOS Biology 17 (1): e3000065. Guldi, Jo. n.d. “The Dangerous Art of Text Mining” (unpublished manuscript). Guldi, Jo. 2019a. “The Measures of Modernity: Word Counts, Text Mining and the Promise and Limits of Present Tools as Indices of Historical Change.” International Journal for History, Culture and Modernity 7:899–939. https:// Guldi, Jo. 2019b. “Parliament’s Debates about Infrastructure: An Exercise in Using Dynamic Topic Models to Synthesize Historical Change.” Technology and Culture 60 (1): 1–33. Guldi, Jo, and Benjamin Williams. 2018. “Synthesis and Large-Scale Textual Corpora: A Nested Topic Model of Britain’s Debates over Landed Property in the Nineteenth Century.” Current Research in Digital History 1. https://dx.doi .org/10.31835/crdh.2018.01. Habernal, I., and I. Gurevych. 2017. “Argumentation Mining in User-Generated Web Discourse.” Computational Linguistics 43 (1): 125–79. Haldane, J.B.S. 1964. “A Defense of Beanbag Genetics.” Perspectives in Biology and Medicine 7 (3): 343–60. Harris, R. 2017. Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions. New York: Basic Books. Harris, Zellig S. 1954. “Distributional Structure.” Word 10:146–62. Haufe, C. 2016. “Introduction: Testing Philosophical Theories.” Studies in History and Philosophy of Science Part A 59:68–73. Healy, K. 2017. “Fuck Nuance.” Sociological Theory 35 (2): 118–27. Heesen, R., and L.K. Bright. 2021. “Is Peer Review a Good Idea?” British Journal for the Philosophy of Science 72 (3): 635–63. Hegselmann, R., and U. Krause. 2002. “Opinion Dynamics and Bounded Confidence: Models, Analysis and Simulation.” Journal of Artificial Societies and Social Simulation 5 (3). Heliński, M., M. Kmieciak, and T. Parkoła. 2012. Report on the Comparison of Tesseract and ABBYY FineReader OCR Engines. Poznań, Poland: Poznań Supercomputing and Networking Center. Helmane, I., and I. Briška. 2017. “What Is Developing Integrated or Interdisciplinary or Multidisciplinary or Transdisciplinary Education in School?” Signum Temporis 9 (1): 7. Hendricks, V.F. 2006. Mainstream and Formal Epistemology. Cambridge: Cambridge University Press. Hendricks, V.F. 2010. “Knowledge Transmissibility and Pluralistic Ignorance: A First Stab.” Metaphilosophy 41 (3): 279–91. Henrich, J., and F.J. Gil-White. 2001. “The Evolution of Prestige: Freely Conferred


Deference as a Mechanism for Enhancing the Benefits of Cultural Transmission.” Evolution and Human Behavior 22 (3): 165–96. Herbelot, Aurélie, Eva von Redecker, and Johanna Müller. 2012. “Distributional Techniques for Philosophical Enquiry.” In Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, edited by Kalliopi Zervanou and Antal van den Bosch, 45–54. Avignon, France: Association for Computational Linguistics. Hermisson, J., and P.S. Pennings. 2005. “Soft Sweeps: Molecular Population Genetics of Adaptation from Standing Genetic Variation.” Genetics 169:2335–52. Herrera, M., D.C. Roberts, and N. Gulbahce. 2010. “Mapping the Evolution of Scientific Fields.” PLOS ONE 5 (5): e10355. Hodge, M.J.S. 1985. “Darwin as a Lifelong Generation Theorist.” In The Darwinian Heritage: A Centennial Retrospect, edited by David Kohn, 207–43. Princeton, NJ: Princeton University Press. Hofmann, Thomas. 1999. “Probabilistic Latent Semantic Indexing.” Proceedings of the Twenty-Second Annual International SIGIR Conference, 50–57. New York: Association for Computing Machinery. .312649. Holman, B., and J.P. Bruner. 2015. “The Problem of Intransigently Biased Agents.” Philosophy of Science 82 (5): 956–68. Holman, B., and J.P. Bruner. 2017. “Experimentation by Industrial Selection.” Philosophy of Science 84 (5): 1008–19. Hooper, P.L., H.S. Kaplan, and J.L. Boone. 2010. “A Theory of Leadership in Human Cooperative Groups.” Journal of Theoretical Biology 265 (4): 633–46. Howard, Don. 2003. “Two Left Turns Make a Right: On the Curious Political Career of North American Philosophy of Science at Midcentury.” In Logical Empiricism in North America, edited by Gary L. Hardcastle and Alan W. Richardson, 25–93. Minnesota Studies in the Philosophy of Science 18. Minneapolis: University of Minnesota Press. Hu, Yuening, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. “Interactive Topic Modeling.” Machine Learning 95 (3): 423–69. https://doi .org/10.1007/s10994-013-5413-0. Hull, D.L. 1988. Science as a Process: An Evolutionary Account of the Social and Conceptual Development of Science. Chicago: University of Chicago Press. Hull, D.L. 1992. “Testing Philosophical Claims about Science.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1992 (2): 468–75. Huttegger, Simon M., and Brian Skyrms. 2008. “Emergence of Information Transfer by Inductive Learning.” Studia Logica 89 (2): 237–56. Huttegger, Simon M., Brian Skyrms, and Kevin J.S. Zollman. 2014. “Probe and Adjust in Information Transfer Games.” Erkenntnis 79:835–53. Huxley, Julian S. 1942. Evolution: The Modern Synthesis. London: Allen and Unwin.




Hyde, K.K., M.N. Novack, N. LaHaye, C. Parlett-Pelleriti, R. Anden, D.R. Dixon, and E. Linstead. 2019. “Applications of Supervised Machine Learning in Autism Spectrum Disorder Research: A Review.” Review Journal of Autism and Developmental Disorders 6 (2): 128–46. Ioannidis, J.P.A. 2005. “Why Most Published Research Findings Are False.” PLOS Medicine 2 (8): e124. Jablonski, D. 2000. “Micro and Macroevolution: Scale and Hierarchy in Evolutionary Biology and Paleobiology.” Paleobiology 26:15–52. Jinha, A.E. 2010. “Article 50 Million: An Estimate of the Number of Scholarly Articles in Existence.” Learned Publishing 23:258–63. Jockers, Matthew L. 2013. Macroanalysis: Digital Methods and Literary History. Champaign: University of Illinois Press. Johnson, David, and Merry Bullock. 2009. “The Ethics of Data Archiving: Issues from Four Perspectives.” In The Handbook of Social Research Ethics, edited by Donna Mertens and Pauline Ginsberg, 214–28. Thousand Oaks, CA: SAGE. Jones, Karen Spärck. 1972. “A Statistical Interpretation of Term Specificity and Its Application in Retrieval.” Journal of Documentation 28:11–21. Jönsson, M.L., U. Hahn, and E.J. Olsson. 2015. “The Kind of Group You Want to Belong To: Effects of Group Structure on Group Accuracy.” Cognition 142:191–204. Jost, J. 2017. Riemannian Geometry and Geometric Analysis. Heidelberg: Springer. Katzav, J., and K. Vaesen. 2017. “On the Emergence of American Analytic Philosophy.” British Journal for the History of Philosophy 25 (4): 772–98. Kawamura, T., K. Watanabe, N. Matsumoto, S. Egami, and M. Jibu. 2017. “Science Graph for Characterizing the Recent Scientific Landscape Using Paragraph Vectors.” In Proceedings of the Knowledge Capture Conference, 1–8. New York: Association for Computing Machinery. Kendal, R.L., N.J. Boogert, L. Rendell, K.N. Laland, M. Webster, and P.L. Jones. 2018. “Social Learning Strategies: Bridge-Building between Fields.” Trends in Cognitive Sciences 22 (7): 651–65. Kim, Kyung-Man. 1994. Explaining Scientific Consensus: The Case of Mendelian Genetics. New York: Guilford Press. Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press. Kindling, Maxi, Heinz Pampel, Stephanie van de Sandt, Jessika Rücknagel, Paul Vierkant, Gabriele Kloska, Michael Witt, et al. 2017. “The Landscape of Research Data Repositories in 2015: A Re3data Analysis.” D-Lib Magazine 23 (3/4). Kitcher, P. 1989. “Explanatory Unification and the Causal Structure of the World.” In Scientific Explanation, edited by P. Kitcher and W. Salmon, 410–505. Minneapolis: University of Minnesota Press.


Kitcher, P. 1990. “The Division of Cognitive Labor.” Journal of Philosophy 87 (1): 5–22. Klein, J.T. 2008. “Evaluation of Interdisciplinary and Transdisciplinary Research: A Literature Review.” American Journal of Preventive Medicine 35 (2): S116–S123. Klev, Ansten. 2011. “Dedekind and Hilbert on the Foundations of the Deductive Sciences.” Review of Symbolic Logic 4:645–81. Klev, Ansten. 2016. “Carnap on Unified Science.” Studies in History and Philosophy of Science Part A 59:53–67. Kohlstedt, S.G. 2013. “Innovative Niche Scientists: Women’s Role in Reframing North American Museums, 1880–1930.” Centaurus 55 (2): 153–74. Koopman, Rob, Shenghui Wang, and Gwenn Englebienne. 2019. “Fast and Discriminative Semantic Embedding.” In Proceedings of the 13th International Conference on Computational Semantics—Long Papers, 235–46. Gothenburg, Sweden: ASL. Koopman, Rob, Shenghui Wang, and Andrea Scharnhorst. 2017. “Contextualization of Topics: Browsing through the Universe of Bibliographic Information.” Scientometrics 111: 1119–39. Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread: Interactive Navigation in a World of Networked Information.” In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 1833–38. New York: ACM. Kovacs, G., C. Storment, and J. Rosen. 1992. “Regeneration Microelectrode Array for Peripheral Nerve Recording and Stimulation.” IEEE Transactions on Biomedical Engineering 39:893–902. Kuhn, T.S. 1957. The Copernican Revolution: Planetary Astronomy in the Development of Western Thought, vol. 16. Cambridge, MA: Harvard University Press. Kuhn, T.S. 1962. The Structure of Scientific Revolutions. Chicago: University of Chicago Press. Kuhn, T.S., and J. Epstein. 1979. The Essential Tension. College Park, MD: American Association of Physics Teachers. Kumar, S. 2015. “Co-authorship Networks: A Review of the Literature.” Aslib Journal of Information Management 67 (1): 55–73. Lakatos, I. 1978. “Science and Pseudoscience.” Philosophical Papers 1:1–7. Lambert, Ben, Georgios Kontonatsios, Matthias Mauch, Theodore Kokkoris, Matthew L. Jockers, Sophia Ananiadou, and Armand M. Leroi. 2020. “The Pace of Modern Culture.” Nature Human Behaviour 4:352–60. https://dx.doi .org/10.1038/s41562-019-0802-4. Langford, J., and M. Guzdial. 2015. “The Arbitrariness of Reviews, and Advice for School Administrators.” Communications of the ACM 58 (4): 12–13. Lankester, E. Ray. 1896. “Are Specific Characters Useful? [Letter of Jul. 16, 1896].” Nature 54 (1394): 245–46.




Lapointe, Sandra. 2010. “Bolzano a Priori Knowledge, and the Classical Model of Science.” Synthese 174:263–81. Lapointe, Sandra. 2011. Bolzano’s Theoretical Philosophy: An Introduction. Basingstoke, UK: Palgrave Macmillan. Largent, Mark A. 2009. “The So-Called Eclipse of Darwinism.” In Descended from Darwin: Insights into the History of Evolutionary Studies, 1900–1970, edited by Joseph Cain and Michael Ruse, 3–21. Philadelphia, PA: American Philosophical Society. Larivière, Vincent, Chaoqun Ni, Yves Gingras, Blaise Cronin, and Cassidy R. Sugimoto. 2013. “Bibliometrics: Global Gender Disparities in Science.” Nature News 504 (7479): 211. Latour, B. 1987. Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press. Latour, B., and S. Woolgar. 1986. Laboratory Life: The Construction of Scientific Facts. Princeton, NJ: Princeton University Press. Laubichler, M., and J. Maienschein. 2007. From Embryology to Evo-devo: A History of Developmental Evolution. Cambridge, MA: MIT Press. Laudan, L. 1989. “Thoughts on HPS: 20 Years Later.” Studies in History and Philosophy of Science Part A 20:9–13. Laudan, L., A. Donovan, R. Laudan, P. Barker, H. Brown, J. Leplin, P. Thagard, and S. Wykstra. 1986. “Scientific Change: Philosophical Models and Historical Research.” Synthese 69 (2): 141–223. Lawrence, J., and C. Reed. 2020. “Argument Mining: A Survey.” Computational Linguistics 45 (4): 765–818. Layritz, Paul Eugen. 1755. Erste Anfangsgründe der Vernunft-Lehre. Sulechów, Poland: Dendeler. Lee, P.C., and H.N. Su. 2010. “Investigating the Structure of Regional Innovation System Research through Keyword Co-occurrence and Social Network Analysis.” Innovation 12 (1): 26–40. LeGrand, E.K., and C.C. Brown. 2002. “Darwinian Medicine: Applications of Evolutionary Biology for Veterinarians.” Canadian Veterinary Journal 43 (7): 556–59. Lehrer, K., and C. Wagner. 1981. Rational Consensus in Science and Society: A Philosophical and Mathematical Study. Dordrecht: D. Reidel. Leitgeb, H. 2011. “Logic in General Philosophy of Science: Old Things and New Things.” Synthese 179 (2): 339–50. Leng, K. 2013. “An ‘Elusive’ Phenomenon: Feminism, Sexology and the Female Sex Drive in Germany at the Turn of the 20th Century.” Centaurus 55 (2): 131–52. Leonelli, S. 2015. “What Counts as Scientific Data? A Relational Framework.” Philosophy of Science 82:810–21. Leonelli, S. 2016. Data-Centric Biology: A Philosophical Study. Chicago: University of Chicago Press.


Lester, Joseph. 1995. E. Ray Lankester and the Making of Modern British Biology. Edited by Peter J. Bowler. Oxford: British Society for the History of Science. Levanon, A., P. England, and P. Allison. 2009. “Occupational Feminization and Pay: Assessing Causal Dynamics Using 1950–2000 US Census Data.” Social Forces 88 (2): 865–91. Levy, O., and Y. Goldberg. 2014. “Neural Word Embedding as Implicit Matrix Factorization.” Advances in Neural Information Processing Systems 27:2177–85. Leydesdorff, L. 1989. “The Relations between Qualitative Theory and Scientometric Methods in Science and Technology Studies: Introduction to the Topical Issue.” Scientometrics 15 (5–6): 333–47. Li, E.Y., C.H. Liao, and H.R. Yen. 2013. “Co-authorship Networks and Research Impact: A Social Capital Perspective.” Research Policy 42 (9): 1515–30. Li, G., W. Ge, J. Zhang, and M. Kwauk. 2005. “Multi-Scale Compromise and Multi-Level Correlation in Complex Systems.” Chemical Engineering Research and Design 83:574–82. Linden, N., and S. Popescu. 2001. “Good Dynamics versus Bad Kinematics: Is Entanglement Needed for Quantum Computation?” Physical Review Letters 87:047901. Lindenmayer, A. 1968. “Mathematical Models for Cellular Interactions in Development II: Simple and Branching Filaments with Two-Sided Inputs.” Journal of Theoretical Biology 18:300–315. Lippi, M., and P. Torroni. 2016. “Argumentation Mining: State of the Art and Emerging Trends.” ACM Transactions on Internet Technology (TOIT) 16 (2): 1–25. List, Christian, and Philip Pettit. 2004. “An Epistemic Free-Riding Problem?” In Karl Popper: Critical Appraisals, edited by Philip Catton and Graham Macdonald, 128–58. New York: Taylor and Francis. Liu, B. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. New York: Cambridge University Press. Lobbé, Q., A. Delanoë, and D. Chavalarias. 2021. “Exploring, Browsing and Interacting with Multi-Level and Multi-Scale Dynamics of Knowledge.” HAL Open Science. Lochmiller, R.L., and C. Deerenberg. 2000. “Trade-Offs in Evolutionary Immunology: Just What Is the Cost of Immunity?” Oikos 88 (1): 87–98. Lorch, M., and P. Hellal. 2010. “Darwin’s ‘Natural Science of Babies.’” Journal of the History of the Neurosciences 19 (2): 140–57. Losee, John. 2001. A Historical Introduction to the Philosophy of Science. Oxford: Oxford University Press. Macbeth, Danielle. 2016. “Frege and the Aristotelian Model of Science.” In Early Analytic Philosophy—New Perspectives on the Tradition, edited by Sorin Costreie, 31–48. Cham, Switzerland: Springer. MacCallum, C.J. 2007. “Does Medicine without Evolution Make Sense?” PLOS Biology 5 (4): e112.




Machery, E. 2016. “Experimental Philosophy of Science.” In A Companion to Experimental Philosophy, edited by J. Sytsma and W. Buckwalter, 475–90. Chichester, West Sussex: Wiley Blackwell. Maglaughlin, K.L., and D.H. Sonnenwald. 2005. “Factors that Impact Interdisciplinary Natural Science Research Collaboration in Academia.” In Proceedings of ISSI 2005: 10th International Conference of the International Society for Scientometrics and Informetrics, edited by P. Ingwersen and B. Larsen, 499–508. Stockholm: Karolinska University Press. Maienschein, Jane, John N. Parker, Manfred Laubichler, and Edward J. Hackett. 2019. “Data Management and Data Sharing in Science and Technology Studies.” Science, Technology, and Human Values 44:143–60. https://doi .org/10.1177/0162243918798906. Malaterre, Christophe, Jean-François Chartier, and Davide Pulizzotto. 2019. “What Is This Thing Called Philosophy of Science? A Computational Topic-Modeling Perspective, 1934–2015.” HOPOS 9 (2): 215–49. https://doi .org/10.1086/704372. Malaterre, Christophe, Francis Lareau, Davide Pulizzotto, and Jonathan St-Onge. 2021. “Eight Journals over Eight Decades: A Computational Topic-Modeling Approach to Contemporary Philosophy of Science.” Synthese 199:2883–923. Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. “The Stanford CoreNLP Natural Language Processing Toolkit.” In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55–60. Baltimore, MD: Association for Computational Linguistics. Marcus, Gary, and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. New York: Pantheon Books. Marcus, Mitchell P., Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19 (2): 313–30. Marsh, H.W., U.W. Jayasinghe, and N.W. Bond. 2008. “Improving the Peer-Review Process for Grant Applications: Reliability, Validity, Bias, and Generalizability.” American Psychologist 63 (3): 160. Martin, B.R., P. Nightingale, and A. Yegros-Yegros. 2012. “Science and Technology Studies: Exploring the Knowledge Base.” Research Policy 41 (7): 1182–204. McDonald, J.H., and M. Kreitman. 1991. “Adaptive Protein Evolution at the Adh Locus in Drosophila.” Nature 351:652–54. McDowell, J.M., and J.K. Smith. 1992. “The Effect of Gender-Sorting on Propensity to Coauthor: Implications for Academic Promotion.” Economic Inquiry 30 (1): 68–82. McElreath, R., and P.E. Smaldino. 2015. “Replication, Communication, and the Population Dynamics of Scientific Discovery.” PLOS ONE 10 (8): e0136088.


McKinsey, J.C.C., A.C. Sugar, and P. Suppes. 1953. “Axiomatic Foundation of Classical Particle Mechanics.” Journal of Rational Mechanics and Analysis 2:253–72. McKinsey, J.C.C., and P. Suppes. 1953. “Transformations of Systems of Classical Particle Mechanics.” Journal of Rational Mechanics and Analysis 2:273–89. McKinsey, J.C.C., and P. Suppes. 1955. “On the Notion of Invariance in Classical Mechanics.” British Journal for the Philosophy of Science 5:290–302. McLellan-Lemal, Eleanor. 2008. “Qualitative Data Management.” In Handbook for Team-Based Qualitative Research, edited by Greg Guest and Kathleen M. MacQueen, 165–87. Lanham, MD: AltaMira Press. Meehl, P.E. 1967. “Theory-Testing in Psychology and Physics: A Methodological Paradox.” Philosophy of Science 34 (2): 103–15. Meeks, Elijah, and Scott B. Weingart. 2012. “The Digital Humanities Contribution to Topic Modeling.” Journal of Digital Humanities 2 (1). http://journalof Meier, Georg Friedrich. 1762. Vernunftlehre. Halle, Germany: J.J. Gebauer. Mendel, Gregor. 1866. “Experiments in Plant Hybridization.” Verhandlungen des Naturforschenden Vereines in Brünn 4:3–47. Mesoudi, A. 2011. Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences. Chicago: University of Chicago Press. Michener, William K. 2015. “Ten Simple Rules for Creating a Good Data Management Plan.” PLOS Computational Biology 11:e1004525. https://doi .org/10.1371/journal.pcbi.1004525. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” Advances in Neural Information Processing Systems 26 (2): 3111–19. Mitchell, Melanie. 2019. Artificial Intelligence: A Guide for Thinking Humans. New York: Farrar, Straus, and Giroux. Mittelstadt, Brent Daniel, and Luciano Floridi. 2016. “The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts.” Science and Engineering Ethics 22:303–41. Mizrahi, M. 2020. “The Case Study Method in Philosophy of Science: An Empirical Study.” Perspectives on Science 28 (1): 63–88. Mochales, Raquel, and Marie-Francine Moens. 2011. “Argumentation Mining.” Artificial Intelligence and Law 19 (1): 1–22. Moens, Marie-Francine. 2018. “Argumentation Mining: How Can a Machine Acquire Common Sense and World Knowledge?” Argument and Computation 9 (1): 1–14. Mohseni, A., C. O’Connor, and H. Rubin. 2021. “On the Emergence of Minority Disadvantage: Testing the Cultural Red King Hypothesis.” Synthese 198:5599–621.




Montagu, Basil, ed. 1825–1834. The Works of Francis Bacon. London: W. Pickering. Moody, Christopher E. 2016. “Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec.” arXiv:160502019. Moretti, Franco. 2013. Distant Reading. New York: Verso. Morillo, F., M. Bordons, and I. Gómez. 2001. “An Approach to Interdisciplinarity through Bibliometric Indicators.” Scientometrics 51 (1): 203–22. Morris, Steven A., and Betsy Van der Veer Martens. 2008. “Mapping Research Specialties.” Annual Review of Information Science and Technology 42 (1): 213–95. Morscher, Edgar. 2018. “Bernard Bolzano.” In The Stanford Encyclopedia of Philosophy (winter ed.), edited by Edward N. Zalta. archives/win2018/entries/bolzano/. Müller, August Friedrich. 1733. Einleitung in die philosophischen Wissenschaften (Erster Theil). 2nd ed. Leipzig: Breitkopf. Muller, F.A. 2011. “Reflections on the Revolution at Stanford.” Synthese 183:87–114. Mulligan, A., L. Hall, and E. Raphael. 2013. “Peer Review in a Changing World: An International Study Measuring the Attitudes of Researchers.” Journal of the American Society for Information Science and Technology 64 (1): 132–61. Munafò, M.R., B.A. Nosek, D.V. Bishop, K.S. Button, C.D. Chambers, N.P. du Sert, U. Simonsohn, et al. 2017. “A Manifesto for Reproducible Science.” Nature Human Behaviour 1 (1): 0021. Murakami, A., P. Thompson, S. Hunston, and D. Vajn. 2017. “‘What Is This Corpus About?’ Using Topic Modelling to Explore a Specialised Corpus.” Corpora 12 (2): 243–77. Murdock, Jaimie, Colin Allen, Katy Börner, Robert Light, Simon McAlister, Andrew Ravenscroft, Robert Rose, et al. 2017. “Multi-Level Computational Methods for Interdisciplinary Research in the HathiTrust Digital Library.” PLOS ONE 12 (9): e0184188. Murdock, Jaimie, Colin Allen, and Simon DeDeo. 2017. “Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks.” Cognition 159:117–26. Murdock, Jaimie, Colin Allen, and Simon DeDeo. 2018. “Quantitative and Qualitative Approaches to the Development of Darwin’s Origin of Species.” Current Research in Digital History 1. Murdock, Jaimie, Jiann Zeng, and Colin Allen. 2015. “Towards Cultural-Scale Models of Full Text.” Proceedings of the 2016 International Conference on Computational Social Science, Evanston, Illinois. Mutz, R., L. Bornmann, and H.D. Daniel. 2012. “Heterogeneity of Inter-rater Reliabilities of Grant Peer Reviews and Its Determinants: A General Estimating Equations Approach.” PLOS ONE 7 (10): e48509. Nesse, R.M. 2008a. “Evolution: Medicine’s Most Basic Science.” Lancet 372:S21–S27. Nesse, R.M. 2008b. “The Importance of Evolution for Medicine.” In Evolutionary


Medicine, 2nd ed., edited by W.R. Trevathan, J.J. McKenna, and E.O. Smith, 416–32. New York: Oxford University Press. Nesse, R.M. 2019. “The Smoke Detector Principle: Signal Detection and Optimal Defense Regulation.” Evolution, Medicine, and Public Health 1:1. Nesse, R.M., Carl T. Bergstrom, Peter T. Ellison, Jeffrey S. Flier, Peter Gluckman, Diddahally R. Govindaraju, Dietrich Niethammer, et al. 2010. “Making Evolutionary Biology a Basic Science for Medicine.” Proceedings of the National Academy of Sciences 107 (suppl. 1): 1800–1807. Nesse, R.M., and K.C. Berridge. 1997. “Psychoactive Drug Use in Evolutionary Perspective.” Science 278 (5335): 63–66. Nesse, R.M., and J.D. Schiffman. 2003. “Evolutionary Biology in the Medical School Curriculum.” BioScience 53 (6): 585–87. Nesse, R.M., and S.C. Stearns. 2008. “The Great Opportunity: Evolutionary Applications to Medicine and Public Health.” Evolutionary Applications 1 (1): 28–48. Nesse, R.M., and G.C. Williams. 1994. Why We Get Sick: The New Science of Darwinian Medicine. New York: Times Books. Nesse, R.M., and G.C. Williams. 1997. “Evolutionary Biology in the Medical Curriculum: What Every Physician Should Know.” BioScience 47 (10): 664–66. Newman, David J., and Sharon Block. 2006. “Probabilistic Topic Decomposition of an Eighteenth-Century American Newspaper.” Journal of the American Society for Information Science and Technology 57 (6): 753–67. https://doi .org/10.1002/asi.20342. Newman, M.E.J. 2001. “The Structure of Scientific Collaboration Networks.” Proceedings of the National Academy of Sciences 98 (2): 404–9. Nichols, L.G. 2014. “A Topic Model Approach to Measuring Interdisciplinarity at the National Science Foundation.” Scientometrics 100 (3): 741–54. Nickles, T. 1995. “Philosophy of Science and History of Science.” Osiris 10:138–63. Nicolai, A.T., S. Schmal, and C.L. Schuster. 2015. “Interrater Reliability of the Peer Review Process in Management Journals.” In Incentives and Performance, edited I.M. Welpe, J. Wollersheim, S. Ringelhan, and M. Osteroh, 107–19. Dordrecht: Springer. Nissen, S.B., T. Magidson, K. Gross, and C.T. Bergstrom. 2016. “Publication Bias and the Canonization of False Facts.” eLife 5:e21451. North, D.C. 1990. Institutions, Institutional Change and Economic Performance. Cambridge: Cambridge University Press. Nosek, B.A., and D. Lakens. 2014. “Registered Reports.” Social Psychology 45 (3): 137–41. Nosek, B.A., J.R. Spies, and M. Motyl. 2012. “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth over Publishability.” Perspectives on Psychological Science 7 (6): 615–31. Noyons, E.C.M., H.F. Moed, and M. Luwel. 1999. “Combining Mapping and




Citation Analysis for Evaluative Bibliometric Purposes: A Bibliometric Study.” Journal of the American Society for Information Science 50:115–31. NSF. 2015. “Thinking about Data Management Planning.” Working Paper. Arizona State University School of Life Sciences, Center for Biology and Society. Nunez, Tyke. 2014. “Definitions of Kant’s Categories.” Canadian Journal of Philosophy 44:631–57. O’Connor, C. 2017. “The Cultural Red King Effect.” Journal of Mathematical Sociology 41 (3): 155–71. O’Connor, C. 2019. “The Natural Selection of Conservative Science.” Studies in History and Philosophy of Science Part A 76:24–29. O’Connor, C., L.K. Bright, and J. Bruner. 2019. “The Emergence of Intersectional Disadvantage.” Social Epistemology 33 (1): 23–41. O’Connor, C., and J. Bruner. 2019. “Dynamics and Diversity in Epistemic Communities.” Erkenntnis 84 (1): 101–19. Okike, K., K.T. Hug, M.S. Kocher, and S.S. Leopold. 2016. “Single-Blind vs Double-Blind Peer Review in the Setting of Author Prestige.” JAMA 316 (12): 1315–16. Okruhlik, K. 1994. “Gender and the Biological Sciences.” Canadian Journal of Philosophy 24 (suppl. 1): 21–42. Olby, Robert. 1989. “The Dimensions of Scientific Controversy: The BiometricMendelian Debate.” British Journal for the History of Science 22 (3): 299–320. Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716. Overton, J.A. 2013. “‘Explain’ in Scientific Discourse.” Synthese 190 (8): 1383– 405. Painter, D.T. 2019. “Computational Interdisciplinarity: A Study in the History of Science.” PhD diss., Arizona State University. Painter, D.T., B.C. Daniels, and J. Jost. 2019. “Network Analysis for the Digital Humanities: Principles, Problems, Extensions.” ISIS 110 (3): 538–54. Painter, D.T., B.C. Daniels, and M.D. Laubichler. 2021. “Innovations Are Disproportionately Likely in the Periphery of a Scientific Network.” Theory in Biosciences 140 (4): 391–99. Palau, R.M., and M.F. Moens. 2009. “Argumentation Mining: The Detection, Classification and Structure of Arguments in Text.” In Proceedings of the 12th International Conference on Artificial Intelligence and Law, 98–107. Barcelona: Association for Computing Machinery. Pang, B., and L. Lee. 2008. “Opinion Mining and Sentiment Analysis.” Foundations and Trends in Information Retrieval 2:1–135. Pasquetto, Irene V., Christine L. Borgman, and Morgan F. Wofford. 2019. “Uses and Reuses of Scientific Data: The Data Creators’ Advantage.” Harvard Data Science Review 1:1–35.


Pearce, Trevor. 2014. “The Origins and Development of the Idea of Organism-Environment Interaction.” In Entangled Life: Organism and Environment in the Biological and Social Sciences, edited by Gillian Barker, Eric Desjardins, and Trevor Pearce, 13–32. Dordrecht: Springer Netherlands. https://doi .org/10.1007/978-94-007-7067-6_2. Pearson, Karl. 1892. The Grammar of Science. 1st ed. London: Walter Scott. Pearson, Karl. 1906. “Walter Frank Raphael Weldon: 1860–1906.” Biometrika 5 (1/2): 1–52. Pearson, Karl. 1908. “On a Mathematical Theory of Determinantal Inheritance, from Suggestions and Notes of the Late W.F.R. Weldon.” Biometrika 6 (1): 80–93. Peirson, B.R. Erick, Erin Bottino, Julia Damerow, and Manfred D. Laubichler. 2017. “Quantitative Perspectives on Fifty Years of the Journal of the History of Biology.” Journal of the History of Biology 50:695–751. https://dx.doi .org/10.1007/s10739-017-9499-2. Peldszus, Andreas, and Manfred Stede. 2013. “From Argument Diagrams to Argumentation Mining in Texts: A Survey.” International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 7 (1): 1–31. jcini.2013010101. Pence, Charles H. 2011. “‘Describing Our Whole Experience’: The Statistical Philosophies of W.F.R. Weldon and Karl Pearson.” Studies in History and Philosophy of Biological and Biomedical Sciences 42 (4): 475–85. https://doi .org/10.1016/j.shpsc.2011.07.011. Pence, Charles H. 2015. “The Early History of Chance in Evolution.” Studies in History and Philosophy of Science 50:48–58. .shpsa.2014.09.006. Pence, Charles H. 2022. A Pompous Parade of Arithmetic: The Rise of Chance in Evolutionary Theory. London: Academic Press. Pence, C.H., and G. Ramsey. 2018. “How to Do Digital Philosophy of Science.” Philosophy of Science 85 (5): 930–41. Pennington, J., R. Socher, and C.D. Manning. 2014. “GloVe: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. Pennycook, G., and V.A. Thompson. 2018. “An Analysis of the Canadian Cognitive Psychology Job Market (2006–2016).” Canadian Journal of Experimental Psychology 72 (2): 71–80. Peters, D.P., and S.J. Ceci. 1982. “Peer-Review Practices of Psychological Journals: The Fate of Published Articles, Submitted Again.” Behavior and Brain Sciences 5:187–255. Phillips, K.W., G.B. Northcraft, and M.A. Neale. 2006. “Surface-Level Diversity and Decision-Making in Groups: When Does Deep-Level Similarity Help?” Group Processes and Intergroup Relations 9 (4): 467–82.




Pinnick, C., and G. Gale. 2000. “Philosophy of Science and History of Science: A Troubling Interaction.” Journal for General Philosophy of Science 31:109–25. Piotrowski, M., and M. Fafinski. 2020. “Nothing New under the Sun? Computational Humanities and the Methodology of History.” In Proceedings of the Workshop on Computational Humanities Research (CHR 2020), November 18–20, 2020, Amsterdam, Netherlands. Edited by Folgert Karsdorp, Barbara McGillivray, Adina Nerghes, and Melvin Wevers, 171–81. Piper, Andrew. 2018. Enumerations: Data and Literary Study. Chicago: University of Chicago Press. Pitt, J.C. 2001. “The Dilemma of Case Studies: Toward a Heraclitian Philosophy of Science.” Perspectives on Science 9:373–82. Poggiolesi, Francesca. 2016a. “A Critical Overview of the Most Recent Logics of Grounding.” In Objectivity, Realism, and Proof: FilMat Studies in the Philosophy of Mathematics, edited by Francesca Boccuni and Andrea Sereni, 291–309. Cham, Switzerland: Springer. Poggiolesi, Francesca. 2016b. “On Defining the Notion of Complete and Immediate Formal Grounding.” Synthese 193:3147–67. Poggiolesi, Francesca. 2018. “On Constructing a Logic for the Notion of Complete and Immediate Formal Grounding.” Synthese 195:1231–54. Poggiolesi, Francesca. 2021. “Grounding Principles for (Relevant) Implication.” Synthese 198:7351–76. Poggiolesi, Francesca, and Nissim Francez. 2021. “Toward a Generalization of the Logic of Grounding.” Theoria 36 (1): 5–24. Popper, K. 1959. The Logic of Scientific Discovery. London: Routledge. Popper, K. 1968. The Logic of Scientific Discovery. New York: Harper. Popper, K. 1979. Objective Knowledge: An Evolutionary Approach. Oxford: Oxford University Press. Porter, A., A. Cohen, J. David Roessner, and M. Perreault. 2007. “Measuring Researcher Interdisciplinarity.” Scientometrics 72 (1): 117–47. Porter, A., and I. Rafols. 2009. “Is Science Becoming More Interdisciplinary? Measuring and Mapping Six Research Fields over Time.” Scientometrics 81 (3): 719–45. Poza, D.J., F.A. Villafáñez, J. Pajares, A. López-Paredes, and C. Hernández. 2011. “New Insights on the Emergence of Classes Model.” Discrete Dynamics in Nature and Society 2011: article ID 915279. Price, D.J. de Solla. 1965. “Networks of Scientific Papers.” Science 149 (3683): 510– 15. Pritchard, Jonathan K., Matthew Stephens, and Peter Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.” Genetics 155 (2): 945–59. Provine, William B. 1971. The Origins of Theoretical Population Genetics. Princeton, NJ: Princeton University Press.


Quan, W., B. Chen, and F. Shu. 2017. “Publish or Impoverish: An Investigation of the Monetary Reward System of Science in China (1999–2016).” Aslib Journal of Information Management 69 (5): 486–502. Radhakrishnan, S., S. Erbis, J.A. Isaacs, and S. Kamarthi. 2017. “Novel Keyword Co-occurrence Network-Based Methods to Foster Systematic Reviews of Scientific Literature.” PLOS ONE 12 (3): e0172778. nal.pone.0172778. Radick, Gregory. 2005. “Other Histories, Other Biologies.” Royal Institute of Philosophy Supplement 56 (3–4): 21–47. 0505602X. Radick, Gregory. 2012. “Should ‘Heredity’ and ‘Inheritance’ Be Biological Terms? William Bateson’s Change of Mind as a Historical and Philosophical Problem.” Philosophy of Science 79 (5): 714–24. Ramsey, Grant, and Charles H. Pence. 2016. “EvoText: A New Tool for Analyzing the Biological Sciences.” Studies in History and Philosophy of Biological and Biomedical Sciences 57:83–87. Randolph, Thomas Jefferson. 1829. Memoir, Correspondence and Miscellanies: From the Papers of Thomas Jefferson. Charlottesville, VA: F. Carr and Co. Rappert, Brian, and Louise Bezuidenhout. 2016. “Data Sharing in Low-Resourced Research Environments.” Prometheus 34:207–24. 09028.2017.1325142. Raup, D.M., S.J. Gould, T.J. Schopf, and D.S. Simberloff. 1973. “Stochastic Models of Phylogeny and the Evolution of Diversity.” Journal of Geology 81:525–42. Ravenscroft, Andrew, and Colin Allen. 2019. “Finding and Interpreting Arguments: An Important Challenge for Humanities Computing and Scholarly Practice.” Digital Humanities Quarterly 13 (4). http://www.digitalhumanities .org/dhq/vol/13/4/000436/000436.html. Recchia, Gabriel. 2016. “The Utility of Count-Based Models for the Digital Humanities.” Digital Humanities Congress 2016, session 6. https://www.dhi Reisch, George. 2009. “Three Kinds of Political Engagement for Philosophy of Science.” Science and Education 18 (2): 191–97. s11191-007-9094-6. Reppen, R. 2010. “Building a Corpus: What Are the Key Considerations?” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 31–37. Abingdon, UK: Routledge. Ribeiro, M.T., S. Singh, and C. Guestrin. 2016. “Why Should I Trust You? Explaining the Predictions of Any Classifier.” In KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. New York: Association for Computing Machinery. Richards, R.J. 1992. “Arguments in a Sartorial Mode, or the Asymmetries of His-




tory and Philosophy of Science.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1992 (2): 482–89. Richardson, S.S., and H. Stevens. 2015. Postgenomics: Perspectives on Biology after the Genome. Durham, NC: Duke University Press. Richerson, P.J., and R. Boyd. 2005. Not by Genes Alone: How Culture Transformed Human Evolution. Chicago: University of Chicago Press. Rivelli, Luca. 2019. “Antimodularity: Pragmatic Consequences of Computational Complexity on Scientific Explanation.” In On the Cognitive, Ethical, and Scientific Dimensions of Artificial Intelligence: Themes from IACAP 2016, edited by Don Berkich and Matteo Vincenzo d’Alfonso, 97–122. Philosophical Studies Series. Cham, Switzerland: Springer. https://doi .org/10.1007/978-3-030-01800-9_6. Rockwell, Geoffrey, and Stéfan Sinclair. 2016. Hermeneutica. Cambridge, MA: MIT Press. Roe, Glenn, Clovis Gladstone, and Robert Morrissey. 2016. “Discourses and Disciplines in the Enlightenment: Topic Modeling the French Encyclopédie.” Frontiers in Digital Humanities 2 (8). Romero, F. 2016. “Can the Behavioral Sciences Self-Correct? A Social Epistemic Study.” Studies in History and Philosophy of Science Part A 60:55–69. Rosenblatt, M. 2016. “An Incentive-Based Approach for Improving Data Reproducibility.” Science Translational Medicine 8:336ed5. Rosenthal, R. 1979. “The ‘File Drawer Problem’ and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638–41. Roski, Stefan. 2017. Bolzano’s Conception of Grounding. Studies in Theoretical Philosophy. Frankfurt am Main: Vittorio Klostermann. Roski, Stefan. 2019. “Bolzano and Kim on Grounding and Unification.” Synthese 196:2971–99. Roski, Stefan, and Antje Rumberg. 2016. “Simplicity and Economy in Bolzano’s Theory of Grounding.” Journal of the History of Philosophy 54:469–96. Rossi, J.S. 1990. “Statistical Power of Psychological Research: What Have We Gained in 20 Years?” Journal of Consulting and Clinical Psychology 58 (5): 646. Rossiter, M.W. 1997. “The Men Move In: Home Economics in Higher Education, 1950–1970.” Rethinking Home Economics: Women and the History of a Profession, edited by S. Stage and V.B. Vincenti, 96–117. Ithaca, NY: Cornell University Press. Rosvall, M., and C.T. Bergstrom. 2007. “Maps of Information Flow Reveal Community Structure in Complex Networks.” arXiv:0707.0609. Roth, C. 2019. “Digital, Digitized, and Numerical Humanities.” Digital Scholarship in the Humanities 34 (3): 616–32. Rothe, Anselm, Alexander S. Rich, and Zhi-Wei Li. 2018. “Topics and Trends in Cognitive Science.” In Proceedings of the 40th Annual Conference of the Cogni-


tive Science Society, edited by Timothy M. Rogers, Marina Rau, Jerry Zhu, and Chuck Kalish, 979–84. Austin, TX: Cognitive Science Society. Rubin, H., and C. O’Connor. 2018. “Discrimination and Collaboration in Science.” Philosophy of Science 85 (3): 380–402. Rudwick, Martin J.S. 1985. The Great Devonian Controversy: The Shaping of Scientific Knowledge among Gentlemanly Specialists. Chicago: University of Chicago Press. Rumberg, Antje. 2013. “Bolzano’s Concept of Grounding (Abfolge) against the Background of Normal Proofs.” Review of Symbolic Logic 6:424–59. Sá, C.M. 2008. “Interdisciplinary Strategies in US Research Universities.” Higher Education 55 (5): 537–52. Sagan, C. 1996. The Demon-Haunted World: Science as a Candle in the Dark. New York: Ballantine Books. Sahlgren, Magnus. 2008. “The Distributional Hypothesis.” Italian Journal of Disability Studies 20:33–53. Sarsons, H. 2017. “Recognition for Group Work: Gender Differences in Academia.” American Economic Review 107 (5): 141–45. Saucan, E., A. Samal, M. Weber, and J. Jost. 2018. “Discrete Curvatures and Network Analysis.” MATCH: Communications in Mathematical and in Computer Chemistry 80:605–22. Schank, J.C., C.J. May, and S.S. Joshi. 2014. “Models as Scaffolds for Understanding.” In Developing Scaffolds in Evolution, Culture, and Cognition, edited by L.R. Caporael, J.R. Griesemer, and W.C. Wimsatt, 147–67. Cambridge, MA: MIT Press. Schickore, Jutta. 2011. “More Thoughts on HPS: Another 20 Years Later.” Perspectives on Science 19 (4): 453–81. Schickore, Jutta, and F. Steinle, eds. 2006. Revisiting Discovery and Justification: Historical and Philosophical Perspectives on the Context Distinction. New York: Springer Science and Business Media. Schillebeeckx, M., B. Maricque, and C. Lewis. 2013. “The Missing Piece to Changing the University Culture.” Nature Biotechnology 31 (10): 938–41. Schmid, Helmut. 1994. “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In Proceedings of International Conference on New Methods in Language Processing, 44–49. Manchester, UK. Schneider, Mike D. n.d. “The Non-Epistemic Origins of Research Strongholds.” Unpublished manuscript. Schnieder, B. 2011. “A Logic for ‘Because.’” Review of Symbolic Logic 4 (3): 445–65. Scholl, R. 2018. “Scenes from a Marriage: On the Confrontation Model of History and Philosophy of Science.” Journal of the Philosophy of History 12:212–38. Scholl, R., and T. Räz. 2016. “Towards a Methodology for Integrated History and Philosophy of Science.” In The Philosophy of Historical Case Studies, edited by T. Sauer and R. Scholl, 69–91. Cham, Switzerland: Springer.




Scott, M. 2008. Wordsmith Tools Version 5. Liverpool: Lexical Analysis Software. Sedlmeier, P., and G. Gigerenzer. 1989. “Do Studies of Statistical Power Have an Effect on the Power of Studies?” Psychological Bulletin 105 (2): 309–16. Sepkoski, D., and M. Ruse. 2009. The Paleobiological Revolution: Essays on the Growth of Modern Paleontology. Chicago: University of Chicago Press. Shifman, L. 2013. “Memes in a Digital World: Reconciling with a Conceptual Troublemaker.” Journal of Computer-Mediated Communication 18:362–77. Siebel, Mark. 2011. “It Falls Somewhat Short of Logical Precision: Bolzano on Kant’s Definition of Analyticity.” Grazer Philosophische Studien 82:91–127. Silva, E.B. 1998. “Review: Rethinking Home Economics: Women and the History of a Profession by Sarah Stage, Virginia B. Vincenti.” Work, Employment and Society 12 (3): 570–72. Sinatra, R., D. Wang, P. Deville, C. Song, and A.L. Barabási. 2016. “Quantifying the Evolution of Individual Scientific Impact.” Science 354:aaf5239. Skyrms, Brian. 2010. Signals: Evolution, Learning and Information. Oxford: Oxford University Press. Sloan, Phillip R. 2000. “Mach’s Phenomenalism and the British Reception of Mendelism.” Comptes Rendus de l’Académie des Sciences: Series III: Sciences de la Vie 323:1069–79. Smaldino, P.E. 2014. “The Cultural Evolution of Emergent Group-Level Traits.” Behavioral and Brain Sciences 37 (3): 243–95. Smaldino, P.E. 2015. “A Theoretical Lens for the Reproducibility Project.” Paul E. Smaldino (blog), September 1. -lens-for-the-reproducibility-project/. Smaldino, P.E. 2017a. “Models Are Stupid, and We Need More of Them.” In Computational Social Psychology, edited by R.R. Vallacher, A. Nowak, and S.J. Read, 311–31. New York: Routledge. Smaldino, P.E. 2017b. “On Preprints.” Academic Life Histories (blog), November 29. Smaldino, P.E., and R. McElreath. 2016. “The Natural Selection of Bad Science.” Royal Society Open Science 3:160384. Smaldino, P.E., M.A. Turner, and P.A. Contreras Kallens. 2018. “Open Science and Modified Funding Lotteries Can Impede the Natural Selection of Bad Science.” Royal Society Open Science 6:190194. Smith, Brian Cantwell. 2019. The Promise of Artificial Intelligence: Reckoning and Judgment. Cambridge, MA: MIT Press. Smocovitis, V.B. 1992. “Unifying Biology: The Evolutionary Synthesis and Evolutionary Biology.” Journal of the History of Biology 25:1–65. Sommers, S.R. 2006. “On Racial Diversity and Group Decision Making: Identifying Multiple Effects of Racial Composition on Jury Deliberations.” Journal of Personality and Social Psychology 90 (4): 597–612. Speakman, R.J., C.S. Hadden, M.H. Colvin, J. Cramb, K. Jones, T.W. Jones, I. Lu-


lewicz, et al. 2018. “Market Share and Recent Hiring Trends in Anthropology Faculty Positions.” PLOS ONE 13 (9): e0202528. Sreejith, R.P., J. Jost, E. Saucan, and A. Samal. 2017. “Systematic Evaluation of a New Combinatorial Curvature for Complex Networks.” Chaos, Solitons and Fractals 101:50–67. Srivastava, Ashok N., and Mehran Sahami. 2009. Text Mining: Classification, Clustering, and Applications. Boca Raton, FL: CRC Press. Stage, Sarah. 1997. “Home Economics: What’s in a Name?” In Rethinking Home Economics: Women and the History of a Profession, edited by Sarah Stage and Virginia Bramble Vincenti, 1–14. Ithaca, NY: Cornell University Press. Stearns, S.C. 1992. The Evolution of Life Histories. Oxford: Oxford University Press. Steel, D., C. Gonnerman, and M. O’Rourke. 2017. “Scientists’ Attitudes on Science and Values: Case Studies and Survey Methods in Philosophy of Science.” Studies in History and Philosophy of Science Part A 63:22–30. Stewart, A.J., and J.B. Plotkin. 2021. “The Natural Selection of Good Science.” Nature Human Behaviour 5:1510–18. Stock, P., and R.J. Burton. 2011. “Defining Terms for Integrated (Multi-InterTrans-Disciplinary) Sustainability Research.” Sustainability 3 (8): 1090–113. Strevens, M. 2003. “The Role of the Priority Rule in Science.” Journal of Philosophy 100 (2): 55–79. Stuart, David. 2017. “Data Bibliometrics: Metrics before Norms.” Online Information Review 41:428–35. Stuewer, R.H., ed. 1970. Historical and Philosophical Perspectives of Science. Minnesota Studies in the Philosophy of Science 5. Minneapolis: University of Minnesota Press. Su, Hsin-Ning, and Pei-Chun Lee. 2010. “Mapping Knowledge Structure by Keyword Co-occurrence: A First Look at Journal Papers in Technology Foresight.” Scientometrics 85 (1): 65–79. Sullivan, E. Forthcoming. “Understanding from Machine Learning Models.” British Journal for the Philosophy of Science. Suominen, A., and H. Toivanen. 2016. “Map of Science with Topic Modeling: Comparison of Unsupervised Learning and Human-Assigned Subject Classification.” Journal of the Association for Information Science and Technology 67 (10): 2464–76. Swanson, Reid, Brian Ecker, and Marilyn Walker. 2015. “Argument Mining: Extracting Arguments from Online Dialogue.” In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 217–26. Prague: Association for Computational Linguistics. v1/W15-4631. Szucs, D., and J.P. Ioannidis. 2017. “Empirical Assessment of Published Effect Sizes and Power in the Recent Cognitive Neuroscience and Psychology Literature.” PLOS Biology 15 (3): e2000797.




Tabery, James G. 2004. “The ‘Evolutionary Synthesis’ of George Udny Yule.” Journal of the History of Biology 37 (1): 73–101. Tajima, F. 1989. “Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism.” Genetics 123:585–95. Tenopir, Carol, Elizabeth D. Dalton, Suzie Allard, Mike Frame, Ivanka Pjesivac, Ben Birch, Danielle Pollock, and Kristina Dorsett. 2015. “Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide.” PLOS ONE 10:e0134826. Thagard, P. 1980. “Against Evolutionary Epistemology.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980 (1): 187–96. Thagard, P. 1989. “Explanatory Coherence.” Behavioral and Brain Sciences 12 (3): 435–502. Thoma, J. 2015. “The Epistemic Division of Labor Revisited.” Philosophy of Science 82 (3): 454–72. Thorén, H. 2015. “History and Philosophy of Science as an Interdisciplinary Field of Problem Transfers.” In Empirical Philosophy of Science, edited by S. Wagenknecht, N.J. Nersessian, and H. Andersen, 147–59. Cham, Switzerland: Springer. Tollefson, J. 2018. “China Declared World’s Largest Producer of Scientific Articles.” Nature 553 (7689): 390. Tomkins, A., M. Zhang, and W.D. Heavlin. 2017. “Reviewer Bias in Single- versus Double-Blind Peer Review.” Proceedings of the National Academy of Sciences 114 (48): 12708–13. Toulmin, S.E. 1958. The Uses of Argument. Cambridge: Cambridge University Press. Trevathan, W.R., E.O. Smith, and J.J. McKenna. 1999. Evolutionary Medicine. Oxford: Oxford University Press. Turchin, P., T.E. Currie, E.A. Turner, and S. Gavrilets. 2013. “War, Space, and the Evolution of Old World Complex Societies.” Proceedings of the National Academy of Sciences 110 (41): 16384–89. Turnbaugh, P.J., R.E. Ley, M. Hamady, C.M. Fraser-Liggett, R. Knight, and J.I. Gordon. 2007. “The Human Microbiome Project.” Nature 449 (7164): 804. Uebel, Thomas. 2005. “Political Philosophy of Science in Logical Empiricism: The Left Vienna Circle.” Studies in History and Philosophy of Science Part A 36 (4): 754–73. Underwood, Ted. 2019. Distant Horizons: Digital Evidence and Literary Change. Chicago: University of Chicago Press. Vaesen, Krist, and Joel Katzav. 2019. “The National Science Foundation and Philosophy of Science’s Withdrawal from Social Concerns.” Studies in History and Philosophy of Science Part A 78 (December): 73–82. .shpsa.2019.01.001.


van den Berg, Hein. 2014. Kant on Proper Science: Biology in the Critical Philosophy and the Opus postumum. Dordrecht: Springer. van den Berg, Hein, and Boris Demarest. 2020. “Axiomatic Natural Philosophy and the Emergence of Biology as a Science.” Journal of the History of Biology 53 (3): 379–422. van Dijk, D., O. Manor, and L.B. Carey. 2014. “Publication Metrics and Success on the Academic Job Market.” Current Biology 24 (11): R516–R517. Van Dijk, H., M.L. Van Engen, and D. Van Knippenberg. 2012. “Defying Conventional Wisdom: A Meta-Analytical Examination of the Differences between Demographic and Job-Related Diversity Relationships with Performance.” Organizational Behavior and Human Decision Processes 119 (1): 38–53. Van Eck, N.J., and L. Waltman. 2009. “VOSviewer: A Computer Program for Bibliometric Mapping.” ERIM Report Series Reference No. ERS-2009–005-LIS. Van Noorden, R. 2014. “Global Scientific Output Doubles Every Nine Years.” Nature News Blog, May 7. ic-output-doubles-every-nine-years.html. Van Rijnsoever, F.J., and L.K. Hessels. 2011. “Factors Associated with Disciplinary and Interdisciplinary Research Collaboration.” Research Policy 40 (3): 463–72. van Wierst, Pauline, Arianna Betti, Steven Hofstede, Thom Castermans, Michel Westenberg, Yvette Oortwijn, Shenghui Wang, and Rob Koopman. 2018. “BolVis: Visualization for Text-Based Research in Philosophy.” In 3rd Workshop on Visualization for the Digital Humanities. Berlin: VIS4DH. van Wierst, Pauline, Sanne Vrijenhoek, Stefan Schlobach, and Arianna Betti. 2016. “Phil@Scale: Computational Methods within Philosophy.” CEUR Workshop Proceedings 1681. .pdf. Vermeir, Koen, Sabina Leonelli, Abdullah Shams Bin Tariq, Samuel Olatunbosun, Augustine Ocloo, Ashraful Islam Khan, and Louise Bezuidenhout. 2018. “Global Access to Research Software: The Forgotten Pillar of Open Science Implementation.” Halle, Germany: Global Young Academy, German National Academy of Sciences Leopoldina. tent/uploads/2018/03/18013_GYA_Report_GARS-Web.pdf. Vicedo, Marga. 1995. “What Is That Thing Called Mendelian Genetics?” Social Studies of Science 25 (2): 370–82. Vinkers, V., J. Tijdink, and W. Otte. 2015. “Use of Positive and Negative Words in Scientific PubMed Abstracts between 1974 and 2014: Retrospective Analysis.” British Medical Journal 351:h6467. von Oertzen, C. 2013. “Science in the Cradle: Milicent Shinn and Her HomeBased Network of Baby Observers, 1890–1910.” Centaurus 55 (2): 175–95. Vorzimmer, Peter. 1963. “Charles Darwin and Blending Inheritance.” Isis 54 (3): 371–90. Walch, Johann Georg. 1726. Philosophisches Lexicon. Leipzig: Gleditsch.




Waltman, L., N.J. Van Eck, and E.C. Noyons. 2010. “A Unified Approach to Mapping and Clustering of Bibliometric Networks.” Journal of Informetrics 4 (4): 629–35. Wang, Z., A.D. Shah, A.R. Tate, S. Denaxas, J. Shawe-Taylor, and H. Hemingway. 2012. “Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning.” PLOS ONE 7 (1): e30412. Waring, T.M., S.H. Goff, and P.E. Smaldino. 2017. “The Coevolution of Economic Institutions and Sustainable Consumption via Cultural Group Selection.” Ecological Economics 131:524–32. Warren, Edward W., ed. 1975. Porphyry the Phoenician—Isagoge. Vol. 16. Toronto: PIMS. Wasserstein, R.L., and N.A. Lazar. 2016. “The ASA’s Statement on p-Values: Context, Process, and Purpose.” American Statistician 70 (2): 129–33. Watts, A. 2003. “A Dynamic Model of Network Formation.” In Networks and Groups, edited by B. Dutta and M.O. Jackson, 337–45. Berlin: Springer. Weiner, H. 1998. “Notes on an Evolutionary Medicine.” Psychosomatic Medicine 60 (4): 510–20. Weisberg, M., and R. Muldoon. 2009. “Epistemic Landscapes and the Division of Cognitive Labor.” Philosophy of Science 76 (2): 225–52. Weldon, W.F.R. 1890. “The Variations Occurring in Certain Decapod Crustacea.—I. Crangon vulgaris.” Proceedings of the Royal Society of London 47:445– 53. Weldon, W.F.R. 1894. “The Study of Animal Variation [Review of Bateson, W., Materials for the Study of Variation].” Nature 50 (1280): 25–26. https://doi .org/10.1038/050025a0. Weldon, W.F.R. 1901a. “Letter from WFRW to KP [Karl Pearson], 1901-11-25,” November 25, 1901. PEARSON/11/1/22/40.6.3. Pearson Papers, University College London. Weldon, W.F.R. 1901b. “Letter from WFRW to KP [Karl Pearson], 1901-12-01,” December 1, 1901. PEARSON/11/1/22/40.6.4. Pearson Papers, University College London. Weldon, W.F.R. 1902. “Letter from WFRW to KP [Karl Pearson], 1902–07,” July 1902. PEARSON/11/1/22/40.8.1. Pearson Papers, University College London. Weldon, W.F.R., Karl Pearson, and C.B. Davenport. 1901. “Editorial: The Scope of Biometrika.” Biometrika 1 (1): 1–2. West, J.D., T.C. Bergstrom, and C.T. Bergstrom. 2010. “The Eigenfactor MetricsTM: A Network Approach to Assessing Scholarly Journals.” College and Research Libraries 71:236–44. West, J.D., J. Jacquet, M.M. King, S.J. Correll, and C.T. Bergstrom. 2013. “The Role of Gender in Scholarly Authorship.” PLOS ONE 8 (7): e66212. Whewell, William. 1837. History of the Inductive Sciences. London: John W. Parker.


Wilkenfeld, D.A., and R. Samuels, eds. 2019. Advances in Experimental Philosophy of Science. New York: Bloomsbury. Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3:160018. Williams, G.C. 1957. “Pleiotropy, Natural Selection, and the Evolution of Senescence.” Evolution 11 (4): 398–411. Williams, G.C., and R.M. Nesse. 1991. “The Dawn of Darwinian Medicine.” Quarterly Review of Biology 66 (1): 1–22. Williamson, T. 2006. “Must Do Better.” In Truth and Realism, edited by Patrick Greenough and Michael P. Lynch, 177–87. Oxford: Clarendon. Wilson, D.S., and R.M. Nesse. 2016. “Evolutionary Medicine Comes of Age: An Interview with Randolph Nesse.” This View of Life, September 20. https:// -randolph-nesse/. Wolff, Christian Freiherr von. 1732. Philosophia rationalis sive logica, methodo scientifica pertractata et ad usum scientiarum atque vitæ aptata: Præmittitur discursus præliminaris de philosophia in genere. Frankfurt: Renger. Woodberry, Owen Grant, Kevin B. Korb, and Ann E. Nicholson. 2009. “Testing Punctuated Equilibrium Theory Using Evolutionary Activity Statistics.” In Australian Conference on Artificial Life, 86–95. Berlin: Springer. Yates, F. 1934. “Contingency Tables Involving Small Numbers and the χ2 Test.” Supplement to the Journal of the Royal Statistical Society 1 (2): 217–35. Zedler, Johann Heinrich. 1734. Grosses vollständiges Universal-Lexicon aller Wissenschafften und Künste. Vol. 7. Halle, Germany: Zedler. Zeng, A., Z. Shen, J. Zhou, J. Wu, Y. Fan, Y. Wang, and H.E. Stanley. 2017. “The Science of Science: From the Perspective of Complex Systems.” Physics Reports 714:1–73. Zitt, Michel. 1991. “A Simple Method for Dynamic Scientometrics Using Lexical Analysis.” Scientometrics 22 (1): 229–52. Zollman, K.J.S. 2007. “The Communication Structure of Epistemic Communities.” Philosophy of Science 74:574–87. Zollman, K.J.S. 2010. “The Epistemic Benefit of Transient Diversity.” Erkenntnis 72:17–35. Zollman, K.J.S. 2018. “The Credit Economy and the Economic Rationality of Science.” Journal of Philosophy 115 (1): 5–33. Zook, Matthew, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, et al. 2017. “Ten Simple Rules for Responsible Big Data Research.” PLOS Computational Biology 13:e1005399.



Colin Allen is distinguished professor in the Department of History and Philosophy of Science at the University of Pittsburgh. His main areas of research concern the philosophical foundations of cognitive science, particularly as these relate to the scientific study of cognition and artificial intelligence. His publications also span topics in the philosophy of neuroscience, philosophy of mind, philosophy of biology, moral machines, and humanities computing. Arianna Betti is professor and chair of philosophy of language at the University of Amsterdam, Institute for Logic, Language and Computation. After studying historical and systematic aspects of ideas such as axiom, truth, and fact (Against Facts, MIT Press, 2015), they now endeavor to create a sound computational methodology to trace the development of ideas such as these in a strongly interdisciplinary setting. They have been visiting professor at the University of Gothenburg and the University of Torino and the recipient of two ERC grants (2008–2013, 2014– 2015) and a Dutch NWO VICI grant (2017–2022). Jelke Bloem is an assistant professor at the University of Amsterdam, Institute for Logic, Language and Computation. With a background in computational humanities, he has previously worked on applying novel digital methods to the study of language variation. Currently he is working on applications of natural language processing methods to philosophy, with a focus on distributional representations of meaning, and is interested in extending this line of research to other text-based humanities as well. 279



Justin Bruner is assistant professor in the Department of Political Economy and Moral Science at the University of Arizona. His work in political philosophy and social epistemology draws on computational and empirical methods from the social sciences. His work has been published in Analysis, Journal of the American Philosophical Association, Philosophical Studies, and Evolution and Human Behavior. Jean-François Chartier holds a PhD in cognitive computer science. He is currently an associate professor at the University of Quebec at Montréal (UQAM), where he conducts research in natural language processing and machine learning applied to social sciences and humanities, including social representation analysis, computational semiotics, and social network analysis. He is also an industrial researcher in data science. David Chavalarias is director of research at the Centre for Social Analysis and Mathematics (CAMS) of the CNRS and director of the Institut des systèmes complexes de Paris Ile-de-France (ISC-PIF; http:// He studies social and cognitive dynamics from modeling and data-mining perspectives, with an interdisciplinary approach based on cognitive and complex systems sciences. His work on the evolution of science includes new methods for reconstructing the dynamics of science from academic productions, as well as models of the collective dynamics of scientific discovery. He has also developed several software and visualization tools to map knowledge from large corpora: academic digital repositories, online media, or press. His website is http://chav Julia Damerow is a scientific software engineer at Arizona State University with a degree in computer science and a PhD in computational history and philosophy of science. Her interests are the application of computation to the field of history and philosophy of science and, more generally, software development in the digital humanities. She cofounded and heads the Digital Innovation Group (DigInG) in the Global Biosocial Complexity Initiative at Arizona State University (http://dig Julia is currently a member of the US Research Software Engineer (US-RSE) Association steering committee and a member of the steering committee of DHTech, an ADHO Special Interest Group. Andreas De Block is a professor at the Institute of Philosophy, KU Leuven, Belgium. His current research interests revolve around philosophy of sex, science, and values and experimental philosophy of medicine.


He likes to collaborate with researchers from other disciplines and has published with legal scholars, psychologists, economists, sport scientists, and biologists. Steve Elliott is a researcher in the US federal government. He is affiliated with Arizona State University via the Center for Biology and Society, the Center for Gender Equity in Science and Technology, and the Center for Organizational Research and Development. As a philosopher and sociologist of science, he studies the reasoning strategies, economics, and social organization of research. Previously he was editor-in-chief of the Embryo Project Encyclopedia, an open-access publication about the history of biology accessed worldwide by millions of people. Annapaola Ginammi is a postdoctoral researcher at the University of Amsterdam, Institute for Logic, Language and Computation. After obtaining her PhD in philosophy from the Scuola Normale Superiore in Pisa, Italy, with a thesis on infinite idealizations in physics, she now combines traditional philosophical research in philosophy of mathematics and logic, investigating concepts such as that of concept, explanation, and the mathematical infinite, with research on computational methods for text-based philosophical research. Bennett Holman is assistant professor of the history and philosophy of science at Yonsei University in Seoul, South Korea, and a senior research associate at the University of Johannesburg. His research is primarily focused on improving scientific epistemology in areas of science heavily influenced by industry funding, especially pharmaceutical research. He has additional interests in political polarization and epistemological systems within a politically fractured polity. Philippe Huneman is director of research at the Institut d’Histoire et de Philosophie des Sciences et des Techniques (CNRS/Paris I Sorbonne). He has published on the relationships between natural selection and causation, on the roles of organism in evolution, and on the conception of emergence. He is collaborating with David Chavalarias on a project funded by the French National Research Agency (ANR) on quantitative computer-assisted epistemology. His edited books include From Groups to Individuals, with F. Bouchard (MIT Press, 2013), Functions (Synthese Library, 2013), and Challenging the Modern Synthesis (with D. Walsh, Oxford University Press, 2017). He will soon publish Why? (Stanford University Press) and a book on the philosophy of biological death (Palgrave Macmillan).




Rob Koopman is an applied research architect at the OCLC EMEA office in Leiden, Netherlands. His research activities include semantic embedding, data mining, and enhancing data quality. He started to work at OCLC (then Pica) in 1981 and designed and developed major parts of the EMEA systems. Since 2012 he has done applied data science for OCLC. Manfred D. Laubichler is Global Futures Professor and President’s Professor of Theoretical Biology and History of Biology. He is director of the School of Complex Adaptive Systems and the Global Biosocial Complexity Initiative at Arizona State University. His work focuses on evolutionary novelties from genomes to knowledge systems, the structure of evolutionary theory, and the evolution of knowledge. He is external professor at the Santa Fe Institute and visiting scholar at the Max Planck Institute for the History of Science in Berlin, Germany, external faculty member at the Complexity Science Hub Vienna, and vice chair of the Global Climate Forum. Kate MacCord is an instructor in the School of Life Sciences at Arizona State University and the program administrator of the McDonnell Initiative at the Marine Biological Laboratory. Kate is a historian and philosopher of science. Her research focuses on four overlapping areas: the intersection of development and evolution, concepts related to the germ line and its regeneration, regeneration as a phenomenon of complex living systems, and establishing collaborations between historians, philosophers, and life scientists. Kate uses history to provide the context for understanding modern concepts in the life sciences and philosophy to systematically parse and clarify these concepts. Jane Maienschein is university professor, regents professor, and president’s professor at Arizona State University, where she directs the Center for Biology and Society. She also serves as fellow and directs the History and Philosophy of Science Project at the Marine Biological Laboratory in Woods Hole, Massachusetts. Maienschein specializes in the history and philosophy of biology and the way biology, bioethics, and biopolicy play out in society, especially looking at developmental and cell biology and most recently at regeneration and at synthetic cell biology, while also exploring the impacts of computational history and philosophy of science. Christophe Malaterre is a professor in the Department of Philosophy at UQAM and Canada Research Chair in the Philosophy of the Life


Sciences. His research focuses on the application of computational approaches (topic modeling, clustering, associative rule inference, multilingual corpus analysis) in the context of conceptual and historical analyses in philosophy of science. He also works on the epistemology of origins of life research and astrobiology. Jaimie Murdock is a senior member of technical staff at Sandia National Laboratories. He earned a joint PhD in cognitive science and complex systems from Indiana University in 2019. His dissertation, “Topic Modeling the Reading and Writing Behavior of Information Foragers,” explores how individuals search for and discover novel concepts in scientific publications. He has extensive experience in digital preservation and information management, formerly working at, the Internet Archive, and the HathiTrust Research Center. Cailin O’Connor is a philosopher of science and applied mathematician specializing in models of social interaction. She is associate professor of logic and philosophy of science and a member of the Institute for Mathematical Behavioral Science at the University of California, Irvine. She is currently co-administering the National Science Foundation grant “Consensus, Democracy, and the Public Understanding of Science.” Her book The Misinformation Age, coauthored with James Owen Weatherall, was published in 2019 with Yale Press, and her book The Origins of Unfairness was also published in 2019 with Oxford University Press. Deryc T. Painter is assistant research professor in the School for Complex Adaptive Systems at Arizona State University, with a degree in biology and a PhD in computational history of science. His interests include analyzing and predicting innovation, scaling relationships in the Great Acceleration, and quantitative analysis of interdisciplinarity. Additionally, he spends time thinking about the intersection of ecology, evolution, and economics. His future goals involve incorporating machine learning into network analysis to conceptual landscapes for potential innovation. Charles H. Pence is Chargé de cours at the Université catholique de Louvain, in Louvain-la-Neuve, Belgium, where he directs the Center for Philosophy of Science and Societies (CEFISES). He also serves as a coeditor of the journal Philosophy, Theory, and Practice in Biology (PTPBio). His work centers on the philosophy and history of biology, with a focus on the introduction and contemporary use of chance and statis-




tics in evolutionary theory. His lab is also one of the foremost groups integrating this work with methods of the digital humanities and is increasingly engaged in the ethical implications of biological science and technology. Davide Pulizzotto is a postdoctoral researcher at the University of Sherbrooke in the field of text data mining for humanities. He is also a research associate at Polytechnique Montréal and a member of CIRST (Centre interuniversitaire de recherche sur la science et la technologie) at UQAM. His areas of interest are semiotics and linguistics, text analysis with computational and quantitative methods, machine learning, digital humanities, and computational semiotics. Thibault Racovski is a postdoctoral researcher associated with the Institute for the History and Philosophy of Science and Technology (CNRS & Paris I Pantheon-Sorbonne University) and a teacher of philosophy and English in secondary education. His doctoral thesis, completed in 2019 at the University of Exeter, focused on philosophical and historical issues surrounding biological research on evolutionary novelty. As a member of the French ANR-funded “EPIQUE” research project, he uses original scientometric tools to address philosophical questions on the evolution of science and to acquire new insights on case studies in the history of biology. Grant Ramsey is a research professor at the Institute of Philosophy, KU Leuven, Belgium. His work centers on philosophical problems at the foundation of evolutionary biology. He runs the Ramsey Lab, a highly collaborative research group focused on issues in the philosophy of the life sciences. Hannah Rubin is an assistant professor in the Philosophy Department at the University of Notre Dame, where she is also affiliated with the Reilly Center for Science, Technology, and Values and the Center for Network and Data Science. She specializes in evolutionary game theory, philosophy of biology, and formal models of science. She is currently principal investigator for an National Science Foundation Career grant, titled “Race, Gender, and the Science of Science,” integrating considerations about the effects of researchers’ social identities on inquiry (from philosophy of science) into the study of how to achieve well-functioning science.


Mike D. Schneider is a philosopher of science, primarily focused on the dynamics of theory development at the intersection of contemporary high-energy physics and large-scale cosmology. He is currently a postdoctoral researcher on the Cosmology Beyond Spacetime project, through the Philosophy Department in the University of Illinois at Chicago. Previously, he was a 2020–2021 postdoctoral fellow at the University of Pittsburgh’s Center for Philosophy Science, having completed his dissertation in the Department of Logic and Philosophy of Science at the University of California, Irvine. Paul E. Smaldino is associate professor of cognitive and information sciences at the University of California, Merced. Following undergraduate training in physics and a PhD in psychology, he held postdoctoral appointments in departments of medicine, anthropology, political science, and computer science before joining the faculty at UC Merced in 2016. His work focuses on the development of mathematical and computational models of social and cultural phenomena. He has published widely on cultural evolution, cooperation, signaling, metascience, and the use of formal models in the social and behavioral sciences. Krist Vaesen is associate professor in the philosophy of innovation in the Department of Philosophy and Ethics at Eindhoven University of Technology. Krist has worked on a wide range of topics, including technological normativity, the extended mind hypothesis, the epistemology of cognitive artifacts, the cognitive bases of tool use, and experimental philosophy. His current research interests include theories of cultural and technological evolution, foundational issues in human origins research, the philosophy of scientific models, scientific pluralism, science and research policy, and the history of twentieth-century Anglo-American philosophy. Shenghui Wang is an assistant professor at the HumanMedia Interaction (HMI) group of the University of Twente and research scientist at OCLC, Leiden, Netherlands. After obtaining her PhD in computer science at the University of Manchester in 2007, Shenghui has worked on various projects to apply natural language processing and semantic technologies to improve the semantic interoperability in the domains of cultural heritage and digital humanities. Her current research interests include natural language semantics, text and data mining, concept modeling, and human-computer interaction.



Note: References in italic figures and tables. agent-based models, 11, 12–13, 42, 72. See also models Alcock, J., 209 Al-Doulat, A., 125–29 Allen, C., 33, 113 Ambrosino, A., 123, 124 American Home Economics Association, 235n17 American Journal of Sociology, 123 The American Naturalist, 148 American Statistical Association (ASA), 30 Amgen, 19, 231n1 (Ch. 1) Analytica Posteriora (Aristotle), 187 Ankeny, R.A., 10, 151 anthropomorphism, 118 Antonovics, J., 9 argument mining, 129–30, 184 Ariadne, 193–94, 196, 201 Aristotle, 187, 240n4 Ayala, Francisco, 113 Bachelard, Gaston, 92 Baker-Brown for General American English corpus (AmE06), 222 Bala, V., 42, 43 Baldwin, James Mark, 70 Baldwin, Melinda, 160

bandit problem, 42–43 Barabási, A.-L., 215, 216 bargaining: and academic collaboration, 58; diverse collaborations, 57–59 Barrett, J.A., 44–45, 46 Bateson, William, 149–50, 151–53, 155, 156–58, 162 Bayesian model of science, 22 Bayes’ Theorem, 21, 103 Beatty, John H., 150 Bergstrom, C.T., 6, 222 Bernard-Bolzano-Gesamtausgabe (BGA), 189 Betti, Arianna, 187, 189, 191, 240n2 between-group collaborations, 61–67, 63, 64 bias, 4, 33, 36, 80, 98: and network formation, 50–51; and pooling/aggregation models, 50–51; publication, 35, 38, 40; see also discrimination big data revolution, 7 “biometrical school,” 150–51 Biometrika, 159–60 biometry, 158–60 biometry-Mendelism debate, 149–63 blending inheritance, 149 BolVis, 189; exact query 1, 200–202; query 1, 196–97; query 2, 197–98;




query 3, 198; query 4, 198–99; query 5, 199–200; traditional theory of concepts with, 191–93 Bolzano, Bernard, 186, 191–93 Bordons, M., 215 Börner, K., 225 Botts, T.F., 59 Box, George, 114 Boyack, K.W., 225 Boyd, R., 11 Bright, L.K., 233n1, 235n16 Bruner, J.P., 43–44, 46, 232n9, 233n4 Callon, M., 85 Campbell, L.G., 27, 28, 56 Carlin, J., 231n2 (Ch. 1) Carnap, Russell, 187 case studies, 4, 14; dilemma of, 80; “theory-ladenness” of, 80 Cavalli-Sforza, L.L., 11 Chalmers, David, 4 Chartier, Jean-François, 117, 129 Chavalarias, D., 86–88, 90, 94 Chen, C.M., 225 Chen, P.M., 10 Chomsky, Noam, 232n9 Classical Model of Science (CMS), 189–91 close reading, 9, 14, 115, 116, 188 Cloud storage, 139 co-authorship networks, 215, 217 co-citation analysis, 83 Cognition, 237n3 Cohen, Jacob, 30–31 Cointet, J.P., 94 communication structure: costminimizing, 43; and myopic individuals, 41; and network formation, 43–45; and scientific inquiry, 40–41 contagion of disrespect: child study, 69–70; and diverse collaborations, 68–71; home economics, 70; inequity, 71 context modeling, 105 Contreras Kallens, P.A., 35 count-based modeling, 194 Courtial, J.P., 85

co-word analysis, 83–84 crisis: scientific communities in, 160–63 Cultural Evolution Society, 37 “cultural red king effect,” 12 Cultural Transmission and Evolution: A Quantitative Approach (CavalliSforza and Feldman), 11 Culture and the Evolutionary Process (Boyd and Richerson), 11 Cunningham, J.T., 153, 156 Daniels, B.C., 207 Darbishire, A.D., 151 Darwin, Charles, 20, 108, 111, 112–14, 117, 149, 238n4 Darwinism, 12, 28, 78, 81, 98, 150 data: accessing, 139–40; collecting and organizing, 136–39; collection, 136–39; further resources, 143; management, 132–43; recognizing what counts as, 134–36; sharing, 140–42; storing, 139–40 Data Curation Centre, 143 data-driven research, 75–77 data-entry: data-mining, 82; data preprocessing, 169; data repositories, 139 data management plan (DMP): creating, 133–34; diverse in appearance, 134; as living document, 134; using, 133–34 Davenport, Charles B., 159 Dawkins, R., 89 “Dawn of Darwinian Medicine” (Williams and Nesse), 209 DeDeo, Simon, 113 DeGroot, M.H., 42 de Jong, W.R., 189, 191, 192, 201, 240n4 Delanoë, A., 86–88, 90 Dennett, Daniel, 5 Descartes, 5, 187 The Descent of Man (Darwin), 20 diachronic topic analysis, 170 Dietrich, M.R., 10 digital analysis, 152–53 Digital HPS community repository, 140 Digital HPS Consortium, 143 discrimination: vs. between-group

Index payoffs, 66; diverse collaborations, 57–59; and homophily, 67; instances of discrimination, 65; and payoffs for between-group collaborations, 64; proportion of discrimination, 65. See also bias distant reading, 9 Distant Reading (Moretti), 115 distributional semantic models (DSMs), 194; evaluation for philosophy, 202–3 diverse collaborations, 54–72; bargaining, 57–59; contagion of disrespect, 68–71; discrimination, 57–59; epistemic benefits of, 68; importance of, 55–57; modeling increased credit, 61–63, 63; special grants for, 60–61 diversity: in academic communities, 55; within collaborations, 55–56; initiatives, 59–61; minority representation, 60 DMPTool, 143 Donovan, Arthur, 79 Dryad (open access repository), 140 Dublin Core Metadata Initiative, 138–39 Dublin Core standards, 137, 143 Eastern Paralyzed Veterans Association, 93 ECHO (digital data repository), 140 Eldredge, N., 235n1 Ellis, Havelock, 57 Embryo Project, 133, 138, 140 engaged philosophy of science, 173–83; methodological strengths and limitations, 183–85; and topic modeling, 170–73; topics with keywords related to socially, 171 Enumerations (Piper), 115 epistemic social dilemma, 51–53 epistemology: and knowledge, 40; network, 41–43; reorientation of, 40 European Research Council, 133 EvMed Review, 211 Evolution, Medicine, and Public Health, 211 evolutionary biology / biologists, 6, 11,

26, 97, 206, 210–14, 216, 217, 222, 228–30; concept of signature in, 82; phylogeny in, 74 evolutionary game-theoretic models, 54 evolutionary medicine, 15, 204–30; author background and publication location, 212; corpus, 207–9; founder effect, 209–10; genotype, 214–22; growth and development, 209–28; initial growth, 210–11; initial reflections based on analyses, 228–30; migration, 212–13; phenotype, 222–28; population dynamics, 211–12; selection, 213–14 Evolutionary Medicine Network (EvMedNetwork), 208, 209, 211, 213, 214, 217 experimental philosophy, 5; of science, 6 Explaining Scientific Consensus: The Case of Mendelian Genetics (Kim), 151 “Explanatory Coherence” (Thagard), 12 fact finding: hypothesis selection and investigation, 22, 23; “hypothesis testing” 20–22, 21; population dynamics of hypotheses, 24, 24–26; and science, 20 FAIR principles, 142 Falconer, Hugh, 112 false facts, 20, 25–26; and variation, 29, 30 Feldman, M.W., 11 Feldon, D.F., 59 Forman, R.R., 216 Forman-Ricci curvature, 216, 218–21 Fortunato, S., 6, 231n2 (intro) founder effect, 209–10 Freeman, R.B., 233n2 Frege, Gottlob, 186, 187 French, John R.P., Jr., 42 Future and Emerging Technologies (FET Open) funding scheme, 90 Galton, Francis, 150 Garfield, Eugene, 231n1 (Introduction) Gelman, A., 231n2 (Ch. 1) general philosophical theories, 77–78




Genêt, Edmond-Charles, 110 genotype, 206, 214–22 Giere, Ronald, 3 Giordan, G., 123 GitHub, 140 Glänzel, W., 215 Global Environmental Change, 124 Goodin, R., 44 Goodman, Alyssa, 143 Gould, S.J., 235n1 Goyal, S., 42, 43 Griffiths, Thomas L., 169 Grim, P., 7 Gross, K., 6 Haldane, J.B.S., 151 Hammond, George, 110 Haraway, Donna J., 173 Harding, Sandra, 173 Hegselmann, R., 233n5 heritability, 28–32 Hermeneutica (Rockwell and Sinclair), 115 history and philosophy of science (HPS), 4, 13–14, 164; computational, 7; as science of science, 6–8; topic modeling in, 164–85; traditional, 7 history of science, 3, 77–81 History of the Inductive Sciences (Whewell), 113 Holman, B., 43–44, 46, 232n9 homophily, 64, 68, 72; and agent-based models, 72; collaborative, 60; and discrimination, 67; inbreeding, 63, 63, 66 Howard, Don, 165, 167–68, 170, 181–82 HPS Repository, 133, 138–39 Huang, W., 233n2 Hull, David, 11, 28, 78, 97 “human ecology,” 70 Huttegger, Simon M., 41, 43, 232n1 Huxley, Thomas Henry, 156 hypothesis: formulations of, 38; investigation, 22, 23; population dynamics of, 24, 24–26; and replication, 31; selection, 22, 23 “hypothesis testing” 12, 20–22, 21; rep-

lication, 31; Type 1 errors, 21; Type 2 errors, 21 Ideengeschichtelich, 188 incommensurability, 148 influence: and network formation, 45–49, 48; and reliability, 45–49, 48 interdisciplinarity, 204–5 interest: and network formation, 49–50; and pooling/aggregation models, 49–50 International Society for Evolution, Medicine, and Public Health (ISEMPH), 207 interpretive pluralism, 115 Ioannidis, J.P.A., 22, 31 Jefferson, Thomas, 109–10 Jinha, A.E., 236n3 Jockers, Matthew, 115 Johannsen, Wilhelm, 153 Jost, J., 207 Journal of Digital Humanities, 103, 105 Journal of the History of Biology, 237n3 JSTOR, 136, 169 Kant, I., 187, 188–89 Katzav, J., 125 keyword co-occurrence networks (KCNs), 223–24, 225, 227–29 Kim, Kyung-Man, 148, 151–53, 155, 158, 159 Kovacs, G., 92 Krause, U., 233n5 Kuhn, Thomas, 4, 74, 78, 147–48, 160, 163, 239n13 Kuhnian crisis, 160 Kullback-Leibler (KL) divergence, 112, 117 Kumar, S., 215 Lakatos, I., 96 Lankester, E. Ray, 153, 155, 156, 158 latent Dirichlet allocation (LDA), 13, 103–19, 121–23, 166, 168; as datareduction technique, 125; robust modeling practices, 114–19; topics as

Index contexts, 109–14; use and abuse of topic models, 106–9 latent semantic analysis (LSA), 166 Latour, B., 83 Laubichler, M.D., 207 Laudan, L., 9, 79 Laudan, Rachel, 79 Laville, F., 85 Lawrence, J., 130 Lee, M., 125–29 Lehrer, K., 42 Leibniz, Gottfried Wilhelm, 186 Leng, K., 57 Leonelli, Sabina, 135 Li, E.Y., 215 Liao, C.H., 215 Linden, Noah, 91 Lindenmayer, A., 75 Linguistic Inquiry and Word Count, 126 List, Christian, 52 Lobbé, Q., 86–88, 90 low- and middle-income countries (LMICs), 141–42 L-systems, 75–76 Machery, Edouard, 5–6 machine learning: predictive models using, 241n15; supervised machine learning (SML), 121, 126–31; techniques, 130–31; untapped potential of, 125–31 Macroanalysis (Jockers), 115 Malaterre, Christophe, 117, 129 MALLET 104 Marine Biological Laboratory, Woods Hole, Massachusetts, 133, 138 Marine Biological Laboratory (MBL) History Project, 133, 134–36, 138 Materials for the Study of Variation (Bateson), 150–51 McCarthyism, 167 McElreath, Richard, 29, 31 McLellan-Lemal, Eleanor, 143 Meehl, Paul, 30 Meeks, Elijah, 103–5, 111–12, 118 Mehler, D.M.A., 33 Mendel, Gregor, 151

Mendelism, 151, 158–60 metadata, 84, 126, 137–40, 184–85, 208 Michener, William K., 143 migration, 212–13 Modern Synthesis, 149, 151, 238n3 Mohseni, A., 44–45, 46 Moretti, Franco, 115 Morgan, Robin, 235n17 Murakami, A., 123–24 Murdock, Jaimie, 113 Named Entity Recognizer (NER), 153 Nash demand game, 58–59, 62–63 National Endowment for the Humanities, 133 National Institute of Health, 10 National Science Foundation, 133 naturalism, 3; in philosophy of science, 4–6 Natural Language Processing Toolkit, 126 natural philosophy, 5 Nature, 19–20, 27, 148, 152–53, 156, 157, 212, 238n2, 239n11 Neale, M.A., 56 negative results, publishing, 21, 24, 25, 32–33, 35, 90, 92 Nesse, Randolph M., 205, 209–11, 222, 228 network epistemology, 41–43 network formation: and bias, 50–51; and communication structure, 43–45; and influence, 45–49, 48; and interest, 49–50; models, 45–51, 48; and reliability, 45–49, 48 network of discourse, 153–57; in Nature, from 1895 to 1899, 156; in Nature, from 1900 to 1904, 157 neuroprosthesis, 92 Newman, M.E.J., 216 Newton, Isaac, 5, 187 Nissen, S.B., 232n4 (Ch. 1) Northcraft, G.B., 56 Noyons, E.C., 225 Obaidat, I., 125–29




O’Connor, C., 232n9, 233n4, 234n5 Okruhlik, K., 55 Olby, Robert, 161 On the Origin of Species (Darwin), 112–13, 149 Open Science Collaboration, 19 “Open Science” movement, 32, 37 Otte, W., 27 Overton, J. A., 10 Painter, D.T., 207, 216 “paradigm articulators” 151, 152 Pearl, Raymond, 151, 157 Pearson, Karl, 150, 152–53, 156, 157, 158, 163 Pettit, Philip, 52 phenomenological reconstruction, 75, 76–77, 77, 84, 86, 87, 88, 96; automated, 81–82; to study scientific evolution, 81–82 phenotype, 222–28 Phillips, K.W., 56 Philosophical Review, 125–26 philosophy and history of science (HPS), 4, 13–14, 164; computational, 7; as science of science, 6–8; topic modeling in, 164–85; traditional, 7 philosophy of science, 3, 74, 77–81; as descriptive discipline, 3; experimental, 6; naturalism in, 4–6; and normative questions, 3; socially engaged, 166–68 philosophy of science, engaged, 173–83; methodological strengths and limitations, 183–85; and topic modeling, 170–73; topics with keywords related to socially, 171 Philosophy of Science (journal), 14, 129, 164, 166–69, 170, 173, 181–82, 184–85, 237n3; evolution of cumulated topic-probability, 180; evolution of topic probability in, 180 Philosophy of Science Association (PSA), 169, 183 PhilSci Archive, 140 phylomemetic approach, 13, 74; data-driven research, 75–77; and

dynamics of science, 73–99; theorydriven research, 75–77 phylomemetic network: concept of, 88; defined, 86; multiscale organization of, 87 phylomemy reconstruction: concept of, 88; described, 86; examples of, 89–95, 90, 93, 95; level of observation, 86–87; levels vs. scales, 87; and quantum computing, 91; workflow of, 88, 88 Piper, Andrew, 115 PLOS ONE, 27 PNAS (Proceedings of the National Academy of Sciences), 212 Poggiolesi, Francesca, 188 pooling/aggregation models, 42, 51–53; and bias, 50–51; and influence, 45–49, 48; and interest, 49–50; and reliability, 45–49, 48 Popescu, Sandu, 91 Popper, K., 28, 78 population dynamics: in evolutionary medicine, 211–12; of hypotheses, 24, 24–26 probabilistic latent semantic analysis (pLSA), 166 probabilistic topic model, 122 Proceedings of the Cognitive Science Society, 237n3 Proceedings of the National Academy of Sciences, 27 Provine, William, 148 “Publication Trends in Model Organism Research” (Dietrich, Ankeny, and Chen), 10 Pulizzotto, Davide, 117, 129 p-values, 30 quantitative data-mining approach, 82 quantum computing, 84, 85; phylomemetic branch of, 90; and phylomemy reconstruction, 91 Radick, Gregory, 150 random genetic drift, 82 reading: close, 9, 14, 115, 116, 188; distant, 9

Index Reed, C., 130 rehabilitation engineering, 93 reliability: and influence, 45–49, 48; and network formation, 45–49, 48 Report on the Teak Forests of the Tenasserim Provinces (Falconer), 112 Reppen, R., 207 research data, defined, 135 Research Data Alliance, 143 research productivity, 33–37 retinal prosthesis, 93 Richards, Ellen Swallow, 70 Richerson, P.J., 11 robust modularity, 155 Rockwell, Geoffrey, 115 Romanes, George J., 156, 158 Romero, F., 232n4 (Ch. 1) Rosen, J., 92 Rosenblatt, M., 31 Roski, Stefan, 186 Rossiter, M.W., 70 Rosvall, M., 222 Royal Society, 159 Rubin, H., 234n4 Sagan, Carl, 19 Saint-Blancat, C., 123 Saucan, E., 216 Sbalchiero, S., 123 Schön, Jan, 28 Schubert, A., 215 Schuster, Edgar, 151 Science, 6, 10, 19, 27, 148, 212, 237n3 science and technology studies (STS), 6 Science as a Process (Hull), 11 science of science, 121, 133, 231n1 (Introduction): defined, 6; HPS as, 6–8; papers published under, 7 scientific communities in crisis, 160–63 Scientific Data, 140 scientific evolution: testing theories of, 77–81; through text mining, 82–95; using phenomenological reconstruction to study, 81–82 scientific inquiry, 40; and communication structure, 40–41 scientometrics analysis, 6, 82–84; coword analysis, 83; overview, 82

selective sweep, 97 Shinn, Milicent, 69 Shull, George, 151 Siebel, Mark, 192–93 simulations, 11–12 Sinclair, Stéfan, 115 Singer, D. J., 7 single nucleotide polymorphisms (SNPs), 22 Skyrms, Brian, 41, 43, 44–45, 46, 232n1 Smaldino, Paul, 35, 114 social dilemma, 51–53 social structure, 12, 40 Spencer, Herbert, 150, 158 Spiekermann, K., 44 Sreejith, R.P., 216 Stanford Natural Language Processing (NLP) project, 153 Stapel, Diederik, 28 “Statement on p-Values,” 30 Stearns, Stephen C., 211, 222 Steyvers, Mark, 169 Storment, C., 92 Structure of Scientific Revolutions (Kuhn), 147 Stuewer, Roger, 3 supervised machine learning (SML), 121, 126–31 Suppes, Patrick, 187 Szucs, D., 31 Tabery, James G., 151 text mining: macro-structures of science, 84–89; phylomemy reconstruction examples, 89–95; scientific evolution through, 82–95; scientometrics analysis, 82–84 TextSTAT, 126 Thagard, Paul, 11–12 theoretical reconstruction, 75–76, 77 theory-driven research, 75–77 “theory-ladenness” of case studies, 80 Theory of Science (Bolzano), 186 theory of the germ plasm, 150 Thiselton-Dyer, W.T., 156 Tijdink, J., 27 topic-modeling: algorithms, 165–66; assessing engaged philosophy of




science, 170–73; corpus retrieval and cleaning, 169; data preprocessing, 169; diachronic topic analysis, 170; in HPS 164–85; limits of, 123–25; methodology, 168–70; philosophy of, 165–66; and socially engaged philosophy of science, 166–68; strengths of, 121–23; topic interpretation, 170 Toulmin, Stephen, 130 traditional theory of concepts, 189–91, 193–96 Turner, M.A., 35 two-way reasoning, 78–81 Type 1 errors, 21 Type 2 errors, 21 Tyson, Neil deGrasse, 19 Underwood, Ted, 115–16 untapped potential of machine learning, 125–31 The Uses of Argument (Toulmin), 130 US National Science Foundation’s Social Science program, 167 Vaesen, K., 125 Van Dijk, H., 56 Van Eck, N.J., 225 Van Engen, M.L., 56 Van Knippenberg, D., 56 variation, 28–32 Vicedo, Marga, 161, 238n5, 239n12 Vinkers, V., 27 Virginia Polytechnic Institute (VPI) project, 79–80

von Oertzen, C., 69 Vorzimmer, Peter, 238n4 VOSviewer, 225 Wagner, C., 42 Waltman, L., 225 Wansink, Brian, 28 Watts, A., 62 Web of Science (WoS), 84, 85, 208, 210, 211, 222, 228 Weierstrass, Karl, 186 Weingart, Scott, 103–5, 111–12, 118 Weismann, August, 150, 158 Weldon, W.F.R., 150–53, 155, 156–58, 162–63 Whewell, William, 113 “Why Most Published Research Findings Are False” (Ioannidis), 22 Why We Get Sick: The New Science of Darwinian Medicine (Nesse and Williams), 209–10 Williams, George C., 205, 209–10, 222, 228 Williamson, Timothy, 5 Wilson, David Sloan, 209–10 Woolgar, S., 83 WordSmith Tools, 222 Yen, H.R., 215 Yule, George Udny, 151 Zollman, K.J.S., 11, 41, 42–43, 46, 232n1