The Measurement of Information Integrity: An Information Science Perspective 9780367565732, 9780367565695, 9781003098942

Arguing that there never was a time when politicians did not prevaricate and when some communities did not doubt conclus

322 53 4MB

English Pages [181] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
List of tables
Preface
Acknowledgements
1. Introduction and approach
Introduction
A note on vocabulary
Problem statement
Methodology
Literature review
Summary
2. Context and society
Introduction
Historical context
Evolution of the concepts
Incentives and disincentives
Summary
3. Context and institutions
Introduction
Institutions and infrastructure
Technology and tools
Role of law
Economic context
Summary
4. Disciplines
Introduction
Natural sciences
Social sciences
Humanities
Professional schools
Summary
5. Measurement
Introduction
What is measurement?
Training in measurement
Institutions and measurement
Measurement failures
Summary
6. Actors
Introduction
Investigators
Judges
Violators
Victims
Summary
7. Conclusion and consequences
Introduction
Scholarly fraud
Fraud in the wider world
Bibliography
Index
Recommend Papers

The Measurement of Information Integrity: An Information Science Perspective
 9780367565732, 9780367565695, 9781003098942

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

THE MEASUREMENT OF INFORMATION INTEGRITY

Arguing that there never was a time when politicians did not prevaricate and when some communities did not doubt conclusions that others considered to be facts, The Measurement of Information Integrity puts the post-truth era in context and offers measures for integrity in the modern world. Incorporating international examples from a range of disciplines, this book provides the reader with tools that will help them to evaluate public statements – especially ones involving the sciences and scholarship. It also provides intellectual tools to those who must assess potential violations of public or academic integrity. Many of these tools involve measurement mechanisms, ways of putting cases into context, and a recognition that few cases are simple black-and-white violations. Demonstrating that a binary approach to judging research integrity fails to recognise the complexity of the environment, Seadle highlights that even flawed discoveries may still contain value. Finally, the book reminds its reader that research integrity takes different forms in different disciplines and that each one needs separate consideration, even if the general principles remain the same for all. The Measurement of Information Integrity will help those who want to do research well, as well as those who must ascertain whether results have failed to meet the standards of the community. It will be of particular interest to researchers and students engaged in the study of library and information science. Michael Seadle was long the Director of the Berlin School of Library and Information Science, Humboldt-Universität zu Berlin, Germany, and Dean of Humanities. His current research areas include information integrity and digital archiving, and he currently serves as Executive Director of the international iSchools organisation.

THE MEASUREMENT OF INFORMATION INTEGRITY

Michael Seadle

First published 2022 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 605 Third Avenue, New York, NY 10158 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2022 Michael Seadle The right of Michael Seadle to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-0-367-56573-2 (hbk) ISBN: 978-0-367-56569-5 (pbk) ISBN: 978-1-003-09894-2 (ebk) DOI: 10.4324/9781003098942 Typeset in Bembo by Taylor & Francis Books

To my wife Joan

CONTENTS

List of tables Preface Acknowledgements 1

Introduction and approach

ix x xiii 1

Introduction 1 A note on vocabulary 2 Problem statement 3 Methodology 8 Literature review 14 Summary 25 2

Context and society

27

Introduction 27 Historical context 28 Evolution of the concepts 33 Incentives and disincentives 38 Summary 44 3

Context and institutions Introduction 46 Institutions and infrastructure 47 Technology and tools 53

46

viii Contents

Role of law 59 Economic context 61 Summary 64 4

Disciplines

65

Introduction 65 Natural sciences 68 Social sciences 78 Humanities 84 Professional schools 90 Summary 95 5

Measurement

98

Introduction 98 What is measurement? 99 Training in measurement 104 Institutions and measurement 109 Measurement failures 115 Summary 119 6

Actors

121

Introduction 121 Investigators 122 Judges 127 Violators 132 Victims 137 Summary 143 7

Conclusion and consequences

144

Introduction 144 Scholarly fraud 144 Fraud in the wider world 145 Bibliography Index

148 162

TABLES

4.1 Number of retractions by reason 4.2 Relative size of subject areas as measured by the number of journals 4.3 Number of natural science journals by discipline 4.4 Number of social science journals by discipline 4.5 Humanities journals 4.6 Journals for professional school disciplines

67 68 68 79 86 91

PREFACE

Context The year 2020 was unusual because of the sheer quantity of information integrity topics. Three major events helped to shape the information landscape for the European and North American public. One event was the election of Donald Trump as the US President in 2016 and his defeat in 2020. A second was the COVID pandemic throughout the world, and particularly in the US and Europe, resulting in a disproportionately high number of deaths in some countries. The third event was the British departure from the European Union (Brexit), and the subsequent negotiations. Other salient events included the crash of the Boeing 737 Max in 2019, which resulted in an unusual amount of information about how respected corporations may cover up their own bad judgment, and the ongoing development of the Retraction Watch Database as scholars rushed to bring out pandemic-related articles, a number of which had to be retracted by prestigious journals because of fatal flaws. No one should read this book as a history of the year 2020, but the year plays a role because it demonstrated the importance of having intellectual and technical tools to measure the integrity of information as a basis for making informed judgments about conflicting claims. In a different year the examples might have been different, but the need for measurement would have remained the same. Calculating the overall quantity of misinformation in 2020 is not a goal of this book. Not only would that task be enormous, but the experience of Retraction Watch has shown that a systematic categorisation of information integrity violations based on journal retraction notices is complicated enough, and adding up all possible forms would be like calculating all the possible shades of grey in a black-and-white photograph. This grayscale metaphor is relevant at multiple levels throughout this book, because information integrity issues are never so simple as black or white,

Preface xi

right or wrong, true or false. A grayscale range almost always plays a role and is precisely what makes measurement processes important, because all of the nuances of error and accuracy play an important role in how we can understand the information landscape in which we live. The year is not the only context issue that plays a role in the structure of this book. Location, language, cultural background and preferred news sources all influence the selection of issues and examples. An author who believed in QAnon or in other conspiracy theories would likely have picked different sources and have come to quite different conclusions, even while still talking about measurement. Measurement is itself a tool, and the output from any measurement process will very much depend on the information inputs that are fed into it. This book draws heavily from the Retraction Watch Database, from the New York Times, the Guardian, and the public German news programme ARD. A feature that all of the sources have in common is their respect for the scientific method and for the results that come from scholarly investigation. Measurement itself is an integral part of how science is done in the modern world. What is new here is the attempt to apply measurement techniques to information integrity itself.

How to read this book There are two obvious ways to read this book. It was written from start to finish, chapter after chapter, over a period of about seven months, and reading it from start to finish is the ideal approach to have a sense of how the ideas developed and interact. It is not a story with a historical beginning or ending. The first chapters set the social and institutional context, the middle chapters discuss measurement techniques and examples, and the final chapters look at how the actors involved with discovering and judging information integrity make use of the tools and techniques from earlier chapters. An alternative way to read the book is primarily for people who are seeking specific kinds of information or techniques. For example, those readers whose primary interest is in how the various scholarly disciplines compare with each other in terms of retractions due to problems with plagiarism, data falsification, image manipulation, unreliable data, or analysis problems can go directly to Chapter 4 on “Disciplines,” which uses the Retraction Watch Database to provide a rich set of detail, including comparisons from discipline to discipline. Readers with a special interest in guidelines should look first at Chapter 3 on “Context and institutions,” which discusses the complex landscape of institutional, agency, and national rules, which often say the same thing in different words, and which are generally so abstract that students seriously interested in avoiding problems may have no clear idea about what to do after reading them. These rules are written at a high level of abstraction for a good reason: no rule-making body has the technical expertise to cover every situation that might arise, especially as scientific approaches evolve over time. Readers who want to learn measurement techniques that they can apply on their own may want first to look at Chapter 5 on “Measurement,” which takes the

xii Preface

reader through the core issues of classification and categorisation, without which there is no basis for the kind of counting that is at the heart of measurement techniques. The chapter also looks as concrete examples of where measurement has played a role with issues of public interest. Reasers who care in particular about how humans interact with information integrity cases and violations can begin with Chapter 6 on “Actors,” which discusses four categories: investigators (who are sometimes called hunters and who look for violations), judges (who must decide on the validity of accusations and assign penalties, when appropriate), violators (the people who commit integrity breaches, either intentionally or unwittingly), and victims (who may be those who suffer from the consequences of false information, but may under some circumstances also be the violators themselves). The conclusion, Chapter 7, attempts to pull together many of these themes, but is by no means a substitute for reading what has gone before.

ACKNOWLEDGEMENTS

Let me first acknowledge the help and support of my wife, Joan Luft, who helped with proofreading and made invaluable suggestions about clarity and content. I would also like to acknowledge the help of Melanie Rügenhagen, who helped proofread some of the numerical data. Let me also acknowledge Elke Greifeneder, who sparked my interest in this topic years ago when we worked together as co-editors. The author would like to thank colleagues at Humboldt-Universität zu Berlin, the HEADT Centre, and iSchools for their advice and support. Any errors in this work are of course entirely mine.

1 INTRODUCTION AND APPROACH

Introduction Untruths, exaggerations and outright lies have all existed as long as humans have communicated with each other. The problem today is not merely that a great deal of fake or false information influences people on life-and-death issues such as the ongoing COVID-19 pandemic, but that people need tools for distinguishing between factual information that meets demonstrable standards for reliability and claims that may seem attractive or plausible but lack demonstrable reliability and pose risks. Individual listeners or readers may judge on the basis of personal factors such as tone or style, or a preference for particular outcomes, or on any number of other subjective factors, Subjective factors represent an efficient mental shortcut compared to analysing every statement, and everyone uses them. Nonetheless, as J. R.R. Tolkien (1961) famously wrote, “Shortcuts make long delays,” and delays can be fatal. Sometimes tools are needed to cut through the brambles to find the right path, and this book is about measurement tools that can help to determine the integrity of information. Information is an essential commodity in human society and arguably has been important ever since the first humanoid species learned how to pass on techniques for hunting, gathering, making fire and making tools. In a complex society information directly or indirectly governs everything that people make and do. The very process of writing and publishing a book like this builds on centuries of experience with the development of language, writing, printing as well as the intellectual development of the structure of a book with chapters, references, indexes and headings. The ability to read it depends on information transmitted during the educational process. In May 2020 when work on this book began, one of the key information problems was how soon the COVID-19 pandemic’s rate of replication would have DOI: 10.4324/9781003098942-1

2 Introduction and approach

slowed sufficiently to allow more people to return to their workplaces. The measure of the replication rate is called R0 which is “an epidemiologic metric used to describe the contagiousness or transmissibility of infectious agents” (Delamater et al., 2019). The R0 measure has a life-and-death aspect, because the more a pandemic spreads, the more people die. But the longer people are kept from their workplaces, the more a national economy suffers, which affects the quality of life and has indirect life and death aspects too, depending on the local social safety net. The quality and integrity of the information that scientists use to make these decisions needs to be correct beyond any reasonable doubt. This book will look at all aspects of the integrity of information in public discourse and in scholarly works. Scholarly works matter especially because they stand at the core of any well-informed public discourse. Without the reliability of scientific information, the public discourse degenerates easily into the equivalent of a shouting match where facts fail and the biggest, loudest, and most insistent voice wins. With reliable information, lawmakers and national leaders have at least a reasonable chance of making the best possible decisions in the interests of their citizens, but explaining the science can still be challenging. The German Chancellor Angela Merkel, herself a trained scientist, made a public broadcast explaining why social distancing can save lives: Merkel’s explanation of the scientific basis behind her government’s lockdown exit strategy, a clip of which has been shared thousands of times on social media, had all the calm confidence expected of a former research scientist with a doctorate in quantum chemistry… (Oltermann, 2020) Bhalla (2020) of Vox News claimed that the broadcast went “viral.” Relatively few countries have trained scientists in their leadership, which makes it important that the quality of information that reaches citizens is as reliable as possible. Nothing can guarantee that citizens or national leaders will listen to highquality information, but some effort can be made to measure whether false or misleading information is labelled appropriately, especially when it risks lives.

A note on vocabulary The words “science” and “scientist” appear frequently in this book, and these terms have somewhat different shades of meaning in modern English, German, and French. Contemporary English often uses the word “science” as meaning “natural science,” while the meaning of the German word Wissenschaft or the French term la science embraces the social sciences and humanities as well. When this book uses the word science in its broader continental European sense, it will be put in italics to distinguish it from the more limited meaning that applies only to the natural sciences. The phrase “information integrity” needs clarification as well. In this book it serves as an umbrella term that includes both ends of the scale, from the negative

Introduction and approach 3

(such as “information fraud,” “fake information” and “false information”) to the positive (such as “reliable information”). This book generally avoids simple abstract dichotomies like “lies” or “truth” to emphasise the degree to which information integrity represents a broad continuum and not merely the extremes.

Problem statement The problem statement that this work addresses is simple: to what degree can measurement and the tools involving measurement assist in determining the integrity of information, both in the context of public discourse and in academic research. As the problem statement implies, there is no simple answer to most cases involving the integrity of information. Often broad grey-zones exist between reliable science as we currently understand it and an overly simplistic untruth that contradicts known facts. The grey-zone metaphor comes from digital photography and is meant to give readers a visual sense of range between the pure whiteness of “truth” and the pure blackness of “lies.” While few scholars believe that integrity is a simple black-and-white matter; public announcements about the results of integrity investigations often err in a black-or-white direction to make a point, because serious discussion about integrity problems quickly becomes too complex to make good headlines. This book focuses on an intellectual tool called “measurement” that has existed at least as long as lies have existed. At one level the word “measurement” is simple. It can mean the number of grams (or ounces) of a particular ingredient in a recipe, or the number kilometres (or miles) to the nearest store, but even at this level the word measurement shows its inherent complexity. Grams and ounces have precise meanings in modern times, as do kilometres and miles. The older terms such as ounces and miles grew out of traditional usage that varied from place to place and did not necessarily require absolute precision for every purpose. The metric measurements came from the reforms during the French Revolution and had precise international values from the start, which made them attractive for engineering and natural science applications. Precision in measurement depends in part on the details of categories and classification. A simple example is the distance from one building to another, which depends on whether the measurement starts at a particular door or the nearest outer wall or in some countries from parking lot to parking lot.

Counting Counting is an essential element of measurement, and counting is also ancient. The earliest clay tablets from Mesopotamia contained accounting records that included the numbers of animals someone owned. The British Domesday Book represented a count of people and property in the newly conquered Norman possessions. Such counts represent essential social and economic tools from ancient times onward, especially for taxation. Counting was not always precise and sometimes had legal

4 Introduction and approach

quirks. For example, Article 1, Section 2, Clause 3 of the US constitution before the US Civil War included a rule that counted slaves as only three fifths of a person for determining congressional representation. Definitions matter and affect the accuracy of the measurement. Reliable measurements of many aspects of society became common in the nineteenth century. By 1820, France, Prussia, the UK and the US all had some form of census with a range of questions that variously asked about social and economic status as well as collecting simple population numbers to determine representation in legislative assemblies. The data were not perfect, but were sufficiently reliable for the needs of the time. Regional, state, and municipal governments also began gathering social and economic data on a more frequent basis. Karl Marx spent long hours in the British library gathering statistics about the social conditions of the working class for his books and manifestos. In the later nineteenth and early twentieth centuries Sidney and Beatrice Webb became famous for their use of official British records to support their arguments for “Fabian socialism.” The British records were largely truthful, if not always comprehensive. When the Webbs travelled to the Soviet Union in the 1930s and applied the same techniques using government records, they mistakenly believed that Soviet records were equally reliable. The book that resulted from these records, Soviet Communism: A New Civilization? (Webb, 1935), proved to be an embarrassment, because the Soviet data were politically skewed to the point of untruth. Without reliable data, measurement fails and the results become meaningless.

Statistics One of the important mathematical tools for measurement is statistics. At one time the word statistics primarily meant data, and what is today called descriptive statistics sticks closely to the original meaning with minimal abstraction in its counts, percentages, graphs and other forms of visualisations. Inferential statistics uses mathematical logic expressed in rigorous theorems as the basis for tests that draw conclusions with a degree of mathematical probability. Inferential statistics uses mathematical logic expressed in rigorous theorems as a basis for evaluating how likely observed sample data would be, if a given hypothesis is true. For example, when researchers testing a new medication report p < 0.05, it means that if the medication is in fact useless, at least nineteen samples out of twenty of the size and characteristics the researchers have used would indicate uselessness. (If p < 0.01, then at least 99 samples out of a hundred would indicate the medication is useless.) The reliability of the claim depends also on a wide variety of factors, including the degree to which a sample accurately reflects the population being studied, the quality (reliability) of the data in the sample, and how the so-called “null hypothesis” was stated because that is what is being tested. Statistics is a sophisticated and highly complex science that is integral to a wide range of disciplines. The fact that statistical results rely on assumptions about data and their distribution does not

Introduction and approach 5

mean that statistical results are flawed or unreliable, only that a reader needs to understand what they mean in order to interpret the results correctly. Statistics can be highly misleading if the samples were skewed to produce a particular result. The mere existence of numbers does guarantee that any serious measurement took place or that the results are correct. Fake news often provides statistics that lack integrity. Goodwin (2020) writes: Faced with arguments underpinned by numbers, we need to cultivate statistical alertness so we can spot the falsehoods but also read authentic statistics with shrewdness… Common sense should be our first line of defence. One of Goodwin’s examples was a survey that the Daily Express published in 2015 that “showed 80% of Britons wanted to quit the EU.” The survey had a low response rate and a self-selection problem. Reliable statistics play important roles in business, and the insurance industry offers a classic example. Most western countries keep reliable records about death rates including the age at death and the causes of death. An actuarial analysis of, for example, a non-smoking middle class male with no history of cardiovascular illness suggests statistically that he is likely to live longer than a poor male smoker with a history of heart trouble. While the conclusion may seem obvious, the important question for insurance companies is how much greater is the likelihood of an early death, which will in turn affect the pricing for insurance coverage. This form of measurement provides a rational basis for the pricing, but of course it works only if the information an applicant provides on the forms is true. An actuarial analysis also provides a tool for judging claims that do not fit the model, such as a 40 year old male smoker applicant with known cardiovascular problems who claims that family members with a similar profile always live to 90. Such individual cases could be true, but the statistical likelihood may be low, except in cases where new drugs or certain kinds of lifestyles may make a difference. The point is that statistical analysis is not simple, but that it can provide a logical basis for establishing the plausibility of unlikely claims. A pandemic like COVID-19 affects all forms of data about death rates, and thus changes the expectations on which insurance and many other medical businesses depend. The year 2020 experienced many models about replication rates, death rates, and the likelihood that certain members of society are more at risk (older men, for example) than others. It is a clear instance where separating probable truth from potential false claims takes on life-and-death importance. The pandemic has functioned almost like an experiment in which the different countries have played out different experimental conditions for handling the infection and death rates. Over time it will become clearer whether countries that put a priority on lives via social distancing, extensive testing and medical care fare better economically in the long run than countries that made choices to keep the economy functioning at the expense of additional deaths. While a pandemic might affect everyone equally, the current evidence suggests that the poor and old die at higher rates than working

6 Introduction and approach

age middle class people. An unpleasantly cruel choice could be to let the poor and the old die because their deaths free society from the burden of caring for them longer in old age and thus benefits the economy as a result. No leader has ever admitted to wanting that, though some appear to have preferred it. That is, however, an ethical question, and this book is about measurement, not ethics.

Other tools Measurement is certainly not the only tool for judging the integrity of information, but other tools often build on the data from measurement. One example is the model that the blog Fivethirtyeight uses for predicting election results. Nate Silver (2020) describes this model in detail, both in terms of the polls that get included and other factors such as demographic and economic data and an “uncertainty index”: When it comes to simulating the election – we’re running 40,000 simulations each time the model is updated – the model first picks two random numbers to reflect national drift (how much the national forecast could change) and national Election Day error (how off our final forecast of the national popular vote could be)… One old and standard tool for judging information quality is its source. The reasoning is that if the source is known to be reliable, the information is likely to be reliable too. For example, it is generally reasonable to believe statements about medical issues by a person with a medical degree from a respectable university who specialises in the disease being discussed and who has years of experience. During the COVID-19 crisis people generally trusted the advice of specialists like Dr. Anthony Fauci, director of the US National Institute of Allergy and Infectious Diseases. Like Dr. Fauci, a good expert may give warnings and complex answers that do not serve the goals of the political leadership, which makes it hard for political leaders to accept his advice. Ignoring expertise can, however, be dangerous in the longer term. Accepting the advice of experts like Dr. Fauci is to some extent a cultural issue. Germany has a longstanding respect for learning and technical expertise: many federal and local cabinet members have doctorates, and a person with an academic credential is generally accepted as a reliable source. Other countries such as Japan and Korea share this tendency. In contrast, people in the US and the UK often express scepticism about academic expertise. The idea of “muddling through” (UK) or self-reliance (US) leaves people sceptical about claims that academics know more. Members of some churches and religious sects would rather trust the statements of their pastors or leaders than outside specialists. For them the pastor or leader is the source with the highest reliability because that person speaks the word of God. Measurement plays no role there. A problem with relying too heavily on any single source is that the source could include hidden assumptions that undermine any simple claims. By and large

Introduction and approach 7

statistics from government agencies in western democracies are considered reliable, but problems may emerge when looking beneath the surface. For example, the state COVID-19 death rate in Florida was under dispute. The Miami Herald reported: “The health department is also excluding some snowbirds and other seasonal residents, along with visitors who died in Florida, from its count. The medical examiners are including anyone who died in Florida” (McGrory and Woolington, 2020). In some sense both ways of reporting could be true and reasonable, but excluding seasonal residents served a local political purpose of reducing the state’s death count. The problem had other ramifications, since the Florida outof-state deaths appear never to have been reported to people’s home states, meaning that their counts were also under-reported. The problem is that putting trust in even official sources is itself not automatically reliable, except in cases where the sources are highly regulated and carefully defined, and even then the nature of the regulation and the meaning of the definition need to be transparent. Historians have grappled over the centuries with the problem of which sources to trust. Many historical sources have reflected the bias or blindness of past eras, and people need to be no less willing to question the authority of sources in the contemporary world. The quality of a source itself needs measurement to determine the degree to which it should be trusted. This does not mean that all sources should be distrusted, only that sources themselves need to be measured using critical judgment to examine what they really say, as examples in later chapters will demonstrate. The process is far from perfect, but a willingness to examine details brings readers one step closer to something like the truth.

Ethics Many books on information integrity put a strong focus on ethics, and it is almost impossible not to encounter ethical considerations when writing on the subject. A trained philosopher might reasonably argue that many aspects of measurement have ethical components, since measurements involve choices about what is worth measuring. Ethical considerations are too important to ignore completely, but the focus of this book will remain on the technical and scholarly aspects of how to measure, what to measure, and what the results can tell about the degree to which an information integrity violation has occurred. Ethical analysis by itself cannot expose the facts about whether an integrity violation has occurred, only how to react to an ethical violation. The methodological approach, which the next chapter will discuss, emphasises how people understand and apply the mechanisms of measurement. No analysis is of course ethically neutral, as will become clear when talking about those whose job it is to hunt and to judge violations. The focus of this book is, however, not on the ethics for hunting or judging per se, but on mechanisms for evaluating the results. This book offers tools that anyone can use in judging information integrity issues, including those who commit violations.

8 Introduction and approach

Methodology This book’s methodology is fundamentally the methodology of a trained historian (which the author is). Academic history uses a very wide set of disciplines to gather data, including economics, political science, and in more recent times ethnography and computing. The means of collecting the data is not the defining aspect of historical methodology as much as the ways of looking at the data, which typically emphasise factors such as change over time and the temporal and social context. Historical sources are often but not exclusively text-based. Traditionally historians draw on published works and archival sources. Such sources are often “open access” in the sense that others could visit archives and read published works in libraries. Some sources are of course harder to find than others and some archives and libraries are not completely open to the public. The sources in this work are almost all available to other researchers and in most cases are freely available on the internet. One reason is because the writing has taken place during multiple COVID-19 shutdowns when libraries were at least partly closed and travel was limited. Another reason is to provide maximum transparency for the sources. One dilemma that typically confronts the historian is how much to reference and what to treat as general knowledge. What seems like general knowledge to a historian may not be general knowledge for the general reader. The reference to the French Revolution in the section above has, for example, no external reference because the author has a reasonable expectation that readers know about it. There is also no reference to the definition of the gram and metre, even though these may not be universally known, because it is general information easily found in encyclopedias including Wikipedia. The importance of an element of information plays a role in the use of references. The definition of the gram and metre were used as examples, but statistics about, for example, COVID-19 deaths will of course be referenced. Not all quotations will be from original English language sources. The author will generally make the translation himself from German, French, and other western European languages. The original can be found in a footnote for multilingual readers.

Trust One of the methodological questions that a historian must consider is which sources to use when writing about trust and the integrity of data. The usual rule is that peer-reviewed articles and books have a high measure of trust, but that is clearly too simplistic in a discussion of cases where exactly such works are under examination for possible integrity violations. Retraction Watch (retractionwatch. com) is a source that dates from 2010 and reports on a wide range of article retractions in peer-reviewed sources. Retraction Watch belongs to the Center for Scientific Integrity, which “is a 501(c)3 non-profit. Its work has been funded by generous grants from the John D. and Catherine T. MacArthur Foundation, the

Introduction and approach 9

Laura and John Arnold Foundation, and the Leona M. and Harry B. Helmsley Trust.” (Retraction Watch, n.d.) The reputation of Retraction Watch is relatively high among scholars, except of course among those whom Retraction Watch targets. An example of such criticism comes from Teixeira da Silva (2016): “Retraction Watch is a distinctly anti-science blog whose primary objective is to smear scientists who hold errors or retractions to their names.” In fact the reporting (in blog format) focuses primarily on factual information about retractions from academic journals, and the sources are almost always transparently available. Integrity issues are highly sensitive in the scholarly world because they affect reputations and future prospects. In political discourse it is relatively common to accuse opponents of false statements with little or no evidence. Many listeners and readers automatically discount such claims, but the question becomes more problematic when health issues are at stake. The anti-vaccine campaign was built on a now retracted study of 12 children by Wakefield et al. (1998) that linked Autism with a vaccine for measles, mumps and rubella. The reasons for the anti-vaccine campaign appear to have less to do with medical science or with the Wakefield paper than with personal beliefs and a private distrust in science, at least according to a 2017 WHO pamphlet called “How to Respond to Vocal Vaccine Deniers in Public.” The debate can take on an almost religious dimension that sounds like a clash of fundamental beliefs. While this clash is real, the real issue ought not to be one of belief, but of the evidence that the scientific method offers. Facts matter. It is of course not enough to trust the mere claim of a researcher or scholar that a result is based on solid scientific evidence. The claim also needs a transparent grounding in reproducible facts. In chemistry this may mean mixing two reagents together in the same strength under the exact same conditions and getting the same result. In economics it may mean running the same analysis against the same data for the same result. For historians and other humanities scholars proof is more complicated and means going to the same sources in the same context. The interpretations of the words or economic analysis or chemical results may vary, but the underlying facts should remain. Take away those facts and the basis for trust is gone. A reader of Retraction Watch may legitimately believe, as da Silva wrote, that its objective is to smear scientists, but the retraction notices in the journals are there for anyone to see. The facts remain, despite disagreement about their meaning, and integrity thrives on the measurable existence of facts.

Time Truth is not timeless. Creation stories were credited as facts in pre-modern times because they had become integrated into people’s thinking over hundreds, even thousands, of years, and the very existence of the world seemed like something that needed a story to explain it. Before Copernicus it seemed like a reasonable fact that the sun and planets orbited the earth, since people could easily observe the sun and moon and planets rising and setting. The facts of the movements remained when

10 Introduction and approach

Copernicus offered a new mathematical explanation based on the earth rotating around the sun. His advantage was that he offered a solution that was mathematically simpler and more reliable, and that had economic benefits for navigation. In a sense Einstein’s general relativity turned the tables again by allowing any point (even an accelerating body) to serve as the central point for making observations and taking measurements. This did not mean that Ptolomy was right and Copernicus wrong, only that both were mathematically reasonable representations of the world. In schools children still learn that the earth rotates around the sun. It works as a fact, even though over time scientists have learned to express the meaning more cautiously. The meanings of words change over time as well. This is one of the points that Michel Foucault makes in Folie et Déraison: Histoire de la Folie à l’Âge Classique (1961) (in English Madness and Civilization: A History of Insanity in the Age of Reason). How people have understood insanity has altered over time, which has led to different ways of treating people whom society considers insane. The same sort of evolution in consciousness applied to other scientific topics in the twentieth century. At one time medical studies of heart disease focused largely on men. There were plausible reasons for this, since men were more prone to heart attacks than women, and in common usage the word “men” applied to both men and women (e.g. the much quoted phrase from the US Declaration of Independence that “all men are created equal”). It was not until the women’s suffrage movement in the late nineteenth and early twentieth centuries and more importantly the women’s liberation movement in the mid-twentieth century that society became cautious about situations where the word “men” did not apply generically. Today physicians are more conscious of, for example, the protective role of estrogen in women, and do not assume that studies of male health automatically apply equally to women. The research emphasis changed in part because social meanings had evolved. There were other reasons too, of course, including the fact that many more women themselves became physicians and medical researchers as time moved on. The argument here is not that truth changes over time, but that the perception of how to evaluate and analyse facts is not identical from century to century or even decade to decade. This kind of historical change affects many issues involving information integrity, including plagiarism. Copying the words of others was not regarded as problematic in the years before or even for many years after Gutenberg developed a printing technology based on the wine press. By the time of the Statute of Anne in 1710, publishers had become aware of the economic value of protecting expression. Before 1710, copying was legal if not quite fair game. It was not until the twentieth century when the US joined the Berne Convention in 1988 that the same copyright protection applied to most western countries. Before the 1980s concerns about plagiarism affected mainly large chunks such as whole paragraphs. Since 2000, computing tools have made it possible to detect plagiarism at a finer granular level, and this awareness has broadened the definition of plagiarism. Today paraphrasing may be considered plagiarism, even when some form of

Introduction and approach 11

reference is present. Paraphrasing was common among well-read academics in the mid-twentieth century and was used to express information less intrusively than with an exact quote. Today the expectation has changed. This book avoids paraphrasing to avoid controversy and thus includes far more exact quotes than would have been normal at one time.

Technology Technology in this context has two important meanings. One is engineering-related and refers to tools such as Turnitin (or its variant iThenticate) that can process information more quickly than people can on their own. The other meaning is more abstract and refers to intellectual tools such as statistics. In both cases results from the technology need interpretation. These technologies have evolved in ways that help to detect integrity violations, but, due to their complexity, can under certain circumstances cause doubt and confusion. One of the dangers is that people take results at face value without looking deeper at what they are measuring and the quality of the data. Information technology of the engineering sort has made the spread of both information and misinformation easier. Social media grew out of information technology developments and often gets blamed for not controlling the lies spread by extremist groups. While the criticism is justified, the fact remains that exercising control over an open system is hard. Wikipedia is a similar form of new and socially shared media that made a serious effort early on to constrain those who wanted to use the platform for propaganda purposes. Wikipedia’s approach is a combination of warning texts and human volunteers who help police entries, especially on sensitive topics. Twitter and Facebook have recently adopted similar approaches. Using computing technology to detect false statements can be more multifaceted, and perhaps more fragile, than direct (“brute-force”) comparisons between files to detect potential plagiarism. An example of this form of technology comes from Fatemeh Torabi Asr at Simon Fraser University, who wrote a program that analyses words that characterise fake news (Shariatmadari, 2019). The use of linguistics to detect fraud is also not new. The old article by Markowitz and Hancock (2014) on “Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel” was an earlier application of technology to information integrity violations. He claimed, for example, that Stapel’s “… writing style also matched patterns in other deceptive language, including fewer adjectives in fraudulent publications relative to genuine publications.” The Stapel case also motivated scholars to look at statistical analysis for fraud detection. Martin Enserink (2012) referenced “an unpublished statistical method to detect data fraud” in his article in Science, and statistical analysis is often used today to look at results that seem too perfect to be true. Statistical results are of course probabilities and a moderately improbable result is not impossible, depending on the data and the way an experiment was carried out and the assumptions about

12 Introduction and approach

how representative the data are. What matters here is not the engineering side of information technology (which statistical package was used to calculate the results) but an intellectual understanding of what the results may or may not mean. An example of the need to understand statistics was visible in the push to get out results about COVID-19. Gangelt, a small town in North Rhine-Westphalia, is one of the “most heavily affected areas in Germany”1 and has caused many headlines because one study on the town estimated that there could be 1.8 million infected people in Germany (Streeck et al., 2020). The real number was far less precise according to later comments, because of assumptions that the death rate in Gangelt was the same as in the rest of Germany: “‘One must naturally be a bit careful, it is only an estimate’ conceded Streeck”2 (Tagesschau, 2020). The point is neither that the study was fundamentally untrue nor that the figure in the headlines can be taken at face value, but rather that a reasonable understanding of statistics is an important intellectual tool for judging what such results really mean.

Culture The cultural context plays an important role in the methodology, because cultural norms and expectations can determine what people understand to be an information integrity problem and how to address it. While one can reasonably sometimes talk about US culture or German culture, the concept of culture should not be thought about as coterminous with national boundaries. One reason is that many countries share the same cultural norms. Germany is, for example, known for valuing punctuality, but the same is true for the Nordic countries. Another reason is that multiple cultures (sometimes called micro-cultures) exist within a society as well as across societies. A simple example is the computing community, which shares a language and sometimes metaphors that make sense within the community, but the references may seem opaque to people with minimal technology background. An example is the term “brute-force” in the previous section, which sounds bad but is a standard computing term for an approach that relies more heavily on computing power rather than performance shortcuts. Political cultures, scholarly cultures, and the cultures of people in different economic classes all influence how quickly people recognise an information integrity problem, how urgent it seems, and how they judge it. Legal cultures vary as well, although treaties and international cooperation have brought them closer together. The common law systems in the UK and the US are, for example, fundamentally different from the civil law systems of most of Europe and South America. A good example of where culture matters is the attitude toward copyright violations. For those financially dependent on income from royalties for books, movies, music, or any other form of intellectual property, breaches of copyright seem serious because they affect these individuals’ economic status. Not everyone who creates intellectual property feels the same kind of financial dependence, though. Professors in certain disciplines (historians and economists for example) may gain non-trivial supplements to their salaries from books that sell well, but the

Introduction and approach 13

majority of scholars care more about reaching as broad an audience as possible to build their reputation, because it is reputation that will determine whether they get raises or better positions. This represents economic thinking from a culturally different perspective where open access is a benefit, not a threat. The culture of natural scientists has traditionally also put value on sharing information and data, with the exception of certain kinds of chemical and medical developments that can lead to valuable patents. Typically physicists, mathematicians, biologists and geologists see information sharing as a positive value. One example of information sharing is arXiv, which “is a free distribution service and an open-access archive for 1,697,568 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics” (ArXiv, n.d.). Even though ArXiv makes the point clear that it has no peer-review process, notable papers occasionally appear only in arXiv without being published elsewhere, such as the paper by Grigori Perelman solving the Poincaré conjecture, for which he was offered and “rejected a $1 million prize from the Clay Mathematics Institute in Cambridge, Mass.” (Overbye, 2010). The success of arXiv, the Social Science Research Network (SSRN) and other preprint sites has made preprint servers popular and with them the concept of open peer review. Open peer review is not a single approach, but is one that typically makes papers available for comment, and when there are enough favourable comments, the paper moves to the status of a published work. An example of this comes from Wellcome Open Research,3 which has a four-step process that includes pre-publication checks, immediate publication so that the paper can be cited, open peer review by selected expert reviewers, and official publication, which means only that the “awaiting peer review” label is removed and the article is given to “PubMed, PubMed Central, Europe PMC, Scopus, Chemical Abstract Service, British Library, CrossRef, DOAJ and Embase” (Wellcome Open Research, n.d.). While Wellcome Open Research and other open peer-reviewed journals make an effort to distinguish peer-review approved works from those not yet approved, or not approved at all, the fact remains that they enable access to papers that may contain false information or integrity problems. The hope is that greater transparency will mitigate the effect of false results. It also provides an opportunity for science-sceptics to seize on results that fit a political agenda, even if the results turn out to be flawed. Open data has itself become a topic for information integrity discussions on the principle that when research data are publicly and freely available, anyone can check the validity of the results. Some natural science institutions follow this practice, such as CERN in Geneva, Switzerland: CERN (2019) “… has now provided open access to 100% of its research data recorded in proton–proton collisions in 2010, in line with the collaboration’s data-release policy.” The culture of sharing data is broad and includes European Union funded projects like OpenAIRE. The US National Science Foundation has also pushed for data from publicly funded research projects to be openly available for retesting and re-examination. Thus far open data have not become a

14 Introduction and approach

factor in the spread of false information, perhaps because the data themselves are often too complex for non-scientists and non-scholars to understand and use.

Censorship Censorship is a possible means of shielding the world from false information based on flawed scientific results by instituting more rigorous pre-publication policies or an aggressive policy of removing access to papers whose results do not stand up to replication. Censorship is, however, a politically charged cultural issue and the prospect of explicit censorship is rarely openly discussed. The reasons include the political history of censorship in the twentieth century, since no one wants to be accused of emulating either the Nazis or the Soviets. The reality is also that it is almost impossible to make a work vanish from the internet once it becomes available. Transparency and exposure seem like the best long-term solutions, despite the continuing use some politicians make of disproven results. The problem is that transparency and exposure take time, sometimes decades, to be effective. Nonetheless indirect forms of control are openly discussed to reduce hate speech and false information. Twitter has rules against posts that incite violence, terrorism, child sexual exploitation, and Twitter’s new rule on targeted harassment says “You may not engage in the targeted harassment of someone, or incite other people to do so” (Twitter, n.d. a). A section on hateful conduct says: “You may not promote violence against, threaten, or harass other people on the basis of race, ethnicity, national origin, caste, sexual orientation, gender, gender identity, religious affiliation, age, disability, or serious disease” (Twitter, n.d. a); as of May 2020 Twitter decided to add tags to identify false information. The Twitter blog states (Roth, 2020): During active conversations about disputed issues, it can be helpful to see additional context from trusted sources. Earlier this year, we introduced a new label for Tweets containing synthetic and manipulated media. Similar labels will now appear on Tweets containing potentially harmful, misleading information related to COVID-19. This will also apply to Tweets sent before today. According to a chart on the blog, severely misleading information will be removed. The definition of misleading is: “statements or assertions that have been confirmed to be false or misleading by subject-matter experts, such as public health authorities” (Roth, 2020). The Twitter blog does not define when misleading information becomes severe. It is also not clear whether the controls are applied uniformly to, for example, rants by a US president. What matters here is an attempt to control the free flow of false information on social media. It is a limitation on free speech in an attempt to save lives.

Literature review This section will look at the literature on information integrity and how to address integrity problems including fake news. The goal is not to produce a comprehensive

Introduction and approach 15

bibliography, if that were even possible, but to discuss particular issues in the literature to establish a broad overview of the current thinking on the topic. Perhaps needless to say, the discussion has evolved significantly over time, not only because of the pandemic, which has raised new questions about reliability and the public perception of the reliability of all forms of science in the broadest sense, but because of the US election in 2020 and related concerns about the role of social media. The misuse of information during the pandemic was not unique for a crisis situation. What was unusual was the degree to which both the misuse of information and the awareness of false information became a scholarly and a public issue. Social media was partly responsible for initiating discourse, as well as being at least partly responsible for the spread of false information. Normally a literature review would focus on scholarly publications, but the scope of this review needs to be broader because the discourse has continued to evolve in the course of writing this book. This review starts with literature about the impact of COVID-19, because it has created new conditions and because it has the potential for changing how scholars think about other aspects of information integrity. Guidelines, training programmes, tools and legal issues need consideration as well as the broader categories of plagiarism, image manipulation, and data falsification.

Literature on COVID-19 There is reason to think that the pandemic has been driving changes in how scholarly publishing is handled and especially in how the evaluation process works. Lees (2020) writes in the Economist: As Stuart Taylor, publishing director of the Royal Society, Britain’s top scientific academy, observes, moves towards more open science, preprints and faster dissemination of results were under way before the covid-19 pandemic. But these events will heighten those changes and probably make them permanent. Scholarly communication seems to be at an inflection point. Like many other things until recently taken for granted, it may never return to the way it was before sars-cov-2 came along. Making research results available more quickly essentially without any peer review opens the prospect of uncontrolled false information as well as other integrity problems, including plagiarism. This sounds alarming, but a relevant question is how completely traditional peer review has succeeded in filtering out results that were flawed but not fraudulent. Evidence that traditional peer review has not always prevented false results is available in the Retraction Watch Database.4 Nonetheless the number of problem articles is small compared to publishing as a whole (see Chapter 7). Preprints have become an established way to publish in advance of peer review. In an analysis of publications from the preprint servers bioRxiv (bioarchive) and medRxiv (medarchive), Lees (2020) notes the following study:

16 Introduction and approach

For those who question the quality of science contained in preprints, there is reassurance in a recent study by researchers in Brazil (itself posted as a preprint), in which the authors used a questionnaire to score the quality of preprints on bioRxiv, and also the subsequent peer-reviewed-journal versions of these papers. They found that the journal papers were indeed of higher quality. But the difference was, on average, only 5%. A reasonable question is, of course, whether this preprint study (which has apparently not been peer-reviewed) provides accurate information. Time will tell. There is plenty of discussion about the fact that peer-reviewed papers contain flaws. The distracting focus on hydroxychloroquine as a potential covid-19 treatment was, for example, partly stimulated by a peer-reviewed paper in the International Journal of Antimicrobial Agents that was published on March 20th by French scientists. That paper now has question-marks over its rigour and reliability. (Lees, 2020) Peer review is not new. It started in Scotland in 1731, but remained largely advisory with editors making the real decisions until after the Second World War. Reviewers do not always catch errors, and there are plenty of stories of fake papers or papers with deliberate errors that peer reviewers do not catch (see Hadas, 2014). Some disciplines favour single-blind peer reviewing where the reviewers know who the author is, and others prefer double-blind, on the theory that it puts more famous authors on an equal footing with everyone else. Open peer review may favour established authors, but also exposes them to potential embarrassment that normal peer reviewing hides within the editorial revision process. The pandemic has not been the only factor driving changes to the review process. The Retraction Watch blog has been documenting errors in scholarly publications since 2010 and has a large number of retraction notices in its database. For example, a search of the Retraction Watch database on 15 May 2020 turned up 750 entries for plagiarism alone.5 With tools like iThenticate serious publishers can in theory catch plagiarism cases before publication, even without peer review, but this obviously does not always happen. The category “falsification/fabrication of image” gets 376 hits, and “falsification/fabrication of data” gets 1207 hits. Normal peer review has not prevented these kinds of problems and it is not even clear that it could. An ordinary reviewer may arguably be able to recognise plagiarism without machine assistance when it involves whole chapters or pages or paragraphs, assuming that the reviewer has a good memory for text and knows the literature of the field with the necessary intensity. Most reviewers do not have that kind of memory. Data problems and image problems are even harder for reviewers to detect unless they have access to the dataset or to the original set of images and are ready to take the time to do an additional analysis. People on the review boards of

Introduction and approach 17

journals are often notable people in their field, which means that they can exercise good judgment about the analysis or the argument, but notable scholars are often notably busy and may not have the time or inclination to dig deeply enough to find integrity violations. Open peer review does mean that the range of people who might take the time to look for integrity problems is greater, which is what appears to be happening with COVID-19 related papers on preprint servers. Such papers are, however, a special case, because of the public interest. There is no guarantee that papers on other topics would receive comparable public scrutiny. The point is not that open peer review is better, but that the COVID-19 crisis has highlighted its benefits in ways that may change the reviewing process more broadly.

Plagiarism literature Detection Plagiarism is as old as writing itself and the literature on plagiarism is large. Sundby (1952) wrote about “A Case of Seventeenth-Century Plagiarism” that discussed dictionary makers who “unblushingly copied long lists of words from their predecessors,” and articles discussing plagiarism go back to the eighteenth century. Once more texts became available in digital formats and computer-based plagiarism detection systems began to appear, the attention to plagiarism increased because it was easier to discover. None of the systems are perfect, though. Köhler and Weber-Wulff (2010) wrote a review of 26 systems, including Turnitin, and the highest grade any of them received was a C-. One of the reasons for the low grades was a failure to detect all cases that could be plagiarism in the view of the authors. Defining plagiarism precisely and completely is far from easy “since any definition is inevitably open for interpretation” (Weber-Wulff, 2016). One of the most detailed attempts is the discussion of types of plagiarism on the Vroniplag Wiki (VroniPlag Wiki, n.d. b: “Plagiatsarten”). Some of the types presented in this discussion are straightforward, such as complete plagiarism. Others are less common, such as “pawn sacrifice,”6 where there is a reference to part of the original text, but where there are also larger sections that contain similar or identical text without additional references (VroniPlag Wiki, n.d. a: “BauernOpfer”). The issue is how far a single reference suffices for multiple noncontiguous sentences of overlapping content. Another form of plagiarism is called “concealment”7 where content is paraphrased or reformulated without a clear and direct reference (VroniPlag Wiki, n.d. d: “Verschleierung”). Paraphrasing has a long history, and has not always been seen as plagiarism. Much depends, of course, on how complete the reference is. A more complex category is called “shake-and-paste” for cases where phrases or sentences come from multiple sources without clear referencing (VroniPlag Wiki, n.d. c: “ShakeAndPaste”). These categories define plagiarism at a much more detailed level than copying pages or paragraphs. Sometimes a reference is present, but insufficiently attached to particular sets of words. The measurement issues with these categories will be discussed in later sections.

18 Introduction and approach

The broader literature on plagiarism breaks broadly into two categories: detection (including avoidance), and ethics (including consequences). The former will be discussed in this section, and some ethical issues in the next. Detection is especially important because it involves how measurement is employed. Some relatively recent papers on the topic include Eissen and Stein (2006), who spoke about “Intrinsic Plagiarism Detection” at the European Conference on Information Retrieval, and Potthast et al. (2011), who wrote about “Cross-Language Plagiarism Detection” in Language Resources and Evaluation. These works show how the sophistication of plagiarism detection algorithms has grown to try to detect copying across languages and to recognise aspects of style that indicate possible plagiarism. The computing issues involved in this kind of detection are interesting and sophisticated, and such work matters if one accepts the need to detect plagiarism at a very granular level. Such applications find a market in the vendors of plagiarism detection tools. Since most plagiarism detection companies do not make their algorithms available, only actual tests can determine what they are likely to find. Turnitin was not the first company to offer plagiarism detection software, but it has been one of the most successful and is widely used in universities. Turnitin began in 1998 and became a company in 2000. It uses a comparison mechanism to test whether the words and phrases in a particular article match words and phrases in other articles in their database. The more complete the database, the greater is the likelihood that the system will detect copying, and perhaps the most important feature of Turnitin was the agreement with CrossRef in 2007 to have access to a database that includes most of the large academic publishers (Turnitin, 2020). As with the algorithms, companies do not make public all the sources they use, though most are open about using sources like Wikipedia, partly because so many students use it.

Ethical issues Cheating and plagiarism are closely related insofar as intentional plagiarism involves academic cheating. The degree to which unintentional plagiarism is an ethical issue is more complex and the boundaries are not always clear. Roig and Caso (2005) wrote: That the use of fraudulent excuses is related to self-reported measures of cheating and of plagiarism should not be surprising. After all, the three measures constitute major forms of academic dishonesty. It is worth noting, however, that the strength of these associations is modest at best, which suggests some degree of independence between these constructs. While this article is 15 years old, there is no reason to suppose that the situation has changed greatly. Newer studies are often more focused, such as the article by Chankova (2019) that concludes:

Introduction and approach 19

It thus seems that, for the Bulgarian context, little has changed in the two decades after Hunt’s suggestions: in terms of new knowledge models that emphasize the connection between information and skills with a practical setting of application, none have emerged to replace the one that seems to persist in the students’ perceptions. … Plagiarism and cheating in the Bulgarian context are symptomatic of a rift in the students’ involvement in their learning. Chankova also quotes Pecorari (2015): If factors inherent in some cultures cause a predisposition to plagiarize, then plagiarism does not violate a universal academic value; it violates a belief locally situated in the English-speaking world. Pecorani (2015) goes on to add: If international students from country X do not understand what plagiarism is, then they need an explanation. At a minimum, they need to know what it is and what penalties it incurs. Because the penalties can be very severe, they also need to understand how very seriously plagiarism is regarded. The point here is that plagiarism does not necessarily equal cheating outside of western European and North American cultures and may not even correlate with cheating in all of those contexts. The relationship is too complex to reduce to black-and-white categories. At the least a grey-scale range must be applied instance by instance.

Tools Tools are particularly important when discussing measurement, because the size and complexity of contemporary texts and contemporary datasets exceeds what people can easily or efficiently handle on their own. No tool is comprehensive enough to address all problems with information integrity. The tools break down into three broad classes: plagiarism, data falsification, and image manipulation. The widely used plagiarism detection tools have already been discussed in the section above. Turnitin and its variant iThenticate focus on text in articles and student papers and originally focused primarily on English language works since the initial market was Anglo-American. Many continental European universities have long had concerns about the legality of submitting student works without explicit permission, and European Data Protection Regulations have increased the concern, which may have limited the size of the non-English language market. The fact is also that the primary language for academic journal publication is English, which also means that a temptation exists for non-native English speakers to search for correct English-language phrases to make their writing sound more native and more professional. Whether that should count as cheating may be questionable, but it is plagiarism and the extent of the problem is also too large to ignore.

20 Introduction and approach

Text is not the only form of content where plagiarism plays a role. Hage et al. (2010) write about source code plagiarism tools: The main goal of this paper is to assist lecturers and teaching assistants with an overview of the current state of the art in plagiarism detection for source code, to highlight the features of each tool and provide measurements of their performance, in order to make an informed decision which tool to use for plagiarism detection. Source code plagiarism is itself a complex problem. The reuse of code is common among professional programmers, and those who learned to write programs 50 years ago (this author included) were encouraged to reuse source code. Source code has no established tradition of academic-style references, and even though it is good coding practice to indicate sources in the comments, all too few professional programmers provide enough comments to make maintenance easy. Changing the borrowing practices of younger programmers is a worthwhile goal, but may be hard in a culture where sharing is common. The need for reliable image manipulation detection tools is a hot topic today, and one of the persons working on developing tools is Daniel Acuna (Acuna et al., 2018), who did a large-scale comparison of images: We analyzed 760,036 articles from the PMOS repository obtained in early 2015. This repository provides the full text of the articles, PDFs, images and datasets associated with each. There are 2,628,959 images contained in these articles. In conclusion Acuna et al. (2018) write: The analysis relies on a copy-move detection algorithm and then a classification of the potential matches into biological or non-biological. After that, a panel of reviewers reviews the context in which those matches occur. Overall, our results suggest that around 0.59% of the articles in PubMed Open Access would be unanimously considered fraudulent by a panel of three scientists. We now discuss some problems with our analysis. The fact that the system requires a set of reviewers to determine whether the data are fraudulent makes the automation less than complete, but it represents a start. The paper is published in a preprint archive, which may indicate the increasing importance of preprint archives for academic publishing. At Harvard Medical School there is also research moving forward to develop tools for image manipulation detection. Lucy Goodchild van Hilten (2018) writes: Dr. Walsh started working with colleagues in other departments to develop a tool that could provide statistical analysis showing the degree of similarity

Introduction and approach 21

between two images. Providing an objective quantitative value can help show whether the degree of similarity is too high to be a random coincidence, supporting people in a position of making formal decisions about research integrity. The emphasis here is clearly on “objective” measurement, which is harder for those engaged in visual identification of similarities. van Hilten (2018) continues: The second-generation IDAC tool uses machine learning – specifically, training on a well-defined set of pairs of duplicate images (positive controls) and pairs of distinct samples (negative controls). This enables the algorithm to generate probability scores indicating whether two test images are too similar to be derived from different samples, without any further input needed from the user. As of May 2020 this Harvard project started collaborating with another Elsevierfunded project at Humboldt-Universität zu Berlin.8 Tools to detect false data in general are harder to design because false data, manipulated data and falsely analysed data are so various that any generic approach to the problem will have limited success. One interesting approach is to look at the language used to describe the data. The idea here is that people who use fraudulent data discuss it differently in their articles. An early example comes from Markowitz and Hancock (2014), who analysed the works of Diederik Stapel: The analysis revealed that Stapel’s fraudulent papers contained linguistic changes in science-related discourse dimensions, including more terms pertaining to methods, investigation, and certainty than his genuine papers. His writing style also matched patterns in other deceptive language, including fewer adjectives in fraudulent publications relative to genuine publications. Using differences in language dimensions we were able to classify Stapel’s publications with above chance accuracy. Beyond these discourse dimensions, Stapel included fewer co-authors when reporting fake data than genuine data, although other evidentiary claims (e.g., number of references and experiments) did not differ across the two article types. This research supports recent findings that language cues vary systematically with deception, and that deception can be revealed in fraudulent scientific discourse. Tompkins (2019) also uses linguistic tools for “Disinformation Detection”: A common characteristic of fake news articles is emotionally-charged, inflammatory language. Positive or negative sentiment, high occurrences of personal pronouns, and sensationalist (click-bait) language are all examples of psycholinguistic features of fake news.

22 Introduction and approach

Linguistic analysis offers only warnings that the data may be false, and does not – perhaps cannot – measure the degree of truthfulness.

Guidelines Formal guidelines for research integrity abound, and there is a wealth of recommendations for the broad category of information integrity, which includes everyday information sources such as newspapers and broadcasters. These guidelines tend to enunciate broad principles and to avoid any discussion of how to measure violations, perhaps because measurement is too complex, too discipline-dependent, and varies over time because the tools and the topics vary. Most countries have their own guideline statements for government-funded research projects, as does the European Union. The European Federation of Academies of Sciences and Humanities (ALLEA, 2020a) offers a good example of the breadth of the missions: ALLEA’s mission revolves around three focal areas:   

Serving European academies and strengthening the role of science in society Providing a trusted source of knowledge for policymakers and the general public Shaping the framework conditions for science and research in Europe.

Country based integrity guidelines include the German Research Foundation (Deutsche Forschungsgemeinschaft, 2019b): Scientific integrity forms the basis for trustworthy research. It is an example of academic voluntary commitment that encompasses a respectful attitude towards peers, research participants, animals, cultural assets, and the environment, and strengthens and promotes vital public trust in research. The constitutionally guaranteed freedom of research is inseparably linked to a corresponding responsibility. The US National Science Foundation (2020a) offers a similar example: The responsible and ethical conduct of research involves not only a responsibility to generate and disseminate knowledge with rigor and integrity, but also a responsibility to: a b c

conduct peer review with the highest ethical standards; diligently protect proprietary information and intellectual property from inappropriate disclosure; and treat students and colleagues fairly and with respect.

Introduction and approach 23

Universities in most countries have their own guidelines, often based loosely on those of national agencies. It is typical of the guidelines that they are at a very high level of abstraction. This author was one of the contributors to guidelines for his own university, and is aware of how difficult it is to write more detailed guidelines that apply to all university disciplines and all possible forms of malpractice. The real problem comes when malpractice commissions must judge specific violations and have only vague guidelines as a base. Training programmes for avoiding information integrity problems often do better than the guidelines in being clear, in part because they address specific situations or types of integrity violations. Training programmes to prevent plagiarism are the most common type. Oxford University (2020) offers both an online tutorial and clear explanations of various types of plagiarism, and makes a distinction between unintentional, intentional and reckless (grossly negligent) plagiarism: Plagiarism may be intentional or reckless, or unintentional. Under the regulations for examinations, intentional or reckless plagiarism is a disciplinary offence. This distinction is important because it does not list “unintentional” plagiarism as a disciplinary offence. Some years ago the Academic Senate of the author’s own university came to the same conclusion when discussing the difference between cases of negligence and those of gross negligence (Fahrlässigkeit and grobe Fahrlässigkeit), which is an important distinction in most legal systems. The problem with the distinction is how to measure an author’s intention. A potential crude measurement is quantity, on the principle that a large amount of plagiarism is probably not accidental. While journal editors typically set their own trigger limits, details matter, such as whether the plagiarism was a paraphrasing problem in the literature review. Oxford University (2020) offers definitions as part of its training programme that cover some, but not all, of the elements in the Vroniplag definition. It has, for example, a clear and detailed explanation about paraphrasing, but does not include “pawn sacrifice” or “shake-and-paste”: Paraphrasing the work of others by altering a few words and changing their order, or by closely following the structure of their argument, is plagiarism if you do not give due acknowledgement to the author whose work you are using. A passing reference to the original author in your own text may not be enough; you must ensure that you do not create the misleading impression that the paraphrased wording or the sequence of ideas are entirely your own. Even with this clear definition, precise measurement is difficult.

Data and images The literature on data fabrication and image manipulation is substantial regarding specific cases, and one of the best sources for finding case-by-case literature is

24 Introduction and approach

Retraction Watch, whose database lists the kinds of problems, the individual cases, the commentary, and the consequences. In general images for research purposes are just a specialised subset of the broader category of research data that are often machine-generated during lab tests and which depend on the reliability of the machine settings as well as on how they are presented. The natural sciences make heavy use of images of this sort, especially biology and medicine. Shen (2020) writes about the system-wide problem: Many – including Bik – argue that combating image manipulation and duplication requires system-wide changes in science publishing, such as greater pre-screening of accepted manuscripts. “My preference is not to have to clean up the published literature, but to do it beforehand,” says Rossner. He helped to introduce universal image pre-screening of accepted manuscripts at the Journal of Cell Biology nearly 20 years ago. At the EMBO Press, says Pulverer, journals have pre-screened accepted papers for faulty images since 2013. But most journals still do not pre-screen or (as with Nature) spot-check only a subset of papers before publication. “Image screening is not common right now,” says Chris Graf, director of research integrity at the publisher Wiley. The academic literature talks less about the falsification of other forms of images. Authors can manipulate graphical representations of large-scale data more easily and more accidentally than machine-generated images by selecting subsets that demonstrate their arguments. Manipulation takes place also in ordinary photographic images, such as those used in contests. Zhang (2018) writes, for example, about the controversy about whether the photographer Marcio Cabral faked a prize-winning photograph by using a prop. The problem of data in the broader sense has less written about it, because the instances of data falsification tend strongly to be article- and discipline-specific. Nurunnabi and Alam Hossain (2019) confirm this lack of general literature on data falsification: Due to lack of literature, the objective of this study is to evaluate data falsification and academic integrity. Accordingly, the study presents the academic misconduct (Falsification/Fabrication of data and Concerns/Issues About Data) case of Professor James E. Hunton, a former top ranked accounting professor from Bentley University. It is typical that the authors shift immediately to the example of a specific wellknown case (the Hunton case will be discussed in Chapter 4), which virtually confirms the observation that they find data falsification hard to discuss in the abstract. An example of how one discipline copes with data falsification can be seen in Tugwell and Knottnerus (2017), who describe a statistical approach in the Journal of Clinical Epidemiology:

Introduction and approach 25

These analyses look at flags/indicators suggesting that data may be fabricated, such as (a) distributions of fabricated observations of continuous measures being different from the observations of other centres, (b) data values being on average too low or too high, (c) variability of readings being too low in fabricated data because fraudulent investigators either choose to refrain from fabricating extreme values to avoid triggering attention or simply underestimate the true variability, (d) fabricated data being “too perfect” with relatively few missing values, (e) too constant a rate of recruitment as a result of inclusion of either ineligible patients and/or phantom subjects, (f) too many patient visits taking place during weekends, (g) too many similar first, second, or last digits in lab results or vital recordings of pulse and blood pressure. The statistical approach Tugwell describes here has been applied elsewhere, notably to the case of Diederik Stapel (which will be discussed in Chapter 4) and to other cases in social psychology. Statistical analysis is a measurement tool and is arguably the most effective indicator of possible falsification. It does not, however, prove falsification, but only indicates a need for further careful examination.

Summary The scholarly literature on information integrity is far from exhaustive. Even though a substantial amount has been written about plagiarism, there is no clear scholarly consensus about a single comprehensive definition. One reason is the variety of potential categories, with direct copying of pages or paragraphs as the minimal definition, and various aspects of paraphrasing which may involve relatively standard phrases at the extreme. Good measurement tools exist in the form of software like Turnitin/iThenticate, which itself offers controls that let the user adjust rules for matching by selecting the number of matches necessary in a particular segment. Clear metrics for how much copying indicates what level of integrity violation on a grey-scale do not exist. Beyond plagiarism the literature on information integrity splits into disciplinebased discussions that offer no general measurement guidelines, even in the guidelines from government agencies or universities. That is not all bad, since coming up with overly specific measurements that establish black-and-white rules for violations would strip away much of the flexibility of those organisational units and judicial bodies that must make decisions which, at least for academics, could destroy careers. In practice, though, a large number of inconsistent decisions are made by people forced into a simple black-or-white (guilty or not guilty) decision because no intermediate options exist. Outside the academic world there is practically no discussion of how to measure false information. Various crude counting mechanisms are of course possible. The number of verifiable untruths that a prime minister or a president has uttered in (for example) a six-month period could serve as a measure of falsity in social discourse, but measuring the mix of fact and falsity in any specific statement is often

26 Introduction and approach

more complex. Politics generally does not rely on facts, and voters may prefer to trust specific individuals rather than scientific evidence in the broadest sense. An interesting historical question is when the number of untruths that a society accepts begins to make a difference in affecting social outcomes. That is one of the topics for the next chapter.

Notes 1 German original: “einer der am stärksten von COVID19 betroffenen Orte Deutschlands.” 2 German Original: “Das muss man natürlich immer ein bisschen mit Vorsicht genießen, es ist eine Schätzung#, räumte Streeck ein.” 3 https://wellcomeopenresearch.org/about 4 http://retractiondatabase.org/ 5 http://retractiondatabase.org/RetractionSearch.aspx 6 Bauernopfer 7 Verschleierung in the original German 8 https://headt.eu/

2 CONTEXT AND SOCIETY

Introduction Context is a common term in both social science and historical scholarship. “Context” means the whole cultural, linguistic, and physical world that surrounds human actors at any one time, and it therefore shapes what people accept as true or untrue, plausible or implausible. Data reliability is an element of social context. When citizens and scholars have limited access to data, verification becomes hard. An important part of the history of science in the twenty-first century has been making more data more available for broader testing. The idea is that greater transparency represents progress toward building a stable base on which others can reliably build. Having data with testable and measurable integrity is core to all forms of science. This book mostly discusses the measurement of information integrity in the context of democratic societies, but even the definition of a democratic society is very broad. A common definition of a democratic society is one where citizens may vote for their government, but who counts as a voter varies, and in the 2020 US election the question of what kinds of votes should be counted also arose. Poorer citizens living in disadvantaged neighbourhoods with jobs that rigidly control their working hours may not be able to vote in states or countries where voting is only possible on a workday and the nearest voting place is distant, as is sometimes the case in southern US states. Education is another important context for determining how people make decisions. According to the World Population Review (2020), 44.29% completed their upper secondary education in the US, 45.74% in the UK, and 57.94% in Germany. This suggests that a substantial portion of the people in these relatively rich western countries may lack key academic tools for measuring information integrity and for understanding what measurement means. For example, training in DOI: 10.4324/9781003098942-2

28 Context and society

statistics is minimal in the average high school, and understanding statistics has become key to understanding many forms of measurement. Trust in institutions is another context element that affects measurement and how people react to it. During the COVID-19 epidemic the published death rates have varied substantially, and early rumours suggested that some sources were reporting an overcount for political purposes. The fact that multiple sites present different numbers fed the conspiracy theories. In fact most scholars believe that the official numbers represent an undercount. One form of evidence for this is the difference between the normal or average death rate for a region and the number of deaths during the COVID-19 period. Pappas (2020) writes in Scientific American: Perhaps, the best clue as to whether COVID-19 deaths have been undercounted or overcounted is excess mortality data. Excess mortality is deaths above and beyond what would normally be expected in a given population in a given year. CDC data shows a spike of excess mortality in early 2020, adding up to tens of thousands of deaths. She goes on to explain why the undercount is likely, and how the counting process varies because it depends on inconsistent cause-of-death decisions. She even explains circumstances in which an overcount could theoretically be possible, depending on the definitions of what should be counted. Scientific American is itself almost an institution for science news reporting, and counts for many as a reliable source. Nonetheless rumours about overcounts persisted on some talk shows and in some social media posts. For example, Steven Crowder claimed: “Do you not think for a second that some leftist activist who also happens to work in a hospital in New York is not going to take every single opportunity possible to try and label this a COVID-19 death, if they are not required to actually show any proof of a positive test?” he said. (Richardson, 2020) A willingness to believe that people will conspire to make the statistics worse for political purposes is a disruptive factor in the information context that cannot be ignored. Measurement ceases to be an effective tool when trust in the data is lost.

Historical context The history of information is a popular subject today, though it often focuses primarily on information technology. Gleick (2011) for example begins with the 1948 Bell Labs announcement of an electronic semiconductor, and he is certainly correct that this changed how information is processed today. The history of information begins much earlier, however, probably in some sense with the first communicative grunts of humanoid ancestors, or arguably even with the pre-human

Context and society 29

signals that animals use to cooperate in herds or insects in colonies. The history of flawed information must be equally old, given the capacity of all creatures to misunderstand and misinterpret. For the purposes of this book, the discussion will focus on human information sources and largely on the last several centuries because they are most immediately relevant to the contemporary world. The history of information has three important aspects. One is the data behind the information, whose quality determines reliability. A second is the set of concepts used to understand the information, which includes language and mental constructs as well as physical tools. The third is the transmission medium, which may be highly transient such as speech or relatively fixed over longer periods such as the written word. Each of these aspects has changed substantially over time and each exposes the integrity of information to different forms of risks. They will be considered separately below.

Data Some of the earliest surviving written works are accounts on clay tablets, which are presumed today to represent taxable assets. A count of the number of sheep that a family owned depended not only on the honesty of the person counting, but on when the count took place, for example before new lambs were born or after some sheep were slaughtered or lost to predators. The data were certainly not completely reliable but were better than anything else available and there is a reasonable expectation that the rulers of the time relied on them for taxation. Data quality was less of an issue in pre-modern times than getting any data at all. Detailed personal observation was (and still is) ranked highly as a reliable source. A good example is Thucydides’s history of the Peloponnesian war, which was largely a first-hand account that explained the events of the war from 431 to 404 BC. Notable historians like J.B. Bury (Bury and Meiggs, 1975) have accepted Thucydides’s account as largely true, partly because there is no better source, even if the author may have selected his facts to tell a particular story – a criticism that can apply to more modern works as well. The boundary between imagination and fact was often porous in earlier eras. Jean-Jacques Rousseau’s Social Contract, originally published in 1762, imagines a state of nature that pre-existed human society. This state of nature likely never existed. Nonetheless Rousseau bases much of his argument on his utopian fiction, which had great influence on the social thought of the time. The nineteenth century German historian Leopold von Ranke argued that history should be written “wie es eigentlich gewesen [ist],” which is variously translated as “as it really was.” The precise meaning of this phrase has become the subject of debate, but there is little doubt that it was meant to distinguish his own rigorous documentation from prior works that approached historical writing more imaginatively. This emphasis on data and documentation was possible partly because more and more written and factual sources became available when he was writing, and libraries were growing.

30 Context and society

Data required reliable sources. The growth of universities in the nineteenth century, combined with the establishment of more rigorous scientific methods in all branches of scholarship, meant that more reliable data of all sorts was becoming available. The US census of 1790 was by no means the first census in the world, but it demonstrated the political need for factual information about inhabitants, as well as the need to gather those data at regular intervals. The UK followed with a census bill in 1800, as did other countries. Like earlier census-taking, these early modern censuses focused initially on property questions and the number of people in a household, but the range of questions about education and social circumstances grew steadily and became an important source for the social sciences that had not existed previously. Sharing complex large-scale data, especially raw data, from the natural sciences was difficult before computers because the data were trapped in paper formats that could not be processed directly. Lab books for recording the details of experiments and their results have long existed, and played an especially important role in patent law under the older first-to-invent rules, but a lab book represents only a process record, such as might be published in an academic paper. Although key data may be recorded there, the traditional paper format limited how much could be transcribed and even modern electronic lab books are not generally good vehicles for recording actual data. Researchers have long tended to regard data as their own personal property lest someone steal their work, and even today the push for open data meets resistance.

Concepts The concept of counting is presumably as old as the ability of humans to communicate, and variations on the concept are complex. This section discusses two aspects of counting. One is classification, so that only things which are reasonably and appropriately similar get listed in the same category. The other is the quantification of data elements in ways that allow systematic measurement, such as with statistics in contemporary society. In some ways the two aspects are so intertwined that they are hard to separate. In a world where a herd of sheep formed an important part of the economic base, the distinctions between lambs and grown sheep and rams and ewes mattered, as well as age and breeds. Counting them involved definitions and categories whose importance likely changed over time. The count of military-age males mattered for those building armies. The US Constitution originally made a distinction in the census between free persons and slaves, who counted as three fifths of a human for electoral purposes, but did not count at all for military recruitment. The distinction between citizens and temporary residents is another counting concept that made a difference in 2020 in the COVID-19 statistics. The earlier example of Florida during the pandemic illustrates how the exclusion of temporary residents can change the meaning of the death rate. A broader debate involves the question of which kinds of deaths appropriately belong to the COVID-19 death

Context and society 31

rate count. Everyone acknowledges that assigning a single cause of death can be difficult, since many short- and long-term factors may contribute to a death. Because of this, the concept “excess deaths” has increasingly become an important measure. The US Centers for Disease Control and Prevention define “excess deaths” as: Excess deaths are typically defined as the difference between the observed numbers of deaths in specific time periods and expected numbers of deaths in the same time periods. (CDC, 2020a) Voce et al. (2020) explain why this concept matters: During the early phase of reporting on Covid-19 a lot of focus was placed on the deaths attributed to coronavirus, as represented by the red bars in the charts. However, the figures capture different things – some count deaths in care homes and the community, some only include deaths verified with a test and others have long time lags, making comparisons difficult. Due to the wide variation in how Covid-19 deaths are counted, excess deaths are the most accurate way of quantifying the impact of the crisis in different countries. Without a clear concept of what should be counted, even a simple counting can be misleading to the point of being potentially false. One of the standard conceptual tools for measurement is statistics and the history of statistics is the subject of many books. Those interested in a good overview should consult Ian Hacking’s The Taming of Chance (1990). He writes: Society became statistical. A new type of law came into being, analogous to the laws of nature but pertaining to people. These new laws were expressed in terms of probability. They carried with them the connotations of normalcy and of deviations from the norm. (Hacking, 1990, p. 1) The term statistics has multiple meanings, depending on the era and the context. The oldest and broadest meaning refers to the data themselves. For example, the government “blue books” of nineteenth century Britain were called statistics by those who used them, including Karl Marx and Sidney and Beatrice Webb. Another more precise meaning is “descriptive statistics,” which includes counts, percentages, and graphs. Elementary or high school students typically learn this form of statistics as part of their basic training in mathematics. The goal is, as the name implies, to describe the data in aggregate form. Descriptive statistics are typically the form of statistics found in newspapers and non-academic journals and they are easily the most widely used form of mathematical measurement. “Inferential statistics” represent a more sophisticated mathematical approach to handling data. The goal of inferential statistics is not to describe the data, but to

32 Context and society

enable researchers to draw conclusions. The development of this form of statistics builds on the laws of probability and chance. Ideas about probability are as old as games of dice and cards, but systematic thought about probability arguably began with Blaise Pascal and Pierre de Fermat in the seventeenth century. In other words, statistics became a science that not only described data, but enabled people to draw plausible conclusions from them – under certain conditions, if the samples were large and representative enough, and if the data were reliable. Another different and somewhat controversial way of measuring the impact of data is with a “generative model” as Spinney (2020) explains: Our approach, which borrows from physics and in particular the work of Richard Feynman, goes under the bonnet. It attempts to capture the mathematical structure of the phenomenon – in this case, the pandemic – and to understand the causes of what is observed. Since we don’t know all the causes, we have to infer them. But that inference, and implicit uncertainty, is built into the models. That’s why we call them generative models, because they contain everything you need to know to generate the data. As more data comes in, you adjust your beliefs about the causes, until your model simulates the data as accurately and as simply as possible. The important fact here is that the concepts about how to use data continue to grow and evolve. Generative models are a long way from the simple counting of sheep or people, but the degree of abstraction does not reduce the need for information integrity – quite the opposite. No model is really robust against unreliable input.

Transmission media The focus in this section is on changes in the ways of transmitting information from one place to another, because transmission affects the quality (if it causes distortions), the quantity (if it limits the amount), and the availability (if it limits access). Transmission is a more limited concept than communication, which implies some degree of mutual understanding. Cuneiform marks on a clay tablet transmit data, but they enable communication only when both parties can read them. Transmission is not necessarily physical. The internet protocol layer diagrams take various forms with various names where the base is always the most basic “link” or “physical” “transportation” layer. The diagram then describes the other levels that communicate destinations within the network (the “internet” layer and “host-to-host” layer) and within the host machine (the “application” layer). Some diagrams break the description down into more layers. Transmitting information from one place to another is not only complex, but potentially error-prone. This is a reason why internet packets contain a check-sum so that the integrity of the

Context and society 33

packets can be checked at each stage of the journey. The problem is not hacking, but ordinary bit-deterioration (bit-rot) over time and across media. Transmission problems are often a forgotten part of the information context. They are not unique to modern digital technologies. Written works can, for example, be equally problematic. Pre-modern hand-written texts typically contained errors, especially when scribes had to copy text after text; and if multiple people were copying texts, the number of errors or even intentional changes (intended as improvements) could grow substantially. Verbal transmission is so notoriously change-prone that there is a children’s game called Telephone in US English, Chinese Whispers in British English and Stille Post in German where the goal is to see how recognisable the original message is when it has been around a full circle of verbal transmission. The important fact here is the fallibility of transmission. The problem that the game illustrates is that human transmission of messages from one person to another has a substantial error rate. The problem is not merely verbal, but one where transmitting (explaining) the cultural context matters. In the Australian film Ali’s Wedding (Walker, 2016), the eponymous hero has to explain why threats the US Authorities found on his smartphone were only part of a football rivalry and not meant as actual threats against people. Social media tends to magnify messages, especially ones that seem entertaining or fit personal preferences. One of the persistent sources of fake news are reports that grow out of proportion when circulated among an ever-widening crowd. The process is nothing new. When newspapers were the chief source of outside information, the tabloid newspapers and so-called yellow press were notorious for sensational headlines and inflammatory stories whose economic purpose was to sell more copies of the newspaper. Sometimes the publishers wanted to inflame a situation such as during the weeks before the Spanish-American war of 1898. The famous Hearst newspaper drawing of male Spanish authorities strip-searching a woman, implicitly a US citizen, was successfully inflammatory but was wrong in several respects: the woman was Cuban, not American, and the searchers were women, not men (Campbell, 2016). Accuracy was not the newspaper’s goal any more than it is in some media posts today.

Evolution of the concepts Ideas about what is true, what is false and how to recognise the difference have evolved over time, much as ideas about sources, causation, and historical determinism have adapted to new discoveries and new circumstances. It is wrong to imagine, for example, that plagiarism has always been seen as an integrity problem since the very idea of plagiarism depends on concepts of intellectual property that did not exist before modern times. This section will look at how four key concepts evolved to have their current meanings: plagiarism, data falsification, image manipulation and discovery mechanisms, which include processes like peer review and mechanisms like measurement.

34 Context and society

The starting point could be 1900, when the basic principles of modern natural science and of many of today’s social sciences were taking shape, but looking back only 120 years gives a false sense of permanence, and with it perhaps the illusion that those concepts which science in the broadest sense currently accepts have long been and are now fully established for the long run. In fact scientific concepts are constantly evolving and a goal of this section is to make readers understand that what people measure today, and how they measure, will likely change in meaningful ways in the next 120 years. The temptation to predict or at least foreshadow change is ever-present and usually turns out badly. Nonetheless it makes sense to indicate some of the ways in which these concepts could evolve given current trends and certain sets of conditions.

Plagiarism and authorship If plagiarism is fundamentally just copying text from another author without fully acknowledging the source, then it is as old as the written word and was born free of the negative connotations it has today. The very concept of an “author” is something that had to develop over time. Early legends and creation myths generally had no named author. Homer was the name given to the author of the Iliad and the Odyssey, and there is debate today whether Homer was an actual person or simply a name given to a set of authors, though the majority opinion favours the collective (Graziosi, 2002, p. 15). Some hundreds of years later, Herodotus and soon after him Thucydides wrote histories to which their names as authors were clearly attached. The Mediterranean world of western classical antiquity developed a well-established tradition of authorship, and men as philosophically different as Cicero and later Augustine of Hippo became well-known for their works. The idea of authorship was not attached exclusively to the written word, but to the words and ideas of particular named persons. Cicero was primarily an orator, and his writings are often copies of what he said. Socrates is not known to have written anything on his own and became known through the writings of others, primarily Plato, who gave full credit to his intellectual contribution. The exact words seem to have mattered less than the ideas that the person expressed, which is perhaps unsurprising in a world where few could read and even fewer could write, but everyone could listen. Oral tradition mattered. When the western half of the Roman Empire ceased to be a cohesive entity, the classical world’s concept of authorship largely continued in the eastern or Byzantine Empire, but faded in the west. Learning and the recognition of authorship by no means ceased in the west. Names like Venantius Fortunatus, the sixth-century poet and hymn writer, and Alcuin of York, an eighth-century scholar and poet, were recognised as respected authors, but the social context in which authorship and the attribution of authorship mattered shrank. Authors did not play a major role in politics, as Cicero had. The “Church” was the domain of written works and church life, especially in monasteries, was strongly collective. Monks copied texts as part of their daily work, often Biblical texts, but somewhat at random other works

Context and society 35

as well, including occasional ancient authors and newer legends. The copying was often not reliably accurate, and the reading public was vanishingly small outside the church. The idea of authorship was by no means lost, but its societal importance had diminished. Gutenberg’s press and his printing of the Vulgate (Latin) Bible in the 1450s contributed to increasing literacy in Europe by making reading materials more abundantly and more cheaply available than was possible before. Luther’s 95 theses, the Reformation, and the various translations of the Bible into vernacular languages accelerated the process and changed the nature of authorship by increasing the importance of the written word, which could reach a far wider public than speech. The author’s name was generally – though not inevitably – clearly visible somewhere in the text. Copying part or all of another’s works and publishing successful pamphlets or books without consent or attribution was a common practice. Authors had no special rights to their own works. Copying became an economic issue for printers, who wanted to protect their investment, and agitation for protection led to the first copyright law in England in 1710, known as the Statute of Anne. There were various other partial protections earlier, and in other countries, but this statute is generally regarded as the formal beginning of copyright protection. It is important to remember that the Statute of Anne only controlled the right to copy, and did little for author’s rights. The primary goal was to protect printers and publishers. Authors themselves had no special protection. There was also no protection across political boundaries. The US was notorious in the nineteenth century for publishing popular British works without compensation to either the publishers or the authors. Charles Dickens complained frequently about being deprived of income because of the lack of protection under US law and called for international copyright agreements (Hudon, 1964). The US did not join the international copyright treaty, called the Berne Convention, until 1988. The US Copyright law did not recognise any “moral rights” until the Visual Artists Rights Act of 1989, and then only for visual art. These moral rights included the author’s right to have his or her name on the work, a right that had long been part of European laws. US law had a work-forhire provision that undermined some rights: If a work is made for hire, an employer is considered the author even if an employee actually created the work. The employer can be a firm, an organization, or an individual. (US Copyright Office, 2020) Under this provision, human authors have no legal rights to the work they created as an employee. Authorship evolved from being purely personal to recognising corporate identities as creators. European law generally vests authorship rights in the employee who wrote the work for an organisation, but the situation is complex: France, Germany, and China, all civil law countries, vest initial ownership of such a work in the employee, but Chinese law provides for a mandatory two-

36 Context and society

year license to the employer by the employee. German courts also often will imply a contractual transfer of ownership to the employer based upon an employment agreement. In most cases, moral rights are retained by the employee author as they are inalienable once originally vested in the employee. All three nations have specific exceptions for software, ownership of which vests in the employer. France and Germany have similar exceptions for audio-visual works, and France and China have exceptions for collective works. (Sutherland and Brennan, 2004) There are many sorts of collective works, including articles written by lab groups with more than half a dozen authors, where the head of the lab is sometimes listed as an honorific author. Today the rules at many universities discourage honorific authorship, though not always with complete success. The authorship of a deliberately anonymous work, such as Wikipedia, is also in some sense corporate, since no one has special rights to any article. The point here is that the concepts of both authorship and plagiarism have evolved over time in complex ways involving both economic issues and individual rights. Copying a block of someone else’s text verbatim without attribution clearly breaks the rules of contemporary academic scholarship. In the context of collectivist authorship, however, it may not be wrong, for example, for a person writing a Wikipedia article on one subject to copy passages from another equally anonymous Wikipedia entry. In the world of professional computing, copying source code within a company is standard practice and may even be encouraged. From a highly individualistic viewpoint, such copying may seem like plagiarism, but not all societies are equally individualistic, and anyone judging a particular situation should take the trouble to understand the context.

Information reliability This section will discuss three issues. One is that the distinction between data and information is a modern artefact of the computing era. Another is that measurement of the integrity or falsity of information was harder in pre-modern times for lack of sources and tools. And the third is that images were never completely accurate and are vulnerable today to many forms of manipulation. Data fabrication and data manipulation are as ancient as plagiarism-related integrity issues, and are in fact closely related, because data in the ancient world were typically expressed in verbal form, as is still true in the much of the humanities. The definition of data has evolved significantly over time. Today many people distinguish between data, which are by nature raw and unprocessed, and information, which has context and structure. For example, Young (2020) writes: The difference between data and information is subtle but important and sets open data apart from prior transparency regimes. Information consists of data that have been given structure and meaning.

Context and society 37

This distinction is relatively modern. Harper (2020) in his website “etymonline. com” gives the following etymology for the word “data”: 1640s, “a fact given or granted,” classical plural of datum, from Latin datum “(thing) given,” neuter past participle of dare “to give” (from PIE root *do“to give”). In classical use originally “a fact given as the basis for calculation in mathematical problems.” From 1897 as “numerical facts collected for future reference.” Meaning “transmittable and storable information by which computer operations are performed” is first recorded 1946. In other words historically data are fundamentally facts, and it was not until the twentieth century that the meaning of data became associated with computing operations. Since early computing systems were not good with words, early programmers had to find other solutions. When insurance companies converted their contracts from paper to machine-based records, they encoded the information in ways that had more precise meanings than ordinary words in order to make actuarial analysis more efficient.1 Computers did not become good at handling words until the later 1980s and even now they struggle to get meaning from natural language. The split between human language as a representation for information and numbers or symbols as a representation of data enabled new kinds of processing, but of course did not change the fundamental need for both information and data, whether verbal or numerical, to be measurably true. Measuring the reliability of information was especially a problem in pre-modern times because no clear standards existed and there was too little reliable data (facts) to make comparisons. Personal observation and traveller’s tales were long the best and most reliable source of information about the outside world. Herodotus, the fifth century historian, was known for his cautious comparison of sources, but nonetheless reported dubious stories, such as, for example, that cats in Egypt had a “supernatural impulse” to throw themselves into a conflagration. (Herodotus, 1824, p. 149) The truth of this observation is questionable based on modern experience with cats, and Herodotus might have wondered why cats in his hometown of Halicarnassus did not throw themselves into fires, but Egypt at the time was sufficiently exotic and distant that anything might have been possible. Despite this and other possibly questionable reports, such as that “lower class Lydian women often worked as prostitutes to gather a dowry before marriage” (Middleton, 2015, p. 557), Herodotus remains the best source of facts about the Greco-Persian wars of the fifth century BC, partly because other sources corroborate many of his claims. For the military the ability to measure the degree to which the available data represent facts has never ceased to be important. An early use of airplanes in the First World War was to see what troops on the other side of the trenches were doing, but of course the accuracy of this information depended heavily on how carefully the pilots interpreted and reported what they saw – an example of the

38 Context and society

distinction between the raw data of observation and the interpreted information transmitted to superiors. Spreading false information also played an important role. Operation “Fortitude” represented an attempt to spread false rumours about the D-Day landing in Normandy to make it less likely that the allied troops would meet the full force of the German army. Deliberate false information is nothing new. Images are a form of information that many people treat as factual. The earliest images date back to Cro-Magnon cave paintings from 30,000 BC or earlier, and essentially all ancient societies had image representations of some sort. It is hard to say how much the earliest images were meant to be accurate representations of particular people, places, or animals, and how much they were stylised and generalised representations. The life-like statues of ancient Greece and the portrait busts of the Roman Empire were arguably meant to be accurate representations. The portrait tradition revived in the Renaissance, but accuracy depended on the artist’s skill, and forgery and imitation depended on market conditions. Many Flemish paintings showed mountains, for example, that were not part of the local landscape even if the themes were local. Mountain scenes had a popular market. Pieter Bruegel the Elder had enough success that his sons and grandsons copied his style and de facto some of his works. The workshops of Rubens and Rembrandt also became major producers of similar works, and it is only recently that museums have applied new standards and have re-labelled works as coming from their workshop and not the masters themselves. None of this was plagiarism in any legal sense, but primarily an authorship question, if only because there was no systematic law regulating copying and no clear sense that it was wrong. The advent of photographs reinforced the idea of images as facts. Yet even an unmanipulated photograph can de facto alter the look of a face or of a landscape depending on the light and the angle. The development process enabled a broad range of other choices, and darkroom techniques enabled Soviet photographers to add or remove faces from formal photos, depending on the political whims of the time. Digital image manipulation tools made the transformations easier, and at the same time far more scientific machines began to produce digital images as part of their output. These images inherited the social belief that automatically ascribed accuracy to images, despite their actual malleability. The context has changed in recent years and the more respectable scholarly journals increasingly treat images with the same healthy scepticism as other forms of data.

Incentives and disincentives The incentives and disincentives for breaking the social rules regarding copying text, falsifying data, and manipulating images have evolved over the centuries as the rules, the incentives, and the disincentives themselves have changed. Sometimes these changes have taken place within the lifetime of living people and sometimes they are different from country to country. Applying contemporary rules to people from the past is helpful only to understand how much the world has changed. The

Context and society 39

same might be true in applying rules form one culture to another, except for those spheres where cultures interact, such as the sciences. Then incentives that enable falsification matter. This book will concentrate primarily on western culture, mainly continental Europe and the Anglo-American world, but changes in the rest of the world cannot be ignored. China especially has seen radical changes from extreme collectivism during the “Cultural Revolution” (1966–76) to individual wealth accumulation and relatively open discourse on non-political subjects. This change matters because Chinese authors increasingly play a significant role in scholarly publication. The US National Science Foundation (2018) lists China as number two worldwide with 426,165 scholarly publications, after the European Union with 613,774 publications and ahead of the US with 408,985 publications. India also plays a significant role with 110,320 scholarly publications and Japan with 96,536. Overall the world of scholarly publications today is very international, and these statistics do not even include South America, the Middle East or Africa. Measuring incentives and disincentives for non-scholarly publications is much harder, partly because there are so many more non-scholarly publications and because the criteria for veracity are much vaguer. Twitter only just started factchecking in 2020 by issuing warnings about falsehoods. Facebook followed suit. The incentive structure for social media seems often to resemble an elementary school playground where the goal is to primarily get attention within a particular peer-group, and it is not clear that a warning message about false statements is so much a disincentive to more outrageous writers, as a signal that encourages their disaffected peer-groups, especially when the original message is allowed to remain.

Before citation indexes External incentives for writing in the pre-modern world mostly had to do with social factors such as fame rather than any source of monetary income. In the Ancient world writing was the only way to communicate with a larger number of people than would fit in a space where a voice could be heard. Letters were important because they could transport thoughts across empires and over time. The poet Virgil was popular in the time of Emperor Augustus, but it is far from clear whether his popularity provided any income or even a place at court. Augustine of Hippo taught grammar and won a post as professor of rhetoric, but rhetoric was primarily an oral skill. Augustine’s major written works appeared only after he became Bishop of Hippo in northern Africa, well away from the centre of Roman culture, and wrote to spread his ideas, not to win a new job or to make money. Economic incentives for writing grew in early modern times once printing became established, but for a long time popular works benefited the printers financially more than the authors, who might get a nominal sum or, more importantly, might get a salaried position as a result. The poet John Milton, for example, received a government appointment in 1649 as a kind of propagandist for the Commonwealth. A century later, after the Statute of Anne established copyright

40 Context and society

protection, Edward Gibbon did sell the copyright for his very popular Rise and Fall of the Roman Empire, but while writing he depended on his father’s largess and eventually on his estate as well as on a job at “the Board of Trade and Plantations at a salary of £750,” mainly for his silent loyalty as a Member of Parliament (Womersley, 2006). In the early nineteenth century authors finally began to make enough money from writing that the financial reward became an incentive beyond mere fame to keep writing. Charles Dickens is a well-known example of someone who managed to live off of his income as a writer in the UK. Heinrich Heine in Germany came from a wealthy family, but needed an income since he had given up both business and law, and in the end shifted to a pragmatic form of remunerative writing and became a newspaper correspondent as well as a poet. Another contemporary, Victor Hugo in France, did live mainly on his novels. Living from writing was not easy, but it had become possible. Fiction writing has never pretended to be factual in any strict sense, but a common theme for these early novelists was the social condition of their society. Dickens in Oliver Twist and Hugo in Les Miserables remain classic examples of the novelist as social critic, and they are sometimes treated today as de facto ethnographers of the slums of their times. The market for factual scholarly writing grew as well, and with it a new emphasis on the importance of accuracy and sources. Leopold von Ranke, as noted earlier, was a leader in this in historical writing, but the natural sciences also began to emphasise the importance of laboratory books to document processes. Precise observation was a hallmark of the journals of Charles Darwin, and the ability to reproduce results became an important factor in success in the natural sciences. This is not to suggest that fake information was not propagated. Belief in the Biblical creation story remained strong, especially in the US, fuelled in part by the Piltdown man hoax, and Lamarckism persisted well into the twentieth century. Trofim Lysenko argued for a vaguely Lamarckian form of genetic theory that was based more on Soviet politics than on scientific evidence. False information grew especially in political climates that favoured particular theories. Depending on the institution, professors in the twentieth century had an increasing amount of social pressure to write books and articles. US colleges that focused on undergraduate education generally had the lowest expectations, and in the 1950s even colleges with strong regional reputations did not require that professors publish, though increasingly they did require new hires to have a doctorate. The better universities in the later twentieth century did develop explicit publication expectations to move from assistant to associate to full professor, and with that social pressure came financial reward in terms of higher salary and tenure for life. The pressure was only modestly effective, though. A person could reach associate level at a research university and stop bothering to publish as late as the 1980s and early 1990s. The situation in the UK was equally various. The famous German philosopher Ludwig Witgenstein became a professor at Cambridge University and published

Context and society 41

nothing during his lifetime except the Tractatus Logico-Philosophicus in 1921. His intellectual work continued, and he wrote substantial amounts, but did not suffer from any publication pressure. British universities did not until recently actually require a professor to have a doctorate as long as the person had enough publications and someone who would attest that they qualified without the degree.2 In Germany a doctorate was and remains absolutely required for a position as professor at a university, and until 2002 the equivalent of a second doctorate, the Habilitation, was required as well. The payrates under the pre-2002 “C” scale were largely set for life, giving no economic incentive to do more research. Nonetheless habit and social expectation played an important role. Before 2002 most people did not become a professor until they were in their mid-40s and were already so accustomed to publishing that they continued as a matter of course (SteghausKovac, 2002). As in the UK before the Polytechnics were renamed as universities, Germany still has two levels of higher education institutions, the Fachhochschule (called in English “Universities of Applied Sciences”) and the universities themselves. The Fachhochschule may not grant doctoral degrees, though there is some political pressure to change that, and professors at Fachhochschule have double the teaching load. A doctoral degree has not always been required for a professorship at Fachhochschule, though mostly it is today, and the research and publication expectations are significantly less than at universities. The incentive to publish remains almost entirely social. Little has changed in that respect since the 1950s.

After citation indexes Citation indexes arguably began in the nineteenth century when Frank Shepard first annotated US legal decisions, but the modern form of citation indexes for academic publishing began with Eugene Garfield who combined a degree in library science with a doctorate in linguistics and established the Institute for Scientific Information (now part of Clarivate Analytics). The idea was to follow the citation chain from article to article. The intellectual value of this is clear in establishing the prior basis on which scholars built newer research. Nonetheless the citation index also suffered early on from misuse, as Garfield (1998) noted in a letter to the editor of the German language journal Der Unfallchirurg (The Orthopaedic Surgeon): I have found that in order to shortcut the work of looking up actual (real) citation counts for investigators the journal impact factor is used as a surrogate to estimate the count. I have always warned against this use. An impact factor based on citations represents a kind of popularity contest and is not a guarantee that a scholar who cites the work views it positively. Negative references also count as citations. Nonetheless academic administrators slowly began to regard the citation indexes as a quick and factual way of ranking faculty

42 Context and society

when giving out rewards and penalties, especially in the form of money. The trend towards the use of citation indexes to distribute rewards grew gradually and unevenly; it was perhaps most popular with less prestigious universities that wanted to encourage faculty to publish more. Publication rates at top universities were already high and those faculty rarely needed encouragement to keep on publishing. An example of the role of citation data can be found in the recent announcement by the UK Research Excellence Framework 2021: The UK’s four higher education (HE) funding bodies have awarded Clarivate Analytics’ Institute for Scientific Information (ISI) a contract to provide Research Excellence Framework (REF) 2021 assessment panels with citation information. (Higher Funding Council of England, 2020) In fact the use of publication data to count research productivity in the UK dates back to the Research Assessment Exercise of the mid-1980s when Margaret Thatcher was in office. The goal of these assessments was to have a reliable measurement of the research production of British universities, which was in principle a reasonable idea, but as often happens in establishing measurement standards, flaws could be found in both the measurement mechanisms and in the interpretation of results. Criticism of the results has been harsh. Martin Trow (1998) discussed the situation in Britain 22 years ago from a US perspective: It is not the outcomes that are so disturbing; it is all the things that universities and departments and academics do to try to influence the next set of assessments that are so costly to the academic enterprise. Among these are: (1) The preference for short-term publication as against long-term scholarship, and the frenzy to hasten publication to get in under a publication deadline, a frenzy that also affects decisions about where to publish. (2) The effort to squeeze research out of people and departments that have no training, aptitude or inclination for research, with the resulting proliferation of bad and useless research. … These two points are particularly relevant for assessing information integrity. Simply put, the pressure to publish may have damaged the overall quality of research output. Such conclusions need, however, to be kept in perspective. Poor quality research is not necessarily driving out better work. It may be finding new venues for publication. The publication pressure has de facto created economic incentives for establishing new academic journals, including predatory publications. Whatever the cause, the number of scholarly journals began to grow as did the subscription costs for journals from established publishers. Librarians began speaking about the “serials crisis” at the turn of the century. Estimates of the number of journal titles vary widely depending on the definition. Ware (2015) did a study for the International Association of Scientific, Technical and Medical Publishers and concludes:

Context and society 43

There were about 28,100 active scholarly peer-reviewed English-language journals in late 2014 (plus a further 6450 non-English-language journals), collectively publishing about 2.5 million articles a year. The number of articles published each year and the number of journals have both grown steadily for over two centuries, by about 3% and 3.5% per year respectively, though there are some indications that growth has accelerated in recent years. With prices rising for the more traditional journals, librarians began looking at new publication models, including open access journals with “golden road” funding models using article processing charges. Faculty also wanted more venues for their articles. A new factor in the publishing market were the online “mega-journals” that promised faster turnaround time and open access to maximise impact and readership: There has also been an internal shift in market shares. PLOS ONE, which totally dominated mega-journal publishing in the early years, currently publishes around one-third of all articles. (Björk, 2018) Björk (2018) also notes that Between 2010 and 2016 the overall number of articles indexed in Scopus grew by 28%, to around 2,170,000. The mega-journals are not predatory publications, but predatory publishers have built on their success. The predatory publications offer minimal and mostly fake peer reviewing and make their money from article processing fees, not from subscriptions. The number of predatory publishers is hard to determine, partly because determining who is predatory depends on data that are generally unavailable. Jeffrey Beall was forced to stop publishing his list of predatory journals with thousands of names because of threats from one of the publishers. The important fact is simply that the number of possible publication venues grew enormously in the first twenty years of the century. Several changes came at once that did not necessarily cause poor quality research and false data, but which did put pressure on the peer reviewing process. The time it takes to review manuscripts varies widely from field to field. For a high-quality journal in a field where the expectation is that reviewers have a close look at the data and the analysis, not just at the writing and logic, reviewing can become a significant time commitment that detracts from the scholar’s own ability to do new research and to write new articles. It is unsurprising that reviews engage in satisficing to protect their own time in a world where the economic rewards emphasise publication, and where credit for doing thorough and high-quality reviews is close to non-existent. Publishers also have an interest in limiting the article turnaround time, both for the sake of the authors and for understandable economic reasons in

44 Context and society

wanting to have articles ready for upcoming issues. Adding to this is the fact that indexing services often require on-time publication for journals if they are to be listed. The result puts pressure on the quality assurance process for a variety of reasons that are not in and of themselves bad but tend together to undermine peer review. The increase in the number of journals has increased the number of reviews needed. Even top-quality journals may face the problem of finding good reviewers. In addition the time pressure to publish high-impact content quickly, for example studies related to COVID-19 during the 2020 pandemic, has led to less thorough reviews that allowed false information to get through. Checking the quality of the data is hard. Journals like the Lancet and the New England Journal of Medicine came under fire in June 2020 because they published two articles that had to be retracted very quickly. As Dr. Richard Horton, the editor in chief of The Lancet, explained in an interview in the New York Times: …peer review was never intended to detect outright deceit, he said, and anyone who thinks otherwise has “a fundamental misunderstanding of what peer review is.” “If you have an author who deliberately tries to mislead, it’s surprisingly easy for them to do so,” he said. (Rabin, 2020) This is in effect an admission that publishers have no effective mechanisms in place to measure or catch false information. It is not fair to say that this is a direct consequence of the reliance on citation indexes, but it is one of the collateral consequences of the increased pressure to publish.

Summary Each of the measurement issues discussed in this chapter has different values in different societies at different times. Today the very concept of “society” continues to have a national element for many readers, but the movement of people and especially of students has been substantial. Anderson and Svrluga (2019) report: There were more than 360,000 Chinese students in the United States in the 2017–2018 school year, according to the Institute of International Education, more than triple the count from nine years earlier. The number of students from India at US universities is also substantial. The US Embassy in India (2019) reports: The number of Indians studying in the United States increased by almost three percent over the last year to 202,014, according to the 2019 Open Doors Report on International Educational Exchange, released today. This marks the

Context and society 45

sixth consecutive year that the total number of Indians pursuing their higher education in the United States has grown. The numbers add up to only about 2.5% of the twenty million US students (US Department of Education, 2019). These international students disproportionately attend top schools. It is no invasion, but a significant international presence, nonetheless. Similar increases have occurred in the UK for both Chinese students and those from India. Over time this has meant that Anglo-American academic culture has been exported to the students’ home countries, while the local institutions themselves grow more diverse. National origin still matters, but less than it did once. German and other European universities have seen fewer Chinese and Indian students because of the language barrier, but the increasing emphasis on English language courses and programmes is changing that. The number of German professors who have studied at elite US universities has also grown over the decades, as has the reverse flow of Germans looking for opportunities abroad. This does not mean, of course, that national origin no longer plays a role at elite universities, but it does mean that the social and historical context discussed in this chapter applies internationally if somewhat inconsistently. For less-travelled and less-educated levels of society, the influence of computing tools and broadcast media has certainly also played a role in making the social context less local, but this is hard to measure formally or reliably.

Notes 1 Personal experience. The author once worked for insurance companies. 2 Note: This is based on personal experience as part of the hiring process at a Russell Group university. No names are mentioned lest it cause embarrassment.

3 CONTEXT AND INSTITUTIONS

Introduction Institutions represent a key part of the context for measuring information integrity. The first section of this chapter will discuss formal institutions that have an explicit role in maintaining information quality at the national and international levels. In none of these cases is either measurement or information quality the core purpose of the institutions, but rather a factor in their operations that affects how others view them. Tool builders are also part of the institutional infrastructure. As a general rule, individuals lack the resources to build adequate measurement tools on their own, at least not ones that work on a scale large enough to matter. Some private companies have grasped market opportunities for tool development, which focus on particular forms of measurement and on special kinds of problems where they can make money. Plagiarism has long been the favourite because the measurement problem is technically simpler to solve than are data manipulation and falsification issues. While a number of companies have taken an interest in the image and data problems, the market has relatively few serious players because the data measurement issues involve far more variables. Most buyers would prefer a general solution, not dozens of solutions for dozens of potential problem types. Guidelines for information integrity come from a wide array of institutions with considerable overlap in their overall intention but with little or no coordination with respect to the details of how to measure integrity issues. As noted earlier in the literature review, bland all-embracing statements are common and make the task of building measurement tools harder because of their poor specificity. Law is a form of institution that should provide more context for the measurement of information integrity, but most statutes do little to define problematic practices. A person might in theory face financial punishment for large-scale plagiarism of copyright-protected works, but putting forward wrong information about DOI: 10.4324/9781003098942-3

Context and institutions 47

medicine that could cause hundreds of deaths is less plainly punishable legally. The economics of information integrity represent an element of institutional context that takes on many forms. The final section of this chapter gives an overview of the economic situation. More details will follow in succeeding chapters.

Institutions and infrastructure National organisations The US has a federal Office of Research Integrity (ORI) that belongs to the Department of Health and Human Services and will be 30 years old in 2022. The organisational structure suggests a closer relationship to medical issues than to broader natural science issues or to integrity problems in scholarly works generally. Nonetheless its definition of research misconduct is relatively broad: Research misconduct (formerly called scientific misconduct) is a narrowly defined set of actions that call into question the scientific integrity of a body of work. Under the regulations that articulate ORI’s statutory authority, research misconduct is defined as “fabrication, falsification or plagiarism in proposing, performing or reviewing research, or in reporting research results. … Research misconduct does not include honest error or differences of opinion” (42 C.F.R. Part 93). (ORI, 2020) In other words, the scope is not formally limited to health issues. Nonetheless its role in investigating potential misconduct in any scientific field is clearly limited: Allegations of research misconduct also may come directly to ORI. In this scenario, ORI will assess the allegation to determine whether it falls within its jurisdiction. If it does, ORI then forwards the case to the institution where the alleged research misconduct took place for its subsequent inquiry and investigation (as warranted). ORI always is available to assist the institution with its inquiry and investigation to ensure that it follows the regulatory requirements. ORI has no direct involvement in the decision-making by an institution. When an institution completes its investigation, ORI reviews the institution’s findings and process and then make[s] its own independent findings. (ORI, 2020) The ORI acts as a conduit for accusations but does not become directly involved with investigations themselves. Private conversations with ORI staff suggest that the emphasis is strongly on process and not on any issues involving measurement or other forms of analysis. The UK’s Research Integrity Office (UKRIO) was founded in 2006, more than a decade after the US ORI, and is not part of any ministry, but is rather a registered charity. Under “Role and Remit” UKRIO (n.d. a) states:

48 Context and institutions

UKRIO is an advisory body. We are not a regulatory organisation and have no formal legal powers. UKRIO fills gaps between jurisdictions, where no overall regulation might apply, and helps to direct researchers, organisations and the public to regulatory bodies when issues fall within their jurisdiction. UKRIO (n.d. b) also offers training, as explained under “Our Work: Education and Training”: We can provide training that emphasises the good practice that runs across all research disciplines or sessions that focus on a single discipline. In particular they offer training for members of research ethics commissions. They also offer a page of references on a broad range of topics including image manipulation and GDPR (General Data Protection Regulation). Germany’s Deutsche Forschungsgesellschaft (DFG) is roughly equivalent to the US National Science Foundation and maintains an office to coordinate the work of the Ombusdman offices at German universities: The German Research Ombudsman (Ombudsman für die Wissenschaft) is a committee appointed by the German Research Foundation (DFG) that assists all scientists and researchers in Germany when it comes to questions and conflicts related to good scientific practice (GSP) and scientific integrity. The Ombuds Committee is supported in its deliberations and conflict mediation by an office that is located in Berlin. (Deutsche Forschunugsgemeinschaft, 2019a) The connection with the local ombudsman offices at the universities encourages a more formal coordination structure than their Anglo-American equivalents, but in the end their role is also largely advisory. Key decisions about retractions and withdrawing degrees are made locally.

Universities Universities are the prime actors for handling academic information integrity issues, both in terms of trying to prevent violations and in making judgments about the consequences. The different administrative and legal systems result in a variety of approaches. Most US universities have set up so-called “institutional review boards” (IRBs) following the 1973 National Research Act to prevent abuses such as occurred in the Tuskegee study of syphilis, which had allowed African-American males with syphilis to go untreated in order to observe the results. The IRBs initially focused primarily on medical research, but increasingly broadened their scope to cover most forms of research involving human subjects. The idea was to protect people from harm, to ensure that there was informed consent, and to monitor how researchers carried out experiments. Over time universities sometimes split the IRB role between

Context and institutions 49

medical research and behavioural research on human subjects, partly because physicians often played a dominant role in the review process and had little understanding of the consequences of other forms of human subject research. The focus was not on the integrity of the information that resulted or on how to measure it. The IRBs did not monitor outcomes. It is not quite fair to imply that the IRBs played no role in preventing integrity violations. Many encouraged and set up training programmes on good scientific practice as well as programmes to help people design experiments to comply with human subjects regulation. Over time these programmes grew to include a wider range of issues, especially as the university leadership became more aware of issues like plagiarism, though typically programmes to prevent plagiarism focused heavily on undergraduate students, where the sheer quantity of plagiarism seemed to reach epidemic proportions. Universities actively update their websites to provide information and examples for their students. Princeton University offers a concise and unusually clear pamphlet called the “Academic Integrity Booklet 2018–19” (Princeton University, Office of the Dean of the College, 2018). The pamphlet’s explanation of paraphrasing is, for example: Paraphrase is a restatement of another person’s thoughts or ideas in your own words, using your own sentence structure. A paraphrase is normally about the same length as the original. Although you don’t need to use quotation marks when you paraphrase, you absolutely do need to cite the source, either in parentheses or in a footnote. Paraphrase your source if you can restate the idea more clearly or simply, or if you want to place the idea in the flow of your own thoughts – though be sure to announce your source in your own text… (Princeton University, Office of the Dean of the College, 2018, p. 7) This definition fits standard scholarly practice for paraphrasing, but offers no usable metrics about, for example, the number of words from the original that may be reused in the paraphrase. A reasonable explanation might be that the number depends on the context, but that exposes students and writers to software that matches words and may label a paraphrased segment as plagiarised. This is a risk with the VroniPlag definitions, as discussed earlier. The pamphlet also addresses the question of common knowledge directly: You may have heard that it’s not necessary to cite a source if the information it provides is “common knowledge.” In theory, this guideline is valid…. However, when you’re doing sophisticated original work at the college level, perhaps grappling with theories and concepts at the cutting edge of human knowledge, things are seldom so simple. This guideline can often lead to misunderstanding and cases of potential plagiarism.

50 Context and institutions

The warning is valid, but again there is no measurement rule, or even a definition about what is a fact (e.g. that normal human body temperature is 37 degrees Celsius) and what might be another form of general knowledge that could be found in encyclopaedias but may not be so widely known. Even though the focus is on plagiarism, the pamphlet does make it clear that: “Failing to acknowledge one’s sources isn’t the only form of academic dishonesty” (Princeton University, Office of the Dean of the College, 2018, p. 19): Fabricating or falsifying data of any kind is also a serious academic violation. Rights, Rules, Responsibilities defines false data as: “The submission of data or information that has been deliberately altered or contrived by the student or with the student’s knowledge, including the submission for re-grading of any academic work under the jurisdiction of the FacultyStudent Committee on Discipline.” (Princeton University, Office of the Dean of the College, 2018, p. 19) This appears under the heading of “Misrepresenting Original Work” and as usual there are no metrics for what counts as fabrication or falsification. Genuine data problems can be complex: for example, data dropped from an analysis, or a source deliberately not used. Image manipulation is not mentioned. Many universities also offer training programmes about all forms of information integrity violations. The topics and the quality vary widely. There is also a sense in which many statistics courses and statistics labs are training programmes in how to avoid integrity problems when interpreting numeric data. Training programmes are important for doctoral students especially, since some may be shy about asking their advisors, as questions might suggest they do not know what they are doing.

Publishers Publishers play a particularly salient role in information integrity issues, since they are typically (though no longer exclusively) the ones who make the information available to a broader market. Publishing today is diverse and has at least four components: traditional academic peer-reviewed journals, repositories and academic open access publishing, commercial publishing including newspapers, and social media tools. Two of the four have historical paper analogues, but this section will focus primarily on the contemporary digital versions. There are of course also nuances within each category and differences across cultures. Special focus will be put on whether and how each category measures the integrity of information. Traditional academic publishers have generally built their reputations by providing high-quality information, and the best of them care about making sure that no plagiarism or false data get into their journals because their economic model depends on reliability. The standard quality-assurance tool for academic articles is peer review, a process where scientists in the same field read, comment on, and ultimately approve an article for publication. Typically, two or more reviewers

Context and institutions 51

comment and should agree. The precise definition of who counts as an “academic peer” lacks rigour. Titles such as “professor” or “Dr” play a role in the definition, but reasonable doubt may exist that an assistant professor at a US community college is a peer of a Regius professor at Oxford University, even if the community college assistant professor may in fact be a better (or at least more motivated) reviewer. Social distinctions play a non-trivial role in the scholarly world, but that is only one reason why peer review is not a perfect measurement tool for judging the quality of articles. Journals compete for notable persons to sit on their peer review boards, on the reasonable theory that a person with recognised expertise in a discipline has the necessary basis for judging new articles. These are, however, often people who have many demands on their time, and the problem is that checking on the logic and the analysis, as well as on the quality of the data, takes time. As a general rule, journals expect reviewers to look carefully at the logic, to make some judgment about whether the analysis is reasonable without redoing it themselves, and to accept the data unless some obvious problem is visible. The data may not be available anyway, depending on the size of the dataset, privacy issues and potential commercial considerations. When peer reviewers disagree, the editor must make a judgment call, which can involve asking yet another person to review a paper. Some journals boast about quick turnaround time for article reviews as a sign of efficiency, but this may show nothing more than management pressure to get new articles out quickly. In rare cases an editor or reviewer may be praised for catching problems and preventing an embarrassing publication, but more often speed, not care, is rewarded. Reforming peer review is a popular topic and a number of variants of peer review exist, most recently open peer review, where a broad readership is given a chance to comment rather than (just) a hand-picked set of experts. One might also argue that hurrying a timely paper through the review process and then retracting it after people find errors, as has happened with a number of COVID-19 related papers, is itself a form of open peer review where the publisher tolerates the risk of embarrassment in order to get an article out quickly. The exact measurement processes that reviewers use in different fields are too various to discuss but are usually at least mentioned in a good peer review. The imperfections of the peer review process are widely known, but discussions typically come back to the conclusion that the academic world has no better solution. The important fact is that traditional academic publishers do make a serious effort to control the quality of the information they make available. Repositories claim no formal peer-review type process, but there are often controls about who may post and under what circumstances. One of the oldest open access repositories, dating back to 1997, is the document server (Dokumentenserver) of Humboldt-Universität zu Berlin (Humboldt-Universität zu Berlin, 2001). The rules for publication are stated in part in the “Guidelines” (Leitlinien) page of the repository (Humboldt-Universität zu Berlin, 2018). The rules are also partly specified by the various faculties that allow their doctoral

52 Context and institutions

students to publish there. A related project is the Networked Digital Library of Theses and Dissertations (NDLTD), which started at Virginia Tech and on which a number of international universities (including Humboldt-Universität zu Berlin) have board seats. Background work on DNLTD began in 1987. The idea was to make US dissertations available in electronic formats that had previously only been available as microfilm copies from either University Microfilms in Ann Arbor, Michigan, or from the libraries of non-participating universities (notably the University of Chicago and Harvard University). Quality control remained with the universities. arXiv (formerly often known as the Los Alamos National Laboratory Archive) offers preprint publication with few restrictions: arXiv is a free distribution service and an open-access archive for 1,723,249 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv. (arXiv, n.d.) Even though anyone can post to arXiv, there is a form of quality control via the comments. A comments field is strongly recommended, which makes it almost a form of open peer review. Commercial publishing is itself a complex and multifaceted world where generalisations are risky. Nonetheless one clear difference between commercial and academic publishers is that the former do not in general have any external quality control equivalent to peer review. They do, however, have multiple editorial layers to ensure quality and consistency, and changes in the top editorial leadership – or in the ownership – can bring about changes in perspective, which may mean using different measurement standards for information integrity. The most respectable contemporary national newspapers such as the New York Times or the Guardian in the UK are careful about fact checking and following up on sources. At the other end of the spectrum are the tabloids whose economic model is to cater to a public that wants sensationalism, often with a heavy mix of sex-oriented photographs that once seemed daring but are now mild compared to what can be found for free online. For these tabloids, fact checking seems to matter little. They have a long tradition of spreading news about alien invaders, unknown monsters, and sensational murders. The politics of the news in these tabloids depends on the preferences of the publisher, but many have a long history of nationalist and right-wing orientation. Left-wing sensationalism is not unknown, but unsurprisingly lacks the financial resources for large-scale impact. Social media began as a means for personal communication, in some sense like the earliest email lists. Those involved with email in the 1980s will remember how lists sometimes degenerated into angry arguments called “flame wars.” Moderators had to set guidelines and slowly a culture developed in which extreme statements

Context and institutions 53

were no longer acceptable. Social media companies today have a far wider reach than the old email lists, and have corporate managers who can in theory set rules. In late May 2020 Twitter became the first major social media platform to flag blatantly untrue statements, even by the US president (Walker, 2020). A month later big advertisers like Unilever and Coca Cola began to pull advertisements from Facebook because of hate speech (Murphy, 2020). Facebook finally also set rules. Nonetheless, social media continues to be a major conduit for misinformation. At present the control mechanisms only react to untruths of the more extreme sort, and have no means for preventing them from being posted. Broadcast news, both television and radio, is closely related to commercial publishing, but has historical similarities also to social media. Citizens Band (CB) radio was popular among truck drivers in the US long before they had any access to the internet. It was largely unregulated and allowed people to share stories and rumours uncensored. Since there was no recording mechanism (and none wanted), much of that world has vanished into obscurity. Early radio in the 1920s and 1930s, like early television in the 1950s, had costs resembling those of publishers and had a corporate management structure even when state-owned. The British Broadcasting Corporation (BBC) and its north American counterparts, CBC (Canadian Broadcasting), CBS (Columbia Broadcasting), ABC (American Broadcasting) and NBC (National Broadcasting), competed for stories but had editorial policies that kept the news side of the broadcasting strongly factual and evidence-based. Broadcast news had nonetheless to compete with the entertainment programmes for advertising funding, which created economic pressures to win more viewers. As the number of broadcast networks grew, the competition grew as well. The Corporation for Public Broadcasting and PBS (Public Broadcasting) began in 1969. Ted Turner started Cable Network News in 1980 and Rupert Murdoch established Fox News in 1985. Non-commercial “public” broadcasting stations existed in many regions in the US as early as the 1950s, often in connection with universities. The problem was that facts and in-depth reporting no longer sufficed to attract an audience – a perspective became necessary, a point of view or background story that provided a context in which the news made sense. Increasingly viewers could choose their brand of information based on politics and social opinions. The critical measurement for these news programmes involved ratings and income, not the integrity of the information on display.

Technology and tools Technology is part of the infrastructural context for measuring information integrity, and it plays a particularly important role in enabling some forms of reliable counting. This section will discuss primarily digital tools, but of course others exist too. In the pre-computer world counting by hand using sampling mechanisms and estimates was common. The real complexity with almost all forms of measurement is not found in the mechanics, but in what to count, how to define it and how to recognise it in reproducible ways.

54 Context and institutions

Plagiarism tools As noted earlier, plagiarism detection tools are among the most frequently used and most accurate, because they match discrete groupings of possible variations of a limited number of symbols, mainly letters using ASCII encoding. The theoretical size of the number of variations depends on an upper limit on word size, and will be smaller than the actual number of words in a language. The size of the English vocabulary, for example, varies from dictionary to dictionary depending on the inclusion of archaic words, technical concepts, species names, and words borrowed from other languages. Word counts in German are more complex than in English, because German readily combines words to make new ones. This is one reason why German universities and publishers usually specify the lengths of articles or essays in spaces, not in words. German has multiple dialects too, with their own spelling variants, and there have been several spelling reforms. Nonetheless in practical terms, plagiarism tools and plagiarism measurement concentrates on English language texts, since the international scientific community has largely adopted English as the common communication medium. This makes the matching process for plagiarism far more tractable than any equivalent measurement process with data, where the number of combinations and permutations is essentially unlimited. Words are not all alike. James Pennebecker (2011) distinguishes “function words” from words that carry meaning: Pronouns (such as I, you, they), articles (a, an, the), prepositions (to, of, for), auxiliary verbs (is, am, have), and a handful of other common word categories are called function words. On their own, function words have very little meaning. In English, there are fewer than 500 function words yet they account for more than half of the words we speak, hear, and read every day. Plagiarism checking software does not typically distinguish between function words that serve structural purposes in a sentence but have no inherent meaning, and content words, meaning words with specific meanings or significance. This approach is reasonable when trying to measure whether someone has copied a large block of text verbatim, but for decisions about paraphrasing, the distinction could matter. At some level paraphrasing probably must reuse many of the key content words in order to discuss the same topic in a meaningful way. How many or what percent of those content words may safely appear in a paraphrase will depend on the content and the context, and a general rule may be hard to define except to say that 100% is likely too much and 0% likely unworkable. Whether the reuse of function words should count at all seems less relevant. Context matters as well. Someone paraphrasing a description of a particular chemical process must reuse many of the same technical terms. This is true for many technical fields, including the description of the outcome of statistical tests. Even paraphrasing a description of a landscape must probably reuse many content words, if the passage includes specific flora and fauna and geographic terms. There

Context and institutions 55

comes a point where an exact quotation in quotation marks with an exact reference makes more sense. The legitimacy of a paraphrase depends on its added value, generally the added clarity of the formulation, and if the count of reused content words exceeds 50% of the original total, there may be a basis for doubt. A reliable measurement analysis considers both count and context. Facts should not count as plagiarism, but facts are particularly hard to distinguish in a text. Some facts are fairly obvious. The 12-word sentence: “MecklenburgVorpommern is a federal state in the north east of Germany” was once flagged as plagiarism in a case in Germany (the author’s name is withheld because of confidentiality, even though the attack was made publicly in a newspaper). It is no surprise that other writers have used the same phrase. Other ways to express the fact exist, but the number is limited and looking for a unique way to express a fact has little intellectual value. The challenge is to tell a discovery system how to recognise that such a string is a fact. Length can matter too. Here is an example. A search in Google for the six-word exact phrase “Richmond is the capital of Virginia” finds “about 152,000 results.”1 Adding three more words (“the state of”) reduces the results to 10,600. Adding one more word and searching for “Richmond is the capital of the southern state of Virginia” gets only one result from an advertising website called “Healing Law: Connecting You With The Perfect Lawyer” (Healing Law, 2020). A very strict construction of plagiarism could conclude that using the phrase without acknowledging the website is a violation, but such an argument seems tendentious. Even if the phrase has only one result in a Google search, the combination of the two names (Richmond and Virginia) with two facts (capital and state) and a common adjective (southern) hardly creates a unique concept or description even when the phrase is measurably (for Google) unique. The US Supreme Court ruling in Feist Pubs., Inc. v. Rural Tel. Svc. Co., Inc., 499 U.S. 340 (1991) determined that factual data organised in an unoriginal way lacks the creativity to be copyright protected (US Supreme Court, n.d.). While the ruling applies only to US law, the uniqueness principle is relevant for plagiarism decisions. Nonetheless it could arguably represent plagiarism if someone copied the whole paragraph from the website: Richmond is the capital of the southern state of Virginia. Which is located in the United States. Richmond, is unsurprisingly the center of the Richmond Metropolitan Statistical Area (MSA). The city of Richmond is also the center of the Greater Richmond Region. Incorporated in the year of 1742, Richmond has been an independent city since 1871. Everything in the paragraph is a fact, but at some point even the combination of common facts in a particular constellation borders on a unique expression even if nothing except the word “unsurprisingly” shows any opinion. It is not impossible to set rules for complex boundary cases like this in a computer algorithm, but existing systems currently (and perhaps rightly) leave such judgments to humans.

56 Context and institutions

Standard phrases also ought not to be counted as plagiarism. The clearest examples come from statistical output. A good example of a standard phrase can be found in Julie Pallant’s SPSS Survival Manual (2007, p. 133). She writes: The results of the above example using Pearson’s correlation could be presented in a research report as follows. … The relationship between perceived control of internal states (as measured by the PCOISS) and perceived stress (as measured by the Perceived Stress Scale) was investigated using Pearson product-moment correlation coefficient. Preliminary analyses were performed to ensure no violation of the assumptions of normality, linearity and homoscedasticity. … The language is relatively inflexible because of how the test works and what the requirements are for normality, linearity and homoscedasticity. Even more important is the textbook’s recommendation to use this language in a research report, a strong indication that copying is expected. Statistics are not the only area where standard phrases exist. Standardised language is less common in the humanities, where uniqueness of expression matters, than in the natural sciences, where language is only a tool to convey results and where uniqueness actually has a negative value because it can cloud the description of precisely how a test was carried out. Standard phrases are less problematic than facts for computer-based plagiarism tools because there are a number of signals for what counts as a standard phrase (including the mere fact of describing a statistical result), and the number of standard phrases is probably modest enough that a database of standard phrases could be set up.

Data tools There are no tools for discovering data falsification or manipulation in the same way that there are tools for plagiarism. As noted above, measuring plagiarism is at least a relatively tractable problem because the number of words is finite. The same cannot be said for data. This section will look at tools for detecting three types of data problems: text-based humanities sources, data visualisation, image manipulation. There are many other forms of data integrity problems that need some means of discovery and measurement, but which lack any well-established tools. Text-based humanities sources are perhaps the oldest form of relatively reliable systematic data. There is a degree to which humanities information has long been open access, since much of the information comes from books and a good scholarly library will generally make those books available to recognised scholars. The coverage is incomplete because not every library has every published book and not every scholar has access, though in the modern world scholarly networks are relatively effective in getting people the information they want. Archival data is less available. Not all archives are open or even have an organisational structure that makes discovery easy. Digitisation has helped, but complete digitisation has a long

Context and institutions 57

way to go. A primary way in which libraries (and to some extent archives) can function as a discovery tool for false information is through scholars following up footnotes and references. The referencing structure for scholarly texts is old, and may well date from the sixteenth century according to Chuck Zarby in The Devil’s Details: A History of the Footnote (cited in Kennedy, 2019). Referencing is not a digital tool per se, but in the contemporary world such references increasingly include links to sources which makes follow-up easier. Seamless access to the sources of (textual) data is no panacea for preventing manipulation or falsification, but its transparency helps. There is nothing to prevent fake or manipulated sources from being put on the internet, and some extremist organisations have done that already. Nonetheless the amount of outright falsification in humanities sources is relatively low, perhaps because the monetary value of having falsified sources is not worth the effort. Non-scholarly publishers with only a few exceptions make little or no use of any type of references. Fact checkers can help to address information integrity issues for these publications, which range from ordinary newspapers to social media. Fact checkers are numerous in the US, and some politicians have raised concerns about whether the factcheckers themselves are sufficiently politically neutral to be reliable. Middlebury College offers a list of five “non-partisan fact checking sites,” including: FactCheck.org (from the Annenberg Public Policy Center), FactChecker2 (from the Washington Post), PolitiFact. com (from the St. Petersburg Times), Snopes.com (an independent service), and PunditFact (from the Poynter Institute). These sites make a real effort to be balanced. For example, the lead story on Snopes on 5 July 2020 was “Did Trump Say Operation Desert Storm Took Place in Vietnam?” and the analysis went on to explain: “… that claim was false, based on Trump’s fumbling the delivery of a passage in his speech in a way that left some listeners confused” (Mikkelson, 2020). Correctiv.org and Volksverpetzer.de are the German equivalent fact checking sites. Among the fact checking sites in the UK are the BBC and separate services each for Scotland and Northern Ireland. A worldwide list can be found in Wikipedia under “list of fact-checking websites.” Systems to discover the manipulation or falsification of facts for the humanities and more narrative-oriented social sciences do not really exist, but for quantitative sciences and those social sciences that use tables and graphs, some statistical tools can check whether the data are in some sense too perfect. Uri Simonsohn (2013) wrote in the journal Psychological Science: In this article, I illustrate how raw data can be analyzed for identifying likely fraud through two case studies. Each began with the observation that summary statistics reported in a published article were too similar across conditions to have originated in random samples, an approach to identifying problematic data that has been employed before (Carlisle, 2012; Fisher, 1936; Gaffan & Gaffan, 1992; Kalai, McKay, & Bar-Hillel, 1998; Roberts, 1987; Sternberg & Roberts, 2006). Simonsohn, Lief Nelson and Joe Simmons established a blog called “Data Colada: Thinking about Evidence and Vice Versa” in 2013 and continue to update it. This

58 Context and institutions

is not in any sense a tool for analysing the reliability of data in the social and natural sciences, but it represents a portion of the infrastructure by promoting discussion about analytical methods. The articles appear mainly to be by the three founders and the site is apparently not meant to be a forum for other authors, but a public place for personal reflection. Whether tools will grow from it and from other similar sites remains to be seen. Images are a form of data that have received increasing attention because of the manipulation of medical and biological images. These bio-medical images offer a compact representation of a large number of data points and there is a lingering sense that photographic style images are an accurate representation of information. Historically bio-medical images were rare even in the mid- to late-twentieth century, because the imaging equipment was expensive and not available in every lab. Digital imaging techniques made a difference in the availability of machine-generated images, and tools like Photoshop or Gimp made manipulation easier than with chemical film. One of the discovery problems with these bio-medical images is their complexity. If one is not an expert in the kind of content in the image, it is difficult to see the nuances that can show possible manipulation. A few experts have trained themselves to do this reliably. One is Dr. Elisabeth Bic, a Dutch microbiologist who has founded a blog called “Science Integrity Digest.” She explains her method in the FAQ on her blog: I find these by just looking at them. I don’t use any software to find them; I just flip through PDFs. I do use Mac’s Preview to make an image darker or lighter to bring out some details, if needed. Occasionally, I make use of FotoForensics.com or Forensically to confirm my suspicions, but I always find them by eye first. Preview is also used to “decorate” the image with colored boxes to point out the areas that look similar to each other. (Bic, 2019) There is a scaling problem with fraud detection using human eyes alone. Specific domain knowledge presumably helps to find problems, but may not be absolutely necessary. Another notable detection expert is Jana Christopher from Heidelberg, who left Embro Press in 2015 to establish her own “Image Integrity” service that “offers a comprehensive service helping scientists and journals to protect and safeguard the published record” (Christopher, n.d.). She is not a trained microbiologist, as she explains in the “About” section of her website: I have a background in theatre arts, and worked in theatres and opera houses in London’s West End for several years. I obtained an MA in Bilingual Translation from Westminster University in 2002, before returning to Germany and starting a new career in publishing. (Christopher, n.d.) The idea of finding an automated detection mechanism for image manipulation is popular, and a number of researchers and a few companies are working on

Context and institutions 59

machine-learning tools to make detection easier. Among those are a team at Harvard Medical School led by Mary C. Walsh: Dr. Walsh is responsible for the forensic analysis of scientific research relevant to issues of potential academic misconduct at HMS. Her current responsibilities include the application of advanced forensic analysis techniques and tools approved by the federal Office for Research Integrity throughout the course of an academic misconduct review. … (Walsh, 2020) The Harvard team has the goal of building machine-learning tools and has a partnership with the Berlin-based HEADT Centre, which is providing retracted manipulated images for testing.3 Other relevant research projects include the work of Daniel Acuna at Syracuse (see the literature review in Chapter 2) and the Japanese company LPIXEL (2019): LPIXEL is a University of Tokyo startup that specializes in providing advanced image analysis and processing technology for the life sciences. Ever since its inception, LPIXEL has been developing solutions tailored to detecting inappropriately manipulated and duplicated images used in scientific papers. None of these initiatives have created a tool so reliable that it has won a significant market among leading publishers or universities. It is easy to underestimate how complex the problem is. As a general rule, computers “see” images by creating mathematical algorithms of the boundaries of lines and shapes within the image. Too much detail can make boundary identification difficult and too little can leave out important details. That is only the beginning. A detection algorithm must also understand what the image should look like and what the elements of the image mean. Many other tools exist as well, including ImageJ: …the image editing program ImageJ, developed by Wayne Rasband under the auspices of the United States National Institute of Health, offers a wide range of image editing possibilities, as well as a wealth of tools and extensions that can be used for measuring and analyzing images. (Beck, 2019) More tools are also in development around the world.

Role of law Three aspects of law can play a formal role in information integrity issues, depending on the circumstances and type of problem. The law most often mentioned is copyright law and has to do with plagiarism, but Jonathan Bailey (2013) makes an important distinction: “plagiarism is an ethical construct and copyright

60 Context and institutions

infringement is a legal one. Though they have a lot of overlap, they are not the same and can never really be the same.” Copyright law is difficult to apply to plagiarism cases, in part because the law is significantly different in different countries. In the US, for example, there is the “fair use” clause (17 USC 107), which applies four key tests: In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include – 1. 2. 3. 4.

the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; the nature of the copyrighted work; the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and the effect of the use upon the potential market for or value of the copyrighted work.(Cornell Legal Information Institute, 2020)

In plagiarism cases the first two tests play at most a minor role. Traditionally scholarly uses are acceptable under the first fair use test and scholarly works also fit fair use for the second test. The third and fourth tests are potentially relevant for plagiarism and have been applied to cases where substantial amounts of text have been taken, as in the case of “Harper & Row, Publishers, Inc., et al. v. Nation Enterprises, et al.” where the issue was a relatively small amount of copying. Reth (1985) explains in the summary: …the Court held that although the excerpts constituted a quantitatively insubstantial portion of the memoirs, they represented “the heart of the book” and were qualitatively substantial in view of their expressive value and their key role in the infringing work. This suggests that a plagiarism case that took the “heart” of a work without proper attribution could be considered a copyright violation under US law. Something less than the heart of a work, however, might well count in the US as a form of fair use for copyright purposes. European law has no equivalent to “fair use.” There are exceptions for certain kinds of uses, including teaching. For example the German BMBF (Federal Ministry for Research and Education) advises based on paragraph 60a of the Copyright law (§ 60a UrhG) that 15% of a work can be used for teaching without permission (BMBF, 2020). It also advises that paragraph 60c of the Copyright law (§ 60c UrhG) allows copying up to 75% of a work for personal research (paragraph 2) and up to 15% of a work for a limited number of others (paragraph 1). Germany also has a citation law (§51 UrhG), which specifies that a citation is allowed as part of a single scholarly work that uses it as an example or explanation (“Erläutering”)

Context and institutions 61

(dejure.org, n.d.). It is clear from the law that a reference is necessary. The exact form of the reference is not specified. The UK copyright law has the concept of “fair dealing,” as do Canada and other parts of the Commonwealth. UK law has had to conform to European law in a broad sense, but that will likely change with Brexit in 2021. Nonetheless it is unlikely to affect copyright law substantially in the near future, and specific copyright issues rarely arise in actual cases of plagiarism, even in Europe where the law is more stringent. Law is often cited in connection with ethical reasons for avoiding plagiarism, but plagiarism only rarely leads to court cases. One example is Star Wars vs Battlestar Galactica, which claimed plagiarism in its lawsuit (Clark, 2015), but plagiarism played no role in the appellate court decision, probably because it is not defined in law (US Court of Appeals for the Ninth Circuit, 1983). Arguably copyright law has a positive role in discouraging plagiarism, since more extreme forms of plagiarism could reasonably be called copyright violations, but some other forms of legal protection can actually play a negative role in information integrity issues because they may give an incentive for keeping data secret, which makes falsification or manipulation harder to discover. Patent law, for example, offers protection to certain kinds of chemical, bio-chemical, medical, or other processes once they are registered. Registration means describing processes that could give others ideas, and registration often takes longer than peer review. Data protection (in Europe the General Data Protection Regulation or GDPR) is a part of the legal context that discourages providing access to original datasets. The principle of GDPR is to protect the privacy of individuals by limiting the exposure of data about them. Each member of the European Union has its own codification, some of which are stricter than others, and any data involving EU citizens is also supposed to follow GDPR regardless of the country. Strictly speaking, data published in the US involving only US citizens need not follow GDPR, but in today’s highly international scholarly world, social science data involving behavioural research is so likely to involve EU citizens that most people are cautious. US universities have developed their own data privacy rules too in connection with the institutional review board process. The end result is that original data is often not available to scholars who want to do extra checking. At the same time funding agencies, including the US National Science Foundation and the German DFG have established policies requiring archiving plans so that data will be kept for (in general) at least ten years.

Economic context There is no specific institution or set of institutions that tracks or oversees the economics of information integrity. The economics arise from a mosaic of many thousands of researchers, universities, publishers, and companies offering discovery and measurement solutions. The economic issues are so ubiquitous and so complex that they are easily overlooked among the other ethical, technical, political, historical and structural issues involved in information integrity. It might be simpler if

62 Context and institutions

there were a clear economic advantage to falsification or its opposite, which includes both hunting integrity violations or choosing simply to follow the path of virtue. This section will look broadly at some of the economic issues. The following chapters will consider some aspects in greater detail. Historically information falsification has not been an obvious long-term economic win for perpetrators. Over time, lies and fakes in technical fields tend to come to light for the simple reason that the results are not reproducible. This seems not to discourage people any more than robbers seem discouraged by the prospect of being caught, which is of course more likely in some countries and cities than in others. Psychology plays as much of a role here as economics in short-term decision-making. Sharot (2011) writes about the tendency to go hopefully: Humans, however, exhibit a pervasive and surprising bias: when it comes to predicting what will happen to us tomorrow, next week, or fifty years from now, we overestimate the likelihood of positive events, and underestimate the likelihood of negative events. Inventing fake data may, for example, seem attractive in the short term, because it saves time and effort and because the likelihood of discovery seems minimal. There are other attractions too, such as producing results so good that they silence critics, including peer reviewers and editors. It is human to prefer praise to grudging acceptance. No one enjoys having to defend flawed genuine results, and skilfully created fake data can bypass many problems and give the impression of extraordinary scholarly competence. For a single occurrence the risk of discovery may seem and may be minimal, but carelessness over time can expose the fake and cost the author considerable embarrassment and possible job loss. Diederik Stapel and James Hunton are two well-known examples that will be discussed later in Chapter 4. Complete falsification represents an extreme and occurs relatively rarely. Data and image manipulation offer more complex scenarios and a broader range of problems. Outliers in a behavioural sample are by no means unusual, and there are legitimate reasons for throwing them out, especially if the reasons are clearly and openly explained. Explanations can be difficult, though. In a hypothetical case where a couple of participants may have misunderstood the instructions of an experiment, the temptation to save time and aggravation by making the awkward data points go away has an economic attraction. Self-justification in such cases is easy and by no means dishonest, if there is empirical evidence that several participants did indeed misunderstand the instructions. People do in fact misunderstand instructions all the time and mindlessly including such answers would also be wrong. Anomalies of all sorts need explanation. Of course, a well-designed experiment would ideally build in a control element that shows how well the instructions were understood, but it is hard for researchers to anticipate all possible situations in advance. The point here is that avoiding criticism, and avoiding having to explain problems to justify results, is an economic issue as important as money. Anyone who fears that admitting a flaw in an experiment means having to repeat it could have

Context and institutions 63

financial concerns, since many behavioural experiments involve some form of financial incentive and natural science experiments may involve non-trivial costs for materials, staff and equipment use. A young researcher with a limited budget may lack the means for repeating an experiment, which makes it urgent to make the original results acceptable. Academics do not typically make money directly from publications, but make money indirectly by getting positions, promotions and salary increases. Assistant professors and other scholars in comparable entry-level jobs often have a limited period for qualifying for a permanent (tenured) position, and many universities have guidelines that lay down exactly how many publications of what sort a person needs in order to move to higher positions. The British Research Excellence Framework (REF) is an example of one such set of guidelines. It is unsurprising that such requirements put pressure on young scholars to avoid risk-taking, including anything that might make reviewers or editors require them to do substantial revisions. Whether the guidelines have a dumbing-down effect on scholarship will only be measurable over longer periods, as the generation growing up with the REF and similar guidelines matures, but the pressure to publish is explicit because jobs, salaries and futures are at stake. The temptation to make minor problems go away is palpable; most people resist, but not everyone can. Young scholars are by no means the only ones who feel the pressure to publish. A number of the best known cases (again Stapel and Hunton are examples) were professors with permanent positions, good reputations, and substantial salaries. Temptation rather than risk played the key role in their wanting even more acclaim for more papers from more colleagues. The competitive nature of modern academic life plays a non-trivial role in pressuring people with safe positions to risk what may seem like minor manipulations to data or results to build their reputations further. As in the Sharot quote above, these scholars misjudge what they really want, and misjudge the risk of the consequences. The economics of non-scholarly publishing is linked to the popularity of articles and the revenue the stories generate in either sales or advertising. One of the effects of online publishing has been free access to content, with publishers’ income entirely dependent on advertiser fees. This model came from radio in the early part of the twentieth century, and later from television broadcasting, where everyone with a receiver could listen or watch. The model enables large advertisers to support the publication of almost any information or misinformation they favour. Publishing outlets have a theoretical responsibility to monitor the quality and truthfulness of information that they publish, but they lack any equivalent external mechanism such as peer review. The US Federal Communications Commission has guidelines: The FCC is prohibited by law from engaging in censorship or infringing on First Amendment rights of the press. It is, however, illegal for broadcasters to intentionally distort the news, and the FCC may act on complaints if there is documented evidence of such behavior from persons with direct personal knowledge. (FCC, 2011)

64 Context and institutions

Enforcement is, however, lax due both to free speech and free press regulations and because of a lack of political consensus about how to measure the truth of a claim. Even when big advertisers withdraw from shows, continuing viewer or reader popularity may persuade the publisher or broadcaster to continue publishing the content. An example is Walt Disney Company cancelling its advertising on Tucker Carlson’s Fox News show because of his monologues that were racist, outrageous, and often untrue. Viewers, however, are tuning in. “Tucker Carlson Tonight” was seen by 4.2 million people on Monday, making it the most-watched television program in the country that night, ahead of entertainment fare on the major networks. (Grynbaum and Hsu, 2020) The distinction between news and entertainment is unclear in part because no broadly accepted measurement mechanism exists to define the boundary between genuine factual statements and opinion-based distortions that deviate significantly from that. If short-term popularity is the de facto measure of the economic value of untruths, information integrity may find it hard to compete.

Summary This chapter has looked at the institutional framework for the measurement of information integrity. While there are certainly institutions that accept a measure of responsibility, including major universities and the more reputable publishers, these institutions lack both the tools and the resources to do a systematic job of policing all the forms of information that exist in modern society. Excessive policing could also evolve into a restraint of free speech and the freedom of the press. From the viewpoint of some leaders and some governments, criticism is itself a form of false information. The old German concept of “Sitzredakteur” from the Wilhelmine era meant a member of the publication staff whose job was essentially to go to jail whenever the authorities felt an article was too critical of the government (Hall, 1974). Today the person who goes to prison is more often the journalist, not the editor. A recent example is the Turkish prison sentence for the German journalist Deniz Yücel (Tagesschau, 2020). The following chapter will look at a different form of institutional context: the various disciplines.

Notes 1 Date of the search 3 July 2020 2 www.washingtonpost.com/news/fact-checker/ 3 The HEADT Centre is the Humboldt-Elsevier Advanced Data and Text Centre. The author of this book is the director.

4 DISCIPLINES

Introduction Sociologist Andrew Abbott (2001) describes disciplines as follows: The American system of disciplines thus seems uniquely powerful and powerfully unique. There are alternatives: the personalism of nineteenth century Germany, the French research cluster, the ancient British system with its emphasis on small communities of common culture. But because of their extraordinary ability to organize individual careers, faculty hiring, and undergraduate education, disciplinary departments are the essential and irreplaceable building blocks of American universities. (p. 128) Yet in fact the magnitude of centripetal disciplinary forces remains enormous. Initial canons are still taught in most departments in most disciplines, and even while their content changes, the intersections between them, across disciplines, do not grow appreciably larger. (p. 152) While Abbott is talking primarily about disciplines in North American universities, this disciplinary model has spread internationally in western Europe and increasingly in Asia. The original meaning of the word “discipline” has evolved from mainly implying punishment (to “discipline” someone) to a more complex range of concepts. The etymology website “etymonline” (2020), explains the changing meanings of the word over time: Meaning “branch of instruction or education” is first recorded late 14c. Meaning “system of rules and regulations” is from mid-14c. Meaning “military DOI: 10.4324/9781003098942-4

66 Disciplines

training” is from late 15c., via the notion of “training to follow orders and act in accordance with rules;” that of “orderly conduct as a result of training” is from c. 1500. Sense of “system by which the practice of a church is regulated, laws which bind the subjects of a church in their conduct” is from 1570s. These earlier meanings are not, however, completely lost in discussing modern academic disciplines, since the disciplines include sets of rules for how to conduct research, including rules to prevent certain kinds of information integrity abuses. Often newer disciplines have more explicit rules to define their practices than older ones, since they needed to differentiate themselves from existing fields of study. Such rules can enable some forms of measurement, as will be discussed below. Outside the formal academic disciplines, regulation of information often depends on professional standards. In medicine the standards are relatively explicit in most western countries for those with medical licenses. There are, of course, people who give medical advice or sell pseudo-medicinal products for which regulation is limited and whose activities are sometimes in a grey-zone between the legal and illegal. Journalism also has professional codes of behaviour that students learn in academic programmes, but for which there is almost no effective enforcement. Anyone may claim to be a journalist or may publish articles with no formal credentials. The degree to which any rules apply depends on the willingness of the publishing platform to enforce them, which is one of the key issues in putting pressure on social media outlets like Twitter and Facebook to enforce restraint on more outrageous counterfactual claims. This chapter will use data from the Retraction Watch Database (n.d. a) to provide a neutral means for measuring how the different disciplines handle information integrity problems. No reader should suppose that the Retraction Watch Database is a complete measure. It records only publicly available journal (and some monographic) retractions, and the results are modified periodically, presumably because of new information and occasionally because of reclassification of the disciplines or reasons for the retractions. The founders, Ivan Oransky and Adam Marcus, have a particular interest in medical issues, which could potentially skew the results but has not done so in any obvious way. The Retraction Watch Database has its own thesaurus and uses Boolean logic to facilitate searches, and includes items going back into the 1950s. In order to provide a baseline for comparison and measurement, searches for particular kinds of integrity problems were done against all disciplines in the database. Note that all Retraction Watch Database searches were done between 26 and 30 January 2021. The results can vary over time as updates and revisions occur. The results are shown in Table 4.1. These are only 29% of all the possible reasons for retractions in the Retraction Watch Database, but they are the ones that most matter across disciplines. Other reasons are more process-oriented and do not necessarily mean that the information is false. Some reasons, such as contamination, will be discussed separately under specific disciplines. The intention of these Retraction Watch searches is to allow a comparison of the relative size of the information integrity problems for particular

Disciplines 67

TABLE 4.1 Number of retractions by reason

CATEGORY

ALL SUBJECTS

SEARCH COMMAND

Data falsification Image falsification Manipulated results Errors

1261 hits

“Falsification/Fabrication of Data OR Falsification/Fabrication of Results” “Falsification/Fabrication of Image OR Unreliable Image” “Unreliable Data OR Manipulation of Results”

Plagiarism

2503 hits

520 hits 597 hits 2097 hits

“Error in Analyses OR Error in Data OR Error in Image OR Error in Materials” “Plagiarism of Article OR Plagiarism of Text OR Plagiarism of Image OR Plagiarism of Data”

kinds of violations in each of the fields. It is important to note that an article might have more than one reason for its retraction. The size of a discipline is important to consider when measuring the degree to which a discipline has integrity problems such as falsification, manipulation or plagiarism. In an even distribution, larger disciplines should have a larger number of problems than smaller ones. The question of how to measure the size of a discipline is itself non-trivial. For particular countries, the number of members of a national association could serve as a measure, but not all countries have such associations and not all active researchers join. A reasonable proxy measure could be the number of journals in a field. In the natural sciences journal publication has become dominant over monographs, and Retraction Watch focuses primarily on journal retractions, which means that the number of retractions in a field and the number of journals should bear some rough relationship. No single international list of research-oriented journals by discipline exists, but the catalogues of major university libraries do offer subject classifications that roughly match the academic disciplines and that allow searching by journal as a publication type. Two of the largest and best-funded academic research libraries are those of Harvard and Oxford. Although they clearly favour English language publications, scholarly research today is so overwhelmingly in English that scholars who are not native English speakers publish heavily in English when writing for a non-local audience. Looking both at a US and a UK library is important, because the subject classification is not always identical, and their collection policies, while fairly comprehensive, vary as well. In order to get current journals, a limit of the last 20 years was used. That includes all journals that are currently active. The numbers for specific disciplines will be given in the sections below for the natural sciences, social sciences and humanities. An important factor to keep in mind is the relative size of the subject areas as measured by the number of journals, as shown in Table 4.2. A reasonable expectation is that larger subject areas will have more retractions. With all of the statistics below the reader should remember that Retraction Watch

68 Disciplines

TABLE 4.2 Relative size of subject areas as measured by the number of journals

DISCIPLINE

HARVARD

%

OXFORD

%

Natural sciences Social sciences Humanities Professions TOTAL

21,832 16,889 71,307 74,831 184,859

11.8% 9.1% 38.6% 40.5% 100.0%

15,590 12,456 56,483 42,043 126,572

12.3% 9.8% 44.6% 33.2% 100.0%

may assign more than one category per retraction, which guarantees that the total percentage of retractions will not always add up to 100%.

Natural sciences The natural sciences today contain a wide range of disciplines and methods. In a very broad sense the natural sciences can trace their methods back to Aristotle (precise observation, for example) and to Pythagoras (mathematical description). Much has changed since the fifth century BCE, but the role of mathematics and the need for precision remain. This section will separate the natural science disciplines into two groups: mathematical sciences (including computer science), and laboratory sciences including medicine. Overlap between the groups is considerable and essentially all of the natural sciences rely to some degree on mathematical methodologies. The number of journals for each of the disciplines is given below along with the percent of all the journals for this set of natural science disciplines. Table 4.3 is important to keep in mind throughout the chapter, since the relative size of a field affects the absolute number of retractions. Unsurprisingly Harvard has more natural science journals and probably a larger budget than Oxford. The numbers show that journals devoted to medical topics represent close to half of the total for Harvard, and medicine plus biology is over TABLE 4.3 Number of natural science journals by discipline1

DISCIPLINE Mathematics Computer science Physics Chemistry Biology Medicine TOTAL

HARVARD

%

OXFORD

%

2,045 1,077

9.4% 4.9%

1,658 755

10.6% 4.8%

1,697 2,088 4,338 10,587 21,832

7.8% 9.6% 19.9% 48.5% 100.0%

1,494 2,000 3,481 6,202 15,590

9.6% 12.8% 22.3% 39.8% 100.0%

Disciplines 69

60% for both. Classification differences may matter here, but money likely influences the size differences between the medical and biological journals and all others, especially in terms of journal financing. It is important to remember that these percentages are only for journals in the natural sciences, while the Retraction Watch statistics in Table 4.1 are for all subjects. No one should make a direct comparison, but the relative size matters. Smaller disciplines should have fewer retractions than larger ones. The relative size of retractions in the natural sciences compared to retractions in all subjects gives an important measure of where problems occur.     

1047 instances of data falsification (83.0%), 503 instances of image falsification (96.7%), 520 instances of manipulated results (87.1%), 1772 instances of errors in the analysis (84.5%) and 1725 instances of plagiarism (68.9%).

(Note that the percentage reflects the relative number of retractions for all subjects and that categories in the Retraction Watch are not exclusive, meaning that there can be multiple reasons for a retraction.) These numbers may be a reflection of the emphasis on natural sciences in Retraction Watch, but they suggest that many of the information integrity problems can be found in these technical fields, which might also be the ones best able to set up internal measures to avoid problems in the future.

Mathematical sciences Pure and applied mathematics Mathematics is far from a unitary discipline and an overview cannot do justice to its range and variation. It is like language in being the basis of communication for the vast range of thought and method necessary to the natural sciences. Pure mathematics is pure logic: the proofs, theorems, and corollaries of mathematical thinking that have developed over multiple millennia. None of the natural sciences has an equally long history of continuous development. Defences against false information in mathematics are an integral part of the logic of proofs, because the “truth” of a proof comes under immediate examination when someone presents the proof for a new theorem. Not every mathematician tests every new theorem, and not every mathematician has the background to follow each proof, but the rigor of the logic is so well-embedded in the field that it tends to be self-correcting. This is not to say that pure mathematics has a perfect record. People make mistakes in all fields and some proofs are eventually discovered to be wrong, but deliberate falsification is rare, perhaps because the existing standards for measuring false claims work well. Applied mathematics takes several different forms, the best known of which is statistics. Proofs in inferential statistics are part of pure mathematics, but the

70 Disciplines

application of inferential statistics to actual data is prone both to human error and deliberate falsification. The theorems in inferential statistics often require the data to follow certain patterns or distributions, and that the data sample be chosen randomly, at least within subgroups. For the fields of mathematics and statistics the Retraction Watch Database lists the following retractions.     

4 instances of data falsification (0.4%), 0 instances of image falsification (0%), 2 instances of manipulated results (0.4%), 50 instances of errors in the analysis (2.9%), 125 instances of plagiarism (7.5%).

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.) Existing prevention mechanisms in the discipline seem to work well by this measure. The reasons for the retractions are various and often involve other fields that are only using mathematical tools. Intentional fraud does not appear to be among them. A number of companies have created statistical software, among them SAS Institute and SPSS (now owned by IBM). These packages eliminate the need for researchers to program or to calculate tests on their own. A researcher must just know to use the right test and the right data. These programs also provide warnings for datasets whose distributions do not meet the requirements for a particular test. In that sense they have a built-in measurement process that helps prevent errors and certain kinds of information integrity problems. Users can, of course, manipulate the data to get around distribution problems, and the programs cannot prevent something so deliberate. Statistical tools are very widely used in all fields except in the humanities and have become a core part of modern research. Unfortunately, researchers using these tools may never have taken a basic statistics course or understand the proofs. When using these packages, those without formal training in statistics and without a basic understanding of the theories behind the tests have a greater likelihood of making the kind of errors that generate false information – not deliberately but through inadvertence.

Computer science Computer science is a modern discipline that began at many universities in the mathematics department and did not establish itself as a separate entity until the 1960s and 1970s. Many early professional-level computing jobs required a degree in mathematics for lack of any good alternative. This made sense since mathematical logic was and still is important to machine operations. The scheduler for operating systems is, for example, an algorithm that mathematically determines which tasks to process in what order, often depending on how long they take. In

Disciplines 71

simple terms, purely computational tasks tend to be fast and those requiring data input tend to be slow because the data transfer rate is slower than computational processing. Mathematical algorithms play a key role in most contemporary computing activities, including computer visualisation. Deliberate falsification in computer science research (as distinct from computing applications) is rare enough that it is not a significant issue. Errors occur, of course, but one of the key tests of any computer science development is whether it works, and that provides a form of measurement for weeding out deliberate fakes. The more complex question for computing is what actually constitutes a fake. Computing systems generate information that has integrity problems all the time, but the causes tend to be the data or the programming choices. Building systems that are deliberately fake could mean building systems that deliberately give wrong answers. The frequency of unintended error-prone developments has led to systematic testing, which is far from 100% reliable but is a systematic form of integrity measurement. A common kind of deliberate falsification in applications is to insert a bug in a program so that it will give wrong answers or behave in a malicious way. One of the earliest examples is the so-called Morris worm: The incident in question was the internet worm, which was released on 2 November 1988 and written by Cornell University student Robert Tappan Morris. If you’re unfamiliar with the details, the motivation behind its creation and release has never been expressly stated by its author, but it was believed to have been exploratory, and it didn’t have an explicitly destructive payload. However, flaws in the code caused replication and reinfection to occur at a far higher rate than intended, bogging down the infected systems to the extent that they became unusable. (Furnell and Spafford, 2019) The frequency of worms and viruses has increased dramatically over the decades and has given rise to a new specialty called computer security. Generally, the goal of a virus or worm is to extract or manipulate information. Security experts worry about the potential for falsification too, especially involving elections. Here again testing is an established if imperfect tool. For computer science research (as distinct from applications using computing tools) the Retraction Watch Database lists:     

3 instances of data falsification for computer science (0.3%), 0 instances of image falsification (0%), 2 instances of manipulated results (0.4%), 33 instances of errors in the analysis (1.9%), and 162 instances of plagiarism (9.4%).

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.)

72 Disciplines

Existing prevention mechanisms in the discipline seem effective by this measure, though the number for plagiarism is high. Of these, only 12 (0.7%) come from majority English speaking countries, which may suggest that the authors had language problems.

Laboratory sciences Chemistry and physics The laboratory sciences include aspects of most of the natural sciences, particularly chemistry, physics and parts of biology (which will be discussed below). The laboratory sciences have no inherent measurement mechanisms equivalent to the testing of mathematical proofs or computer science testing, but deliberate falsification may be harder in a laboratory setting where many people are looking at the processes and at the results – at least in theory. The facts show that this is only a limited protection against integrity violations. For chemistry the Retraction Watch Database lists:     

152 instances of data falsification (14.5%), 40 instances of image falsification (8.0%), 84 instances of manipulated results (16.2%), 284 instances of errors in the analysis (16.0%), and 173 instances of plagiarism (10.0%).

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.) The figure for errors in analysis seems particularly high. Human error is a fact of life in all fields, and anyone who has worked in a chemistry laboratory is aware of how sensitive tests are to all kinds of inadvertent human mistakes. While errors contribute to false information, they are not the same as deliberate falsifications and can occur even in the best regulated lab. Measures to check for errors may need improvement. Errors in analysis are not only a problem for non-western countries. A search of instances for majority English speaking countries plus the Netherlands and Germany (countries with active research programmes that often publish in English language journals) shows 74 instances (43.5% of the total for chemistry) for less than 7.3% of the world’s population (Worldometer, 2020). Sloppiness can be a problem anywhere. In cultures with strong social hierarchies even mild complaints about perceived irregularities within a lab group may be dangerous to career prospects and can become an economic incentive for silence. Lab tasks may also be divided in such a way that responsibilities do not overlap, and constant time pressure can mean there is little cross-checking. A lab member who lacks skills or gets wrong results could be tempted to falsify some portion of the data to disguise the failure. Ideally the head of the lab would check everything, but that takes time, and signs of distrust can undermine cooperation.

Disciplines 73

For physics the Retraction Watch Database lists:     

21 instances of data falsification (2.0%), 12 instances of image falsification (2.4%), 30 instances of manipulated results (5.8%), 120 instances of errors in the analysis (6.8%), and 105 instances of plagiarism (6.1%).

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.) The number for errors in the analysis is noticeably higher than for false data. One of the differences between physics and the other lab sciences is the reliance on large-scale hardware such as the Large Hadron Collider at CERN, where machinegenerated data are shared among many research groups. The scale and the amount of sharing makes falsification harder. This difference can explain the relatively low amount of data and image falsification in physics as compared to other lab sciences. A factor that offers some protection against falsification in the natural sciences is the expectation that results must be reproducible. One experiment often builds on others in the natural sciences. Replication is widely discussed as a test for reliability and it can serve as an effective measurement tool when multiple replications take place. Replication is, however, not all that simple. In a chemistry experiment, for example, temperature, the exact purity of ingredients, precise amounts and the timing of sequences, plus an array of other factors can all play a role in whether results come out as expected. This is one of the reasons why lab notebooks are especially important, since space limitations in a journal article may not allow enough detail for exact replication.

Biology The Retraction Watch Database includes four biology categories: general, cellular, molecular, and cancer. It also has biochemistry, genetics, microbiology, plant biology (botany) and zoology. This makes it hard to offer a single set of statistics for all of biology, and the overlap with medicine complicates the problem further. The statistics for biology will consistently exclude cases that are also marked as involving medical research using the Boolean commands “AND NOT Medicine*.” For biology in its most generic form (“bio* and not Medicine*”), which includes biostatistics and biochemistry, the Retraction Watch Database lists:     

291 257 137 489 256

instances instances instances instances instances

of of of of of

data falsification (23.1%), image falsification (49.4%), manipulated results (6.7%), errors in the analysis (19.5%), and plagiarism (10.2%).

74 Disciplines

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.) The numbers are in general a good bit higher than in physics or chemistry, and the figures for image falsification are particularly high, but readers should realise that images play a more important role in biomedical research than in many other fields. Other than human observation (see Chapter 3) and a few tools like InspectJ and ImageJ, biology has no systematic means for measuring image manipulation. The importance of replication is one reason why laboratory books play a key role in the natural sciences, and especially in chemistry and biology. Researchers are supposed to note down exactly what they are doing when they are doing it. Lab books were traditionally written in ink on paper with the idea that nothing can be or should be changed. If someone writes a number wrong, the correction should be clear and the original should not be erased. Over time electronic lab books have grown in acceptance, especially for student work, perhaps because students are more accustomed to using electronic tools than older researchers: … it appeared that the students were generally more diligent in annotating gel photographs in the wiki than previous students had been when using paper copies and their methods were more clearly recorded from the very beginning of the project. (Badge and Badge, 2009) Electronic lab books have, however, also met some resistance: Practicality was one objection to an electronic lab book: “it is hard to type with gloves on” (B-S1) and there are concerns about contamination from spills. A-S3 did not want electronic lab books because writing on paper is done at the same speed as thought whereas typing is not. There is also a concern about a lack of traceability if files can be altered (comment: time stamps would help deal with this). Currently the ELNs on the market will not be accepted by courts in IP cases but with the change to U.S. patent law it “perhaps might be a good time to look at them again” (A-IPM). (Calvert, 2015) At one time there was a concern that electronic records would be hard to preserve, but today the concern about the fragility of paper is at least as great, and digital lab books are easier to make public. Anything that enhances transparency can help to protect against false information.

Medical sciences Retraction Watch focuses heavily on the medical sciences, and integrity problems with medical information can have life-threatening consequences. Medical science belongs logically to the natural sciences, but organisationally remains separate in

Disciplines 75

most universities because of different oversight rules and because of a very different financial structure. The training can be different too. In North America students earning a Medical Doctor degree are not traditionally required to write a researchbased thesis, though about 138 programmes now offer joint MD/PhD degrees according to the Association of American Medical Colleges. In other words, some physicians doing medical research may not have formal research training. In the UK the structure is more complicated. The Bachelor of Medicine (MB) degree is the normal degree for a physician in the UK: “To receive an MD rather than an MB, students must complete a thesis and receive some additional training (e.g., research training) over and above what is required for the MB” (SquareMouth, 2004). In practice only those with research training are likely to want to write academic articles, but there is no guarantee that others will not try to publish and may succeed in the less prestigious journals. In Germany students earning a doctorate in medicine must write a thesis based on research. Some are, of course, more enthusiastic about doing research than others. The number of cases of retractions of pure medical research in all branches – excluding biology – in the Retraction Watch Database was 3072 or 14% of all the cases listed for all subjects. Nonetheless it may make better sense with medical research not to exclude biology from the search, since biological processes are often integral to medical research, while the reverse (medical research being used to support biology) is less likely to be true. For “medicine*” (including all medical subfields) the Retraction Watch Database lists:     

603 211 285 862 952

instances instances instances instances instances

of of of of of

data falsification (47.8%), image falsification (40.6%), manipulated results (47.7%), errors in the analysis (41.1%), and plagiarism (44.9%).

(Note that the percentage reflects the relative number of retractions per category for all of the natural sciences.) All of these numbers are high compared to the other natural sciences. The reasons are likely complex and may be due partly to the fact that medicine is simultaneously a profession as well as one of the branches of the natural sciences, and separating the two aspects is not always easy, especially since many pharmaceutical companies employ medical researchers to help develop and to test new products. The exact number of physicians in the pharmaceutical industry is difficult to ascertain. In 2013, there was an estimated 2000 pharmaceutical physicians in the UK [7]. A similar number was reported for Germany in 2000, amounting to merely 0.7% of the total number of physicians in the workforce at the time… (Sweiti et al., 2019)

76 Disciplines

In some ways the situation is no different than for chemists or other natural scientists working for industry, except for a strong public expectation that professional ethics and legal regulation will make them even more concerned to act in the public interest when developing or testing new drugs. That expectation can be in direct opposition with the (short term) economic interests of those working for pharmaceutical companies. No company wants a drug whose side-effects will hurt their reputation, but many want desperately to be the first to market with a new product. The pressure to get out an effective COVID-19 vaccination is only one recent example of cases in which there is social and political pressure to make some form of the medicine available quickly. One notorious example of a drug that caused serious damage was Thalidomide from the German firm Chemie Grünenthal. Thalidomide was used to ease some side effects of pregnancy but was found to cause birth defects. Grünenthal put Thalidomide on the market in 1957 as an over the counter drug. The company claimed its sedative was “completely safe” and aggressively promoted their “non-toxic” drug. Between 1957 and 1962 Grünenthal and its licenced distribution companies sold the “wonder drug” in more than 47 countries under different trade names. Thalidomide soon became the company’s bestseller. What the public did not know is that Grünenthal had no reliable evidence to back up its claims that the drug was safe. They also ignored the increasing number of reports coming in about harmful side-effects as the drug was being widely used. (Zaritsky, 2016) Thalidomide is certainly an extreme example, and is now over 60 years in the past, but it remains as one of the reasons for rigorous testing. Many new drugs in fact come onto the market with few side effects and a high degree of effectiveness, a fact that may mask the risk. Changes in the US law have allowed medical advertising to become more common in recent decades. Pharmaceutical sales staff visit physicians’ offices to recommend new drugs and give out free samples. Physicians have no reason not to believe that the controls for testing and drug approval are adequate, and it is hard for them to make informed judgments about the tests unless they take the time to read all of the literature. Testing standards for medical products are far from simple, even when the results are published in academic journals. John Ioannidis (2018) wrote in JAMA Network, a publication of the American Medical Association: P values and accompanying methods of statistical significance testing are creating challenges in biomedical science and other disciplines. He refers to an article that casts doubt on the validity of “many of the claims,” but the link to that article is unfortunately not available to the public. Ioannidis (2018)

Disciplines 77

goes on to report a recommendation for “lowering the routine P value threshold for claiming statistical significance from .05 to .005 for new discoveries.” Many medical researchers already use the more stringent P value standard, of course, and that alone does not eliminate false information. Social prejudice can also play a role in false results. For decades testing was done primarily on male subjects, especially for cardiovascular problems since men appeared to be especially vulnerable. Today there is a greater awareness of the need for broader testing because of hormonal differences and the seemingly obvious fact that women’s bodies are different. A further reason why the measurement of medical information integrity is hard is because trained and licensed members of the medical profession tend to play a limited role in the testing process. When medical professionals see problems with new drugs, they typically need to convince corporate administrators about possible risks with a product that may already have cost millions in resources and personnel investment. The administrators have different priorities and may lack the scientific knowledge to evaluate the risks reliably. It might help if the results of more trials were available, but the evidence suggests that many are not published. Medicine has many specialties in the Retraction Watch Database, and they are worth investigating individually because the numbers of problem instances are quite various. The following specialties tend to be high-money areas where external pressures could play a role. Here are the totals for instances in each subfield of the five primary integrity categories for investigation (data falsification, image falsification, manipulated results, errors in the analysis or plagiarism) plus a percentage based on the combined total of all medical retractions (“Medicine*”) for the five categories being investigated (2773 instances):     

490 449 226 199 168

instances instances instances instances instances

in in in in in

Oncology (17.7%), Cardiology OR Cardiovascular medicine (16.2%), Anaesthesia/Anaesthesia (8.2%), Drug Design (7.2%), and Infectious Disease (6.1%).

(Note that the percentage reflects the relative number of retractions per category for all of the categories in medicine.) It is clear that the first two specialties have the highest number of retractions. They may also have the highest number of publications and the most journal space devoted to them, so the numbers need to be viewed with caution. It is worth noting that for oncology, the percentage of retractions with unreliable images and image falsification is high compared to other specialties: 20.8%. This suggests that researchers and journals in this specialty may want to focus on checking images more carefully. Anaesthesia/Anaesthesia has the next highest percentage, and the reason for that is a very large number of retractions involving data problems: 178 instances (30.9%). Other percentages for the specialties are low. Only a person with subject knowledge can offer a reliable explanation for why this number is high. It may

78 Disciplines

involve problems with machine-produced data rather than be a sign that researchers in the specialty are unusually prone to creating fake information. This metric is in any case a signal for journals and researchers to check the data carefully.

Social sciences The term social sciences did not exist before the nineteenth century, even though thinkers have discussed social science ideas since Aristotle wrote his Politics. There is also no universal consensus about which fields belong to the social sciences. The word science in the case of the social sciences had the same meaning originally as the French term “la science” which included not only the natural sciences, as in modern English, but all forms of knowledge as in its Latin root “sciencia.” The social sciences aim fundamentally at examining all forms of “knowledge” about human society. The term “social science” acquired ideas and elements from the methodologies of the natural sciences and aimed over time at the same level of precision and reproducibility that disciplines like chemistry and physics were demonstrating. For the purposes of this chapter, the following disciplines will be classed with the social sciences: sociology, political science, economics, (social) anthropology including ethnography, geography, and psychology. The first two are almost universally considered social sciences. Economics arguably belongs logically to the social sciences, even though many economics departments want to maintain their distance from what they consider softer and less exact fields. Anthropology and ethnography likewise fundamentally examine human society, but some branches of physical anthropology have more to do with biology and some aspects of ethnography take on a humanities flavour (especially those using the methods of Clifford Geertz). Geography builds on geology, which is a natural science, and psychology has a strong relationship to biology and medicine, even though one of its founding figures, Sigmund Freud, wrote broadly about social issues: e.g. Das Unbehagen der Kultur (Civilization and its Discontents) (1930). The list also leaves out disciplines that are sometimes classified as social sciences, in particular history, which addresses many of the same questions as political science, sociology, and even economics, though with different methods and perspectives. The number of journals for each of the social science disciplines is given below along with the percent of all the journals for this set of disciplines. While the number of economics journals is larger than for most other disciplines, there is no dominance equivalent to medicine in the natural sciences. Collection and classification differences are more striking among these social science fields, with large variance in anthropology/ethnography, political science and psychology. A search for “any language” in political science and geography resulted in 3,305 and 4,171 journals respectively. Since these numbers were significantly higher than the Harvard totals, some further investigation was done. Oxford included topics like disaster studies, refugee studies, or migration studies under geography, and management and economics was often included under political science, depending on the language. These kinds of discrepancies in data are common enough in

Disciplines 79

TABLE 4.4 Number of social science journals by discipline

DISCIPLINE Anthropology/ ethnography Political science Geography Psychology Sociology Economics TOTAL

HARVARD

%

OXFORD

%

2,381

14.1%

1,048

8.4%

2,243

13.3%

1,344*

10.8%

1,767 2,670 2,591 5,237 16,889

10.5% 15.8% 15.3% 31.0% 100.0%

1,571* 1,892 2,461 4,140 12,456

12.6% 15.2% 19.8% 33.2% 100.0%

(* Note: This is a summary of all languages, not a search for “any language”)

databases where there is wide latitude in the encoding rules. Oddly enough, searching specifically for each language in the Oxford catalogue excluded many of the broader categories from the result set. The numbers presented in Table 4.4 used this search strategy. Since the number of social sciences journals is only about three quarters as large as for the natural sciences, looking at the total number of retractions helps to keep perspective. For all of the social science disciplines in Table 4.4, the Retraction Watch Database lists:     

95 instances of data falsification (7.8%), 4 instances of image falsification (0.8%), 34 instances of manipulated results (1.6%), 143 instances of errors in the analysis (7.3%), and 223 instances of plagiarism (10.5%).

(Note that the percentage reflects the relative number of retractions per category for all subjects.)

Sociology Sociology, like many social sciences, has many founders. An early and defining work was Auguste Comte’s multivolume Système de politique positive, ou Traité de sociologie, instituant la religion de l’humanité (often translated as System of Positive Polity, or Treatise on Sociology, Instituting the Religion of Humanity) (1851–1854). Nonetheless, this does not exactly describe the field as understood today: All of Comte’s work aims at the foundation of a discipline in which the study of society will finally become positive, scientific. His idea of sociology is not quite the one we are used to today… (Bourdeau, 2020)

80 Disciplines

For Comte, the disciplinary boundaries were unclear, perhaps even problematic: If sociology merges at places with philosophy, it is also closely related to history. Comte was thus led to take a stand on a question that deeply divides us today: how should the relations among philosophy of science, history of science, and sociology of science be seen? In the Course, history is at once everywhere and nowhere: it is not a discipline, but the method of sociology. (Bourdeau, 2020) Comte’s mid-nineteenth century view explains something of the tendency in modern sociology to overlap intellectually and methodologically with other disciplines, which complicates having any simple means for measuring integrity violations. Even so, the level of retractions is low relative to the natural sciences. For “sociology” the Retraction Watch Database lists:     

26 instances of data falsification (27,7%), 1 instance of image falsification (25.0%), 12 instances of manipulated results (36.4%), 38 instances of errors in the analysis (28.4%), and 64 instances of plagiarism (31.1%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) It is worth remembering that Retraction Watch classifies some topics with sociology that also belong to other fields, which means that the numbers will not add to 100% across all social science fields. A retraction in the Journal of Personality and Social Psychology is classified as sociology, as well as a retraction from the Journal of Organisational Ethnography. Removing retractions co-classified as psychology from the search reduces the number of retractions for most searches and probably gives a truer picture for sociology:     

3 instances of data falsification (3.2%), 1 instance of image falsification (25.0%), 7 instances of manipulated results (21.2%), 38 instances of errors in the analysis (21.6%), and 58 instances of plagiarism (26.7%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) The important fact here is that the overlap between fields in the social sciences is considerable. The important change is in the number of instances of data falsification.

Disciplines 81

Political science Political science can claim roots back to the philosophical works like Plato’s Republic, and political ideas played a role in philosophical and historical writing up to and through the nineteenth century. Its existence as a separate academic field was established earlier in North America than in Europe. The Political Science Department, University of Minnesota (n.d.) claims for example: The University of Minnesota is home to one of the first political science departments in the United States: the Department of Political Science, founded in 1879, is almost as old as the University itself. The term did not come into common use in Germany until after the Second World War, and the “Deutsche Vereinigung für die Wissenschaft von der Politik” (“German Association for the Science of Politics”) was first established in 1952 (Bleek, 2020), even though discussions of political ideas and structures were common among both legal scholars and historians. Based on a common methodological approach, political science is sometimes classified with sociology. For “political science” the Retraction Watch Database lists:     

1 instance of data falsification (1.1%), 1 instance of image falsification (25.0%), 1 instance of manipulated results (3.0%), 2 instances of errors in the analysis (1.5%), and 30 instances of plagiarism (14.6%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) Interestingly, removing sociology from the search for plagiarism cases in political science reduced the number only to 22 instances. Of those cases, only four come from institutions in primarily English speaking countries (US, UK, Canada, Australia, or New Zealand), which suggests that part of the plagiarism problem could be with authors who are not native English speakers and may have looked for phrases or texts to supplement their language deficit. A comparable search for sociology (and not psychology) gives 11 instances from the US and UK out of the 54 total. Today there is no reason for publishers not to run a plagiarism check on articles before publication, and yet a number of leading publishers are on the list.

Economics Discussions of economic issues, as with other social sciences, reach back at least to the ancient Greek philosophers, and of course numerous writers contributed substantially to modern economic thought, including well-known names like Adam Smith, David Ricardo, Karl Marx, John Maynard Keynes and Milton Freedman.

82 Disciplines

Two significant developments helped to define the academic discipline. One was the founding of the London School of Economics (LSE) in 1895 and the other was the establishment of the University of Chicago School of Business (today the Booth School of Business) in 1898. The philosophical origins of these institutions diverged widely. Fabian Socialists including Sidney and Beatrice Webb were among the founders of LSE, while the Chicago School had a more explicit business orientation. While the Booth School is separate from the Economics Department of the University of Chicago, which dominates the list of Nobel Prize winners in Economics, some influence is likely. As a discipline economics is strongly mathematical in its orientation but draws nonetheless on other aspects of the social sciences, especially behavioural economics. For “economics” the Retraction Watch Database lists:     

8 instances of data falsification (8.5%), 0 instances of image falsification (0%), 3 instances of manipulated results (9.1%), 20 instances of errors in the analysis (14.9%), and 54 instances of plagiarism (26.2%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) As in other social science fields, about 70% of plagiarism instances come from authors who are probably not native English speakers, based on the country of record in the Retraction Watch Database. The amount of data falsification is otherwise low, perhaps because this highly mathematical field benefits from the standard use of publicly available data sources, which greatly increases transparency. Replication or at least partial replication studies also take place regularly.

Anthropology One of the problems with discussing anthropology as a field is that the definition varies widely from country to country and specialty to specialty. Physical anthropology is close to biology and medicine in its focus on human bodies and often comes into popular consciousness when dealing either with hominid ancestors or with medical forensics. Social anthropology focuses on culture and aspects of culture such as language. There is no broad agreement on founders or founding institutions. For many in Europe, the linguist Ferdinand de Saussure laid foundations that Claude Lévi-Strauss and others later used to build the field. In the English speaking world Franz Boas (who worked mostly in the US) and Bronislaw Malinowski (in the UK) played leading roles. One of the dominant names in contemporary social anthropology (also called ethnography) is Clifford Geertz, who emphasised careful observation and what he called “thick description.” Despite this lack of unity in the field, it is reasonable to say that observation and description play a strong role in data gathering, which means that the data

Disciplines 83

themselves are often unique and hard to measure against multiple sources, since each observation is in a sense unique. Quantitative analysis plays a secondary role. This is also a reason why the field is often classified with the humanities. For “anthropology” the Retraction Watch Database lists:     

1 0 0 1 9

instance of data falsification (1.1%), instances of image falsification (0%), instances of manipulated results (0%), instance of errors in the analysis (1.5%), and instances of plagiarism (4.4%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) The list of retractions suggests that Retraction Watch mostly focuses on physical anthropology and that they classify social anthropology as a branch of sociology.

Geography Geography is likewise a complex field because it has a strong relationship to the natural science field of geology on the one hand and to social anthropology on the other. The Oxford Library Catalogue included many topics that have a broadly geographic component, but are not a classic component of geography, as noted above. Geography itself does not appear as a topic in the Retraction Watch Database, though geology does. In order to include any geographic elements, the search term “Geo*” was used. The results are as follows:     

7 instances of data falsification (7.4%), 0 instances of image falsification (0%), 2 instances of manipulated results (6.1%), 18 instances of errors in the analysis (13.4%), and 33 instances of plagiarism (16.0%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) Only 10% of the instances of plagiarism came from countries where English is the majority language.

Psychology Psychology belonged to philosophy for many decades at many European universities. The ancient philosophers speculated about human thought, and in modern times Immanuel Kant’s works on reason and perception formed a base. Similar philosophical considerations in Asia went back to the time of Confucius. Any philosophic system of thinking is arguably related to psychology in the broadest sense. In the twentieth

84 Disciplines

century psychology developed several specialty branches, including social psychology, which borders on sociology, clinical psychology – with a relationship to its medical cousin psychiatry – and cognative psychology. Many modern psychologists also take an interest in how the brain functions, which brings them nearer to the bio-medical world. The common element in all these branches is how thinking processes work rather than any common methodological base or source of information. For “psychology” the Retraction Watch Database has (remember that psychology and sociology often have duplicate classifications):     

81 instances of data falsification (86.2%), 3 instances of image falsification (75%), 20 instances of manipulated results (63.6%), 95 instances of errors in the analysis (70.9%), and 49 instances of plagiarism (23.8%).

(Note that the percentage reflects the relative number of retractions per category for all of the social sciences.) The high numbers and high percentages are striking, but of the 80 instances of data falsification, 52 (55.3% of the social sciences) came from Diederik Stapel, who admitted doing fraud on a massive scale. Another five came from James Hunton, a professor of accounting who used enough social psychology methods for Retraction Watch to categorise him there. Removing these two outliers from the total brings the number of instances of data fabrication down to 23 (24.5% of the social sciences), which is still high for a discipline with less than 16% of the journals in the social sciences but is more plausible. There is no clear reason why psychology should have a greater ongoing problem with data falsification than related fields. The softness of observational data collection may contribute, plus the fact that the datasets used for analysis are not typically publicly available, sometimes for data privacy reasons. Anonymisation should solve such concerns for individuals (though perhaps not for companies), but many scholars distrust the reliability of anonymisation, and anonymisation takes time and effort without bringing any direct rewards. Statistical analysis of seemingly overly perfect results can help, but the conditions for data gathering can be complex, and implausible results can in fact occur. Since the Stapel scandal the Netherlands has increased its scrutiny of the field. Only one retraction for data falsification occurred in the Netherlands in 2012, and none since then. Before 2012 there were 57. This suggests that increased checking and caution on the part of those in a discipline can make a difference.

Humanities The humanities are ancient as scholarly subjects, philosophy and history especially, and works from the ancient Greek world continue to be used as examples (rather than merely as sources) in western programmes. Historical writing in China goes

Disciplines 85

back at least to the second century BCE, and the philosophical works of Confucius to the fifth century BCE. For political and cultural reasons these early Chinese authors are not well-regarded today, but are not completely without some subtle influence in contemporary Chinese culture. The humanities were the original home of many of the ideas of the social sciences long before any notion of social science existed, and those intellectual roots remain. Today humanities fields also borrow from the social sciences, but generally without the same emphasis on method. An important characteristic of most humanities fields is that the data, often texts, are publicly available. Another characteristic is that commercialisation plays a minimal role, except for history books, which continue to have a broad reading market. For the purposes of this chapter, the following disciplines will be classed with the humanities: philosophy, history, literature, linguistics and the arts in a broad sense (including architecture, music, and, in modern times, film studies). The definition of these fields has remained generally relatively stable over time. Universities have had departments of each of these humanities since modern universities began to take form in western Europe. Religion was also once a part of the humanities, often in conjunction with philosophy, but today even where religion remains part of the university curriculum, it is typically housed in a separate faculty or school. Retraction Watch has a category for religion, but the meaning of falsification in religious studies is too challenging for this book and the cases appear mostly to involve other disciplines that are studying religions. The number of humanities journals is significantly higher than the number of social science and natural science journals, largely because of the very large number of history journals, which alone account for 34.5% of the total 110,028 in the sample so far from Harvard and 40.5% in the sample of 84,529 from Oxford. The reader should remember that the categories are not comprehensive and were chosen to match categories in the Retraction Watch Database and do not represent every possible journal in the collections of the two libraries. The number of journals for each of the humanities disciplines is given below, along with the percent of all the journals for this set of disciplines. Since the number of humanities journals is large relative to the 16,889 (Harvard) or 12,456 (Oxford) social science journals, a larger number of retractions might reasonably be expected, but the total number in the Retraction Watch Database is only 27.6% as large as those for the social sciences (130 vs 471). For all of the humanities disciplines in Table 4.5, the number of instances in the humanities are:     

6 instances of data falsification (0.5%), 0 instances of image falsification (0%), 1 instance of manipulated results (0.17%), 6 instances of errors in the analysis (0.3%), and 125 instances of plagiarism (5.0%).

(Note that the percentage reflects the relative number of retractions per category for all subjects.)

86 Disciplines

TABLE 4.5 Humanities journals

DISCIPLINE

HARVARD

%

OXFORD

%

History Literature and linguistics Philosophy Arts Architecture Music TOTAL

37,950 20,089

53.2% 28.2%

34,261 11,210

60.7% 19.9%

4,970 8,301 2,810 2,818 71,307

7.0% 11.6% 3.9% 4.0% 100.0%

4,384 6,628 2,281 2,856 56,483

7.8% 11.7% 4.0% 5.1% 100.0%

Reasons why these numbers are low vary somewhat from field to field.

Philosophy Philosophy is in some sense like mathematics because it relies more on logic – one of the specialties in many philosophy departments – than on empirical data, which can be falsified or distorted. In principle a philosopher can discuss anything. Philosophy is not a field with methodological preferences in the usual sense of standard ways of gathering and analysing data, but one with a strong emphasis on ways of thinking systematically and clearly. Language plays a particularly important role in philosophic works because language is the primary tool for expressing ideas. Some languages adapt better to levels of abstraction. In German, for example, it is easy to turn a verb into an abstract noun, which makes German philosophers such as Immanuel Kant hard to translate and sometimes hard even for native readers to follow. Ethics is the part of philosophy that most often appears in discussions of integrity issues and often appears in the deliberations of other disciplines. Professional philosophers sometimes despair at how untrained philosophers use concepts from their field, but it is also in some sense a sign of its importance. For “philosophy” the Retraction Watch Database has:     

1 instance of data falsification (16.7% of the humanities), 0 instances of image falsification, 0 instances of manipulated results, 1 instance of errors in the analysis (25% of the humanities), and 23 instances of plagiarism (19.2% of the humanities).

(Note that the percentage reflects the relative number of retractions per category for all of the humanities.) The one instance of data falsification is from an article by Diederik Stapel called “Terror Management and Stereotyping: Why do People Stereotype When Mortality is Salient?” in the Personal and Social Psychology Bulletin (2008). The co-authors

Disciplines 87

appear not to have any affiliation with a university-based philosophy department. Retraction Watch appears simply to have made an error in its attribution to philosophy. It should be unsurprising that philosophy has no real data falsification problems, since it is more like mathematics in being non-empirical. The single instance of an analysis error is by someone who is actually in philology, which is broadly related to philosophy, but is a different discipline in the modern world. Plagiarism is more of a problem for philosophy, but a search of countries where English is natively spoken finds only 4 instances of plagiarism out of the 22. The numbers do not change if European countries are included where the use of English is widespread, such as Ireland, the Netherlands and Germany. This raises the possibility again that non-native speakers are seeking examples in correct English.

History As the number of journals suggests, history is itself a very large field with many local history journals and likely some authors who do not belong to academic departments but write articles about local history for the fun of it. Some are fairly good, but no discipline can be held responsible for the errors of amateurs. The history “profession” goes well back into ancient times in the west and in Asia. Historians were in a real sense the original social scientists, since they took an interest in the whole of the society and how it worked. Historians have no clear method or set of methods, as is typical of modern social sciences. Historians borrow whatever tools fit the issue and are conveniently at hand. What unites historians is their perspective on the past. The boundaries of the field are often extremely porous. An example is Time on the Cross: The Economics of American Negro Slavery (1974) by Robert Fogel and Stanley Engerman. Fogel is often described today as an economist and trained as one, but he was originally a member of the history department at the University of Chicago and eventually came back to the University’s School of Business. He won the Nobel Prize in Economics in 1994 along with Douglass North. Fogel was a founder of “cliometrics“ and a primary proponent of using quantitative methods in history. There are disputes about whether Fogel and Engermann did their statistics correctly and concerns about their data (Kolchin, 1992). There was no question of fraud or manipulation, but about interpretation and selection. Typically, the information sources for history are in libraries and archives. Depending on the availability of works in a library and access to the archives, the key data sources are publicly available. This does not mean that every historian has access to every information source, especially for contemporary history where data protection can play a role. Historians may also fail to categorise data correctly. Nonetheless the degree of transparency is generally high. For “history*” (including all subfields) the Retraction Watch Database has:  

2 instances of data falsification (33.3%), 0 instances of image falsification,

88 Disciplines

  

0 instances of manipulated results, 2 instances of errors in the analysis (33.3%), and 53 instances of plagiarism (42.4%).

(Note that the percentage reflects the relative number of retractions per category for all of the humanities.) The one instance of an error in the analysis comes from a journal called Renewable and Sustainable Energy Reviews, and the author is from a department of “Mechanical Engineering,” which might reasonably have excluded it from history as a discipline. The two cases of data falsification are more relevant. One is in a journal called Canadian-American Slavic Studies by an author from Porto, and the second is in a journal called Ethnic and Racial Studies by an author from the Netherlands. Both journals represent narrow specialty topics. The plagiarism statistics are roughly proportional to the number of journals in the discipline. Of the 51 instances, only 7 (13.7%) come from countries that are predominantly English speaking. The plagiarism cases in history include some that seem unlikely, such as Advances in Structural Engineering, Journal of Seismology, Annals of General Psychiatry, Journal of Markets and Morality, American Journal of Public Health, Bordering Biomedicine, American Association of Neurological Surgeons, Marine and Petroleum Geology and Sport in Society. These nine journals probably included some references to historical information by non-historians. Another 13 of the plagiarism retractions in history came from a single source, a journal called Al-Masaq Journal of the Medieval Mediterranean, all by a single author. Removing these 22 from the total still leaves 29 (24% of the humanities), which still suggests a problem, but a manageable one for such a large field.

Literature and linguistics Literature itself is old, but the study of literature in contemporary languages is comparatively recent. Philology and the study of classic and pre-modern texts was common in medieval times. As recently as the twentieth century J.R.R. Tolkien emphasised philology in his role as a professor of Anglo-Saxon at Oxford. In medieval times the study of “rhetoric” played a role in the “trivium,” and the concepts of rhetoric continue to be part of modern literary studies, e.g. the Rhetoric of Fiction by Wayne C. Booth (1961). Philology emphasised the historical basis and development of language, while scholars of literature today look much more broadly at issues like social impact, internal structure, and the meaning of a work. Linguistics inherited some of the historical emphasis of philology but looks more at the internal structure of language (in some sense, an inheritance from the medieval study of grammar). Social linguistics examines the social meaning of language. Computer linguistics uses computational tools as an element of the analysis, somewhat analogous to Cliometrics. Literature is an empirical field insofar as the core data are the literary works themselves, and often the commentaries about them, most of which are findable in

Disciplines 89

libraries. Interviews, letters, and archival materials often play a role as well, and may be less publicly available, in part because of modern data protection issues. For literature and linguistics, the Retraction Watch Database includes:     

2 instances of data falsification (33.3%), 0 instances of image falsification, 0 instances of manipulated results, 0 instances of errors in the analysis (0%), and 48 instances of plagiarism (38.4%).

(Note that the percentage reflects the relative number of retractions per category for all of the humanities.) Of the 2 instances of data falsification, one is in a journal called Psychological Science, and another in a journal called Cognition. Neither journal would be a plausible venue for most literary scholars. Most likely the problem articles only involved references to literature. The percentage of plagiarism is high, but not completely out of proportion to the number of journals. In other fields, the number of instances of plagiarism among native English speakers has been low, but that seems to be less true for literature and linguistics, where 28 instances (58.3%) came from countries that are predominantly English speaking. This does not suggest that the problem is primarily people addressing their own language deficit by seeking and copying phrases.

Arts The arts are a relatively new area for academic study, and not every university has programmes in the arts. Berlin has a Universität der Künste (UdK or University of the Arts) as one of its four research universities. UdK shares some physical resources with the Technical University (TU-Berlin) and offers a very wide-ranging programme that includes music, visual arts, architecture, theatre, dance, teacher training, and even “design and computation” (Universität der Künste, n.d.). This is unusual, but gives some sense of the range of the academic research in the arts. Traditional research in some of these areas is found at many universities, such as music history and art history, but some other aspects are found only in specialty schools that may or may not have direct links to higher education. Retraction Watch recognises a number of arts areas, including architecture, film studies, music, and the arts in general. For these subjects the Retraction Watch Database has:     

1 instance of data falsification (16.7% of the humanities), 0 instances of image falsification, 0 instances of manipulated results (0% of the humanities), 3 instances of errors in the analysis (50% of the humanities), and 15 instances of plagiarism (12.0% of the humanities).

90 Disciplines

(Note that the percentage reflects the relative number of retractions per category for all of the humanities.) The categorisation is again somewhat flawed. All three instances of errors in the analysis have to do with medical or psychological articles that appear to use music or images and are also classified with other categories. The single instance of data falsification is actually about photographs but focuses on how they were used in newspaper reporting. The number of plagiarism instances is not large, but the discipline taken together with all of its branches is small. A search of the Retraction Watch Database for plagiarism in these disciplines in English speaking countries yields no instances, even though several appear to have US roots. One author, Robert James Cardullo, accounts for 6 of the 13 instances of plagiarism, mostly related to film studies.

Professional schools The professional schools have grown in size and importance since the Second World War. Medicine is traditionally one of the professional schools, but in terms of research fits better with the natural sciences. Business could also be classified with Economics, but contemporary business schools cover far more than economic issues and draw on many disciplines. Communication and education both have a strong relationship with the social sciences but have kept their own identity. Law is traditionally independent, while drawing intellectually on most disciplines, depending on the legal issues involved. Library and information science (LIS) could be included in this group, but it is not a category in the Retraction Watch Database and a search for journals with “*libr*” in the title turns up no relevant journals. Presumably Retraction Watch would not deliberately exclude retractions in the field from the database, which suggests that no LIS articles were flagged or that they were classified under other subjects depending on the topic or methodology. The number of journals for each of the professional school disciplines is given below, along with the percent of all the journals for this set of disciplines. For all of the professional schools in Table 4.6, the number of instances are:     

132 instances of data falsification (10.5% of all subjects), 17 instances of image falsification (3.3% of all subjects), 52 instances of manipulated results (8.7% of all subjects), 225 instances of errors in the analysis (10.7% of all subjects), and 446 instances of plagiarism (17.8% of all subjects).

(Note that the percentage reflects the relative number of retractions per category for all subjects.) The striking figure here is the percentage of errors in analysis. This could be a training issue, or could be a measure of complexity.

Disciplines 91

TABLE 4.6 Journals for professional school disciplines

DISCIPLINE

HARVARD

%

OXFORD

%

Business Engineering Education Law Communications / journalism TOTAL

9,737 5,437 18,337 37,915 3,405

13.0% 7.3% 24.5% 50.7% 4.6%

9,047 5,452 10,837 14,196 2,511

21.5% 13.0% 25.8% 33.8% 6.0%

74,831

100.0%

42,043

100.0%

Business Business programmes existed in the first half of the twentieth century in the US but were not a clear ticket to a well-paid job except at a few elite universities. Harvard established its Business School in 1908, a decade after Chicago and 13 years after the London School of Economics, which has many of the characteristics of a business school programme. Dartmouth’s programme goes back to 1900. A few programmes are older, such as Wharton at the University of Pennsylvania (1881), but with few exceptions business was not accepted as a serious academic and research subject until well into the second half of the twentieth century, when the programmes began incorporating social science methods and became more rigorous and more quantitative. In continental Europe business programmes per se are mostly grouped with economics, even though MBA (Master of Business Administration) programmes are gaining popularity. The Retraction Watch Database shows the following numbers for all business subjects (“business*”):     

42 instances of data falsification (31.8%), 2 instances of image falsification (11.8%), 15 instances of manipulated results (28.8%), 78 instances of errors in the analysis (33.8%), and 119 instances of plagiarism (26.7%).

(Note that the percentage reflects the relative number of retractions per category for all of the professional school disciplines.) Since Business has at most 21.5% of the professional school journals in the Oxford library catalogue (and only 13.0% at Harvard), these numbers seem disproportionately high in every area except image falsification. In this case it is not predominantly a problem in universities outside the English speaking world. 78.6% of the data falsification in business is from English speaking countries, 70.6% of manipulated data, and 47.5% of the errors in analysis. In other words, the English speaking world cannot use the excuse that the problem comes from abroad. The

92 Disciplines

actual cause of the problem is harder to determine, and in at least one area – data falsification – the numbers are also misleading, because one person, James Hunton, accounts for 26 instances (61.9% of the total for business). This is not so obviously true for other integrity issues. Since the academic discipline of business has no single research method, and no single public source of data, it is hard to find reliable mechanisms for measuring information integrity. One tool that may be at least partly effective is the preprint server SSRN (formerly known as the Social Science Research Network), which allows researchers to post draft papers and get comments. It would be interesting to know what proportion of SSRN papers have retractions versus those not posted there for comment.

Engineering Engineering as a profession is ancient beyond recording, but as an academic field it is comparatively new. The industrial revolution in England built on the work of people tinkering with machines and power sources like water and later steam. These tinkerers were in fact engineers in a world that gave them no professional recognition or credentials. The English language Wikipedia article on “Engineering” (Wikipedia, 2021b) credits Josiah Willard Gibbs as receiving the first doctorate in engineering from Yale in 1863, based on a book by Wheeler (1951). Whether or not this attribution is entirely correct, it does show that engineering began to be a recognised academic field in the later nineteenth century. Engineering is generally highly mathematical, but nonetheless extremely broad since the term covers everything from physical engineering and computer engineering to chemical and sometimes biological engineering. No single method is dominant, and no single source of information enables comparison, measurement or transparency. The Retraction Watch Database shows the following numbers for all engineering subjects (“engineering*”):     

86 instances of data falsification (65.2%), 14 instances of image falsification (82.4%), 34 instances of manipulated results (65.4%), 132 instances of errors in the analysis (58.7%), and 223 instances of plagiarism (50.0%).

(Note that the percentage reflects the relative number of retractions per category for all of the professional school disciplines.) Since engineering includes only 13.0% of the journal articles at Oxford and 7.3% at Harvard, these numbers seem disproportionately high. Neither Oxford nor Harvard have engineering schools, but the number of engineering journals in the Purdue University catalogue is no higher (4,175), and Purdue is famous for engineering. Language likely plays a role in the number of instances of plagiarism, which drops to 25 for majority English speaking countries. Image fabrication also drops to 1 and errors in analysis to 19 for majority English speaking countries. The

Disciplines 93

reason for this may have to do with training, but it is unclear. The number of instances of data falsification for majority English speaking countries remains high with 64 of the 85 instances.

Education Teacher training as a formal subject began in the nineteenth century, perhaps first in Prussia, where the government began to make school legally mandatory in the eighteenth century, in the “Generallandschulreglement” of 1763. Teacher training was also formalised early and became part of higher education at the universities or the Fachhochschulen, which are officially translated into English as “Universities of Applied Sciences” (but without some of the rights of a normal university). In North America, the Northwest Ordinance of 1787 encouraged the establishment of public universities and schools in what eventually became five midwestern states. In France, public schooling began after Napoleon, but the educational system developed slowly due to conflicts with the church. England did not begin actual public schooling by law until the Elementary Education Act of 1870, though so-called “public schools” like Eton had been around since the fifteenth century. These slow developments meant that professional training came late. In the US formal teacher training was not universally required until after the Second World War. Educational research relies perhaps most heavily on methods from sociology and psychology. For secondary education the teachers need subject training, which in most countries is typically provided by the various academic disciplines themselves. Research at formal schools of education typically focuses on topics about teaching and learning. The Retraction Watch Database shows the following numbers for education:     

2 instances of data falsification (1.5%), 0 instances of image falsification (0%), 3 instances of manipulated results (5.8%), 18 instances of errors in the analysis (8.0%), and 63 instances of plagiarism (19.5%).

(Note that the percentage reflects the relative number of retractions per category for all of the professional school disciplines.) Education is a large field in terms of the number of journals, and the instances of integrity violations in the Database are comparatively few. The instances of plagiarism are even fewer if only majority English speaking countries are considered (12 out of 50, a quarter of the total), which could mean language problems for some authors. Either education polices its research well, or the incentive to look for integrity violations involving data, images, results, or plagiarism is comparatively low. There is no measurable problem here of consequence.

94 Disciplines

Law Law goes back nearly to the beginning of written records. The Roman study of rhetoric included law, and legal reasoning was part of the canon law of the church in medieval times. In most countries, practitioners must take an exam that qualifies them to give legal advice and to appear as advocates for others in court. Law schools typically play a role in preparing students for these exams, but legal research often seems almost irrelevant for passing them. Sometimes people at the best law schools have more problems with the local “bar” exam because the elite schools focus less on statutes and cases and put more focus on preparing students to think broadly about judicial issues. Legal research itself draws on all of the social sciences, especially sociology and economics, as well as history and the logic elements of philosophy. The number of law journals is very large, since each country and often region is likely to have its own. By no means are all research-oriented. The Retraction Watch Database shows the following numbers for law:     

1 instance of data falsification (0.8%), 0 instances of image falsification (0%), 2 instances of manipulated results (3.8%), 5 instances of errors in the analysis (2.2%), and 41 instances of plagiarism (9.2%).

(Note that the percentage reflects the relative number of retractions per category for all of the professional school disciplines.) Considering the number of law journals, the amount of plagiarism is impressively low. About half of the plagiarism instances come from majority English speaking countries, but the language of many of the journals is probably the legal local language rather than an international language like English. Since law is not a strongly empirical field, the low number of data falsification instances is unsurprising.

Journalism Journalism and its more modern name “communications studies” is a comparatively recent academic discipline. Writing, reporting, and communicating are of course ancient, but the idea of studying a subject like journalism as an academic subject occurred first at the end of the nineteenth century: In 1892 newspaper publisher Joseph Pulitzer offered Columbia University’s president Seth Low $2 million to set up the world’s first school of journalism. However, the university initially turned down the money, and on September 8, 1908 the first actual journalism school opened its doors at the University of Missouri… (Normal, 2018)

Disciplines 95

Communications studies and variations on the theme came even later, after television and radio became established sources of news distinct from journalistic reporting, and the term “media studies” grew in popularity (especially as computing grew). Methodologically these fields relied on the social sciences, especially sociology and political science. The Retraction Watch Database shows the following numbers for journalism and communications:     

3 instances of data falsification (2.3%), 1 instance of image falsification (5.9%), 1 instance of manipulated results (1.9%), 13 instances of errors in the analysis (5.8%), and 41 instances of plagiarism (9.2%).

(Note that the percentage reflects the relative number of retractions per category for all of the professional school disciplines.) The methodological overlap means some of these instances belong fundamentally to other fields. Of the three instances of data falsification, for example, one involved James Hunton and economics, and another was marketing-related. Only one actually involved newspaper reporting. Classification is often difficult where the disciplinary and methodological dividing lines are unclear. In this case it seems reasonable to say that journalism as an academic discipline has few integrity problems beyond plagiarism, which is also not large.

Summary This chapter has been rich in numerical data. A key question is: to what degree are these data reliable, or are they in some sense flawed information themselves? The data have the advantage of transparency, since the numbers come from the Retraction Watch Database, which is publicly accessible for everyone. A colleague of the author has also checked many of the numbers in order to use them for a presentation and found at least one transcription error. Multiple pairs of eyes are always better than one. Many of the searches are complex, but the search terms have been described and should be valid as long as the Retraction Watch Database does not change its search algorithm or categories. Retraction Watch does in fact update regularly. The staff regularly add new cases and very occasionally reclassify old ones. That is normal and appropriate. It is a living database, not one frozen in time. The completeness and accuracy of the Database could be an issue. Retraction Watch relies on public information and the staff have a stronger personal interest in medical issues, which could theoretically skew the data, but there is no evidence of that. For social and economic reasons some forms of scholarly literature are more carefully scrutinised than others, in part because false results in those fields have more immediate consequences. Medicine is a good example. One of the problems

96 Disciplines

with measurement is often having something to measure results against. A zealous researcher with time and resources could comb through all journals to create a competing Database, with perhaps different classifications and different totals. In principle that kind of dual process is the best way to ensure accuracy and guard against fakes, frauds, or simple error. In practice it is rare and the measurement of information integrity is forced to rely on spot checking and plausibility. Below is a list in ascending order of instances of falsification, manipulation, analysis errors and/or plagiarism by disciplinary grouping:     

138 humanities, 499 social sciences, 872 professional schools, 2,481 natural sciences (excluding medicine), and 3,086 medicine.

This summary list suggests that the humanities have few information integrity problems. The reasons could be that no one cares to check, or that problems emerge only slowly over time, or these old disciplines may in fact have effective control mechanisms. The social sciences also have relatively few instances of retractions. Preprint serves like SSRN may play a role in catching problems before articles are formally submitted (data on this has been requested from Elsevier, which now owns SSRN, but is not yet available). The professional school disciplines have noticeable problems with plagiarism and errors in the analysis. Plagiarism is an ethical issue but does little damage to the scholarly record as long as the copied passages are true and not taken out of context. Any journal that wants to catch plagiarism has effective tools at its disposal. Medicine seems to have the largest problem, with over half of the instances of retractions in all of the natural sciences. Nonetheless that figure is also complicated because of the overlap between medicine, biology and chemistry. Medical articles are also more likely to be checked for accuracy, since lives may matter, and economic interests may play a non-trivial role. The natural sciences (including medicine) account for the majority of the retractions in the Database, with some exceptions such as mathematics and computer science. The natural sciences are also areas where integrity measurements are perhaps most possible, since new studies often need to build on old results, resulting in a long tradition of replication. It is dangerous to oversimplify the root causes for why scholars commit integrity violations. Pressure to publish is a popular notion, as well as the competition for jobs. Greed can play a role also for disciplines where commercialisation plays a role. Nonetheless finding plausible reasons behind deliberate falsification needs more reliable data and less speculation. Simple error can also be a contributor, since even top scholars can make mistakes. It may be a good sign that many problems are now being caught and so many articles are being retracted, because it shows a degree of academic self-policing, even at the cost of

Disciplines 97

embarrassment. The fact that Retraction Watch is making so many retractions public may encourage disciplines and journals to measure the risk of falsification better in the future.

Note 1 As of August 2020.

5 MEASUREMENT

Introduction This chapter offers a theoretical look at the concepts of measurement and how to apply them broadly to information integrity issues. The chapter will look at examples from well-known fraud cases in the academic world for what they can tell us about measurement, but will focus equally on contemporary non-academic information falsification and manipulation with the goal of understanding how measurement techniques can help to distinguish reality from half-truths and plausible lies. The approach will be that of a historian trying to understand change over time, as well as that of an ethnographic observer living at a time and in a world where people are experiencing, as Geoffrey Kabaservice (2020) writes, “an alternate reality.” When Democrats held their national convention in Chicago in 1968, the radical Youth International Party – better known as the Yippies – promised to send all of the delegates into a psychedelic trance by dumping LSD into the city’s water supply. Watching the Republican National Convention take place this week in and around Washington DC, I wondered if the Yippies had finally pulled off their prank. Some of the speakers’ descriptions of Donald Trump, his presidency, and the state of the country were so far removed from reality that you’d have to be in the grip of powerful hallucinogens to believe them. From the viewpoint of the producers of the Republican convention, the depiction of Geoffrey Kabaservice is presumably the actual flawed and drug-induced alternate reality. A number of notable international news sources, the Guardian and the New York Times included, try to offer balance by including commentators from multiple DOI: 10.4324/9781003098942-5

Measurement 99

viewpoints. If truth were a Hegelian synthesis, the approach might suffice to distinguish reality from falsehood. For a historian what will matter in the long run are the facts and a rational interpretation of them that can hold up over time. Today without the hindsight of decades or centuries, the best one can do is to measure reality using established facts and intellectual tools. How to do that is the subject of this chapter.

What is measurement? Most discussions of the concept of “measurement” focus on the common units of length, weight, and volume, but a more general theory of measurement was addressed in the twentieth century. An example is a book by Fred S. Roberts entitled Measurement Theory (1985). The abstract explains: This book provides an introduction to measurement theory for non-specialists and puts measurement in the social and behavioural sciences on a firm mathematical foundation. Results are applied to such topics as measurement of utility, psychophysical scaling and decision-making about pollution, energy, transportation and health. As the abstract implies, the theory begins with the social sciences and generalises from there. Another more recent work is Measurement: Theory and Practice: The World through Quantification by David Hand (2004), which begins with a reference to Lord Kelvin, who is quoted as saying: … when you can measure what you are speaking of and express it in numbers you know something about it, but when you cannot express it in numbers, your knowledge of it is meager and unsatisfactory; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the stage of science, whatever that may be. (Cited in Ecob, n.d.) Lord Kelvin was certainly right in associating measurement with quantification and the implicit need for numbers, but may have gone too far in suggesting that nonquantitative measurement fails to reach the level of science in its broadest sense. A claim, for example, that the earth is flat defies centuries of human experience in circumnavigating the globe and is scientifically wrong without needing any specific numerical measurement. A measure of truth and falsity can grow out of historical and observational records that do not necessarily rely on numbers.

Classification A reliable definition of “measurement” needs to include three aspects: classification (or its virtual synonym, categorisation), interpretation, and, of course, counting.

100 Measurement

Classification is important in order to know what one is counting. In English people talk about not mixing apples and oranges (in German apples and pears) as a metaphor for two related but dissimilar categories because the distinctions matter when counting. Apples and oranges (or pears) could be combined in a count of fruits generally, or further distinctions could be made between types of apples (or oranges or pears). A wrong label and a false categorisation lead to flawed counts. Interpretation is often part of classification, because understanding the shades of difference and the rules of classification matters for accurate results. This is especially true in economics. The point is simply that the way the authors count their data becomes meaningful only when measured in a well-defined context. This is presumably obvious but needs to be remembered when looking at information integrity issues. Numbers can seem to offer a deceptive measure of reliability and credibility, even when they are partly true. It is the categorisation and interpretation that matters. US President Donald Trump sent a tweet on 1 Sept 2020 saying that only about 9000 people in the US had died from COVID-19 because the Center for Disease Control said that only 6% of those who had died from the virus listed no other cause of death. Twitter deleted the post as violating its misinformation policy (Shammas and Kornfield, 2020). One might quantify the tweet as 94% false, because it excluded 94% of the actual deaths, but what was really wrong in Trump’s post was not the number, but the interpretation that a single cause on a death certificate indicated the real reason for a death, and that multiple causes on the death certificate meant that a single cause such as COVID-19 was not a significant factor.

Classification / categorisation tools People writing about behavioural topics in the social sciences often collect interviews rather than numerical data. The same is true for journalists who need to interpret statements at press conferences or during talk shows. There are several ways in which measurement can take place in such cases. In the academic world today there is typically a transcript of the relevant parts of the interview, and tools like MAXQDA allow the mark-up and classification of the transcript, which makes it possible to count the number of instances of a particular category. It takes time and practice to set up categories and do a thorough analysis, and of course the categories will determine what gets counted and whether the counts have any meaning. This will be looked at more in the following section on “Training in measurement.” One of the most complex areas of measurement, and one that generates significant amounts of misleading or only partly reliable information, is voter polling. Many organisations call voters to ask about their preferences. Some calls are computer-generated, others are live callers. The results are clearly quantitative, and the method matters. Evaluating the meaning of the numbers involves many decisions. One of the first factors to decide before the survey begins is the means for getting a

Measurement 101

random sample, which is necessary for the statistics to be valid. There are multiple approaches using phone numbers or electronic access or even door-to-door queries (which are rare now). With a large enough sample of respondents, the likelihood of it being reliably random can increase, but larger samples require larger resources and become quite expensive. The difference between a “scientific” poll and asking a group of friends or neighbours is the degree to which the “sample” (the set of people asked) resembles the target population. A simple example is daytime calling, which could give misleading results if not adjusted for the potential overrepresentation of retired or unemployed people, who are more likely to be at home during the day. In the United States participation in elections historically tends to be low but can be quite variable. Only about 56% of all possible voters generally voted before the 2020 presidential election but an unexpectedly high 66.7% voted in the 2020 election (Statista, 2020). In the UK 63% of eligible voters cast ballots in the last major election. In Germany just under 70% of the eligible population cast ballots. (Desilver, 2018) The variability matters because questions about whether someone is registered and whether they intend to vote can make a big difference in the results. Most polling organisations have models to help predict the likelihood that someone will actually vote. The models include factors for education, age, and past voting, and the models build on past experience. This is one reason for the variability of answers in what could seem like a straightforward measurement.

Transparency The models need to be understood, which also means that they must be described transparently. Nate Silver’s Fivethirtyeight (https://fivethirtyeight.com/) site gives an extensive explanation of how it calculates results: …we use a formula that weights polls according to our pollster ratings, which are based on pollsters’ historical accuracy in forecasting elections since 1998 and a pair of easily measurable methodological tests:  

Whether the pollster participates in professional initiatives that seek to increase disclosure and enforce industry best practices – the American Association for Public Opinion Research’s transparency initiative, for example. Whether the pollster usually conducts live-caller surveys that call cellphones as well as landlines.

Polls are also weighted based on their sample size, although there are diminishing returns to bigger samples. (Silver, 2020) Weighting in practical terms revises the actual results with the goal of increasing their predictive power. Silver (2020) explains more about this:

102 Measurement

Polls are adjusted for house effects, which are persistent differences between the poll and the trend line. Rasmussen Reports, for example, has consistently shown much better approval ratings for Trump than other pollsters have, while Gallup’s have been slightly worse. The house effects adjustment counteracts these tendencies…. After adjusting for house effects, therefore, these polls – which had seemed to be in considerable disagreement with each other – are actually telling a fairly consistent story. The house effects adjustment is more conservative when a pollster hasn’t released very much data. For those with sufficient mathematical and statistical training, the explanation makes a logical case, even though there is room for scholarly differences about how the “house effects” adjustments work and whether they are needed. For those without sufficient background to engage in such a discussion, the means of measurement has to shift to issues of trust. It is important to remember that even a logical and well-explained model can give flawed results when behaviours change. The fivethirtyeight.com prediction that Biden would win turned out to be true, but the margins were smaller than expected. For most people and for most circumstances, measurement is not really possible to carry out for most forms of information because people often lack access to the source data or they lack the analysis tools or they lack the necessary background and training to do the analysis or they just lack the time and inclination to engage in an intensive measurement exercise. The reality is that the form of measurement that counts the most is which sources to trust. Trust is not a simple binary state, but a grey-scale range that varies from reasonable doubt to general acceptance with many stages in between. “Identity politics” also plays a role that influences measurement judgments. Identity politics as an explicit concept dates back to the 1960s. In referring to identity politics, Haider (2020) writes: In order to generate a political understanding of these words and sequences, which date back to the 1960s, we have to begin with the prism through which the memory of the sixties was viewed. In fact, opinions based on personal and social identity go back much further in time than the 1960s. Medieval peasants were aware that they were different than the noble landowners, and Protestants were aware that they were different than Catholics. Class, religion, education, and race have long played a role in determining how people interpreted information, and those personal interests and identities play a role in trust.

Trust in science It is usual for scholars at elite universities to trust scientific results from other elite institutions because they share a common understanding of the importance of good

Measurement 103

scientific practice and accept the reliability of outcomes, even in fields not their own, because they have a reasonable understanding of the origin of the information. The same is broadly true for people whose university training instilled in them a respect for how science works. This need not mean that they always accept everything that researchers write, but they are more likely to measure the value of a source as trustworthy because they understand the mechanisms behind it. This fundamental respect for science is one of the reasons for contemporary concerns for attacks on it. The US Union of Concerned Scientists has developed a database of “Attacks on Science” with categories that include censorship, anti-science rules, and rolling back data collection or access (Union of Concerned Scientists, n.d.). The argument is often used that scientists disagree, but there is also a broad consensus within the science community on topics such as climate change or COVID-19 numbers. Trust regarding the more political issues in news reports and in social media depends on personal judgments about the sources. In the US the New York Times has a positive reputation among academics and generally gives its sources for reports, except when they are anonymous in order to protect the identities of informants. This is true generally also for the Guardian in the UK and for the public broadcaster ARD in Germany. Sources are not a sufficient criterion for trust but contribute to any systematic measurement of reliability. The kind of source matters, of course. In western democracies government sources have generally been seen as reliable, though government information can also be manipulated. George Romney, governor of Michigan and a candidate for US president, offers a historical example. He had trusted the generals who had told him about the progress of the Vietnam War. When he realised that these government officials had not told him the whole truth, “he told Detroit television newsman Lou Gordon that he had been ‘brainwashed’ by American generals into supporting the Vietnam war effort while touring Southeast Asia in 1965” (Sabato, 1998). The comment undermined his campaign, but offers an example of the risk of trusting a source (in this case the military) that favoured a particular viewpoint. In retrospect it is clear that the war was going badly for the US military but admitting that would have undermined support at home for resources that the generals believed they needed. In other words, the military had a clear self-interest in promoting a favourable view. As someone who was scrupulously honest himself, Romney did not think of this. Other examples of potentially flawed government data are not hard to find. The decennial US Census data are a key source of information for determining how many representatives each state gets in Congress, and concerns about undercounting have surfaced year after year. In 2020 the intensity of the concern grew: Covid-19 and rising mistrust of the government on the part of hard-to-reach groups like immigrants and Latinos already had made this census challenging. But another issue has upended it: an order last month to finish the count a month early, guaranteeing that population figures will be delivered to the White House while President Trump is still in office. (Wines, 2020a)

104 Measurement

It remains to be seen what consequences this will have, but it is an example of where people should have reasonable doubt about a traditionally reliable factual source.

Training in measurement Training in measurement begins typically in elementary school when students learn about units like grams and metres or, in the US, pounds and feet. Fractions and percentages are part of measurement too. Every person with a minimal education learns some aspects of measurement extremely early. The measurement of information integrity shares many of the same principles as measuring heights and weights, but how to measure and the meaning of the measurements grows in complexity with each variation on the topic. This section does not pretend to be a tutorial in measurement, but will look instead at a number of complex examples. COVID-19 statistics offer a good example because there are so many numbers, charts, and even maps purporting to describe the same information. A popular starting place is the Wikipedia page “COVID-19 Pandemic by Country and Territory” (Wikipedia, 2021). Many librarians and scholars have a healthy suspicion of Wikipedia because anyone can in theory update information, but ever since Jim Giles (2005) published his comparison between Wikipedia and Britannica, the acceptance of Wikipedia as a reliable source has grown, especially for current information. Wikipedia has also introduced controls on updates to timely and politically sensitive articles, and Wikipedia typically includes source references. The sources themselves are an important part of any measurement process before ever accepting the proffered numbers. Wikipedia (2021) lists 425,249 deaths in the United States (as of 26 January 2021). For the same date, the New York Times lists 421,003 deaths. Looking at the sources explains the difference. The New York Times used “State and local health agencies and hospitals. Population and demographic data from Census Bureau,” while Wikipedia’s (2021) source is: “COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).” ArcGIS. Johns Hopkins University. Retrieved 26 January 2021. The Johns Hopkins site explains “Cases and Death counts include confirmed and probable (where reported)” (CSSE, 2021). As noted earlier, official state reporting on COVID-19 deaths varies, and the CSSE site includes more probable cases. This does not mean that any one source is more true than others. The data sources and means of counting are just different. Another way to count the impact is the rate of excess deaths. As Viglione (2020) explained in the journal Nature: …there were bound to be deaths from COVID-19 going unreported, thought Noymer, a demographer at the University of California, Irvine. “I just

Measurement 105

remember thinking, ‘this is going to be really difficult to explain to people,’” he recalls. And in March and April, when national statistics offices began to release tallies of the number of deaths, it confirmed his suspicions: the pandemic was killing a lot more people than the COVID-19 figures alone would suggest. The US Centers for Disease Control and Prevention offer statistics on the number of deaths in excess of what was predicted. (CDC, 2020) The number of predicted deaths depends of course on the reliability of past statistics and on a model of what might reasonably be expected. The advantage of using the excess death rate is that it compensates for imperfect reporting about any particular cause of death. A simple takeaway is that multiple measures confirm a high death rate and all of the numbers come from sources traditionally considered reliable.

Intensity A measure of the intensity of the pandemic in each country is the number of deaths per million. As of 26 January 2021 the rates varied widely among countries where the reporting could be considered accurate (Wikipedia, 2021). The UK had one of the highest rates in the world with 1482 deaths per million, and Italy 1421. The US was close behind with 1314 deaths per million. Spain had 1203, France 1097, Mexico 1086, Sweden 1081, Portugal 1043, and Brazil 1037. The rates in Germany (641), Norway (103), Japan (40), South Korea (26) and New Zealand (5) were significantly lower. The rate per million is valuable because it adjusts the impact of COVID-19 based on a country’s population and gives a measure of how effective containment measures were. Any comparison of this sort must assume that reporting of deaths used a comparable basis, and that each country had roughly the same potential for becoming infected. The figures also suggest that the theory that warmth reduces virulence is not completely true, since Mexico and Brazil have high numbers, and Norway and Germany relatively low numbers. Remember that such numbers represent only a single date. This will always be the case when conducting measurements during an event like the pandemic. The New York Times and other publishers (for example, Tagesspiegel in Berlin) use maps to show the local impact of COVID-19. The maps are visually effective measurement tools that typically use colours to show the intensity of the infection. Red usually signals high numbers, orange fewer, and grey a stable situation. The numbers only make sense, however, in connection with the detail in the note “About this data,” which includes explanations for all four map displays (Hot spots, Total cases, Deaths, and Per capita): The hot spots map shows the share of population with a new reported case over the last week. Parts of a county with a population density lower than 10 people per square mile are not shaded. Data for Rhode Island is shown at the state level because county level data is infrequently reported. For total cases

106 Measurement

and deaths: The map shows the known locations of coronavirus cases by county. Circles are sized by the number of people there who have tested positive or have a probable case of the virus, which may differ from where they contracted the illness. For per capita: Parts of a county with a population density lower than 10 people per square mile are not shaded. (New York Times, 2020) The definition of Hot spots is based on “Average daily cases per 100,000 people in the past week” and the definition of Per capita is “Share of population with a reported case.” The two other categories have no special explanation, probably because the labels (“Total cases” and “Deaths”) seem self-explanatory. All of the maps give readers the ability to check statistics by county, and three of the maps give the total number of cases and the share of the population for each county. As an example, on 25 January 2021 three of the maps gave the following statistics for Wayne County, Michigan (which includes Detroit): “Cases: 95,207, share of population: 1 in 18; Deaths: 3,936, share of population: 1 in 444” (New York Times, n.d.). The fourth map, Hot spots, gives figures for the average daily cases (146), per 100,000 (8.3) and a line graph showing the 14 day trend. As a measurement tool these maps provide clear and useful data, but only for those who take the trouble to read all the notes to know what exactly the numbers and colours mean. Training here means learning to look for such details.

Categorisation tools Measurement is possible with non-quantitative information too. This section will use a behavioural research tool called MAXQDA that many scholars apply to analyse interview and text materials. Most of the examples could also be done without MAXQDA or any other tool. The software merely makes it easier, much as statistical packages like SAS and SPSS make it easier to run statistical tests. A very simple example can use a recipe by Yotam Ottolenghi (2020) from the Guardian. The test article offers three somewhat complex Persian dishes and gives the preparation time and cooking time. A reasonable question to ask about any new recipe is whether the preparation time is an accurate reflection of the actual time it would take to prepare a dish. A simple encoding of the ingredients gives one metric:   

Smashed carrots: 15 ingredients, preparation time 25 minutes, Spinach borant: 10 ingredients, preparation time 15 minutes, Chicken kofta: 14 ingredients, preparation time 25 minutes.

This suggests a broad relationship between the number of ingredients and the cooking time: about 0.6 minutes per ingredient. Individual experience will of course vary. This modest amount of time per ingredient likely works for a professional chef. For ordinary household cooks with less experience, it may be too

Measurement 107

skimpy to be entirely accurate. One other factor that might affect the preparation time measurement is how unusual the ingredients are. The chicken kofta has two ingredients that may not be in a typical kitchen: smoked paprika and cascabel chilli. That implies some additional but unspecified preparation time for securing the ingredients. The point of this example is that simple categorisation can give useful information for analysis, not only about a particular recipe, but potentially about other recipes by the same author. A more complex example comes from the news reports about the “Social Progress Index.” It was in the news in September 2020 because it showed that the US had seen a steep decline from 19th to 28th in the world standing. A relevant question is: how reliable is the information in these reports? The results were reported in both the New York Times (Kristof, 2020) and in the politically more conservative Dow Jones paper Market Watch (Pesce, 2020), which suggests that the data are accepted across a broad range of political opinion. The headlines in both papers emphasised the drop: “We’re No. 28! And Dropping!” in the Times and “The U.S. Dropped Majorly on the Index that Measures Well-Being” in Market Watch. The Social Progress Index itself is available online (Social Progress, 2020) with detailed information about how the information was compiled. This means that the accuracy of the reporting about the Social Progress can be checked against the source. That does not guarantee the integrity of the information per se. Verifying must ultimately be a two-step process, one involving the two newspaper reports, the other involving the source.

Coding example A very simple coding system was used to examine the two articles. In each case phrases involving “source information,” “factual information,” and “opinion” were coded. Neither article was long and only relevant phrases (not links to related stories) were included in the coding. The category “source” information could include links, persons, or institutions. “Factual information” was anything presented as a fact in the articles. And opinion was any form of commentary. The results were: New York Times:   

7 phrases involving source information, 17 phrases involving factual information, 13 phrases involving opinion. Market Watch:

  

5 phrases involving source information, 23 phrases involving factual information, 6 phrases involving opinion.

It is worth remembering that the Kristof article was labelled as opinion and the author is described as an opinion columnist. The Market Watch article is simply a

108 Measurement

news report. That could explain the much larger number of phrases involving opinion in the New York Times article. Both articles give source information, and Market Watch references the New York Times as one of several sources. Market Watch pulls in a few additional supportive facts from other sources. On balance both articles report in a scholarly way that is heavy on sources and facts, and the sources are academically respectable, such as the Harvard Business School Professor who chairs the Social Progress Index advisory panel, though a critical reader could reasonably object that he hardly offers an independent viewpoint. If the reporting is fair, what about the data themselves in the Social Progress index? With all indexes and rankings – and the news is full of them today – the categories and the data sources can tell a lot about the integrity of the information. The Social Progress index uses three major categories: Basic Human Needs, Foundations of Wellbeing, and Opportunity. Under each major category are four more subcategories and under each subcategory are three to five more specific indicators. There are too many categories to look at each in detail here, but examining just a few can show potential issues for some readers. The subcategory Personal Safety includes indicators for:    

traffic deaths (per 100,000) (source: Institute for Health Metrics and Evaluation), political killings and torture (0 = low freedom, 1 = high freedom) (source: Varieties of Democracy (V-Dem) Project), perceived criminality (1=low, 5 = high) (source: Institute for Economics and Peace Global Peace Index), homicide rate (per 100,000) (source: UN Office on Drugs and Crime).

Of these indicators, two require complex measurement. The source for political killings and torture describes itself as follows: The V-Dem Institute is an independent research institute and the Headquarters of the project based at the Department of Political Science, University of Gothenburg, Sweden. The Institute was founded by Professor Staffan I. Lindberg in 2014. (V-Dem Institute, 2020) And the source for perceived criminality comes from a site called “Vision of Humanity” that is part of the Institute for Economics and Peace (2017): Vision of Humanity is a guide to global peace and development for people who want to see change take place. Now reaching over 1 million unique users per year, we publish intelligent data-based global insight with articles and the latest research published every week. Those with right-leaning political preferences could regard both sources as suspect even though the results appear to be framed neutrally. The ranking for these topics

Measurement 109

is in any case not a simple publicly available statistic like homicides or traffic deaths. Out of a total of 51 indicators, 11 come from the V-Dem institute according to the links in the downloadable Social Progress Index spreadsheet, and 15 from the “Global Health Data Exchange,” which describes itself as: The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center at the University of Washington. The Global Health Data Exchange (GHDx) is a data catalog created and supported by IHME. (Institute for Health Metrics and Evaluation, n.d.) Multiple other sources of data include agencies of the United Nations including: the World Health Organisation, the Food and Agriculture Organisation, the United Nations Office on Crime and Drugs, United Nations Human Development Reports and UNESCO (United Nations Educational, Scientific and Cultural Organization). In other words, the validity of the data in the Social Progress Index depends on data from a wide range of international and academic sources. Those with a general aversion towards international organisations may be sceptical, and those with a detailed knowledge of some aspects of the data collection processes could have quibbles with some numbers. In the end the sources are ones that most scholars would find reliable. What is important here is the process for determining the integrity of the information. Anyone can look up the sources and follow the references to get to the original data in order to verify its validity. It is time consuming, but doable. Anyone who wants to train themselves to verify the integrity of information must train themselves to follow the chain of references to their ultimate sources.

Institutions and measurement The previous chapter discussed measurement in academic disciplines. The institutions under discussion in this section are not part of the university structure, though often there are indirect connections. Three kinds of institutions will be examined: corporations that advertise information about their products, news organisations, and government agencies. As in previous chapters, the focus will be primarily on institutions in the US, UK, and Germany, where information is freely available. Other countries, such as China and India, are increasingly open with information, but are culturally less comparable.

Enron Marketing plays a significant role in the information that corporations make available, and while marketing is not expected to be entirely neutral, most western countries regard information that is plainly false as unacceptable. One legally important aspect of marketing involves the value of the firm itself. Economists have sought neutral sources for measuring the wealth and success of corporations by

110 Measurement

analysing the stock and related finance markets, but sometimes this data only indirectly addresses or corrects false statements about the financial health of a company. One of the biggest scandals in recent financial history was the collapse of the energy company Enron in 2001, and the subsequent dissolution of the major accounting firm Arthur Anderson for failing to uncover the financial malpractice. The case against Enron was about lying: The 12 jurors and three alternates, who all agreed to talk to about 100 reporters at a news conference following the verdict, said they were persuaded – by the volume of evidence the government presented and Mr. Skilling and Mr. Lay’s own appearances on the stand – that the men had perpetuated a far-reaching fraud by lying to investors and employees about Enron’s performance. (Barrionuevo, 2006) The audit firm Arthur Anderson appears to have contributed to the fraud by illegally destroying documents (Eichenwald, 2002). A consequence of this was that the audit company itself ceased to do business soon after in 2002. The principle aim of an audit is to ensure the accuracy of the financial statements, and auditors have a professional and legal obligation to exercise due care in uncovering error, insofar as discovery is possible. Why they failed is a complicated story. Pressure to support a major client certainly played a role, and, as noted with the Diederik Stapel case, clever lies can be hard to detect. In the years following Ronald Regan’s presidency and Margaret Thatcher’s term as Prime Minister, the idea became popular that corporations had an obligation to make as much money as possible, and sometimes the ends seemed to justify the means. The economic philosophy that justified this belief came from the Nobel Prize winning economist Milton Friedman. Strine and Zwillinger (2020) write: Instead of operating in a manner that treated all stakeholders fairly, Mr. Friedman argued, every corporation should seek solely to “increase its profits within the rules of the game.” Not only that, Mr. Friedman sought to weaken the rules of the game by opposing basic civil rights legislation, unions, the minimum wage and other measures that protected workers, Black people, and the environment. Friedman did not actually call for weakening audit procedures, but the climate of profit-maximisation made corporations like Enron seem like heroes and anyone who criticised their success was a pariah. In effect, this undermined the efforts of auditors to uncover fraud in the financial records and made any neutral measurement of the company’s assets impossible.

Tobacco Tobacco companies faced problems with information integrity as soon as health officials began to document the deleterious effects of smoking. Warning labels

Measurement 111

began to appear on cigarettes in the 1960s, and the companies slowly began to accept the labels as a protection against lawsuits (Hiilamo et al., 2014). As the scientific evidence against smoking grew stronger, countries insisted on blunter and more aggressive labels. Some of the bluntest warnings come from Germany: “Rauchen ist tödlich” (“smoking is deadly”). The tobacco industry disputed the scientific results and also the risks of environmental smoking. Denial was one approach, but more dangerous was a systematic attempt to undermine belief in scientific results by arranging “for some of their consulting scientists to write articles on how its opponents politicised science” (Drope and Chapman, 2001). The effort to undermine the scientific results had large-scale funding: …the industry quickly began to build a massive international network of scientists who would spread the industry’s message that ETS [Environmental Tobacco Smoke] was an insignificant health risk. (Drope and Chapman, 2001) The Tobacco industry had plenty of money to fund its campaign, and one of the indirect effects of the campaign was to undermine public belief in scientific results, the consequences of which can be seen today in claims that the scientific evidence about climate change is inadequate and that scientists disagree. For example in a television interview on 4 November 2018: President Trump asserted: “I can also give you reports where people very much dispute that. You know, you do have scientists that very much dispute it.” (Sabin Center for Cimate Change Law, 2020) The National Oceanic and Atmospheric Administration’s climate information website itself poses the question: “Isn’t there a lot of disagreement among climate scientists about global warming?” and answers it directly in an article by Herring and Scott (2020): No. By a large majority, climate scientists agree that average global temperature today is warmer than in pre-industrial times and that human activity is the most significant factor. Some prestigious science-oriented journals have long made it a policy not to engage in politics or to endorse candidates for office, but the campaign against science-based results, a propaganda war that many companies and politicians find unwelcome, finally pushed the 175 year old magazine Scientific American to break with its own tradition and endorse Joe Biden for president. The magazine has taken the line because, it says, “Donald Trump has badly damaged the US and its people – because he rejects evidence and science.” (Belam, 2020)

112 Measurement

At one time the public and most corporations and news programmes appeared at least to accept the truth of scientific reports. This appears less and less to be the case.

Other cases Another well-publicised case of corporate misinformation involved the Boeing 737 Max. In 2019 Boeing’s chief executive admitted that they “made mistakes and got some things wrong” (Kollewe and agencies, 2019) Eleven months later a US Congressional report criticised the competition with Airbus that stepped up production schedules and made “cost-cutting a higher priority than safety” (Chokshi, 2020). Boeing apparently ignored employees with documented concerns about the software that was supposed to compensate for problems because of the engine placement. The airline industry has many internal and external controls (the Federal Aviation Administration in the case of the US). These controls are a form of measurement designed to catch safety problems, but the evidence suggests that information about risks took second place to corporate interests. Government oversight was especially weak after decades of pressure to free companies from government rules. By no means do all businesses engage in active disinformation campaigns, but industries that felt a threat to their existence and to their profits, and ones that had the financial resources to fight against scientific results, seem prepared to do so. Measurement certainly involves comparisons and scepticism toward simple answers. There are cases where the evidence is ambiguous, such as the controversy about whether resveratrol in red wine is heart-healthy, but in this case industry seems to have support from notable medical centres, including the Mayo Clinic (Mayo Clinic Staff, 2019): The potential heart-healthy benefits of red wine and other alcoholic drinks look promising. Those who drink moderate amounts of alcohol, including red wine, seem to have a lower risk of heart disease. However, it’s important to understand that studies comparing moderate drinkers to nondrinkers might overestimate the benefits of moderate drinking because nondrinkers might already have health problems. News organisations vary widely. Some make a serious effort to discover whether the news that they report is accurate. Such effort costs money and the time pressure to get a story out can undermine even the best intentions to check the facts thoroughly. Interpretation also plays a non-trivial role in how and when to report facts. When the well-known reporter Bob Woodward interviewed US President Trump for his book “Rage,” one of the revelations that was released in September 2020 (before publication), was that Trump knew how dangerous COVID-19 was and had deliberately played it down to prevent panic (Ballhaus, 2020). Fox News, which has often been friendly to Trump, raised the question in turn of whether

Measurement 113

Woodward was remiss in not releasing his tapes earlier. Wulfsohn (2020) published the following answer: According to [Washington Post media critic Erik] Wemple, when asked why he did not go public with Trump’s comments about the virus being “deadly,” Woodward explained he “didn’t know where Trump was getting his information, whether it was true, and so on.” It took him three months to nail down all the reporting about what Trump knew about coronavirus… The situation mirrors that of early scholarly reports about COVID-19, where some scientists felt a responsibility to expose information quickly before full peer review was completed. There is no simple measurement model for completing all the possible fact and source checks on politically sensitive information.

News and government agencies News programmes do not have any exact equivalent to the scholarly peer review process, but there are some similarities, nonetheless. Ultimately an editor in both publication types must take responsibility for the quality of the content, and no editor in either situation can have all of the information necessary to measure the integrity of the content in a reliable way. A certain amount of trust for known authors or reporters is needed. Nonetheless the standards of truth may vary. Many broadcasters and publishers have niche audiences that may put more credibility in tempting rumours than official government reports. There is also a degree to which quickly publishing news stories becomes itself a form of open peer review, where readers comment and the publishers sometimes correct statements, though not all news sources issue corrections. Social media represents a new form of publication with no constraints by editors or peer review. QAnon was born on social media. Spring and Wendling (2020) describe QAnon as follows: QAnon’s main strand of thought is that President Trump is leading a fight against child trafficking that will end in a day of reckoning with prominent politicians and journalists being arrested and executed. Such conspiracy theories are nothing new. The pre-modern world was full of ideas about witches and devil worship and either impending doom or coming retribution for evildoers. Twitter and Facebook long tried to avoid exercising any editorial restrictions, but ultimately gave in to social and political pressure at least to curb the worst instances of disinformation. Government agencies in the western world have a tradition of providing factual information that is unimpeachably reliable and scholars generally trust and use UK, US, and German government statistics in the expectation that they do not reflect

114 Measurement

political bias. It has never been the case that all government information is totally accurate. An example is the population of the city of Berlin. In Germany anyone officially living in a city must register with the local district office (Bürgeramt), and this registration is the legal basis for determining population. Berlin is, however, also the capital of Germany and many people who officially live elsewhere have a pied-à-terre in the city. The number of tourists in the city in pre-COVID-19 times was also an uncounted population that put a demand on city services without being reflected in official statistics. This mismatch between the official and actual numbers was in no way false information, but gave a false measurement for some purposes, the demand for city services and housing among them. The situation with the US census is very different. The US has never required people to register where they live, unless they want to vote or have a driver’s license, neither of which is used as population measures. Instead the US Census Bureau, a government agency, conducts a decennial population count whose accuracy has increasingly been called into question, especially for those living in poorer neighbourhoods. The 2020 census may be especially flawed because of the pandemic, which has meant that “more than one in three people hired as census takers have quit or failed to show up” (Wines, 2020a). Other problems have loomed as well: A federal court on Thursday rejected President Trump’s order to exclude unauthorized immigrants from population counts that will be used next year to reallocate seats in the House of Representatives, ruling that it was so obviously illegal that a lawsuit challenging the order need not go to a trial. (Wines, 2020b) The accuracy of the census is hard to measure, as Wines (2020a) notes: No census is perfect, and many have been marred by incidents like fires in 1890 and 1980, lost records and even skulduggery. But none has been rejected as fatally flawed. Indeed, no metric for a flawed census exists. Doubt about reliability of government information in both the US and the UK grew in 2020 as the political leadership pressed for results more favourable to their own policies. A study of scientists in US governments service was done in 2018: The nonprofit Union of Concerned Scientists, working with Iowa State University’s Center for Survey Statistics & Methodology, has conducted nine such surveys every 2–3 years since 2005 to assess the status of policies and practices at agencies relating to scientific integrity and to identify problem areas that might not otherwise be made public. … (Goldman et al., 2020) Their conclusion states:

Measurement 115

Our results indicate that federal scientists perceive losses of scientific integrity under the Trump Administration, given responses to key questions on the 2018 survey and comparison to surveys conducted prior to 2016. (Goldman et al., 2020) This means that government data, which might ordinarily be a benchmark against which other data could be checked, may have lost some of their reputation for reliability. Nonetheless readers should remember that no source of information has ever been completely perfect. This section has emphasised areas of concern as a reminder of the need for systematic measurement in all institutions.

Measurement failures No simple or single explanation explains why measurement sometimes fails to catch information integrity problems, but a number of factors play a role that will be discussed below in turn. The simple failure to take the time to do a serious integrity measurement is one of the most common reasons. Others include a failure to apply the tools systematically or flawed measurement processes that give wrong or incomplete results. Results that are hard to interpret are an especially important factor. A final factor can be measurement processes whose results are for a variety of reasons simply invalid. One of the leitmotivs of this book is that measurement is hard, and its difficulty makes measurement daunting for many people and institutions. It is reasonable to argue that there are occasions where not everything can or should be measured. Some cooks use gut feeling to measure out ingredients, rather than a scale or measuring cups, and that result is not necessarily bad. In fact such cooks are probably making unconscious measurements based on vision, feel and past experience. Scholars often use the same approach with data, but it can be risky. A simple example comes from survey data. People who do surveys professionally or who work at research institutes that rely heavily on questionnaires will typically do multiple tests to measure whether a sample is sufficiently random to be representative. A good selection process is one way to achieve a representative sample, but no selection process involving human subjects is so completely random that measuring it against known characteristics of the population is unnecessary. The comparison data may also be flawed, of course, but unless the flaws have a systematic bias, the comparison can help to show the degree to which a sample matches the population. Nonetheless for less experienced scholars, including many doctoral students, this is easy to overlook. And merely collecting the comparison data can become a project in itself. Measurement costs money, and that is one of the reasons why corporations sometimes choose not to take additional steps or to get extra tools for measurement. Plagiarism is a good example. With tools like iThenticate and Turnitin a publisher should in theory be able to reduce the risk of plagiarism to almost zero. These tools are not cheap, however, and purchasing their use will either cut into

116 Measurement

profits or mean an increase in subscription or APC (Article Publishing Charges) costs. Simply licensing the tools is not the only cost. Publishers need people who can interpret the results, which are complex enough that serious training is required.

Plagiarism Many of the wealthiest publishers, and universities with a concern about plagiarism, use plagiarism software to check routinely for potential problems. Unfortunately, this does not mean that plagiarism has ceased to be a problem. The Retraction Watch Database records the following instances of retractions due to “Plagiarism of Article OR Plagiarism of Text” (27 January 2021):    

476 (21.2%) for Elsevier OR “Elsevier – Cell Press,” 462 (20.6%) for Springer OR “Springer – Biomed Central (BMC)” OR “Springer – Nature Publishing Group” OR “Springer Publishing Company,” 165 (7.3%) Taylor & Francis, 96 (4.3%) Wiley OR “Wiley-Blackwell.”

(The percentage in parentheses refers to instances of plagiarism in the Retraction Watch Database.) iThenticate lists all of these publishers as customers. The curious thing about these figures above is that they add up to over half of all the plagiarism cases. The publishers in this set have sufficient resources to license iThenticate and several have said that they use it regularly for plagiarism testing. They are also publishers with a reputation for caring about quality. Whether these statistics are reliable depends in part on what proportion of the overall market these publishers represent. If, for example, they publish over half of the articles, then having 54.2% of the total number of plagiarism cases is proportionally not quite so bad. Finding reliable statistics about publishing is not simple. Gaille (2018) wrote that: PLOS ONE was ranked as the top journal in terms of total publications in 2016, featuring over 21,000 content items. Scientific Reports came in second, the only other journal to top the 20,000-item mark. PLoS had, however, only 13 instances of plagiarism in the Retraction Watch Database. Scientific Reports is part of Springer Nature and separate from the other Springer group, and had only 16 instances of plagiarism. Scopus (n.d.) indicates the following percentages for publisher content:    

10% Elsevier, 8% Springer, 5% Wiley Blackwell, 5% Taylor & Francis.

Measurement 117

Assuming that these numbers are a reasonable representation of scholarly publishing, then 28% of the content of publishers that license iThenticate appears to account for 51.6% of the plagiarism. In other words, the availability of an effective tool for discovering plagiarism has not eliminated the problem. There are a number of possible reasons for the failure. One could be flaws in iThenticate itself. Debra Weber-Wulff (2016), a well-known plagiarism researcher, has not tested iThenticate itself, but offers comments on software-based plagiarism detection software that could help to explain the numbers: Software cannot accurately determine plagiarism; it can only indicate potential plagiarism. The decision whether or not a text parallel indeed constitutes plagiarism can only be determined by a person, as has often been stated in this chapter and elsewhere in this handbook. The interpretation of the reports generated by such systems is not an easy task. Training is required in order to be able to use the results to arrive at a conclusion. Basing a decision or a sanction only on a number produced by an unknown algorithm is irresponsible, as this indicates a lack of true understanding of the meaning of the numbers. The interpretation problem in detecting plagiarism is serious and the training required to understand the reports from systems like iThenticate is non-trivial. Since the decisions are often made by journal editors who have academic obligations at their home institutions, a real possibility is that these people simply overlook warning signs or make mistakes or possibly do not bother to check. Any and all of these reasons are possible. The question is then: why are there not more instances of plagiarism in the Retraction Watch Database for the large number of other publishers? A plausible answer is that more plagiarism remains undiscovered because those articles generate less public attention than ones published by more prestigious entities. Without a systematic examination of the articles in other journals, there is no reliable metric for answering the question. Incentive certainly plays a role in discovering plagiarism and all other forms of information integrity. Predatory publishers have no particular incentive to do much checking. Their income comes mainly from APCs, which means they have no pressure from subscribers. Authors who publish in predatory or potentially predatory journals are arguably less concerned with publisher reputation than with getting a publication on their vita, which could imply that they have a preference for not knowing about possible integrity violations in the journal, since the discovery could lower the value of their articles even further.

COVID-19 The case of the hydroxychloroquine study in the Lancet using data from the US company Surgisphere is a classic example of how fraud can slip past even highly

118 Measurement

regarded review processes. In this case the fake data appeared at a point when there was strong public demand for such information, and at a point when time pressure intensified the demand. Since then, the Lancet has implemented a new approach that “requires that more than one author on a paper must directly access and verify the data reported in the manuscript” (Davey, 2020). This assumes, of course, that one author is not in any way colluding with the source of false information. In fact reporters caught the Surgisphere problem using standard techniques: …days after the paper was published, Guardian Australia revealed issues with the Australian data in the study. Figures on the number of Covid-19 deaths and patients in hospital cited by the authors did not match up with official government and health department data. Senior clinicians involved in Covid19 research told Guardian Australia they had never heard of the Surgisphere database. (Davey, 2020) The mundane process of checking on the sources would have allowed either the authors or the editors to avoid this embarrassing problem. Measurement failed in this case as in many others simply because no one bothered to check. Examples of the complexity of interpreting presumably true but complex data are rife when dealing with COVID-19 information. An urgent question in the autumn of 2020 was whether countries faced a second wave of the pandemic, as ultimately, they did. There was historical data from the 1918 flu that made a second wave plausible. Some countries seemed to be experiencing a second wave, especially in Europe; others, such as China, seemed not to be. There had been hopeful speculation that summer heat would reduce the number of cases, which seemed reasonable since a typical flu season generally occurs in the late autumn and winter but eases in the summer. Nonetheless the evidence was inconsistent. Warmer African countries at one time seemed less affected, but the data from many of those countries was probably unreliable. Eventually it became clear that heat was not going to end the pandemic, since warm-weather countries like Mexico, Ecuador, Peru, and Brazil had increasingly high death rates, and in Europe warmer southern countries like Spain and Italy have continued to suffer more than Germany and Norway. A possible cultural explanation was that more self-disciplined countries handled COVID-19 better, but stereotypes do not explain why Belgium, Sweden and the Netherlands rank higher in deaths per million than many eastern or southern European countries where the stereotypes suggest less disciplined societies. Population density could be a contributing factor, which would explain the death rates in the Netherlands and Belgium, which have a high density, but not the low death rates in, for example, South Korea or Japan, which also have high densities. Some combination of statistics and models is likely needed that has not yet been discovered. One of the factors that make measurement hard is that all of the relevant data require context information and when that information is missing, the interpretation

Measurement 119

becomes challenging. In the case of COVID-19, a key piece of context is the reliability of the infection and death rates. As noted earlier, even statistics from individual US states may be imperfect or incomplete. If reporting from some of the less developed countries is flawed, as it may be reasonable to suspect, then their data could be meaningless. Missing but highly relevant data is equally problematic. For example, statistics on face-mask wearing could be highly relevant. Statista (n.d.) provides some figures from a 28 June 2020 online survey that found that Spain, Italy, Germany, and France had much higher rates of mask-wearing than the UK, Netherlands, Norway or Sweden, which suggests that any correlation between mask-wearing and per capita deaths is (or was) weak. One risk in drawing such a conclusion is, however, the assumption that the mask-wearing statistics are correct, and the reliability of a onetime online survey with about one thousand participants may also be too poor to be meaningful. The other risk is the lack of a time-series that would show the relationship between when mask-wearing began and the growth or decline of COVID-19 in those locations. The point here is that trying to interpret measurement results against single information sources that may themselves be flawed is one of the ways in which measurement can fail. Failing to find all of the relevant information sources is a common cause for measurement failure, and even when the effort is made, the search takes time and money and constant checking. When time is short, the temptation to use shortcuts is strong but “short cuts make long delays” as Tolkien (1961) wrote, and it is perhaps more true for analyses than for travellers.

Summary The core argument of this chapter is that measurement is hard and measuring correctly and completely is harder, which may explain why many reputable publishers and good companies sometimes get it wrong. There is no shame in admitting that measurement has failed as long as lessons are learned and the next effort is better. A number of publishers have learned lessons and changed rules. Whether other companies have learned from their errors is harder to say because their rules for quality control are generally less transparent. It would be easy for a casual reader to get an impression from the examples in this chapter that the measurement of information integrity often fails, but that is far from the case. The number of actual known instances is small compared to the immense amount of information currently available in the world in published form or online. What is necessary is constant checking, combined with a healthy scepticism about the data sources, even presumably reputable data sources, that are used for confirmation. This does not mean doubting everything, but it emphasises the importance of making careful classifications about which data are reliable in what ways. Classification of plagiarism should, for example, differentiate between the copying of whole paragraphs, which most people would agree is plagiarism, and paraphrasing content, where reasonable people can differ about the seriousness of the problem. Another classification example comes from COVID-19 statistics.

120 Measurement

Most of the statistics from developed countries could be classified as reliable, but a more nuanced classification might mark statistics from some US states like Florida as containing flaws that suggest an undercount. This kind of classification takes place mentally all the time and is so routine that people may forget to make it explicit when drawing conclusions.

6 ACTORS

Introduction This chapter looks at the people involved in measurement processes and considers them from the viewpoint of the role they take on. The focus will be on four types. The “investigators” are those who actively use measurement tools to look for potential violations of information integrity. There are other labels that could be used, including “hunters” as in “plagiarism hunters” or “integrity police.” Each of those labels has its own set of positive and negative connotations. The term “investigators” is a relatively neutral term, although it may fail to imply the urgency that makes some people successful when they look for integrity violations. The second type includes the “judges,” who are people who must decide whether an integrity violation is serious enough to pursue and ultimately serious enough to warrant some form of punishment. It is a hard role that many would rather avoid. A third role are the “violators“ or potential violators. These are people who range from those who consciously decided to commit information fraud to those who violate the integrity of information through inadvertence or negligence or ignorance or simple stupidity. It is easy to label these people as bad, but that reduces everyone to a single level of guilt that the data may not justify. A fourth role includes the “victims.” This category is routinely overlooked and can be divided into three sub-classes: those who suffer directly because of an integrity violation, those who suffer indirectly including some who are falsely or unfairly accused, and those victims who are invisible and may not even realise that information fraud affected them. These categories can include institutions such as publishers, universities, and companies, whose reputation suffers and who must sometimes pay financial penalties of a scale that affects everyone in the institution. DOI: 10.4324/9781003098942-6

122 Actors

Investigators This section will look at investigations into three types of information integrity violations: plagiarism, which is the most investigated; image manipulation, which is a growing problem; and data falsification, which is especially difficult to investigate because it is so various.

Plagiarism Some people look for integrity violations voluntarily and even eagerly out of a belief that routing out violations is an act of virtue, which it could well be. One of the best public examples of this comes from the VroniPlag Wiki, which is entirely a volunteer organisation. VroniPlag has established its own set of definitions for the different kinds of plagiarism, as noted in Chapter 1, and VroniPlag documents everything they consider a violation so that others can check. This transparency is not only admirable but is better than many official investigations. The VroniPlag investigators adhere to their definitions and gain no obvious personal advantage from the work, which must take a significant portion of their free time. Thus far VroniPlag has focused on doctoral dissertations in open access repositories at German universities. The initiative grew out of the scandal over the heavily plagiarised dissertation of the German Minister of Defence, Karl-Theodor zu Guttenberg, and the subsequent scandal over the dissertation of German Education Minister Annette Schavan, where the case was considered privately by many to be less clear.1 A similar scandal may not have caused the same level of political turmoil in other countries, in part because the voters in other countries often do not regard a doctorate as an important criterion for high political office. For investigators the satisfaction of forcing a high-level politician to resign can have a certain appeal, and VroniPlag is not the only investigator to pursue politicians, though they have been the most credible and systematic. Plagiarism is probably a less common information integrity violation among politicians in Germany and other countries than factual falsification and manipulation, for which examples will be discussed later in this chapter and have been mentioned elsewhere in the book. The advantage of plagiarism accusations is that the evidence for it can be displayed in ways that are objective and easy to understand. Germany is not the only country where plagiarism accusations have occurred. Journalists are among the most active investigators, and in 2016 and again in 2018 journalists accused US President Donald Trump’s wife Melania of plagiarism. In both cases the blame was deflected to speech or ghost writers (BBC News, 2018). The accusations made no real impact and were quickly forgotten. Joseph Biden was also accused of plagiarising British Labour politician Neil Kinnock. It wasn’t the first time Biden had used elements of Kinnock’s speech, Dowd noted, but on other occasions he had cited Kinnock as their source… (Emery, 2019)

Actors 123

Teachers and professors are obligated to investigate information integrity breaches of all types in work their students produce. Sometimes it is obvious when a student has made up information because the result is either implausible or missing references, and a professor can always question the source. Establishing evidence of plagiarism, in contrast, can be harder for teachers and professors if no automated tools are available to provide clear evidence, because there are too many possible sources to check. A search engine search is one way of checking on suspect passages, especially if more than one search engine is used, since the indexing can vary. A few universities such as King’s College London give students a chance to submit their papers to Turnitin as proof they have not plagiarised (Centre for Technology Enhanced Learning, 2020). Tools like Turnitin are not always an option for teachers and professors for a variety of reasons, among them cost and privacy rights issues.

Image manipulation With image manipulation cases, academic investigators have the advantage that the students often make obvious mistakes. Tools like Gimp or Photoshop require significant expertise to manipulate images in ways that look genuine. Students presenting a graph of data that looks too perfect can easily arouse suspicion in a professor who is aware of the variability of most sets of data. Students generally lack the experience to forge data successfully. Those who have successfully forged information, such as Diederik Stapel, have had enough experience with real research that they can make plausible judgments about how the data curve should look. Students generally do not. This does not mean that students cannot escape detection, but a well-versed professor has a reasonable chance of catching attempts. Professors can also make the investigation process easier by insisting on source transparency. Professors are sometimes also called on to act as investigators in more challenging contexts than teaching, even though, as has been noted earlier, academic investigators have less success in detecting expertly falsified data. Validating the genuineness of paintings is a high-money area where scholars are often called in as experts to pronounce on a new discovery. An example of this was the “Sidereus Nuncius,” which was purportedly a set of Galileo watercolours of the Earth’s moon. The story of how the bookseller Richard Lan asked the noted art historian Horst Bredekamp to authenticate the work has been told elsewhere in greater detail (Seadle, 2016). The brief version is that Bredekamp first called a symposium of experts, who agreed that the work was genuine, and then called a new symposium of experts to revise the authentication, after Nick Wilding from Georgia State University raised reasonable doubts based on solid factual information. The important fact here is how readily a good fake could fool not merely one but a whole a group of experts. A reasonable desire on the part of even the best of experts to be part of a new and exciting discovery can influence their judgment. Experts are only human in sometimes seeing what they want to see and in being blind to complicating details. This makes detection hard when it does not involve solid metrics.

124 Actors

Detecting manipulation in medical images is difficult for other reasons. Elisabeth Bik is an example of someone who trained herself to see manipulations in certain kinds of images, and has turned her skill into a business (Marcus and Oransky, 2019). She is not the only person who does this, but is probably the best known. Seeing revealing nuances in manipulations with only human eyes is not a scalable means of investigation, since the number of people who are genuinely skilled is small, and the learning time seems to be long. For some years there has been talk about the need for a computer-based tool that uses machine-learning techniques to flag potential manipulations. Chris Graf and Mary Walsh (2020) write: Wiley and Lumina are working together to support the efforts of researchers at Harvard Medical School to develop and test new machine learning tools and artificial intelligence (AI) software that can identify discrepancies in research image data. Harvard also has a contract with the Elsevier-funded HEADT Centre to use its database of images, and negotiations are ongoing with other universities regarding tools for discovering image manipulation. Information integrity investigations are important for all academic journals that want to maintain their credibility, and large publishers have staff whose jobs are to ensure the integrity of published articles. Typically, these units lack the resources to do sufficient systematic study of all articles. The expectation is that editors and the peer reviewers do at least a first level investigation and would involve publisher staff only after identifying a problem. This makes the peer reviewers the front-line integrity investigators, a role for which they are often poorly equipped. One of the major problems for peer reviewers is easy access to the data on which articles have based their conclusions. The push to make research data publicly accessible has gained strength over the years. The US National Science Foundation (2020a) has the following policy: Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing. See Proposal & Award Policies & Procedures Guide (PAPPG) Chapter XI.D.4. The German Research Foundation (Deutsche Forschunugsgemeinschaft, 2019b) has similar expectations: The long-term archiving and accessibility of research data contributes to the traceability and quality of scientific work and enables researchers to carry on work begun by others. The Alliance of Science Organisations in Germany voiced its support for the long-term archiving of research data, open access to

Actors 125

it and compliance with the conventions of individual disciplines in the “Principles for the Handling of Research Data,” adopted in 2010. The “Guidelines on the Handling of Research Data” put the framework stipulated by the Principles into a concrete form in the DFG’s funding arrangements. Readers should note that neither of these statements includes an absolute requirement for the data to be openly accessible. Phrases like “… are expected to …” and “… voiced its support for…” give researchers the option not to make data available. Nor is the principle that research data should be publicly accessible uncomplicated. Any research data with personal data must be redacted under the GDPR (General Data Protection Regulation). GDPR does not prevent open access, but the redaction may make particular details difficult to discover, for example tracing responses in interview data back to the source in order to find out whether people actually said what was claimed.

Data falsification Data that have a commercial value can also be problematic, especially in medical and bio-chemical areas where a lot of money may be involved. Even in straightforward cases where the research data are available and well-documented, the extra time needed to test whether the data are genuine and actually say what authors claim they do is a cost that falls directly on the reviewers with no direct compensation other than the respect they earn from the community for taking their job as a reviewer seriously. Some fields and some journals put more time pressure on the reviewers than others. Higher teaching loads at some institutions disadvantage those who strive to take an active and responsible role in the reviewing process. There are no statistics about how often and how thoroughly reviewers examine the sources behind articles when those data are available. Any guess is just speculation. A reasonable assumption may be that checking is less frequent than would be ideal. At the larger and better funded newspapers and news programmes, elite reporters are investigators by profession. One classic example is Bob Woodward, who made his reputation with the Watergate Scandal in 1972, where he and his colleague Carl Bernstein exposed lies by then US President Richard Nixon. His investigative record was not perfect. He failed to expose false statements about the existence of Iraqi weapons of mass destruction during the Gulf War. In 2020 he exposed the fact that President Donald Trump knew how dangerous the COVID-19 virus was at a time when Trump was publicly discounting it. Woodward is, however, an exception in having the time and resources to pursue complex investigations and the fame to help get access to levels of leadership normally closed to most people. Political scandals attract more attention than ordinary lies that involve some form of mismanagement, and the more complicated the story, the less likely it is to get headlines, even when it may be important in uncovering serious problems. A few investigative organisations do their best. ProPublica is a small Illinois-based investigative journal whose avowed mission is:

126 Actors

To expose abuses of power and betrayals of the public trust by government, business, and other institutions, using the moral force of investigative journalism to spur reform through the sustained spotlighting of wrongdoing. (ProPublica, 2020) Funding for ProPublica comes from a mix of grants ($10,686,215 as of 31 December 2019) and donations ($7,943,025) (O’Connor Davis Accountants and Advisors, 2020). It lists all of its staff on its website including its office manager for full transparency. As an organisation dedicated to investigative reporting, transparency is also a defence against critics. A typical ProPublica story is one involving the health care firm Prospect Medical: Prospect has a long history of breaking its word: It has closed hospitals it promised to preserve, failed to keep contractual commitments to invest millions in its facilities and paid its owners nine-figure dividends after saying it wouldn’t. Three lawsuits assert that Prospect committed Medicare fraud at one of its facilities. And ProPublica has learned of a multiyear scheme at a key Prospect operation that resulted in millions of dollars in improper claims being submitted to the government. (Elkind and Burke, 2020) The investigation cites publicly available statistics and facts as part of its story: All but one of Prospect’s hospitals rank below average in the federal government’s annual quality-of-care assessments, with just one or two stars out of five, placing them in the bottom 17% of all U.S. hospitals. The concerns are dire enough that on 14 occasions since 2010, Prospect facilities have been deemed by government inspectors to pose “immediate jeopardy” to their patients… (Elkind and Burke, 2020) ProPublica also let Prospect respond to its criticisms: Leonard Green and Prospect responded to ProPublica’s questions in written statements through Sitrick and Company, a crisis PR firm jointly retained on their behalf. They maintain that they’ve kept their commitments, abided by the law, provided good patient care and invested hundreds of millions of dollars, saving many failing hospitals and preserving thousands of jobs. (Elkind and Burke, 2020) The firm’s response fails to address directly the facts the story cites. The real question is whether exposing such problems will have any real effect on the health care that the firm provides or on the firm itself. Sometimes there is a slow effect when the situation is bad enough that government regulators actually intervene, but

Actors 127

internal corporate structures, corporate lobbying, and corporate donations to key political figures can play a role in delaying action, especially in a political climate that is hostile to regulation. Investigation is important, but not always immediately effective.

Judges Judges play a key role, not only in establishing whether information integrity has been violated, but also in deciding whether punishment is warranted and what form it should take. In rare cases the judges may be law court judges who must make a decision and pass a sentence based on the evidence that the investigators collected. More often, however, the judges are people with little or no legal training and sometimes little subject expertise.

Universities Virtually every university has some form of committee structure that makes decisions or at least recommendations about academic malpractice. The names of these committees are various and the standards for choosing people to fulfil these roles are diverse. One good example comes from the “Kommission zur Überprüfung von Vorwürfen wissenschaftlichen Fehlverhaltens (KWF)” (“Commission for Reviewing Accusations of Research Malpractice”) at Humboldt-Universität zu Berlin. The university has not consistently had a committee to handle these cases, but one was established in the early 2000s. The original KWV included a professor from the natural sciences, one from the humanities, and a post-doc (Mitarbeiter) from the social sciences. The idea was for the committee to have scholarly breadth as well as representation from multiple research-oriented status groups. The committee was later reconstituted to include four professors and two post-docs. One of the professors had to come from the law faculty in order to provide a measure of legal expertise for the committee. In addition, the current committee includes one member from Computer Science, one from Literature, one from Biology, and two post-docs, one each from Chemistry and Philosophy. The disciplinary spread makes it more likely that at least one of the members has a methodological understanding of the field in which an alleged violation took place. The VicePresident for Research nominates the members, and the Academic Senate approves them. The process offers both breadth and transparency. Currently the former chair of the KWF acts in an advisory role to provide continuity. The university has also named two people to fill the Ombudsman role. These people, a woman and a man, both professors, work closely with the KWV but play less of a judicial role. The KWV only handles cases involving doctoral-level work. Cases involving bachelor’s and master’s students are handled in each faculty (college). At the faculty (college) level the people who handle information integrity issues have little or no

128 Actors

training or experience with judging such cases, and the personal reactions range from a desire to set a harsh example for other students to a desire to forgive otherwise good students who made a stupid mistake. This is only one example. The number of possible models at universities across Europe and North America is large and varies widely. There is no systematic collection of examples of how the judging processes work or what the range of possible decisions are. For people convicted of a violation in their doctoral dissertation, the most common punishment is for them to lose their degree. For academics in many countries and depending on the circumstances, that might mean the loss of a position. In other countries and at some universities the loss of a degree may matter little. The range of other possible punishments is ill-defined. Most decisions are treated as black-and-white, guilty or not guilty. Criminal law offers a wider range of choices than is typical for university processes, though occasionally there are milder punishments where the person must remove the passage or fix the error or is censured in some fashion. An example can be found in the case of the Family Affairs Minister, Franziska Giffey. The Free University of Berlin (FU-Berlin) decided on a reprimand for possible plagiarism and not to withdraw her doctoral degree: Nachdem sich das Prüfungsgremium intensiv mit der Angelegenheit befasst hatte, schlug es dem Präsidium der Freien Universität Berlin vor, Frau Dr. Giffey eine Rüge zu erteilen und den 2010 vom Fachbereich Politik – und Sozialwissenschaften verliehenen Grad, “Doktorin der Politikwissenschaft” (Dr. rer. pol.) nicht zu entziehen. (Freie Universität Berlin, 2019) After the examination body considered the situation in detail, it proposed that the Presidium of the Free University of Berlin give Dr. Giffey a reprimand and that the title “Dr. of Political Science” from the Department of Political and Social Science should not be taken from her. (Author’s translation) The decision met with some criticism: Ein Plagiatsjäger von VroniPlag Wiki hat Unverständnis über die Entscheidung der Freien Universität geäußert, Bundesfamilienministerin Franziska Giffey (SPD) nicht den Doktortitel zu entziehen, sondern sie lediglich zu rügen. (Der Spiegel, 2019) A plagiarism hunter from VroniPlag Wiki expressed his disagreement with the Free University’s decision not to take away Franziska Giffey’s (SPD) doctoral degree, but only to reprimand her. The person the article referred to was Prof. Dr. Gerhard Dannemann, a noted legal scholar, whose concern was specifically that the choice of a reprimand lacked

Actors 129

legal justification (Der Spiegel, 2019). The Free University reopened the case in 2020, and in November 2020 Franziska Giffey announced that she would voluntarily cease using her Doctor title to avoid making it more of a political issue (Clement, 2020). This is a sign of how complex the issues and the options are. Strictly speaking, the case for withdrawing her title remains unproven.2 One of the advantages of judging plagiarism in an academic context is that rules and regulations exist, though sometimes there are too many rules with too little clarity about their meaning or their consequences for integrity violations. Universities do have staff and faculty who can advise on technical and legal issues, even if not everyone agrees with their advice. Nonetheless a substantial infrastructure exists to guide the process. For many other organisations, this kind of infrastructure may be missing.

Publishers Publishers face integrity violation cases at least as often as universities, and those that do not ignore the problem must judge what to do with articles against which some form of accusation has been made. Most publishers have fewer rules to guide their decision-making process. People often accuse publishers of moving too slowly to withdraw articles. The argument for quick action is that readers should be warned as quickly as possible about false information. The argument in favour of slower and more deliberate decision-making is that withdrawing an article is a dramatic undertaking that should not be done without serious consideration and due process. A responsible journal is not necessarily one that reacts without its own serious investigation. There is no clear universal rule for who acts as the final decision-maker in cases of integrity violations in journals. Probably the simplest scenario is for journals published by academic associations, where the association as a body has appointed someone as the editor (or editor-in-chief). In such cases there is no established hierarchical authority above the editor except for the association as a whole, which rarely if ever gets involved. For academic associations, the decision-making processes tend to mirror similar processes in universities. The editor could, for example, set up a group of sub-editors or reviewers to make a recommendation, but the final decision likely resides with the editor. Legal advice, if requested, is probably handled through the association. The decision-making at commercial journals may follow a somewhat similar pattern, with the editor taking the chief responsibility for making a judgment about the academic aspects of an integrity violation. Integrity violations can also have commercial consequences, depending on the degree of scandal involved. An ordinary plagiarism case with a little-known author might have few ramifications beyond the individual. A case such as happened with the fake Surgisphere data was a blow to the New England Journal of Medicine and the Lancet, which had both published articles based on these data. The former is a publication of the Massachusetts Medical Society and the latter is an Elsevier journal. The Lancet acted first

130 Actors

but the New England Journal of Medicine followed quickly. The case was, however, simple in its way because the source data turned out to be fake. No grey areas were involved. As a general rule it is probably safe to conclude that high-quality commercial publishers handle integrity violations in ways similar to association journals and universities. The broad set of less elite publishers offers fewer publicly available instances of integrity violations for comparison. This may be because no one bothers to check the quality of the articles because they are not taken seriously. Articles from predatory publishers are even less likely to be examined for integrity problems. For predatory publishers there is a clear economic incentive to ignore problems as much as possible, since more negative publicity risks future income. An interesting study might be a systematic examination of articles from less elite and from predatory publishers to be able to compare the occurrence of integrity violations in them with those of elite journals. The Retraction Watch Database shows primarily instances from journals that care enough to issue a retraction when someone discovers a problem. It is an imperfect basis for comparing the number of integrity violations across all academic publications, even though it is the best that is currently available.

Newspapers and social media Newspapers and commercial journals have a staff hierarchy for judging integrity problems. It is not unusual for high-quality publications like the New York Times to issue corrections and updates. It is extremely rare that an article would be retracted completely, and if that happens, it is generally done quietly. The controversy over hydroxychloroquine as a treatment for COVID-19 generated plenty of untrue news stories, but the untruth mostly resulted from genuine unclarity and conflicting reports, not from fake stories by reporters. The reports, even when wrong, represent part of the historical record and for that reason should probably be kept. In the era of physical paper-based publications, removing a false story was not possible in any case. Social media companies have had to develop their own judging process. In the beginning they avoided any kind of comment or retraction of false statements on the grounds that they were not responsible for what people posted, but the outcry against conspiracy theories, racist or divisive comments, and outright untruths persuaded both Twitter and Facebook to adopt explicit terms and conditions allowing them to label or remove content. The Facebook statement is very broad: We also can remove or restrict access to your content, services or information if we determine that doing so is reasonably necessary to avoid or mitigate adverse legal or regulatory impacts to Facebook… (Mihalcik, 2020) Facebook has in fact become relatively aggressive in enforcing its new rules:

Actors 131

Facebook, facing criticism that it hasn’t done enough to curb a fast-growing, fringe conspiracy movement, said on Tuesday that it would remove any group, page or Instagram account that openly identified with QAnon. The change drastically hardens earlier policies outlined by the social media company. (Frenkel, 2020a) Twitter began correcting misleading posts in March 2020. Its policy distinguishes between misleading claims, which will be removed, disputed claims, which get a warning label, and the broad category of unverified claim, which is left in place unmarked. Deciding which category to apply is in part a political judgment. When President Trump tweeted that people should not be afraid of COVID-19, objections came from medical as well as political sources: But Facebook and Twitter did nothing about Mr. Trump’s post, even though the companies have publicized their coronavirus misinformation policies. … Facebook has said it does not allow coronavirus posts that can lead to direct physical harm, and will redirect people to a Covid-19 information center. Twitter also removes only posts that contain demonstrably false information with the “highest likelihood of leading to physical harm.” (Frenkel, 2020b) Precisely who in Facebook and Twitter actually made the judgments is not clear, but it seems likely that judgments involving someone in high political office like the President would be made by the corporate leadership. Many corporations that are not involved in publishing put out marketing information that may be false. These corporations generally have vague policies or none whatsoever about information integrity, and they have fewer rules for judging. Whistle-blower laws exist in many countries with the goal of offering protection to people within organisations who uncover serious problems that the corporate chain of command ignores. Investigations into the Boeing 737 Max after two planes crashed in 2018 and 2019 exposed a number of safety concerns. A Boeing engineer who last year lodged an internal ethics complaint alleging serious shortcomings in development of the 737 MAX has written to a U.S. Senate committee asserting that systemic problems with the jet’s design “must be fixed before the 737 MAX is allowed to return to service.” (Gates, 2020) The same engineer suggested that Boeing had been covering up wider-ranging problems: “However, given the numerous other known flaws in the airframe, it will be just a matter of time before another flight crew is overwhelmed by a design flaw known to Boeing and further lives are senselessly lost.” He goes on to

132 Actors

suggest similar shortcomings in the flight-control systems may affect the safety of Boeing’s forthcoming 777X widebody jet. (Gates, 2020) To be fair one must recognise how difficult it is for a company to come to terms with accusations that say, in effect, that the whole organisation has been making false claims about safety for a long time and that new safety claims for company products may also be false. If engineering accusations proved true, and if multiple flaws in planes had put passengers at risk, it could lead to the bankruptcy of the last large-scale US airplane builder and to a serious economic downturn in the Seattle area, where the planes are built. No one in the firm could want that. That is why judging a situation like this takes on a different character than plagiarism cases at a university, because of the number of lives at stake. Even judging where the primary responsibility for the lies originated is far from simple in a situation where the technology itself is so complex that many levels of the corporate management may genuinely not understand the information they are given. A common tendency is to single out a few lower-level technical people to blame, rather than looking higher at the corporate leadership itself. Any judgment ought to involve the whole cycle of decision-making and testing, which appears so far to have put the priority on short-term profits in order to rush out a flawed product. A judgment that condemns an organisational philosophy would involve a greater transformation than just punishing a few people, and such transformations take time.

Violators Explanations of why people choose to violate information integrity belongs really to the field of psychology, which is outside the scope of this book. Nonetheless a historian can look at instances where people intentionally engage in falsification and can categorise them in ways that help to make sense of the larger phenomenon. It is important to remember that the culture of a particular time and place influences the kind of misinformation that has an effect on society. For example, a contemporary radio commentator who claimed that the ancient Greek god Zeus had caused an airplane crash would be laughed at in modern society, because lies work only when a sufficient subset of the population believe the story. Lies are on some level a form of fiction and a good fiction writer must know what is credible.

Misinformation superspreaders This section will look at integrity violators in a range of circumstances, beginning with large-scale public lies whose aim is to influence political decisions. These violators face no hierarchy of standards and no judges who can (or will) inflict punishments, though some may limit their ability to broadcast a fake message. The new Facebook and Twitter rules were met with outrage in some quarters because

Actors 133

they took away a comfortable platform for publishing fantasies. The rules did not stop QAnon, for example, from spewing out conspiracy theories. It only forced QAnon believers to find new outlets. Some of the most popular outlets for conservative commentators in the US are the radio talk shows: Like single-issue voters, talk radio fans are able to exercise outsize influence on the political landscape by the intensity of their ideological commitment. Political scientists have long noted the way in which single-issue voters can punch above their numerical weight. (Matzko, 2020) An example can be seen below: Take talk radio’s role in spreading Covid denialism. At each stage of the backlash against government recommendations for fighting the pandemic, talk radio hosts prepared the way for broader conservative resistance. (Matzko, 2020) One example of a conservative radio talk host in the article was Rush Limbaugh, who had earlier promoted the fake claim that Barack Obama was not a US citizen and therefore was ineligible for the presidency: The influential right wing talk show host, Rush Limbaugh, told listeners last week that the president’ “has yet to have to prove that he’s a citizen.” (McGreal, 2009) McGreal (2009) explains the facts: At the heart of the supposed conspiracy is Obama’s failure to produce a paper version of his birth certificate because Hawaii digitalised its original records some years ago and now provides a print out of the electronic record. That print out shows he was born in Honolulu in 1961, a fact that was verified again today by the state’s health director, Dr. Chiyome Fukino. The motivation for spreading false claims about Obama’s birth came presumably from a desire to discredit a popular black Democratic politician, and a measure of latent racism could also have played a role. Lies have long been common in political campaigns in all countries, and this particular untruth found resonance among people who disliked the idea of a black American president. As a simple political dirty trick, it was effective in giving a number of people a reason not to vote for Obama, and the fact that the claim was factually wrong made little difference, especially to those who were already inherently suspicious about digital information. Rush Limbaugh went on to receive the Presidential Medal of Freedom from

134 Actors

Donald Trump in what for many served as confirmation of Limbaugh’s statements (Vigdor, 2020). Another example of untruth as part of a political struggle were the claims that Boris Johnson made that “the U.K. pays £350 million per week to the EU” (Dallison, 2019). Dallison’s article goes on to elaborate: The NHS cash claim – which is false – was plastered all over a campaign bus that toured Britain in the run-up to the Brexit referendum in 2016, promising to spend that money on the U.K.’s National Health Service instead of on EU membership. It is hard to know with any certainty whether the untruth was a deliberate election tactic or merely grew out of Boris Johnson’s relaxed attitude toward facts and numbers. Max Hastings (2019), Johnson’s former boss at the Daily Telegraph, offered an explanation: I have known Johnson since the 1980s, when I edited the Daily Telegraph and he was our flamboyant Brussels correspondent. I have argued for a decade that, while he is a brilliant entertainer who made a popular maître d’ for London as its mayor, he is unfit for national office, because it seems he cares for no interest save his own fame and gratification. It may well be that people who initiate lies on public media are in some sense entertainers, whose primary goal is to get attention, and that a reason why the lies work is that some portion of the public willingly believes the story instead of what may be a more complicated truth. Hugh Trevor-Roper (1973, p. xxxi) wrote about Hitler as a “terrible simplifier” who disdained complex problems. Simple answers have a broad appeal for a portion of the public who do not understand the nuances of policy issues and want explanations that help make issues more comprehensible and more relevant to their lives. Repeating a lie is a strategy that seems to work to make it more acceptable. The “stab in the back” legend (“Dolchstosslegende”) in post-First World War Germany began with soldiers who believed the wartime propaganda that Germany would win, grew stronger because the Versailles treaty treated the Armistice as if it were a surrender, and kept growing as nationalist groups, including the Nazis, repeated it as a justification for their disdain for the Weimar Republic. This is only one wellknown example of the phenomenon of an untruth gaining acceptance. Joseph McCarthy’s anti-Communist campaign built on widespread fears in the US, and films that showed the UK struggling against Nazi aggression in the Second World War helped to reinforce a sense in the UK that Europe was the enemy, a sense that Boris Johnson invoked indirectly in his references to Winston Churchill. The antivaccination movement in the US and the disdain for wearing masks during the COVID-19 pandemic are contemporary examples. Untrue claims that have achieved a broad degree of popularity are hard to eradicate once they take hold,

Actors 135

even when, as with vaccinations or masks, the lives of the believers are at stake. Fighting fears with facts is hard in an atmosphere where the scientific process for establishing fact has itself lost public trust. In late November 2020 when Donald Trump continued his demonstrably false claim that he had not lost the US presidential election, Bittner (2020) cited the “Dolchstosslegende” and wrote that Trump’s claim: … should be seen as what it is: an attempt to elevate “They stole it” to the level of legend, perhaps seeding for the future social polarization and division on a scale America has never seen. It seemed not to matter to many Trump followers that the courts have thrown out case after case claiming false results as lacking evidence.

Followers Those who intentionally invent untruths are probably a small subset of the population, partly because it takes a certain kind of skill and a degree of showmanship to fashion a widely acceptable falsehood. A very much larger population segment that is involved with information integrity violations do nothing more than spread the fake news. These people come in many types, the most common of which could well be those who really believe the falsehood and pass it on to friends and relatives as true. Trust is one of the key evaluation criteria for evaluating information, and trust tends to be hierarchical. Members of a religious congregation have a predisposition to trust their pastor or priest. Sermon-giving is an established means for explaining what to believe, and an individual pastor or priest is more likely to accept the word of their peers or church hierarchy than to express independent scepticism based on scientific facts. This is only human and explains how rumours and untruths spread within communities somewhat in the way that COVID-19 superspreaders transmit the virus to people with whom they are in close contact. The adherents of talk show personalities function also as superspreaders in society. They hear an explanation they like and share it in their workplace or with family. It is entirely possible that they hear no counterarguments, because information communities also tend to segregate into groups with particular points of view and particular criteria that convince those within the community while ignoring those outside of it. Most information communities are, however, not exclusive, and how much a community shapes criteria for judging varies. Not every professor at every university accepts the same scholarly conclusions outside of the natural sciences, where scientific logic follows strict rules and requires that results are physically demonstrable. The rules of logic in other scholarly branches offer more flexibility and room for dispute. Nonetheless long years of formal education and training tend to shape people to share expectations for certain kinds of evidence and certain forms of logic. Outsiders sometimes protest that universities are not open to diverse opinions, but there is evidence that the opposite is true. Lee Bollinger (2019), the former president of Columbia University, wrote:

136 Actors

According to a 2016 Knight Foundation survey, 78 percent of college students reported they favor an open learning environment that includes offensive views. President Trump may be surprised to learn that the U.S. adult population as a whole lags well behind, with only 66 percent of adults favoring uninhibited discourse. Listening to multiple viewpoints is normal in academic discourse. An important factor is that members of the university community have scientific criteria for judging claims that outsiders may not share. Measuring the number of people believing and potentially transmitting untrue information is hard because reliable categorisation fails without precise definitions, which do not as yet exist. Nonetheless some broad categories have become measurable because of public awareness and concern. One such category is a distrust of vaccines in the US: A recent online survey of more than 2,000 U.S. adults, conducted by The Harris Poll on behalf of the American Osteopathic Association, revealed that more than 2 in 5 American adults (45 percent) say something has caused them to doubt vaccine safety. (American Osteopathic Association, 2019) Doubt is a vague survey criterion that probably makes the number of positive responses larger than if the survey asked specifically whether people would decline vaccination, but even so it gives a sense of how widespread the concern about vaccines has become. Examples exist of vaccines that have had dangerous consequences, such as the 1955 polio vaccine: It was soon discovered that some lots of Cutter and Wyeth polio vaccine were insufficiently inactivated with formalin leading to live polio virus in more than 100,000 doses. In fact, 16 lots of Cutter polio vaccine were retested and the first six lots produced were positive for live polio virus. These incidents demonstrated the lack of oversight and safeguards put into place before the vaccine was made so widely available. (Juskewitch et al., 2010) The problem was, as noted here, a lack of testing, and this experience is one of several reasons why standards for testing have become much stricter. Nonetheless public memory is long and new doubts about vaccines arose during the controversy over whether some vaccines could cause autism. Gerber and Offit (2009) wrote: On 28 February 1998, Andrew Wakefield, a British gastroenterologist, and colleagues [1] published a paper in The Lancet that described 8 children whose first symptoms of autism appeared within 1 month after receiving an MMR vaccine.

Actors 137

One of the problems in the Wakefield study was a lack of control subjects “which precluded the authors from determining whether the occurrence of autism following receipt of MMR vaccine was causal or coincidental.” Gerber and Offit (2009) wrote that in the end “[t]wenty epidemiologic studies have shown that neither thimerosal nor MMR vaccine causes autism.” Once the concern reached public attention, however, the scientific evidence failed to convince everyone that vaccines were safe. Spreading untrue information happens at some level in most organisations as rumours spread in lunchrooms or on coffee breaks. There is no reliable measure of how many staff at, for example, Boeing continued to say that the 737 Max was safe after the two crashes, but it is reasonable to expect that many or even most employees continued to believe company claims, partly out of loyalty to the firm, partly because decent people would not want to think that a product they worked on directly or indirectly caused people to die. Even once the evidence became clear that the Maneuvering Characteristics Augmentation System (MCAS) software played a significant role in causing the crash, some in the company defended it. Several factors hampered the software developers, including outdated computers. Boeing initially chose to minimise risk by choosing hardware that “had proven to be safe” and therefore stuck with a model “first built in 1996” that used “a pair of single-core, 16-bit processors” (Campbell, 2020). Staff in the company continued to believe that software could fix the problems with the 737 Max: In June 2019, Boeing submitted a software fix to the FAA for approval, but subsequent stress-testing of the Max’s computers revealed more flaws than just bad code. They are vulnerable to single-bit errors that could disable entire control systems or throw the airplane into an uncommanded dive. They fail to boot up properly. They may even “freeze” in autopilot mode even when the airplane is in a stall, which could hamper recovery efforts in the middle of an in-flight emergency. (Campbell, 2020) The Federal Aviation Administration (FAA) has not insisted on more computers (the Airbus Neo has seven) and felt political pressure to certify the fix. Complex situations like this make it easy for employees to rally around company claims, especially when their jobs are at stake. The jobs of the directors who are responsible for risky decisions and untrue claims generally remain safe.

Victims Victims of information integrity problems come in many types. This section will discuss three categories of victims: 1) those who are easily identified and measured, which is the rarest type; 2) those affected mainly indirectly by integrity issues and whose numbers are hard to measure; and 3) the invisible victims whose cases count

138 Actors

theoretically, but may not know or care about what happened and cannot easily be counted at all.

Immediate victims The number of situations in which immediate victims are obvious is small, such as those who died in the 737 Max crashes because the software failed to adjust accurately for airframe design problems. The same is true for people who die because of false information about vaccine safety, either because they were afraid to take an immunisation that might have saved their lives or, in rarer cases, because of quality control or testing problems as with the 1950 polio vaccine. Some non-trivial number of the deaths from COVID-19 can also be attributed to misinformation about masks and the need for social distancing, including those who believed false statements about COVID-19 being no worse than ordinary flu. The excess deaths can be counted, even though no measure tells exactly how many deaths can be ascribed to misinformation. That number can only be estimated by, for example, looking at people who went to indoor public events without wearing masks and later died from COVID-19. Without better mechanisms for tracking the initial spreaders, however, any such estimate is open to dispute. The victims in these cases are not necessarily more gullible than others. In general, they are people who choose to believe the wrong authority. A significant theme in the 2020 US Presidential campaign was about how Donald Trump downplayed the seriousness of the pandemic for personal political reasons. Many people automatically believe that the chief executive of a country (or of a company) represents an authority who has access to reliable information. Believing an authority figure is not unreasonable given social expectations that a chief executive should behave responsibly and will act in the best interests of the people or the organisation, and there are instances where a political leader has in fact sacrificed himself and his party in an attempt to save lives. Robert Peel’s decision to repeal the “Corn Laws” in 1846 to help to alleviate the Irish famine is an example. Such cases are uncommon and abiding trust in the good intentions of leaders may itself count as a form of misinformation.

Indirect victims The indirect victims of false information are hard to identify because the consequences often do not become known for years. Trust in the principles of science and the scientific method has itself become an indirect but important victim of the growth of integrity violations. In stress situations like a pandemic science is not always perfect, especially when genuine scientists feel pressured into making findings available quickly because of the urgency of the situation and have failed to do all of the normal testing and reviewing of results. Having genuine scientists rush to publish wrong results only serves to sow more doubt about the reliability of science in general among the broader public. What precisely a “belief” in science means is itself a complex issue. Dagnall et al. (2019) write:

Actors 139

Acknowledging the potentially important role that secular beliefs play in modern society, Farias et al. (2013) developed the Belief in Science Scale (BISS). The BISS is a 10-item research tool, which measures the degree to which individuals endorse the legitimacy of the scientific approach… Accordingly, the scale recognizes differences in attitudes toward science. These range from rejection of the scientific approach, through acceptance of science as a reliable but fallible source of knowledge, to the conviction that science provides exclusive, veridical insights into reality. The important point here is that a belief or trust in scientific results is a grey-scale range with some variations that include conditions that might, under certain circumstances, be called a healthy scepticism toward counterintuitive results. Readers should remember that what counts as counterintuitive is a culturally based construct. An example may be the degree to which people in more individualistic western societies tend “intuitively” to regard the requirement to wear masks as a restriction on their personal liberty, while people in east Asian countries more willingly accept the scientific advice that mask-wearing protects against COVID-19 transmission. Funk (2020) wrote about the results of a Pew Research survey in an article entitled “Key Findings about Americans’ Confidence in Science and Their Views on Scientists’ Role in Society.” The data came from five surveys that were done between March 2016 and October 2019. All surveys were conducted using the American Trends Panel (ATP), an online survey panel that is recruited through national, random sampling of residential addresses. This way nearly all U.S. adults have a chance of being selected. The survey is weighted to be representative of the U.S. adult population by gender, race, ethnicity, education and other categories. (Funk, 2020) One result was that “[a]bout three-quarters of Americans (73%) say science has, on balance, had a mostly positive effect on society.” Nonetheless there is also substantial scepticism about the reliability of results: Overall, a 63% majority of Americans say the scientific method generally produces sound conclusions, while 35% think it can be used to produce “any result a researcher wants.” People’s level of knowledge can influence beliefs about these matters, but it does so through the lens of partisanship, a tendency known as motivated reasoning. (Funk, 2020) The statistic that a third of Americans believe that scientists can get any result they want suggests that those people may be vulnerable to fake news and its consequences for health and behaviour. Such scepticism weakens the impact of

140 Actors

scientific results generally. When a non-trivial proportion of the population becomes sceptical toward science, the doubt can itself become contagious. The Pew study looked at the attitudes of people in the US. Other studies in Germany and the UK give somewhat different results: Results from both Germany and the UK suggest that public trust in science and researchers appears to have soared during the coronavirus pandemic. The proportion of Germans who said that they trust science and research “wholeheartedly” shot up to 36 per cent in mid-April. This is four times the proportion recorded in the same survey in 2019 and substantially higher than in earlier years. Another 37 per cent said that they were “likely” to trust science and research. A fifth were undecided, while six per cent were lacking in trust. (Matthews, 2020) It is important to remember that these surveys are not directly comparable, so the 20% undecided and the 6% sceptical (26% total) are not answering the same questions as the 35% from the US Pew Research survey who claim to believe science can produce any results a researcher wants.

Invisible victims While the lost trust in science is an abstract if important topic, the monetary costs to publishers and universities in discovering and dealing with integrity violations of all sorts is more concrete. It is nonetheless still hard to measure with any precision, since the financial numbers are mostly not publicly available. These costs make publishers and universities into victims as well since they would far rather not have to spend money to deal with the problem. Plagiarism discovery software is one of the clearest costs. In the case of iThenticate, the software pricing is transparent on its website for submissions of up to 25,000 words (US $100) and 75,000 words (US $300). A 2012 article by iThenticate staff reported: A Fall 2011 survey conducted by iThenticate, which sampled nearly 200 of its customers across five industry types (government, non-profit, publishing, research, and scientific, technical and medical (STM)), found that 60 percent of respondents who dealt with an incident reported the total value of capital losses for the organization to cost up to $10,000. Ten percent reporting damages up to $50,000. This article builds on a study by Michalek et al. (2010) that explains the basis of some of the costs: Costs of the investigation may be divided into personnel (committee membership, witnesses, and support staff), material costs, and consultant costs. The most expensive component of any investigation is faculty time. Faculty members

Actors 141

engaged in our reviews are usually associate or full professors. Faculty members on investigation committees spend considerable time both in and out of the formal committee meetings. Time spent outside formal meetings is directed at reviewing materials, securing additional information, reanalysis of data, writing, and other preparatory activities. Our experience is that faculty spend anywhere from three to ten times more time working outside of meetings than they do in meetings. While these studies are old, the efforts to address integrity problems at universities and publishers have grown rather than diminished, which means that the costs today are likely substantially higher than in 2010. Some of these are opportunity costs rather than out-of-pocket, which are real but not readily visible on anyone’s bottom line, which makes measurement harder. The line between direct and indirect victims is often fuzzy. In the case of Diederik Stapel, his doctoral students became direct victims when he gave them fake data for their dissertations. In fact all students who read articles with false or manipulated data count as victims if they believe and act on the results. In the natural sciences a failure to be able to reproduce fake results in a lab experiment could discourage some students from continuing their studies. This scenario is likely rare but not impossible and illustrates how hard it is to determine boundaries for the consequences of false information. For the public, the victims of false information are even harder to identify. Historically there are cases where the acceptance of false claims stopped abruptly, such as at the end of the Cold War when the propaganda about economic and social successes in the eastern bloc and failures in the west lost credibility. It is far too simplistic to claim that a rejection of fake news brought down the Soviet Empire – too many other factors contributed – but the change in attitude demonstrates the limits of lies once the fear of retaliation is removed. The number of people who believed the propaganda were probably in the millions and some may believe elements of it still. Lies are not always caught in the end, but genuine facts tend in the long run to undermine false information and to show its believers to be victims. Smoking is an example where some level of disbelief in the dangers persists, even while the number of smokers has grown smaller. Not everyone dies from smoking, but the long-term health effects are facts that few now deny. Trump supporters have also persisted in claiming fraud in the 2020 presidential election despite the lack of evidence. How long that belief continues remains to be seen and could affect future elections. The indirect victims of information integrity may deny that they are victims at all. This includes the thousands who have attended events during the pandemic without precautions such as masks or social distancing. For people who enjoy the tightly packed atmosphere of a pub or party or political event, the salient issue may well not be the truth of the scientific advice to wear masks or maintain social distance but rather the ordinary human desire to do what they have always done for comfort and pleasure. Some people may not be able to resist the temptation even when they know the risk, as with smoking. They are less victims of false information than of human weakness.

142 Actors

Proponents of plagiarism-checking are correct in saying that it is illegal to reuse copyright-protected materials, and certainly there can be substantial costs if a case goes to court. Despite that, there are few articles with reliable data about the monetary cost of plagiarism, especially plagiarism in academic works. Music piracy (the industry term) means unauthorised use and is not strictly speaking identical with plagiarism, though the two are clearly related. The financial losses from piracy are substantial according to the RIAA (Recording Industry Association of America) (2020): Music theft also leads to the loss of $2.7 billion in earnings annually in both the sound recording industry and in downstream retail industries. Book plagiarism was much discussed before international treaties went into effect. Some book plagiarism still exists, often in the form of digital copies on websites, but a reliable quantification of the losses is hard to find. At the article level a quantification of the financial losses is hard, if only because it is difficult to determine whether publishers lose customers because content is available elsewhere. For the theft of whole articles, some such calculation may be possible, but much of the concern about academic plagiarism today is at the paragraph, sentence, or phrase level where tracking financial losses is essentially absurd. iThenticate (2020) has a blog where it talks in general terms about monetary costs: Many recent news reports and articles have exposed plagiarism by journalists, authors, public figures, and researchers. In the case where an author sues a plagiarist, the author may be granted monetary restitution. In the case where a journalist works for a magazine, newspaper or other publisher, or even if a student is found plagiarizing in school, the offending plagiarist could have to pay monetary penalties. The use of the subjunctive “could” suggests a lack of real data. The same blog discusses legal repercussions: The legal repercussions of plagiarism can be quite serious. Copyright laws are absolute. One cannot use another person’s material without citation and reference. An author has the right to sue a plagiarist. Some plagiarism may also be deemed a criminal offense, possibly leading to a prison sentence. Evidence of anyone going to prison for plagiarism in western countries is hard to find. Legal costs and fines are real, but are rarely applied to academic cases. Certainly, copyright holders are in some sense victims of plagiarism, but the degree to which they care enough to take action seems limited. The real victims of plagiarism may be the plagiarists themselves.

Actors 143

Summary The boundaries between the types of actors in information integrity cases are often fuzzy. One person may play investigator and simultaneously act as judge, depending on the circumstances. Violators can become victims themselves, for example in plagiarism cases where the copying resulted from the accused’s own carelessness rather than from an actual intention to plagiarise. Victims can also inadvertently cause new integrity problems when they pass on false information in good faith. Many information integrity problems do not stay neatly in boundaries, and they can infect whole societies with doubt about which claims to trust. The erosion of trust in science is one example. Aggressive unfounded claims can lead to an erosion of trust in institutions. This can be seen at a national level not only in campaign claims about “draining the swamp” to end corruption in Washington, but also in claims that mail-in voting could lead to widespread election fraud in the US. In a country that has long suffered from social, economic, and racial divisions, trust in the basic integrity of the voting system has acted as a safety valve. Serious evidence that widespread fraud occurred in the 2020 election seems not to exist and now appears to be no more than a campaign tactic to shed doubt on an unfavourable outcome. Yet it has had an unsettling effect. Less than two weeks before the election news stories about Russian and Iranian attempts to influence the election appeared in the New York Times: The Trump administration’s announcement that a foreign adversary, Iran, had tried to influence the election by sending intimidating emails was both a stark warning and a reminder of how other powers can exploit the vulnerabilities exposed by the Russian interference in 2016. But it may also play into President Trump’s hands. For weeks, he has argued, without evidence, that the vote on Nov. 3 will be “rigged,” that mail-in ballots will lead to widespread fraud and that the only way he can be defeated is if his opponents cheat. (Barnes and Sanger, 2020) The timing of the government announcement, as the article points out, played into one of the themes of the Trump campaign. Nineteenth-century Britain had a long history of people attacking the fairness of the electoral system, but the issue there was an imbalance of power in voting rights. The same was true in the US before the Voting Rights Act 1965. Those criticisms had substantial data behind them. The impact of Russian and Iranian influence remains to be measured, and evidence of widespread mail-in ballot fraud is nil. Lies have always been part of electoral campaigns, but that hardly justifies them. At this point the long-term damage from the campaign lies in 2020 will only be measurable in future decades.

Note 1 Personal communications to the author. 2 As of 11 May it appears that the Free University will withdraw Franziska Giffey’s doctorate. (Wehner, 2021).

7 CONCLUSION AND CONSEQUENCES

Introduction The consequences of information integrity violations have been a recurrent theme in the previous chapters. This conclusion puts them in a broader social and historical context and looks at the role that measurement plays across lifetimes and across centuries. The first topic is information fraud in the scholarly and scientific community, and the second is fraud in the wider world.

Scholarly fraud Plagiarism is the largest category of integrity fraud in academic journals. At the individual level an intelligent person who consciously plans to plagiarise could attempt to find out the statistical likelihood of being caught and the quantity of copying that might on average be ignored. One approach might, for example, be to study the Retraction Watch Database to find journals that retract articles due plagiarism and to avoid them, especially when they appear to use plagiarism checkers. Such a person might also look at public plagiarism cases to see how much and what kind of copying is tolerated. The website of Editage (2018) suggests there is a convention that “a text similarity below 15% is acceptable by the journals…” The number will certainly vary depending on the journal and the kind of copying. For many investigators, 15% might be much too high. Individuals who are inclined deliberately to risk plagiarism could research which journals fail to do serious checking in order to minimise their likelihood of being caught, but the benefit may not (and ideally should not) justify the risk. It may be possible to calculate a figure for known cases of plagiarism as a percentage of all publications. The denominator for the ratio would be the total number of articles. Altbach and de Wit (2018) write: DOI: 10.4324/9781003098942-7

Conclusion and consequences 145

No one knows how many scientific journals there are, but several estimates point to around 30,000, with close to two million articles published each year. Two million articles per year provides a plausible base and the number matches the estimate in the STM Report for 2018 (Johnson et al., 2018). According to a footnote, the information comes from Ulrich’s Web Directory, which is a standard source. The numerator for the calculation could come from the Retraction Watch Database, which lists 2535 plagiarism-related retractions as of 4 February 2021. This number represents only known plagiarism cases and is certainly low. It leaves out cases VroniPlag has flagged, for example, which do not count as journal publications in any case. Assuming that the Retraction Watch cases all came from a single year, as an extreme compensation for missing cases, the amount of plagiarism in scientific journals would still only be 0.13% of Altbach’s estimate. Searching the database only for retractions in the year 2018 (the year of Atbach’s numbers), the result is only 127 instances, so the percentage of plagiarism for 2018 could be as low as 0.0064%. The important point here is that the amount of known plagiarism may well be less than one percent of scholarly publications. The damage that plagiarism does to science in the broadest sense should also be kept in perspective. Plagiarism is an ethical problem and could be a copyright issue in some cases, but plagiarism per se does not undermine the ability to build on existing scientific results unless the plagiarised content itself has other flaws. In the scholarly sphere, data falsification and image manipulation do direct damage to science, because any new results based on those articles build on a false base and cannot be trusted. As with plagiarism, measurement is possible but not easy. The total number of retractions in the Retraction Watch Database as of 4 February 2021 was 23,491. Deducting the total number of plagiarism cases gives 20,956 or 1.05% of Altbach’s two million articles, assuming the total were for one year in order to compensate for missing cases. The actual total number of nonplagiarism cases for 2018 (the year of Altbach’s number) in the Database is 1348, and deducting the 127 plagiarism cases gives 1221, which is 0.06% of all scientific publications. While these estimates in fact prove nothing, their goal is to give readers a plausible perspective on the amount of genuinely damaging information fraud in scientific publishing. Damaging information fraud does exists and should not be trivialised. It needs to be uncovered and prevented, but the scale of the fraud should not cause people to lose faith in the integrity of the scientific community or the scientific method. That is the key point here.

Fraud in the wider world Information fraud outside of the scholarly world is harder to measure, partly because of its integration into popular myths. Public information fraud is dangerous because it can and does lead to deadly and damaging policy decisions. For example, the previous chapters discussed information fraud involving topics like the

146 Conclusion and consequences

environment and COVID-19. There is evidence to suggest that the failure to control the pandemic may have caused over a million deaths worldwide. Of course, not all deaths are due to policies based on false information, and measurement is hard. Excess mortality figures offer some information, but not a model to explain the effect policy decisions might have had. The economist Jeffrey Sachs (2020) and three colleagues directly addressed the policy issue in October 2020 in an attempt at a calculation: Through comparative analysis and applying proportional mortality rates, we estimate that at least 130,000 deaths and perhaps as many as 210,000 could have been avoided with earlier policy interventions and more robust federal coordination and leadership. Their method was to compare the US numbers with six other “more proactive high-income countries…,” namely South Korea, Japan, Australia, Germany, Canada and France. The justification for choosing these countries is vague and the differences among them and with the US are too significant to ignore as factors. The model is simplistic, but at least attempts to address the policy consequences. It is even harder to measure how false information in social media or broadcast news contributed to COVID-19 deaths, even though the number of people attending parties or events without masks and without social distancing appears to have a direct connection to false information. It may be equally true that the false information merely justifies risky behaviour that people would follow anyway without some strict legal and social enforcement. The number is not currently measurable. Looking at false information from a broader historical perspective, it is nothing new. Social illusions that count today as false information have embedded themselves in many cultures and still serve as reasons for racism, violence, or aggression. There is a reasonable argument that Brexit builds on a widespread illusion that the UK can go it alone, as it did in the early months of the Second World War. Free market claims from the Thatcher-Reagan era that reduced regulation would spread prosperity seem less credible today than in the 1980s. The most serious damage that false information seems to have done is perhaps to reduce popular trust in scientific and scholarly results. In the 1950s in the midst of the Cold War and the space race many ambitious students wanted to study the natural sciences. The interest today has declined for a variety of social and economic reasons unconnected with information falsehoods, but a consequence of this social shift may be one of the reasons why the respect for science has also declined. Scientific fraud only makes the situation worse and allows commentators speaking in their own self-interest to claim that any scientific results they dislike are invalid. Even if the measurable amount of information fraud coming from scientific journals is objectively small, it acts as a seed that spreads doubt throughout society. Measurement is only one of many tools, but reliable measurements mean reliable data and that is the starting point for science as we know it.

Conclusion and consequences 147

Some factors that led to distrust in scientific results have receded, at least for the moment, and the political leadership in key countries such as the United States have reaffirmed the importance of building policies on a solid scientific basis. This is an important victory, but complacency is dangerous and readers of this book should not forget how important it is to defend the integrity of information in all of its forms.

BIBLIOGRAPHY

AAMC (Association of American Medical Colleges). n.d. ‘MD-PhD Degree Programs by State.’ https://students-residents.aamc.org/applying-medical-school/article/mdphd-degreeprograms-state/. Abbott, Andrew. 2001. Chaos of Disciplines. Chicago: University of Chicago Press. Acuna, Daniel E., Paul S.Brookes, and Konrad P.Kording. 2018. ‘Bioscience-Scale Automated Detection of Figure Element Reuse.’ BioRxiv. https://doi.org/10.1101/ 269415. ALLEA. 2020a. ‘ALLEA in Brief.’ https://allea.org/allea-in-brief/. ALLEA. 2020b. ‘The European Code of Conduct for Research Integrity.’ https://ec.europa. eu/research/participants/data/ref/h2020/other/hi/h2020-ethics_code-of-conduct_en.pdf. Altbach, Philip, and Hans de Wit. 2018. ‘Too Much Academic Research Is Being Published.’ University World News. www.universityworldnews.com/post.php?story=20180905095203579. American Osteopathic Association. 2019. ‘45 Percent of Surveyed American Adults Doubt Vaccine Safety.’ Infection Control Today, 24 June. www.infectioncontroltoday.com/view/ 45-percent-surveyed-american-adults-doubt-vaccine-safety. Anderson, Nick, and Susan Svrluga. 2019. ‘Universities Worry about Potential Loss of Chinese Students.’ Washington Post, 4 June, sec. Education. www.washingtonpost.com/ local/education/universities-worry-about-potential-loss-of-chinese-students/2019/06/03/ 567044ea-861b-11e9-98c1-e945ae5db8fb_story.html. ArXiv. n.d. ArXiv.org e-Print Archive. https://arxiv.org/. Badge, J. L., and R. M. Badge. 2009. ‘Online Lab Books for Supervision of Project Students.’ Bioscience Education 14 (1): 1–4. https://doi.org/10.3108/beej.14.c1. Bailey, Jonathan. 2013. ‘The Difference between Copyright Infringement and Plagiarism.’ Plagiarism Today (blog). www.plagiarismtoday.com/2013/10/07/difference-copyrightinfringement-plagiarism/. Ballhaus, Rebecca. 2020. ‘Trump, in Bob Woodward Interview, Said He Played Down Coronavirus’s Severity.’ Wall Street Journal, 10 September, sec. Politics. www.wsj.com/arti cles/trump-says-he-played-down-coronaviruss-severity-in-bob-woodward-interview11599675374.

Bibliography 149

Barnes, Julian E., and David E. Sanger. 2020. ‘Iran and Russia Seek to Influence Election in Final Days, U.S. Officials Warn.’ The New York Times, 21 October, sec. U.S. www.nytim es.com/2020/10/21/us/politics/iran-russia-election-interference.html. Barrionuevo, Alexei. 2006. ‘Enron Chiefs Guilty of Fraud and Conspiracy.’ The New York Times, 25 May, sec. Business. www.nytimes.com/2006/05/25/business/25cnd-enron.html. BBC News. 2018. ‘Melania Trump Faces New Plagiarism Row.’ BBC News, 8 May, sec. US & Canada. www.bbc.com/news/world-us-canada-44038656. Beck, Thorsten. 2019. ‘HEADT Centre – How to Detect Image Manipulations Part 5.’ HEADT Centre, 2 April. https://headt.eu/How-to-Detect-Image-Manipulations-Part-5/. Belam, Martin. 2020. ‘Prestigious US Science Journal to Back Biden in First Endorsement in 175-Year History.’ The Guardian, 16 September, sec. US news. www.theguardian.com/ us-news/2020/sep/16/prestigious-us-science-journal-breaks-with-tradition-to-back-biden. Bhalla, Jag. 2020. ‘Coronavirus in Germany: Angela Merkel Explains the Risks of Loosening Social Distancing Too Fast.’ www.vox.com/2020/4/17/21225916/coronavirus-in-germa ny-angela-merkel-lifting-lockdown. Bic, Elisabeth. 2019. ‘Science Integrity Digest FAQ.’ Science Integrity Digest (blog). 23 September. https://scienceintegritydigest.com/frequently-asked-questions/. Bittner, Jochen. 2020. ‘1918 Germany Has a Warning for America.’ The New York Times, 30 November, sec. Opinion. www.nytimes.com/2020/11/30/opinion/trump-conspiracygermany-1918.html. Björk, Bo-Christer. 2018. ‘Evolution of the Scholarly Mega-Journal, 2006–2017.’ PeerJ 6 (February): e4357. https://doi.org/10.7717/peerj.4357. Bleek, Wilhelm. 2020. ‘Politikwissenschaft.’ bpb.de. www.bpb.de/nachschlagen/lexika/ha ndwoerterbuch-politisches-system/202090/politikwissenschaft. BMBF-Internetredaktion. n.d. ‘Was Forschende und Lehrende wissen sollten.’ Bundesministerium für Bildung und Forschung (BMBF). www.bmbf.de/de/was-forschende-und-lehrendewissen-sollten-9523.html. BMBF-Internetredaktion. 2020. ‘Gesetzliche Erlaubnis und Zitatrecht.’ Bundesministerium für Bildung und Forschung – BMBF Digitale Zukunft. 2020. www.bildung-forschung. digital/de/gesetzliche-erlaubnis-und-zitatrecht-2651.html. Bollinger, Lee C. 2019. ‘Free Speech on Campus Is Doing Just Fine, Thank You.’ The Atlantic, 12 June. www.theatlantic.com/ideas/archive/2019/06/free-speech-crisis-camp us-isnt-real/591394/. Booth, Wayne C. 1961. Rhetoric of Fiction. Chicago: University of Chicago Press. Bourdeau, Michel. 2020. ‘Auguste Comte.’ In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta. Standford: Metaphysics Research Lab, Stanford University. https://p lato.stanford.edu/archives/fall2020/entries/comte/. Bretag, Tracey, ed. 2016. Handbook of Academic Integrity. Singapore: Springer Singapore. https:// doi.org/10.1007/978-981-287-098-8. Bundesministerium für Bildung und Forschung. n.d. ‘Urheberrecht in der Wissenschaft.’ www.bmbf.de/upload_filestore/pub/Handreichung_UrhWissG.pdf. Bury, J. B., and Russel Meiggs. 1975. History of Greece. 4th edn. New York: St. Martin’s Press. Calvert, Philip. 2015. ‘Should All Lab Books Be Treated as Vital Records? An Investigation into the Use of Lab Books by Research Scientists.’ Australian Academic & Research Libraries 46 (4): 291–304. https://doi.org/10.1080/00048623.2015.1108897. Campbell, Darryl. 2020. ‘The Ancient Computers in the Boeing 737 Max Are Holding up a Fix.’ The Verge. 9 April. www.theverge.com/2020/4/9/21197162/boeing-737-max-softwa re-hardware-computer-fcc-crash.

150 Bibliography

Campbell, W. Joseph. 2016. ‘No, “Politico” – Hearst Didn’t Vow to “Furnish the War.”’ Media Myth Alert (blog). 18 December. https://mediamythalert.com/2016/12/18/no-p olitico-hearst-didnt-vow-to-furnish-the-war/. CDC (US Centers for Disease Control and Prevention). 2020. ‘Excess Deaths Associated with COVID-19.’ 29 May. www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm. Center Watch Staff. 2017. ‘Publication Bias and Clinical Trial Outcomes Reporting.’ 12 June. www.centerwatch.com/articles/13512. Centre for Technology Enhanced Learning, King’s College London. 2020. ‘Submitting a Turnitin Assignment’. www.kcl.ac.uk/teachlearntech/assets/submitting-a-turnitin-assignm ent.pdf. CERN. 2019. ‘CMS Releases Open Data for Machine Learning.’ https://home.cern/news/ news/knowledge-sharing/cms-releases-open-data-machine-learning. Chankova, Mariya. 2019. ‘Teaching Academic Integrity: The Missing Link.’ Journal of Academic Ethics. https://doi.org/10.1007/s10805-019-09356-y. Chen, Xiaotian. 2019. ‘Scholarly Journals’ Publication Frequency and Number of Articles in 2018–2019: A Study of SCI, SSCI, CSCD, and CSSCI Journals.’ Publications 7 (3): 58. https://doi.org/10.3390/publications7030058. Chokshi, Niraj. 2020. ‘House Report Condemns Boeing and F.A.A. in 737 Max Disasters.’ The New York Times, 16 September, sec. Business. www.nytimes.com/2020/09/16/busi ness/boeing-737-max-house-report.html. Christopher, Jana. n.d. ‘Image Integrity.’ https://image-integrity.com/index.html. Clark, Mark. 2015. Star Wars FAQ: Everything Left to Know About the Trilogy That Changed the Movies. Milwaukee: Hal Leonard Corporation. Clement, Kai. 2020. ‘Giffey verzichtet auf Doktortitel: Flucht nach vorn?’ Tagesschau. 13 November. www.tagesschau.de/inland/giffey-doktortitel-105.html. Colquhoun, David. 2007. ‘How to Get Good Science.’ PN Soapbox. http://dcscience.net/ colquhoun-goodscience-jp-version-2007.pdf. Cook, John, Dana Nuccitelli, Sarah A. Green, Mark Richardson, Bärbel Winkler, Rob Painting, Robert Way, Peter Jacobs, and Andrew Skuce. 2013. ‘Quantifying the Consensus on Anthropogenic Global Warming in the Scientific Literature.’ Environmental Research Letters 8 (2): 024024. https://doi.org/10.1088/1748-9326/8/2/024024. Cook, John, Naomi Oreskes, Peter T. Doran, William R. L. Anderegg, Bart Verheggen, Ed W. Maibach, J. Stuart Carlton, et al. 2016. ‘Consensus on Consensus: A Synthesis of Consensus Estimates on Human-Caused Global Warming.’ Environmental Research Letters 11 (4): 048002. https://doi.org/10.1088/1748-9326/11/4/048002. Cornell Legal Information Institute. 2020. ‘17 U.S. Code § 107 – Limitations on Exclusive Rights: Fair Use.’ LII / Legal Information Institute. www.law.cornell.edu/uscode/text/ 17/107. CSSE (Center for Systems Science and Engineering at Johns Hopkins University). 2021. ‘Coronavirus COVID-19 (2019-NCoV).’ 26 January. https://gisanddata.maps.arcgis. com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6. Dagnall, Neil, Andrew Denovan, Kenneth Graham Drinkwater, and Andrew Parker. 2019. ‘An Evaluation of the Belief in Science Scale.’ Frontiers in Psychology 10 (April). https:// doi.org/10.3389/fpsyg.2019.00861. Dallison, Paul. 2019. ‘Bid to Prosecute Boris Johnson over Brexit Bus Claim Thrown out by Court.’ Politico. 7 June. www.politico.eu/article/boris-johnson-brexit-bus/. Data Colada. n.d. ‘About.’ Data Colada (blog). http://datacolada.org/about. Davey, Melissa L. 2020. ‘The Lancet Changes Editorial Policy after Hydroxychloroquine Covid Study Retraction.’ The Guardian. 22 September. www.theguardian.com/world/

Bibliography 151

2020/sep/22/the-lancet-reforms-editorial-policy-after-hydroxychloroquine-covid-studyretraction. dejure.org. n.d. Zitate. Urheberrechtsgesetz. § 51. https://dejure.org/gesetze/UrhG/51.html. Delamater, Paul L., Erica J. Street, Timothy F. Leslie, Y. Tony Yang, and Kathryn H. Jacobsen. 2019. ‘Complexity of the Basic Reproduction Number (R0).’ Emerging Infectious Diseases 25 (1): 1–4. https://doi.org/10.3201/eid2501.171901. Department of Political Science, College of Liberal Arts. 2020. ‘History.’ https://cla.umn. edu/polisci/about/history. Der Spiegel. 2019. ‘Franziska Giffeys Doktortitel: Plagiatsjäger Kritisiert Entscheidung der FU Berlin.’ www.spiegel.de/lebenundlernen/uni/franziska-giffeys-doktortitel-plagiatsja eger-kritisiert-entscheidung-der-fu-berlin-a-1294425.html. Desilver, Drew. 2018. ‘U.S. Voter Turnout Trails Most Developed Countries.’ Pew Research Center (blog). www.pewresearch.org/fact-tank/2018/05/21/u-s-voter-turnouttrails-most-developed-countries/. Deutsche Forschunugsgemeinschaft. n.d. ‘DFG, German Research Foundation – Handling of Research Data.’ www.dfg.de/en/research_funding/proposal_review_decision/applica nts/research_data/index.html. Deutsche Forschunugsgemeinschaft. 2019a. ‘The German Research Ombudsman.’ https:// ombudsman-fuer-die-wissenschaft.de/?lang=en. Deutsche Forschunugsgemeinschaft. 2019b. ‘Guidelines for Safeguarding Good Research Practice Code of Conduct.’ www.dfg.de/download/pdf/foerderung/rechtliche_rahm enbedingungen/gute_wissenschaftliche_praxis/kodex_gwp_en.pdf. Drope, J., and S. Chapman. 2001. ‘Tobacco Industry Efforts at Discrediting Scientific Knowledge of Environmental Tobacco Smoke: A Review of Internal Industry Documents.’ Journal of Epidemiology and Community Health 55 (8): 588–594. https://doi.org/10. 1136/jech.55.8.588. Ecob, Russel. n.d. ‘Reviews.’ https://rss.onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j. 1740-9713.2005.00123.x. Editage. 2018. ‘What Is the Acceptable Percentage of Plagiarism Report?’ Editage Insights. 5 April. www.editage.com/insights/what-is-the-acceptable-percentage-of-plagiarism-report. Eichenwald, Kurt, and Floyd Norris. 2002. ‘Enron’s Collapse: The Auditor; Enron’s Auditor Says It Destroyed Documents.’ The New York Times, 11 January, sec. Business. www. nytimes.com/2002/01/11/business/enron-s-collapse-the-auditor-enron-s-auditor-says-itdestroyed-documents.html. Eissen, Sven Meyer zu, and Benno Stein. 2006. ‘Intrinsic Plagiarism Detection.’ In Advances in Information Retrieval, edited by Mounia Lalmas, Andy MacFarlane, Stefan Rüger, Anastasios Tombros, Theodora Tsikrika, and Alexei Yavlinsky. Berlin, Heidelberg: Springer. https://doi.org/10.1007/11735106_66. Elkind, Peter, and Doris Burke. 2020. ‘Investors Extracted $400 Million From a Hospital Chain That Sometimes Couldn’t Pay for Medical Supplies or Gas for Ambulances.’ ProPublica. 30 September. www.propublica.org/article/investors-extracted-400-million-from -a-hospital-chain-that-sometimes-couldnt-pay-for-medical-supplies-or-gas-for-ambulances. Ellis, Brenda. 2020. ‘Guides: Internet News, Fact-Checking, & Critical Thinking: NonPartisan Fact Checking Sites.’ https://middlebury.libguides.com/internet/fact-checking. Emery, David. 2019. ‘Did Joe Biden Drop Out of the ‘88 Presidential Race After Admitting to Plagiarism?’ Snopes.com. www.snopes.com/fact-check/joe-biden-plagiarism/. Enserink, Martin. 2012. ‘Fraud-Detection Tool Could Shake Up Psychology.’ Science 337 (6090): 21–22. https://doi.org/10.1126/science.337.6090.21. etymonline. 2020. ‘Discipline | Origin and Meaning of Discipline by Online Etymology Dictionary.’ www. etymonline.com/word/discipline.

152 Bibliography

FCC (Federal Communications Commission). 2011. ‘Broadcasting False Information.’ 4 May. www.fcc.gov/consumers/guides/broadcasting-false-information. Fogel, Robert, and Stanley Engerman. 1974. Time on the Cross: The Economics of American. New York: Norton. Freie Universität Berlin. 2019. ‘Freie Universität Berlin beschließt, Dr. Franziska Giffey für ihre Dissertation eine Rüge zu erteilen – der Doktorgrad wird nicht entzogen.’ 30 October. www.fu-berlin.de/presse/informationen/fup/2019/fup_19_320-dissertation-fra nziska-giffey1/index.html. Frenkel, Sheera. 2020a. ‘Facebook Amps Up Its Crackdown on QAnon.’ The New York Times, 6 October, sec. Technology. www.nytimes.com/2020/10/06/technology/fa cebook-qanon-crackdown.html. Frenkel, Sheera. 2020b. ‘Tracking Viral Misinformation Ahead of the 2020 Election.’ The New York Times, 7 October, sec. Technology. www.nytimes.com/live/2020/2020-elec tion-misinformation-distortions. Funk, Cary. 2020. ‘Key Findings about Americans’ Confidence in Science and Their Views on Scientists’ Role in Society.’ Pew Research Center (blog). 12 February. www.pewresea rch.org/fact-tank/2020/02/12/key-findings-about-americans-confidence-in-science-andtheir-views-on-scientists-role-in-society/. Furnell, Steven, and Eugene H. Spafford. 2019. ‘The Morris Worm at 30.’ ITNOW 61 (1): 32–33. https://doi.org/10.1093/itnow/bwz013. Gaille, Brandon. 2018. ‘17 Academic Publishing Industry Statistics and Trends.’ BrandonGaille.com, 18 November. https://brandongaille.com/17-academic-publishing-indus try-statistics-and-trends/. Garfield, Eugene. 1998. ‘The Impact Factor and Using It Correctly’. Eugene Garfield Library, University of Pennsylvania. http://garfield.library.upenn.edu/papers/derunfa llchirurg_v101%286%29p413y1998english.html. Gates, Dominic. 2020. ‘Boeing Whistleblower Alleges Systemic Problems with 737 Max.’ The Seattle Times. 18 June. www.seattletimes.com/business/boeing-aerospace/boeingwhistleblower-alleges-systemic-problems-with-737-max/. Gerber, Jeffrey S., and Paul A. Offit. 2009. ‘Vaccines and Autism: A Tale of Shifting Hypotheses.’ Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 48 (4): 456. https://doi.org/10.1086/596476. Giles, Jim. 2005. ‘Internet Encyclopaedias Go Head to Head.’ Nature 438 (7070): 900–901. https://doi.org/10.1038/438900a. Glasziou, Paul P., Sharon Sanders, and Tammy Hoffmann. 2020. ‘Waste in Covid-19 Research.’ BMJ 369 (May). https://doi.org/10.1136/bmj.m1847. Gleick, James. 2011. The Information: A History, a Theory, a Flood. New York: Vintage. Goldman, Gretchen T., Jacob M. Carter, Yun Wang, and Janice M. Larson. 2020. ‘Perceived Losses of Scientific Integrity under the Trump Administration: A Survey of Federal Scientists.’ PLOS ONE 15 (4): e0231929. https://doi.org/10.1371/journal.pone.0231929. Goodwin, Paul. 2020. ‘Without Learning to Think Statistically, We’ll Never Know When People Are Bending the Truth.’ The Guardian. 31 October. www.theguardian.com/ commentisfree/2020/oct/31/think-statistically-truth-falsehood. Graf, Chris, and Mary Walsh. 2020. ‘Software to Improve Reliability of Research Image Data: Wiley, Lumina, and Researchers at Harvard Medical School Work Together on Solutions’. 10 June. www.wiley.com/network/featured-content/software-to-improve-r eliability-of-research-image-data-wiley-lumina-and-researchers-at-harvard-medical-schoolwork-together-on-solutions. Graziosi, Barbara. 2002. Inventing Homer: The Early Reception of Epic. Cambridge: Cambridge University Press.

Bibliography 153

Grynbaum, Michael M., and Tiffany Hsu. 2020. ‘Advertisers Are Fleeing Tucker Carlson. Fox News Viewers Have Stayed.’ The New York Times, 18 June, sec. Business. www. nytimes.com/2020/06/18/business/media/tucker-carlson-advertisers-ratings.html. Foucault, Michel. 2016. Histoire de la folie à l’âge classique. Paris: Gallimard. Hacking, Ian. 1990. The Taming of Chance. Cambridge: Cambridge University Press. Hadas, Shema. 2014. ‘The Birth of Modern Peer Review.’ Trends in Biotechnology (blog). 19 April. https://blogs.scientificamerican.com/information-culture/the-birth-of-modernpeer-review/. Hage, Jurriaan, Peter Rademaker, and Nikè Van Vugt. 2010. ‘A Comparison of Plagiarism Detection Tools.’ Technical Report UU-CS-2010–2015. Utrecht: Department of Information and Computing Sciences Utrecht University. http://citeseerx.ist.psu.edu/view doc/download?doi=10.1.1.178.1043&rep=rep1&type=pdf. Haider, Asad. 2020. ‘IdentityWords and Sequences.’ History of the Present 10 (2): 237–255. https://doi.org/10.1215/21599785-8351841. Halffman, Willem, and Serge P. J. M. Horbach. 2020. ‘What Are Innovations in Peer Review and Editorial Assessment For?’ Genome Biology 21 (1): 87. https://doi.org/10. 1186/s13059-020-02004-4. Hall, Alex. 1974. ‘By Other Means: The Legal Struggle against the SPD in Wilhelmine Germany 1890–1900.’ The Historical Journal 17 (2): 365–386. Hand, David J. 2004. Measurement: Theory and Practice: The World through Quantification. London: Hodder Arnold. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1740-9713. 2005.00123.x. Harper, Douglas. 2020. ‘Data | Origin and Meaning of Data by Online Etymology Dictionary.’ www.etymonline.com/word/data. Hastings, Max. 2019. ‘I Was Boris Johnson’s Boss: He Is Utterly Unfit to Be Prime Minister.’ The Guardian. 24 June. www.theguardian.com/commentisfree/2019/jun/24/bor is-johnson-prime-minister-tory-party-britain. Healing Law. 2020. ‘Healing Law: Connecting You with the Perfect Lawyer.’ Healing Law (blog). https://healinglaw.com/lawyers/richmond/. Herodotus. 1824. The History of Herodotus. Oxford: Talboys and Wheeler. Herring, David, and Michon Scott. 2020. ‘Isn’t There a Lot of Disagreement among Climate Scientists about Global Warming?’ Climate.gov; Science & Information for a Climate-Smark Nation. 3 February 2020. www.climate.gov/news-features/climate-qa/ isnt-there-lot-disagreement-among-climate-scientists-about-global-warming. Higher Funding Council of England. 2020. ‘Clarivate Analytics Will Provide Citation Data during REF 2021.’ www.ref.ac.uk/news/clarivate-analytics-will-provide-citation-dataduring-ref-2021/. Hiilamo, Heikki, Eric Crosbie, and Stanton A. Glantz. 2014. ‘The Evolution of Health Warning Labels on Cigarette Packs: The Role of Precedents, and Tobacco Industry Strategies to Block Diffusion.’ Tobacco Control 23 (1). https://doi.org/10.1136/toba ccocontrol-2012-050541. Hudon, Edward G. 1964. ‘Literary Piracy, Charles Dickens and the American Copyright Law.’ American Bar Association Journal 50 (12): 1157–1160. Humboldt-Universität zu Berlin. 2001. ‘Dokumentenserver der Humboldt-Universität zu Berlin.’ 7 September. https://web.archive.org/web/20010907092650/http://edoc.hu-ber lin.de/. Humboldt-Universität zu Berlin. 2018. ‘Leitlinien Für Den Edoc-Server.’ 2018. https:// edoc-info.hu-berlin.de/de/nutzung/nutzung_leitlinien. Humboldt-Universität zu Berlin. 2020. ‘Vorwürfe wissenschaftlichen Fehlverhaltens – Kommissionen.’ 30 June. https://gremien.hu-berlin.de/de/kommissionen/fehlverhalten.

154 Bibliography

Institute for Health Metrics and Evaluation, University of Washington. n.d. ‘About the GHDx.’ http://ghdx.healthdata.org/about-ghdx. Ioannidis, John P. A. 2018. ‘The Proposal to Lower P Value Thresholds to.005.’ JAMA 319 (14): 1429–1430. https://doi.org/10.1001/jama.2018.1536. iThenitcate. 2012. ‘iThenticate Misconduct Report 2012.’ iThenticate White Papers: True Costs of Research Misconduct. www.ithenticate.com/resources/papers/resea rch-misconduct. iThenticate. 2020. ‘6 Consequences of Plagiarism.’ www.ithenticate.com/resources/6-con sequences-of-plagiarism. Johnson, Rob, Anthony Watkinson, and Michael Mabe. 2018. The STM Report: An Overview of Scientific and Scholarly Publishing. 5th edn. The Hague: International Association of Scientific, Technical and Medical Publishers. www.stm-assoc.org/2018_10_04_STM_ Report_2018.pdf. Juskewitch, Justin E., Carmen J. Tapia, and Anthony J. Windebank. 2010. ‘Lessons from the Salk Polio Vaccine: Methods for and Risks of Rapid Translation.’ Clinical and Translational Science 3 (4): 182–185. https://doi.org/10.1111/j.1752-8062.2010.00205.x. Kabaservice, Geoffrey. 2020. ‘The Republican Convention Depicted an Alternate Reality. Will Americans Buy It?’ The Guardian, 30 August, sec. Opinion. www.theguardian.com/ commentisfree/2020/aug/30/the-republican-convention-depicted-an-alternate-realitywill-americans-buy-it. Kahneman, Daniel, and Richard Thaler. 2006. ‘Anomalies: Utility Maximization and Experienced Utility.’ Journal of Economic Perspectives 20 (1): 221–234. Kennedy, Amelia. 2019. ‘A Brief History of the Footnote – from the Middle Ages to Today.’ Quetext (blog). 1 September. www.quetext.com/blog/a-brief-history-of-the-footnote. King’s College London. 2020. ‘Submitting a Turnitin Assignment.’ www.kcl.ac.uk/teachlea rntech/assets/submitting-a-turnitin-assignment.pdf. Köhler, Katrin, and Debora Weber-Wulff. 2010. ‘Plagiarism Detection Test 2010.’ https://p lagiat.htw-berlin.de/wp-content/uploads/PlagiarismDetectionTest2010-final.pdf. Kolchin, Peter. 1992. ‘More Time on the Cross? An Evaluation of Robert William Fogel’s Without Consent or Contract.’ The Journal of Southern History 58 (3): 491–502. https:// doi.org/10.2307/2210165. Kollewe, Julia, and agencies. 2019. ‘Boeing Chief to Admit Company Made Mistakes over 737 Max.’ The Guardian, 29 October, sec. Business. www.theguardian.com/business/ 2019/oct/29/boeing-mistakes-737-max. Kristof, Nicholas. 2020. ‘“We’re No. 28! And Dropping!”’ The New York Times, 9 September, sec. Opinion. www.nytimes.com/2020/09/09/opinion/united-states-social-p rogress.html. Lees, Nathalie. 2020. ‘Covid-19 – Scientific Research on the Coronavirus Is Being Released in a Torrent.’ The Economist, 7 May, sec. Science and Technology. www.economist.com/scien ce-and-technology/2020/05/07/scientific-research-on-the-coronavirus-is-being-released-in-atorrent. LPIXEL. 2019. ‘LPIXEL Announces Upgrade of ImaChek, the Automated Image Manipulation and Duplication Checking System for Scientific Papers.’ 7 November. https://lp ixel.net/en/news/press-release/2019/9100/. Marcus, Adam, and Ivan Oransky. 2019. ‘Eye for Manipulation: A Profile of Elisabeth Bik.’ The Scientist Magazine. www.the-scientist.com/news-opinion/eye-for-manipulation–a-profi le-of-elisabeth-bik-65839. Markowitz, David M., and Jeffrey T. Hancock. 2014. ‘Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel,’ edited by Daniele Fanelli. PLoS ONE 9 (8): e105937. https:// doi.org/10.1371/journal.pone.0105937.

Bibliography 155

Matthews, David. 2020. ‘Public Trust in Science “Soars Following Pandemic.”’ Times Higher Education (THE). 7 May. www.timeshighereducation.com/news/public-trust-science-soa rs-following-pandemic. Matzko, Paul. 2020. ‘Talk Radio Is Turning Millions of Americans Into Conservatives.’ The New York Times, 9 October, sec. Opinion. www.nytimes.com/2020/10/09/opinion/ta lk-radio-conservatives-trumpism.html. Mayo Clinic Staff. 2019. ‘The Truth about Red Wine and Heart Health.’ 22 October. www.mayoclinic.org/diseases-conditions/heart-disease/in-depth/red-wine/art-20048281. McGreal, Chris. 2009. ‘Anti-Barack Obama “Birther Movement” Gathers Steam.’ The Guardian, 28 July, sec. US news. www.theguardian.com/world/2009/jul/28/birther-m ovement-obama-citizenship. McGrory, Kathleen, and Rebecca Woolington. 2020. ‘Florida Halts Coronavirus Death Data.’ Miami Herald. www.miamiherald.com/news/state/florida/article242369266.html. McInnes, Rod. 2020. ‘General Election 2019: Turnout.’ https://commonslibrary.parliament. uk/insights/general-election-2019-turnout/. MDR.de. n.d. ‘Trianon: Ein Friedensvertrag stiftet Unfrieden.’ www.mdr.de/zeitreise/tria non-ungarn-friedensvertrag-geschichte-100.html. Michalek, Arthur M., Alan D. Hutson, Camille P. Wicher, and Donald L. Trump. 2010. ‘The Costs and Underappreciated Consequences of Research Misconduct: A Case Study.’ PLOS Medicine 7 (8): e1000318. https://doi.org/10.1371/journal.pmed.1000318. Middleton, John. 2015. World Monarchies and Dynasties. Abingdon: Routledge. Mihalcik, Carrie. 2020. ‘Facebook Notifies People It Can Remove or Block Content to Avoid Regulatory Risk.’ CNET. 1 September. www.cnet.com/news/facebook-notifies-p eople-it-can-remove-or-block-content-to-avoid-regulatory-risk/. Mikkelson, David. 2020. ‘Did Trump Say Operation Desert Storm Took Place in Vietnam?’ Snopes.com. 5 July. www.snopes.com/fact-check/trump-vietnam-desert-storm/. Murphy, Hannah. 2020. ‘Unilever Pulls Facebook and Twitter Ads on Hate Speech Concerns.’ 26 June. www.ft.com/content/5e9624fa-d121-44a9-ba84-d456385e50ab. National Science Foundation. 2020a. ‘Dissemination and Sharing of Research Results.’ www.nsf.gov/bfa/dias/policy/dmp.jsp. National Science Foundation. 2020b. ‘Quality Standards.’ www.nsf.gov/policies/infoqual.jsp. National Science Foundation. 2020c. ‘Responsible and Ethical Conduct of Research (RECR).’ www.nsf.gov/od/recr.jsp. National Science Foundation. 2018. ‘Academic Research and Development; Publication Output by Country.’ National Science Board; Science and Engineering Indicators. www. nsf.gov/statistics/2018/nsb20181/report/sections/academic-research-and-development/ outputs-of-s-e-research-publications#publication-output-by-country. NDLTD. n.d. ‘A Short Document on NDLTD for Its 20th Year Celebration.’ https://drive. google.com/file/d/0B6G_HvAHMBqFUGIzZVZFVzI0b3c/view?usp=embed_facebook. NDLTF. n.d. ‘Networked Digital Library of Theses and Dissertations.’ www.ndltd.org/. New York Times. n.d. ‘Coronavirus World Map: Tracking the Global Outbreak’. www. nytimes.com/interactive/2020/world/coronavirus-maps.html. New York Times. 2020. ‘Coronavirus in the U.S.: Latest Map and Case Count.’ The New York Times, 20 July, sec. U.S. www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html. Normal, Jeremy. 2018. ‘Joseph Pulitzer Offers to Found the First School of Journalism at Columbia University, but the Journalism School at the University of Missouri Precedes It.’ History of Information. www.historyofinformation.com/detail.php?id=4228. Nurunnabi, Mohammad, and Monirul Alam Hossain. 2019. ‘Data Falsification and Question on Academic Integrity.’ Accountability in Research 26 (2): 108–122. https://doi.org/10. 1080/08989621.2018.1564664.

156 Bibliography

O’Connor Davis Accountants and Advisors. 2020. ‘2019 Financial Statements for Pro Publica Inc.’ 14 July. https://assets-c3.propublica.org/pdf/reports/2019-Financial-Statem ents-for-Pro-Publica-Inc.PDF. Oltermann, Philip. 2020. ‘Angela Merkel Draws on Science Background in Covid-19 Explainer.’ The Guardian, 16 April, sec. World News. www.theguardian.com/world/2020/apr/ 16/angela-merkel-draws-on-science-background-in-covid-19-explainer-lockdown-exit. Oreskes, Naomi. 2004. ‘The Scientific Consensus on Climate Change.’ Science 306 (5702): 1686–1686. https://doi.org/10.1126/science.1103618. Oreskes, Naomi. 2018. ‘The Scientific Consensus on Climate Change: How Do We Know We’re Not Wrong?’ In Climate Modelling, edited by Elisabeth A.Lloyd and Eric Winsberg. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-65058-6_2. ORI (Office of Research Integrity). 2020. ‘Frequently Asked Questions.’ https://ori.hhs. gov/content/frequently-asked-questions. Ottolenghi, Yotam. 2020. ‘Smashed Carrots and Chicken Koftas: Yotam Ottolenghi’s Recipes for Sharing.’ The Guardian, 5 September, sec. Food. www.theguardian.com/ food/2020/sep/05/smashed-carrots-borani-yoghurt-dips-chilli-chicken-koftas-yotam-ottolen ghi-recipes-for-sharing. Overbye, Dennis. 2010. ‘Perelman Declines $1 Million for Poincaré Proof.’ New York Times, 2 July, sec. Science. www.nytimes.com/2010/07/02/science/02math.html. Oxford University. 2020. ‘Plagiarism.’ www.ox.ac.uk/students/academic/guidance/skills/pla giarism?wssl=1. Pallant, Julie. 2007. SPSS Survival Manual. 3rd edn. New York: McGraw Hill. Pappas, Stephanie. 2020. ‘How COVID-19 Deaths Are Counted.’ Scientific American. 19 May. www.scientificamerican.com/article/how-covid-19-deaths-are-counted1/. PayPal. n.d. ‘Restore Account.’ www.paypal.com/restore/dashboard. Pecorari, Diane. 2015. ‘Plagiarism, International Students and the Second-Language Writer.’ In Handbook of Academic Integrity, edited by Tracey Bretag. Singapore: Springer. https:// doi.org/10.1007/978-981-287-079-7_69-2. Pennebaker, James W. 2013. The Secret Life of Pronouns: What Our Words Say about Us. New York: Bloomsbury Press. Pesce, Nicole Lyn. 2020. ‘The U.S. Dropped Majorly on the Index That Measures WellBeing – Here’s Where It Ranks Now.’ MarketWatch. www.marketwatch.com/story/ the-us-drops-to-no-28-on-this-global-well-being-index-2020-09-10. Piller, Charles, and Kelly Servick. 2020. ‘Two Elite Medical Journals Retract Coronavirus Papers over Data Integrity Questions.’ Science, 4 June. www.sciencemag.org/news/2020/ 06/two-elite-medical-journals-retract-coronavirus-papers-over-data-integrity-questions. Political Science Department, University of Minnesota. n.d. ‘History: Earliest Beginnings’. Political Science | College of Liberal Arts. https://cla.umn.edu/polisci/about/history. Potthast, Martin, Alberto Barrón-Cedeño, Benno Stein, and Paolo Rosso. 2011. ‘CrossLanguage Plagiarism Detection.’ Language Resources and Evaluation 45 (1): 45–62. https:// doi.org/10.1007/s10579-009-9114-z. Princeton University, Office of the Dean of the College. 2018. ‘Academic Integrity Booklet 2018–2019.’ https://odoc.princeton.edu/sites/odoc/files/Academic%20Integrity%20Book let%202018-19.pdf. ProPublica. 2020. ‘About Us.’ www.propublica.org/about. Rabin, Roni Caryn. 2020. ‘The Pandemic Claims New Victims: Prestigious Medical Journals.’ The New York Times, 14 June, sec. Health. www.nytimes.com/2020/06/14/health/ virus-journals.html. Reddy, Palagati Bhanu Prakash, Mandi Pavan Kumar Reddy, Ganjikunta Venkata Manaswini Reddy, and K. M. Mehata. 2019. ‘Fake Data Analysis and Detection Using

Bibliography 157

Ensembled Hybrid Algorithm.’ In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 890–897. https://doi.org/10.1109/ICCMC.2019. 8819741. Retraction Watch. n.d. a. ‘Retraction Watch Database’. http://retractiondatabase.org/Retra ctionSearch.aspx?AspxAutoDetectCookieSupport=1. Retraction Watch. n.d. b. ‘The Center for Scientific Integrity.’ https://retractionwatch. com/the-center-for-scientific-integrity/. Retraction Watch. 2018. ‘Retraction Watch Database User Guide.’ Retraction Watch (blog), 23 October. https://retractionwatch.com/retraction-watch-database-user-guide/. RIAA. 2020. ‘The True Cost of Sound Recording Piracy to the U.S. Economy | IPI.’ RIAA. www.riaa.com/reports/the-true-cost-of-sound-recording-piracy-to-the-u-s-economy/. Richardson, Ian. 2020. ‘Fact Check: COVID-19 Death Toll Likely Undercounted, Not Overcounted.’ USA Today. 17 April. www.usatoday.com/story/news/factcheck/2020/04/ 17/fact-check-covid-19-death-toll-likely-undercounted-not-overcounted/2973481001/. Roberts, Fred S. 1985. Measurement Theory. Cambridge: Cambridge University Press. Roig, Miguel, and Marissa Caso. 2005. ‘Lying and Cheating: Fraudulent Excuse Making, Cheating, and Plagiarism.’ The Journal of Psychology 139 (6): 485–494. https://doi.org/10. 3200/JRLP.139.6.485-494. Roth, Yoel, and Nick Pickels. 2020. ‘Updating Our Approach to Misleading Information’. 11 May. https://blog.twitter.com/en_us/topics/product/2020/updating-our-approa ch-to-misleading-information.html. Sabato, Larry. 1998. ‘George Romney’s “Brainwashing” – 1967.’ www.washingtonpost. com/wp-srv/politics/special/clinton/frenzy/romney.htm. Sabin Center for Climate Change Law. 2020. ‘Scientific Consensus on Climate Change Questioned by President Trump.’ https://climate.law.columbia.edu/content/scienti fic-consensus-climate-change-questioned-president-trump-0. Sachs, Jeffrey D. 2020. ‘130,000–210,000 Avoidable COVID-19 Deaths – and Counting – in the U.S.’ 22 October. www.jeffsachs.org/reports/https/wwwjeffsachsorg/blog-pa ge-url/new-post-title-2. Scopus. n.d. ‘Scopus: Content Coverage Guide.’ www.elsevier.com/?a=69451. Seadle, Michael. 2016. ‘Quantifying Research Integrity’. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool. www.morganclaypool.com/ doi/abs/10.2200/S00743ED1V01Y201611ICR053. Shammas, Brittany, and Meryl Kornfield. 2020. ‘Twitter Deletes Claim Minimizing Coronavirus Death Toll, Which Trump Retweeted.’ Washington Post, 31 August. www.wa shingtonpost.com/technology/2020/08/31/trump-tweet-coronavirus-death-toll/. Shariatmadari, David. 2019. ‘Could Language Be the Key to Detecting Fake News?’ The Guardian, 2 September, sec. Opinion. www.theguardian.com/commentisfree/2019/sep/ 02/language-fake-news-linguistic-research. Sharot, Tali. 2011. ‘The Optimism Bias.’ Current Biology 21 (23): R941–R946. Shen, Helen. 2020. ‘Meet This Super-Spotter of Duplicated Images in Science Papers.’ Nature 581 (7807): 132–136. https://doi.org/10.1038/d41586-020-01363-z. Silver, Nate. 2017. ‘How We’re Tracking Donald Trump’s Approval Ratings.’ FiveThirtyEight (blog). 2 March. https://fivethirtyeight.com/features/how-were-tracking-dona ld-trumps-approval-ratings/. Silver, Nate. 2020. ‘How FiveThirtyEight’s 2020 Presidential Forecast Works – And What’s Different Because Of COVID-19.’ FiveThirtyEight (blog). 12 August. https://fivethir tyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-differ ent-because-of-covid-19/.

158 Bibliography

Simonsohn, Uri. 2013. ‘Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone.’ Psychological Science 24 (10): 1875–1888. https://doi.org/10. 1177/0956797613480366. Simonsohn, Uri, Lief Nelson, and Joe Simmons. n.d. ‘About.’ Data Colada (blog). http://da tacolada.org/about. Social Progress. 2020. ‘2019 Social Progress Index.’ www.socialprogress.org/. Spinney, Laura. 2020. ‘Covid-19 Expert Karl Friston: “Germany May Have More Immunological ‘Dark Matter’.”’ The Observer, 31 May, sec. World News. www.theguardian. com/world/2020/may/31/covid-19-expert-karl-friston-germany-may-have-more-imm unological-dark-matter. Spring, Marianna, and Mike Wendling. 2020. ‘What Links Coronavirus Myths and QAnon?’ BBC News, 3 September, sec. BBC Trending. www.bbc.com/news/ blogs-trending-53997203. SquareMouth. 2004. ‘US and British Medical Degrees Explained. ’3 December. www.squa remouth.com/travel-advice/us-and-british-medical-degrees-explained/. Statista. n.d. ‘Masks in Europe 2020.’ www.statista.com/statistics/1114375/wearing-a-fa ce-mask-outside-in-european-countries/. Statista. 2020. ‘Presidential Election: Voter Turnout Rate U.S. 2020.’ 4 November. www. statista.com/statistics/1184621/presidential-election-voter-turnout-rate-state/. Steghaus-Kovac, Sabine. 2002. ‘Junior Professors on the Rise.’ Science. 2 August. www.sci encemag.org/careers/2002/08/junior-professors-rise. Streeck, Hendrik, Gunther Hartmann, Martin Exner, and Matthias Schmid. 2020. Vorläufiges Ergebnis Und Schlussfolgerungen Der COVID-19 Case-Cluster-Study (Gemeinde Gangelt). Bonn: Universitätsklinikum Bonn. www.krankenhaushygiene.de/ccUpload/upload/files/ 2020_03_31_DGKH_Einl. Strine, Leo E.Jr, and Joey Zwillinger. 2020. ‘What Milton Friedman Missed About Social Inequality.’ The New York Times, 10 September, sec. Business. www.nytimes.com/2020/ 09/10/business/dealbook/milton-friedman-inequality.html. Sundby, Bertil. 1952. ‘A Case of Seventeenth‐Century Plagiarism.’ English Studies 33 (1–6): 209–213. https://doi.org/10.1080/00138385208596881. Sutherland, Asbill, and Brennan, L. L. P. 2004. ‘Analysis of International Work-for-Hire Laws.’ 2004. https://us.eversheds-sutherland.com/portalresource/lookup/poid/Z1tOl9N PluKPtDNIqLMRV56Pab6TfzcRXncKbDtRr9tObDdEuS3Dr0!/fileUpload.name=/ WorkforHireLaws.pdf. Sweiti, Hussein, Frank Wiegand, Christoph Bug, Martin Vogel, Frederic Lavie, Ivo Winiger-Candolfi, and Maximilian Schuier. 2019. ‘Physicians in the Pharmaceutical Industry: Their Roles, Motivations, and Perspectives.’ Drug Discovery Today 24 (9): 1865–1870. https:// doi.org/10.1016/j.drudis.2019.05.021. Tagesschau. n.d. ‘Türkei: Yücel in Abwesenheit zu Haftstrafe Verurteilt.’ www.tagesschau. de/ausland/yuecel-urteil-101.html. Tagesschau. 2020. ‘Heinsberg-Studie Zu Coronavirus: 1,8 Millionen Infizierte in Deutschland?’ www.tagesschau.de/inland/heinsberg-studie-101.html. Teixeira da Silva, Jaime A. 2016. ‘The Blasé Nature of Retraction Watch’s Editorial Policies and the Risk to Sinking Journalistic Standards.’ Mediterranean Journal of Social Sciences 7 (6). https://doi.org/10.5901/mjss.2016.v7n6p11. The Institute for Economics and Peace. 2017. ‘About.’ Vision of Humanity (blog). http:// visionofhumanity.org/about/. Thorpe, Vanessa. 2020. ‘Whodunnit? Did Agatha Christie “Borrow” the Plot for Acclaimed Novel?’ The Observer, 17 May, sec. Books. www.theguardian.com/books/2020/may/17/ whodunnit-did-agatha-christie-borrow-the-plot-for-acclaimed-novel.

Bibliography 159

Tolkien, J. R. R. 1961. Fellowship of the Ring. London: Allen & Unwin. Tompkins, Jillian. 2019. ‘Disinformation Detection: A Review of Linguistic Feature Selection and Classification Models in News Veracity Assessments.’ ArXiv. http://arxiv.org/a bs/1910.12073. Trevor-Roper, Hugh R. 1973. ‘The Mind of Adolf Hitler’: Hitler’s Table Talk. London: Weidenfeld & Nicolson. Trow, Martin. 1998. ‘American Perspectives on British Higher Education under Thatcher and Major.’ Oxford Review of Education 24 (1): 111–129. https://doi.org/10.1080/ 0305498980240109. Tugwell, Peter, and J. André Knottnerus. 2017. ‘How Does One Detect Scientific Fraud – but Avoid False Accusations?’ Journal of Clinical Epidemiology 87 (July): 1–3. https://doi. org/10.1016/j.jclinepi.2017.08.013. Turnitin. 2020. ‘About Us, Our History.’ www.turnitin.com/about. Twitter. n.d. a. ‘The Twitter Rules.’ https://help.twitter.com/en/rules-and-policies/ twitter-rules. Twitter. n.d. b. ‘Updating Our Approach to Misleading Information.’ https://blog.twitter. com/en_us/topics/product/2020/updating-our-approach-to-misleading-information.html. UKRIO (UK Research Integrity Office). n.d. a. ‘UKRIO: Role and Remit’. https://ukrio. org/about-us/role-and-remit/. UKRIO (UK Research Integrity Office). n.d. b. ‘UKRIO: Education and Training’. https:// ukrio.org/our-work/education-and-training/. Union of Concerned Scientists. n.d. ‘Attacks on Science.’ www.ucsusa.org/resources/atta cks-on-science. Universität der Künste. n.d. ‘Studium – Universität Der Künste Berlin.’ www.udk-berlin. de/studium/. US Copyright Office. 2020. ‘Works Made for Hire.’ www.copyright.gov/circs/circ09.pdf. US Court of Appeals for the Ninth Circuit. 1983. ‘Twentieth Century-Fox Film Corporation, et al., Plaintiffs-Appellants, v. Mca, Inc., et al., Defendants-Appellees, 715 F.2d 1327 (9th Cir. 1983).’ Justia Law. https://law.justia.com/cases/federal/appellate-courts/F2/715/ 1327/405049/. US Department of Education. 2019. ‘The NCES Fast Facts Tool Provides Quick Answers to Many Education Questions (National Center for Education Statistics).’ National Center for Education Statistics. https://nces.ed.gov/fastfacts/display.asp?id=372. US Embassy in India. 2019. ‘Number of Indian Students in the United States Surpasses 200,000 for First Time.’ U.S. Embassy & Consulates in India, 18 November. https://in. usembassy.gov/number-of-indian-students-in-the-united-states-surpasses-200000-forfirst-time/. US Supreme Court. n.d. a. ‘Feist Pubs., Inc. v. Rural Tel. Svc. Co., Inc., 499 U.S. 340 (1991).’ Justia Law. https://supreme.justia.com/cases/federal/us/499/340/. US Supreme Court. n.d. b. ‘Harper & Row v. Nation Enterprises, 471 U.S. 539 (1985)’. Justia Law. https://supreme.justia.com/cases/federal/us/471/539/. van Hilten, Lucy Goodchild. 2018. ‘At Harvard, Developing Software to Spot Misused Images in Science.’ Elsevier Connect. www.elsevier.com/connect/at-harvard-develop ing-software-to-spot-misused-images-in-science. V-Dem Institute. 2020. ‘About V-Dem.’ www.v-dem.net/en/about/about-v-dem/. Vigdor, Neil. 2020. ‘Rush Limbaugh Awarded Presidential Medal of Freedom at State of the Union.’ The New York Times, 15 April, sec. U.S. www.nytimes.com/2020/02/04/ us/politics/rush-limbaugh-medal-of-freedom.html. Viglione, Giuliana. 2020. ‘How Many People Has the Coronavirus Killed?’ Nature 585 (7823): 22–24. https://doi.org/10.1038/d41586-020-02497-w.

160 Bibliography

Voce, Antonio, Seán Clarke, Caelainn Barr, Niamh McIntyre, and Pamela Duncan. 2020. ‘Coronavirus Excess Deaths: UK Has One of Highest Levels in Europe.’ The Guardian, 29 May. www.theguardian.com/world/ng-interactive/2020/may/29/excess-deaths-uk-has-onehighest-levels-europe. VroniPlag Wiki. n.d. a. ‘BauernOpfer.’ https://vroniplag.wikia.org/de/wiki/Kategorie:Ba uernOpfer. VroniPlag Wiki. n.d. b. ‘Plagiatsarten.’ https://vroniplag.wikia.org/de/wiki/Kategorie:Plagia tsarten. VroniPlag Wiki. n.d. c. ‘ShakeAndPaste.’ https://vroniplag.wikia.org/de/wiki/Kategorie: ShakeAndPaste. VroniPlag Wiki. n.d. d. ‘Verschleierung.’ https://vroniplag.wikia.org/de/wiki/Kategorie: Verschleierung. Wakefield, A. J., S. H. Murch, A. Anthony, J. Linnell, D. M. Casson, M. Malik, M. Berelowitz, et al. 1998. ‘Retracted: Ileal-Lymphoid-Nodular Hyperplasia, Non-Specific Colitis, and Pervasive Developmental Disorder in Children.’ Lancet 351 (9103): 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0. Walker, Jeffery. 2016. Ali’s Wedding (film). Matchbox Production. Walker, Tim. 2020. ‘First Thing: Twitter Has Started Fact-Checking Trump’s Tweets.’ The Guardian, 27 May, sec. US news. www.theguardian.com/us-news/2020/may/27/firstthing-twitter-has-started-fact-checking-trumps-tweets. Walsh, Mary. n.d. ‘Software to Improve Reliability of Research Image Data: Wiley, Lumina, and Researchers at Harvard Medical School Work Together on Solutions.’ www.wiley.com/network/featured-content/software-to-improve-reliability-of-researchimage-data-wiley-lumina-and-researchers-at-harvard-medica l-school-work-together-on-solutions. Walsh, Mary. 2020. ‘Harvard Medical School | Academic Research and Integrity | Mary Walsh.’ https://ari.hms.harvard.edu/faculty-staff/mary-walsh. Ware, Mark. 2015. ‘The STM Report: An Overview of Scientific and Scholarly Journal Publishing.’ 20 February. www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf. Weber-Wulff, Debora. 2016. ‘Plagiarism Detection Software: Promises, Pitfalls, and Practices.’ In Handbook of Academic Integrity, edited by Tracey Bretag. Singapore: Springer. https://doi. org/10.1007/978-981-287-098-8_19. Webb, Sidney, and Beatrice Webb. 1936. Soviet Communism: A New Civilization?New York: C. Scribner’s Sons. Wellcome Open Research. n.d. ‘About Wellcome Open Research | How It Works | Beyond A Research Journal’. https://wellcomeopenresearch.org/about. Wehner, Markus, and Berlin. 2021. ‘Prüfung der Dissertation: Giffey soll angeblich Doktortitel verlieren’. Frankfurter Allgemeine, 11 May 2021. https://www.faz.net/aktuell/ politik/inland/franziska-giffey-soll-angeblich-doktortitel-verlieren-17336644.html. Wheeler, Lynde Phelps. 1951. Josiah Willard Gibbs: The History of a Great Mind. Woodbridge: Ox Bow Press. WHO/Europe. 2017. ‘How to Respond to Vocal Vaccine Deniers in Public.’ www.euro. who.int/pubrequest. Widener, Andrea. 2020. ‘Science Editor in Chief Holden Thorp on the Role of Science during the COVID-19 Pandemic.’ Chemical and Engineering News 98 (16). https://cen.acs. org/people/profiles/Science-editor-chief-Holden-Thorp/98/i16. Wikipedia. 2021a. ‘COVID-19 Pandemic by Country and Territory.’ https://en.wikipedia. org/w/index.php?title=COVID-19_pandemic_by_country_and_territory&oldid=977061949. Wikipedia. 2021b. ‘Engineering’. https://en.wikipedia.org/w/index.php?title=Engineer ing&oldid=1015712506.

Bibliography 161

Wines, Michael. 2020a. ‘As Census Count Resumes, Doubts About Accuracy Continue to Grow.’ The New York Times, 24 August, sec. U.S. www.nytimes.com/2020/08/24/us/ census-bureau.html. Wines, Michael. 2020b. ‘Federal Court Rejects Trump’s Order to Exclude Undocumented From Census.’ The New York Times, 10 September, sec. U.S. www.nytimes.com/2020/ 09/10/us/census-undocumented-trump.html. Womersley, David. 2006. ‘Gibbon, Edward (1737–1794), Historian.’ Oxford Dictionary of National Biography. https://doi.org/10.1093/ref:odnb/10589. World Population Review. 2020. ‘Voter Turnout by Country 2020.’ https://worldpopula tionreview.com/countries/voter-turnout-by-country/. Worldometer. n.d. ‘Coronavirus Update (Live): 43,023,518 Cases and 1,155,938 Deaths from COVID-19 Virus Pandemic.’ www.worldometers.info/coronavirus/. Worldometer. 2020. ‘Population by Country (2020).’ www.worldometers.info/world-pop ulation/population-by-country/. Wulfsohn, Joseph. 2020. ‘Woodward Dismisses Claims He Could Have Saved Lives by Publishing Trump’s Coronavirus Remarks Sooner.’ Fox News, 9 September. www.fox news.com/media/bob-woodward-dismisses-publishing-trump-coronavirus-remarks. Young, Matthew M. 2020. ‘Implementation of Digital-Era Governance: The Case of Open Data in U.S. Cities.’ Public Administration Review 80 (2): 305–315. https://doi.org/10. 1111/puar.13156. Zaritsky, John. 2016. ‘Sales at All Costs.’ No Limits (blog). http://thalidomidestory.com/ story/about-grunenthal/company-history/sales-at-all-costs/. Zhang, Michael. 2018. ‘A Closer Look at the Stuffed Anteater Photo Contest Scandal.’ PetaPixel. 30 April. https://petapixel.com/2018/04/30/a-closer-look-at-the-stuffed-anteater-photocontest-scandal/.

INDEX

737 Max x, 112, 131, 137, 138, Abbott, Andrew 65, ABC (American Broadcasting) 53, 37, Actuarial 5, Acuna, Daniel 20,59 Airbus Neo 137 Airbus 112 Alcuin of York 34 Ali’s Wedding 33 Altbach, Philip 144–5 Anaesthesia/Anaesthesia 77 Anesthesia/Anaesthesia 77 Annenberg Public Policy Center 57 Anonymisation 84 Anthropology 78, 82–3 Aristotle 68, 78 Arthur Anderson 110 Article Publishing Charges 116 Arts 58, 89 arXiv 13, 52 Association of American Medical Colleges 75 Augustine of Hippo 34, 39 Augustus 39 Autism 9 Bailey, Jonathan 59 BBC 53, 57,122 Beall, Jeffrey 43 Bell Labs 28, Berlin 48,51, 52, 59, 89, 105, 114, 127, 128 Berne Convention 10, 35

Bernstein, Carl 125 Biden, Joseph Bik, Elisabeth 24, 124 Biology 13, 24, 52, 68, 72–5, 78, 82, 96 bioRxiv (bioarchive) 15–6 bit-deterioration (bit-rot) 33 blue books 31 BMBF (Federal Ministry for Research and Education) 60 Boeing 737 Max 10, 112, 131 Bollinger, Lee 135 Bredekamp, Horst 123 British Broadcasting Corporation (BBC) 53 Bruegel, Pieter the Elder 38 Bury, J. B 29 Byzantine 34 Cable Network News 53 Cabral, Marcio 24 canon law 94 Cardiology 77 Cardiovascular medicine 77 Cardiovascular 5, 77 Carlson, Tucker 64 Censorship 14, 63 Census data 103 Census 4, 30, 103–4, 114 Centers for Disease Control and Prevention, US 31, 100, 105 CERN 13, 73, check-sum 32 Chemie Grünenthal 76

Index 163

Chemistry 2, 9, 72–3, 78, 96 Chemists 76 Chicago, University of 52, 82, 87, 91 China 109, 118, 35–6, 39, 84 Chinese Whispers 33 Christopher, Jana 58 Churchill, Winston 134 Cicero 34 Citation indexes 41 Citizens Band (CB) radio 53 Civil War, US 4 Cliometrics 87 Coca Cola 53 Cold War 141, 146 Communication 63, 90–1 communications studies 95 Computer linguistics 88 Computer science 68, 70 Confucius 83, 85 Constitution, US 30 Copyright 10, 12, 35, 39, 40, 46, 55, 59, 60–1,142, 145 Corporation for Public Broadcasting 53 Correctiv.org 57 COVID-19 x, 1, 5–8, 12, 14–5, 17, 26, 28. 390, 44, 51,76, 103–5, 112–4, 117–9, 125, 130–1, 134–5, 138–9, 146 Creation stories 9 Cro-Magnon 38 CrossRef 13, 18 Cultural Revolution 39 Cuneiform 32 Cutter polio vaccine 136 da Silva, Teixeira 9 Daily Express 5 Dannemann, Gerhard 128 Dartmouth 91 Darwin, Charles 40 fabrication 16, 23, 36, 47, 50, 54, 92 data falsification 11, 15, 19, 24, 33, 56, 69, 70–3, 75, 77, 79–95, 122, 145 Data Protection Regulations, European 19 Declaration of Independence, US 10 descriptive statistics 4, 31, Deutsche Forschungsgesellschaft (DFG) 48, 61 Dickens, Charles 35, 40 Dolchstosslegende 134–5 Domesday Book 3 Drug Design 77 Economics 78–9. 81–2, 87, 90–1 Education 27, 44–5, 48, 60, 91, 93, 109, 122,

Einstein, Albert Electronic lab books 74 Elsevier 21, 64, 116, 124, 129 Embro Press 58 Engerman, Stanley 87 Engineering 88, 91–2, 104 Enron 110 Enserink, Martin 11 Ethics 21, 86, Ethnographers 40, Ethnography 8, 78–9, 82 Eton 93 European Federation of Academies of Sciences and Humanities 22 excess deaths 31, 104, Fabian socialism 4 Fabian Socialists 82 Facebook 11, 39, 53, 66, 113, 130–2 Fachhochschule 41, 9 Fact checkers 57 FactCheck.org 57 FactChecker 57 fair dealing 61 fair use (17 USC 107) 60 Fauci, Anthony 6 Federal Aviation Administration (FAA) 112, 137 Federal Communications Commission, US 63 Feist Pubs., Inc. v. Rural Tel. Svc. Co 55 Fermat, Pierre de 32 Feynman, Richard 32 First World War 37, 134 Fivethirtyeight 6, 101–2 flame wars 52 Flemish 38 Florida 7, 30, 120 Fortunatus, Venantius 34 Foucault, Michel 10 Fox News 53, 64, 112 France 4, 35–6, 40, 93, 105, 119, 146, Friedman, Milton 110, function words 54, Galileo 10, 123 Gangelt 12 Garfield, Eugene 41 GDPR (General Data Protection Regulation) 41, 48, 61, 125 Generallandschulreglement of 1763 93 generative model 32 Geography 78–9, 83 German Research Foundation 22, 48, 124

164 Index

Germany 6,12,27, 35, 36, 40, 41, 48, 55, 60, 65, 72, 75, 81, 87, 101, 103, 105, 109,110, 134, 140, 146, Gibbon, Edward 40 Gibbs, Josiah Willard 92 Giffey, Franziska 128–9 Gimp 58, 123 grey-zones 17 Guardian xi, 52, 98, 103, 106, 118 Guttenberg, Karl-Theodor zu 122 Habilitation 41 Hacking, Ian 31 Halicarnassus 37 Harvard Medical School 20, 59, 124 Harvard University 21, Hastings, Max 134 HEADT Centre xiii, Hearst 33, Heine, Heinrich 40 Herodotus 34, 37 History 89, 94, 126, 143 Hitler, Adolf 134 Homer 24 honorific authorship 36 Hugo, Victor 40Humboldt-Universität

zu Berlin xii, 21, 51–2, 127,153 Hunton, James E. 24, 62–3, 84, 92, 95 Hydroxychloroquine 16, 117, 130 image manipulation xi, 15, 19, 20, 23–4, 33, 52, 48, 50, 56, 58, 62, 74, 122–4, 145, 149 ImageJ 59, 74 Indian 44–5 Infectious Disease 20, 77 inferential statistics 4, 31, 69, 70 InspectJ 74 Institute for Economics and Peace 108 Institute for Scientific Information 41–2 institutional review boards 48 internet protocol layer diagrams 32 investigators 12, 25, 41, 121–5, 127, 144 iThenticate 11, 16, 19, 25, 115–7, 140, 142 JAMA Network 76 Japan 20, 39, 59, 105, 118, 146 Johns Hopkins 104 Johnson, Boris 134 Journal of Clinical Epidemiology 24 Journalism 66, 91, 94–95, 126 judge violations 7 judges viii, xii, 121, 127, 132

Kabaservice, Geoffrey 98 Kelvin, Lord 99 Kant, Immanuel 83 King’s College London 123 Kinnock, Niel 122 Kommission zur Überprüfung von Vorwürfen wissenschaftlichen Fehlverhaltens (KWF) 127 Korea 6, 105, 118, 146 Lab books 30, 74 Lamarckism 40 Lan, Richard 123 Lancet 44, 117–8, 129, 136 Large Hadron Collider 73 Library and information science (LIS) 90 Limbaugh, Rush 133–4 Linguistic analysis Linguistics 11, 21–2, 27, 41, 85–6, 88–9 Literature 14–18, 23–25, 46, 59, 76, 85–6, 88–9, 95, 127 London School of Economics (LSE) 82 LPIXEL 59 Luther 35 Lysenko, Trofim 40 Maneuvering Characteristics Augmentation System (MCAS) 137 Marcus, Adam 66, 124 Market Watch 107–8 Markowitz 11, 21 Marx, Karl 4, 31, 81 Mathematics 13, 31, 52, 68–70, 85, 87, 96, MAXQDA 100, 106 Mayo Clinic 112 McCarthy, Joseph 134, Mecklenburg-Vorpommern 55 media studies 95 medical sciences 74 medRxiv (medarchive) 15 mega-journals 43 Merkel, Angela 2 Mesopotamia 3 Miami Herald 7 micro-cultures 12 Middlebury College 57 Milton, John 39 Morris worm 71 Music piracy 142 National Science Foundation, US 22, 39, 48, 61, 124 natural language 37 Nazis 14, 134 Netherlands 72, 84, 87–88, 118–9

Index 165

Networked Digital Library of Theses and Dissertations (NDLTD) 52 New England Journal of Medicine 22, 129–30 New York Times xi, 44, 52, 98, 103–8, 130, 143 Nixon, Richard 125 Nobel Prize in Economics 87 Normandy 38 Northwest Ordinance of 1787 93

ProPublica 125–6 Prospect Medical 126 Prussia 4, 93 Psychiatry 84, 88 Psychology 25, 62, 78–81, 83–6, 93, 132 Ptolomy 10 Purdue University 92 Pythagoras 68 Qanon xi, 113, 131, 133

Obama, Barack 133 Ombudsman 48, 127 Oncology 77 open data 13, 30, 36 open peer review 13, 16–7, 51–2, 113 OpenAIRE 13 Oransky, Ivan 66, 124 (ORI) Office of Research Integrity 47 Ottolenghi, Yotam 106 Oxford University Oxford 23, 51, 67–8, 78–9. 83, 85–6, 88, 91, 92 Pandemic x, 1–2, 5, 15–6, 30, 32, 44, 104–5, 114, 118, 133–4, 138, 140–1, 146 Paraphrasing 10–1, 17, 23, 25, 49, 54, 119, Pascal, Blaise 32 PBS (Public Broadcasting) 53 Peel, Robert 138 Peer review 13, 15–7, 22, 33, 44, 50–2, 61, 63, 113 Pennebecker, James 54 Perelman, Grigori 13 Pew Research 139–40 pharmaceutical companies 75–6 Philology 67, 88 Philosophy 80, 83–7, 94, 110, 127, 132 Photographs 13, 38, 52, 74, 90 Photoshop 58, 123 Physics 68, 32, 52, 72–4, 78 Plagiarism xi, 10–1, 15–20, 23, 25, 33–4, 36, 38, 46–7, 49–50, 54–6, 59–61, 67, 69–73, 75, 77, 79–96, 115–117, 119, 121–3, 128–9, 132, 140, 142–5 Plato 34 PLOS ONE 43, 116 Poincaré conjecture 13 Political science 8, 76, 78–9, 81, 95, 108, 128 PolitiFact.com 57 Poynter Institute 57 predatory publications 42–3 Predatory publishers 43, 117, 130 Princeton University 49–50 Programmers 20, 37

R0 2 Ranke, Leopold von 29, 40 Religion 79, 85, 102 Rembrandt 38 Renaissance 38 Replication 1, 2, 5, 14,71, 73–4, 82, 96 Research Excellence Framework (REF) 42, 63 Research Integrity Office (UKRIO) 47–8 Resveratrol 112 Retraction Watch blog 16 Retraction Watch Database x, xi, 15–6, 66, 70–73, 75, 77, 79–87, 89–95, 116–7, 130, 144–5 Rhetoric 39, 88, 94 RIAA (Recording Industry Association of America) 142, 157 Roberts, Fred S. 99 Romney, George 103 Rousseau, Jean-Jacques, 29 Rubens, Peter Paul 38 Murdoch, Rupert 53 Sachs, Jeffrey 146 SAS Institute 70 Satisficing 43 Schavan, Annette 122 Science Integrity Digest 58 Scientific American 28, 111 Scopus 13, 43, 116 Shepard, Frank 41 Sidereus Nuncius 123 Webb, Sidney and Beatrice 4, 31, 42 Silver, Nate 6, 101 Simonsohn, Uri 57 Sitzredakteur 64 Snopes.com 57 Social anthropology 78, 82–3 Social media 15, 28, 33, 39, 50–2, 57, 66, 103 Social Progress index 107–109 Social Science Research Network (SSRN) social sciences 2, 30, 34, 57, 67–8, 78–85, 87, 90, 94–6, 99–100, 127

166 Index

Sociology 78–81, 83–4, 93–5 Socrates 34 Source code plagiarism 20 source code 36 Soviet photographers 38 Soviet Union 18 Soviets 14 Spanish-American war 33 SPSS 56, 70, 106 SSRN (Social Science Research Network) 13, 92, 96, 110 St. Petersburg Times 57 Stapel, Diederik 11, 21, 25, 62–3, 84, 88, 110, 123, 141 Statistics 4–5, 7–8, 11–13, 23, 28, 30–2, 39, 50, 56–7, 67, 69–70, 73, 87–8, 101, 104–6, 113–4, 116, 118–20, 125–6 Statute of Anne 10, 35, 39 Stille Post 33 STM Report for 2018 145 Superspreaders 132, 135 Surgisphere 117–8, 129 Technical University (TU-Berlin) 89 Turner, Ted 53 Telephone 33 Thalidomide 76 Thatcher, Margaret 42, 146 thick description 82 Thucydides 34 Tobacco 110–1 Tolkien, J.R.R. 1, 88 Torabi Asr, Fatemeh 11 Trevor-Roper, Hugh 134 Trump, Donald x, 57, 98, 100, 102–3, 111–3, 115, 122, 125, 131, 134–6, 138, 141, 143 Trump, Melania 122 Turnitin 17–9, 25, 115, 123 Tuskegee study of syphilis 48 Twitter 11, 14, 39, 53, 66. 100, 113. 130–2 Ulrich’s Web Directory 145 Unilever 53

Union of Concerned Scientists 103, 114 United Nations Educational, Scientific and Cultural Organization (UNESCO) 109 Universität der Künste (UdK or University of the Arts) 89 University Microfilms 52 University of Chicago School of Business 82 University of Chicago 52, 82, 87 University of Pennsylvania 91 National Science Foundation, US (NSF) 124 Versailles treaty 134 Victims xii, 121. 137–8, 140–3 Violators xii, 121, 132, 143 Virgil 39 Virginia Tech 52 Vision of Humanity Visual Artists Rights Act of 1989 35 Volksverpetzer.de 57 von Ranke, Leopold 29, 40 Voting Rights Act 1965 143 Vox News 2 VroniPlag Wiki 17, 23, 49, 122, 128, 145 Vulgate (Latin) Bible 35 Wakefield study 136–7 Walsh, Mary C. 20, 59, 124 Washington Post 57, 113 Watergate Scandal 125 Wayne County, Michigan 106 Webb, Sidney and Beatrice 4, 31, 82 Weimar Republic 134 Wellcome Open Research 13 Wikipedia 8, 11, 18, 36, 57, 92, 104–5 Wilding, Nick 123 Witgenstein, Ludwig 40 Woodward, Bob 112–3, 125 World Health Organisation 109 Yücel, Deniz 64 Yale 92 yellow press 33