205 26 28MB
English Pages [195] Year 2020
Corpus-Based Analysis of Ideological Bias
Corpus-Based Analysis of Ideological Bias presents research combining a range of corpus-linguistic techniques which are employed to analyse how migration discourse is (re)constructed in the contemporary British press. Two specialised corpora containing 1,000 news reports, editorials, and opinion pieces from five major national British newspapers were collected and annotated for this research. The event separating these two corpora is the 2016 referendum on Britain’s membership of the European Union (EU). In its analysis, this book: •
•
•
•
employs both quantitative and qualitative analytical methods, with four case studies offering a broad perspective on how the topical socio-political issues of migration and asylum seeking are represented by left- and right-wing British newspapers; explores how newspapers reveal their political orientation and promote their political agenda by employing specific linguistic patterns and discursive strategies – in this case, in the representation of the key social actors within migration discourse; provides case studies that place a particular focus on the discourses surrounding European migrants and migration within the EU, which proved to be a very popular topic in the British press both before and after the 2016 EU membership referendum; and offers a comparative corpus analysis that seeks to ascertain whether media discourse regarding EU migration has changed in the wake of the referendum.
This book is a useful source not only for students of English, linguistics, and media studies but also for researchers in the fields of applied corpus linguistics, critical discourse studies, contemporary media analysis, and metaphor research. Anna Islentyeva is a research associate and senior lecturer in sociolinguistics in the Department of English Language and Literature at the Freie Universität Berlin, Germany. Her scientific interests include semantics, conceptual metaphor, discourse analysis, and critical discourse studies. Anna is interested in research that places a special emphasis on the social aspects of language and examines ideologically relevant linguistic patterns in different types of discourses. Her current research projects address media discourses surrounding European migration following the EU referendum, as well as representations of Europe and Britain in the British media. Her recent publications include The Europe of Scary Metaphors (2019) and The Undesirable Migrant in the British Press (2018).
Routledge Applied Corpus Linguistics Series Editor: Michael McCarthy and Anne O’Keeffe Series Co-Founder: Ronald Carter Editorial Panel: IVACS Michael McCarthy is Emeritus Professor of Applied Linguistics at the University of Nottingham, UK, Adjunct Professor of Applied Linguistics at the University of Limerick, Ireland and Visiting Professor in Applied Linguistics at Newcastle University, UK. He is co-editor of the Routledge Handbook of Corpus Linguistics, editor of the Routledge Domains of Discourse series and co-editor of the Routledge Corpus Linguistics Guides series. Anne O’Keeffe is Senior Lecturer in Applied Linguistics and Director of the InterVarietal Applied Corpus Studies (IVACS) Research Centre at Mary Immaculate College, University of Limerick, Ireland. She is co-editor of the Routledge Handbook of Corpus Linguistics and co-editor of the Routledge Corpus Linguistics Guides series. Ronald Carter (1947–2018) was Research Professor of Modern English Language in the School of English at the University of Nottingham, UK. He was also the co-editor of the Routledge Corpus Linguistics Guides series, Routledge Introductions to Applied Linguistics series and Routledge English Language Introductions series. IVACS (Inter-Varietal Applied Corpus Studies Group), based at Mary Immaculate College, University of Limerick, is an international research network linking corpus linguistic researchers interested in exploring and comparing language in different contexts of use. The Routledge Applied Corpus Linguistics Series is a series of monograph studies exhibiting cutting-edge research in the field of corpus linguistics and its applications to real-world language problems. Corpus linguistics is one of the most dynamic and rapidly developing areas in the field of language studies, and it is difficult to see a future for empirical language research where results are not replicable by reference to corpus data. This series showcases the latest research in the field of applied language studies where corpus findings are at the forefront, introducing new and unique methodologies and applications which open up new avenues for research. Other titles in this series Interdisciplinary Research Discourse Corpus Investigations into Environment Journals Paul Thompson and Susan Hunston Native and Non-Native Teacher Talk in the EFL Classroom A Corpus-informed Study Eric Nicaise
For more information about this series, please visit: www.routledge.com/series/ RACL
Corpus-Based Analysis of Ideological Bias Migration in the British Press
Anna Islentyeva
First published 2021 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2021 Anna Islentyeva The right of Anna Islentyeva to be identified as author of this work has been asserted by her in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Islentyeva, Anna, author. Title: Corpus-based analysis of ideological bias : migration in the British press / Anna Islentyeva. Description: London ; New York : Routledge, 2020. | Series: Routledge applied corpus linguistics | Includes bibliographical references and index. Identifiers: LCCN 2020022121 (print) | LCCN 2020022122 (ebook) | ISBN 9780367207168 (hardback) | ISBN 9780429263064 (ebook) Subjects: LCSH: Journalism—Objectivity—Great Britain—History— 21st century. | Journalism—Great Britain—Language. | Emigration and immigration—Press coverage—Great Britain—Case studies. | Corpora (Linguistics) | Discourse analysis—Political aspects—Great Britain. Classification: LCC PN5124.O24 I85 2020 (print) | LCC PN5124.O24 (ebook) | DDC 070.4/49304841—dc23 LC record available at https://lccn.loc.gov/2020022121 LC ebook record available at https://lccn.loc.gov/2020022122 ISBN: 978-0-367-20716-8 (hbk) ISBN: 978-0-429-26306-4 (ebk) Typeset in Times New Roman by Apex CoVantage, LLC
Contents
List of figures List of tables Acknowledgements Introduction: language corpora and media discourse on migration
viii ix xi
1
The scope of research 1 European migration and asylum seeking: how the problem is constructed in the press 2 Major research goals and research methods 5 Structure of the book 6 1
Ideology in the contemporary media: bias in the British press
10
1.1 Approaches to discourse and key goals in critical discourse analysis 10 1.2 Media discourse as an instrument of power 17 1.3 The British press: political allegiances and ideological interests 20 1.4 Summary: discourse as a form of social practice 26 2
Corpus-based discourse analysis: data collection and corpus construction 2.1 Central principles in the study of language 29 2.2 Corpus linguistics as a methodology 32 2.3 Data collection, types of corpora, and newspaper corpora 37 2.4 Corpus linguistics techniques 44 2.5 Advantages of combining critical discourse analysis and corpus linguistics 47
29
vi
Contents
3
Discursive representations of refugees, asylum seekers, immigrants, and migrants (RASIM) prior to the EU referendum
53
3.1 Introducing RASIM in media discourse 53 3.2 Quantitative analysis: frequency analysis and quantification techniques 54 3.3 Collocational analysis and typical contextual domains 59 3.4 Discursive representations of EU migrants in the British press 65 3.5 Asylum seekers and refugees 77 3.6 Summary: ideological differences in the use of RASIM terms 86 4
Discursive representations of migrants after the EU referendum
90
4.1 The EU membership referendum: political and media discourse 90 4.2 Quantitative analysis of the post-referendum corpus: shifting the focus 91 4.3 Collocational analysis and salient contextual domains 96 4.4 Discursive representations of EU migrants in the British press 103 4.5 Summary: the transformation of media discourse surrounding EU migration 112 5
The English garden: a metaphor for English society
115
5.1 The English garden metaphor and English national identity 115 5.2 The Telegraph editorial: the conservative ideal of a hierarchical society 116 5.3 Literary and cultural allusions and visual analysis 124 5.4 The Guardian opinion piece: subverting the English garden metaphor 128 5.5 Summary: the ideological basis of the English garden 132 6
The metaphorical motif of war in the British press 6.1 The semantics of war: the core sense of war and its definition 135 6.2 The semantic prosody of war and the construction of the metaphorical motif 141
135
Contents
vii
6.3 The war motif in the British press: a quantitative analysis 145 6.4 Analysis of articles from left-wing newspapers 152 6.5 Analysis of articles from right-wing newspapers 156 6.6 Summary: politics is war 161 Conclusion: principles of cohesion and evaluation in newspaper texts
163
Theoretical and methodological implications 164 Major findings and results of the corpus-based studies 167 Outlook and further research 169 Appendix Multi-cultivars 174 A passport to my lovely garden? Dream on, you wretched souls 174 Index
174
177
Figures
3.1 Distribution of migrant(s) and immigrant(s) in the Pre-Referendum Corpus 3.2 Distribution of REFUGEE and ASYLUM SEEKER in the Pre-Referendum Corpus 4.1 Distribution of MIGRANT in the Pre- and Post-Referendum Corpora 4.2 Distribution of IMMIGRANT in the Pre- and Post-Referendum Corpora 4.3 Distribution of REFUGEE in the Pre- and Post-Referendum Corpora 5.1 Illustration by David Foldvari from The Guardian 6.1 Distribution of the lemmas from the source domain of WAR in the Pre-Referendum Corpus
56 78 94 95 95 132 149
Tables
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
Newspaper support for political parties in the UK general elections Daily circulation figures for British national newspapers in 2013 and 2014 Daily circulation figures for British national newspapers in 2016 and 2018 Average monthly print and digital readership for November 2018 (in 1,000 copies) Basic types of corpus annotation Number of text types in each subcorpus of the Pre-Referendum Corpus (2013–2014) Number of text types in each subcorpus of the Post-Referendum Corpus (2016–2018) Total number of tokens in each subcorpus Metadata for the Pre-Referendum Corpus Metadata for the Post-Referendum Corpus Raw frequencies of RASIM terms in the Pre-Referendum Corpus (2013–2014) Normalised frequencies of RASIM terms per 10,000 tokens in the Pre-Referendum Corpus Left-hand collocates attracted to MIGRANT in the Pre-Referendum Corpus Left-hand collocates attracted to IMMIGRANT in the Pre-Referendum Corpus Right-hand collocates attracted to MIGRANT in the Pre-Referendum Corpus Right-hand collocates attracted to IMMIGRANT in the Pre-Referendum Corpus Corpus-based definitions of immigrant and migrant KWIC concordance for the construction adjective + EU migrants in the Pre-Referendum Corpus Most frequent nouns appearing in the construction noun + of MIGRANT
21 24 25 26 35 42 42 42 44 44 55 55 60 62 63 64 66 67 69
x
Tables
3.10 3.11 3.12 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 6.1 6.2 6.3 6.4 6.5
KWIC concordance for the collocation Romanian and/or Bulgarian migrants Left-hand collocates attracted to REFUGEE in the Pre-Referendum Corpus Left-hand collocates attracted to ASYLUM SEEKER in the Pre-Referendum Corpus Raw frequencies of RASIM terms in the Post-Referendum Corpus (2016–2018) Normalised frequencies of RASIM terms per 10,000 tokens in the Post-Referendum Corpus Normalised frequencies of RASIM terms per 10,000 tokens in the Pre-Referendum Corpus (2013–2014) Most frequent left-hand collocates of MIGRANT in the Pre- and Post-Referendum Corpora Left-hand collocates attracted to MIGRANT in the Post-Referendum Corpus (2016–2018) Left-hand collocates attracted to MIGRANT in the Pre-Referendum Corpus (2013–2014) Most frequent left-hand collocates of IMMIGRANT in the Pre- and Post-Referendum Corpora KWIC concordance for the construction adjective + EU migrants in the Post-Referendum Corpus Most frequent nouns appearing in the construction (high)-skilled + noun in the Post-Referendum Corpus Most frequent noun collocates of EU in the Post-Referendum Corpus Right-hand collocates attracted to EU in the left-wing subcorpus of the Post-Referendum Corpus Right-hand collocates attracted to EU in the right-wing subcorpus of the Post-Referendum Corpus Corpus-based definition of the word war KWIC concordance for the collocation war damage from the BNC (1994) KWIC concordance for the collocation war damage from the COCA (2008) Absolute frequencies of the lemmas from the source domain of WAR in the Pre-Referendum Corpus (2013–2014) Observed and expected frequencies of the WAR metaphor employed by the left- and right-wing press in the Pre-Referendum Corpus (2013–2014)
70 79 79 92 92 93 97 98 98 102 104 106 107 108 108 140 143 143 149 150
Acknowledgements
This book is the result of a five-year period of research that has been full of new, exciting experiences and interesting discoveries. I would like to thank my colleagues and students in the Department of English Language and Literature at the Freie Universität Berlin, as well as my family and close friends who have supported me over the years. I am particularly grateful to Igor Tolochin from St Petersburg State University, who introduced me to English semantics and the British National Corpus during my time there as a BA student. Professor Tolochin has been my academic supervisor and constant supporter for the past 12 years. I would like to acknowledge the unfaltering encouragement of Anatol Stefanowitsch from the Freie Universität Berlin, who stimulated my profound interest in corpus linguistics and the cognitive approach to metaphor analysis. I thank Professor Stefanowitsch for his help in annotating the British newspaper corpora that I use for the research featured in this book. I also owe a great deal to my student assistants, Mihera Abdel Kafi, Torben Scheffler, and Willi Werner, who helped me gather corpus data for the Post-Referendum Corpus. To the thoughtful, engaged students who took part in my Discourse Analysis: Ideology and Identity seminars in 2018 and 2019: thank you for your stimulating discussions and vital insights on the English garden metaphor featured in Chapter 5 of this book. Very special thanks to my student and friend Louise Pain for her invaluable comments on the previous versions of the manuscript. I would like to acknowledge the constant support and assistance of the Routledge editors for the Applied Corpus Linguistics Series, Anne O’Keeffe and Michael McCarthy, as well as the editorial assistants for English Language and Linguistics, Elizabeth Cox and Adam Woods. Above all else, I thank my family. I dedicate this work to my parents, Marina Islentyeva and Vladimir Islentyev, who made it possible for me to study and carry out my research in Berlin. I thank them for their love and support throughout my life, especially during the period in which I worked on this book.
To my parents
Introduction Language corpora and media discourse on migration
The scope of research The study of language use in the news media has come to be of central importance to a broad range of academic disciplines including media and communication studies, political studies, sociology, and linguistics. The increasing relevance of media language analysis can be explained by the omnipresence of the news media. The mass media serves as one of the primary and ostensibly most trustworthy sources of information for the public, even in this era marked by the increasing popularity of social networks. Furthermore, the media acts as a lens that is able to focus on specific issues while defocusing others, thus producing dominant discourses and constructing the identities of different social actors. In social studies, there is an increasing demand for applied research that requires more empirically oriented discourse-analytical studies. This book represents a corpus-based linguistic analysis of the modern migration discourse produced by the British national press. The analysis combines methods developed in cognitive linguistics, specifically Conceptual Metaphor Theory (CMT) with Critical Discourse Analysis (CDA). Each of these approaches offers a diverse set of tools for analysing the evaluative nature of meaning in newspaper texts; in combination, they provide a solid framework for interpreting ideological bias on both a structural and semantic level. In contemporary linguistics, it is sometimes assumed that discourse-analytical methods are incompatible with corpus linguistics – the former applied to the study of individual texts (Lee 2008) and employed in the critical investigation of socio-political themes, and the latter chiefly focuses on the study of large bodies of electronically stored texts and is designed to facilitate searches for specific lexical and grammatical patterns (Baker et al. 2008: 273–276). This assumption is rather misleading (cf. Taylor & Marchi 2018); the use of corpora is rapidly becoming an integral part of linguistic analysis. Although the combination of corpus-linguistic methods with CDA is not a new practice, the overall number of such studies is rather low compared to the number of studies in corpus linguistics or CDA alone (Baker et al. 2008: 274). This book focuses on migration – one of the recurrent themes in the contemporary media – and seeks to fill the aforementioned methodological gap by providing a multi-level, corpus-based discourse analysis of 1,000 articles from
2
Introduction
five British national newspapers, thereby revealing how these two approaches to the study of media discourse can be interrelated and combined in various ways. It should be emphasised that corpus linguistics is seen not as a branch of linguistics, but rather as a set of practices and methods. More broadly speaking, it is an approach to the study of language (Lee 2008: 87; McEnery & Hardie 2012: 6, 147), making corpus-based linguistics a more accurate term to use. Linguists who use corpora are just as interested as other linguists in key aspects of language like semantics, syntax, morphology, and language change (Lee 2008: 87). However, corpus linguists aim to achieve more systematic ways of studying language, one of which involves employing large collections of computerised texts and statistical methods. Indeed, the term corpus-based approach is used widely in corpus linguistics. McEnery & Hardie (2012: 6) refer to the differences between corpus-based and corpus-driven approaches, explaining that corpus-based studies employ corpora to explore a theory or hypothesis, whereas corpus-driven linguistics (Sinclair 1991, 2004; Tognini-Bonelli 2001) uses the corpus itself as the sole source of hypotheses about language. Linguists working within the corpus-based approach define corpus linguistics as a method. The difference between corpus-based and corpus-driven approaches is discussed in more detail in Section 2.2, and Section 2.5 introduces some other methodologies that combine discourse analysis and corpora. Media discourse on migration is the focus of the research presented in this book. The term discourse – as a complex concept – is understood in various ways within the academic cultures of different countries. Generally, a distinction can be drawn between the theoretical approach to discourse which originated in France and the tradition of empirically oriented discourse research that was developed in the UK and the USA (Angermuller et al. 2014: 2). Chapter 1 will elaborate on a number of approaches to discourse analysis and outline the key objectives of CDA. It is widely acknowledged that the contemporary mass media is central to the construction of discourses and the promotion of ideologies and value systems. The media is able to focus on particular issues while recontextualising others. The general public uses this mediated information as a source of knowledge about world events, and the mass media plays a pivotal role in terms of shaping public opinion. As a rule, the public does not always have direct access to primary texts such as governmental or parliamentary reports. Consequently, most primary texts and the values they promote are mediated via secondary sources such as news reports, editorials, opinion pieces, and other forms of media. Secondary texts act as a lens through which primary texts are reified, but may also be recontextualised (Koller 2004: 45). The central research question of this book involves how newspapers with specific political orientations apply various linguistic patterns and structures in their reporting in order to promote their political agendas.
European migration and asylum seeking: how the problem is constructed in the press Immigration is a highly topical socio-political issue and the major theme that unites the 1,000 newspaper articles that were collected and annotated for this research.
Introduction 3 These articles make up two newspaper corpora: one that contains 500 articles from the years 2013 to 2014, and another that contains 500 articles from the years 2016 to 2018. The event separating these two corpora is the referendum on British membership of the European Union (EU), held in the UK on 23 June 2016. Throughout this book, these two collections of newspaper articles will be referred to as the PreReferendum Corpus and the Post-Referendum Corpus, respectively. Immigration and asylum seeking are among the most topical issues discussed in the British media, and, as such, attract a great deal of public attention in contemporary Britain. Hart (2010: xii; emphasis added) notes: “Immigration remains a contentious issue in the UK and one which is largely fuelled by the media”. Between 2005 and 2020, a number of bills restricting immigration and the right to seek asylum were passed before the UK parliament: The Immigration, Asylum and Nationality Act 2006; UK Borders Act 2007; and the Borders, Citizenship and Immigration Act 2009, among others. Moreover, immigration became one of the central issues in the 2005 general election campaigns of the political right (Charteris-Black 2006), while the 2010 and 2015 election campaigns of both the Conservative and Labour parties were centred around the issue of European migration. This issue was so pivotal, in fact, that the then prime minister and leader of the Conservative Party, David Cameron, made an election promise to hold an EU membership referendum. Migration within the European Union (EU) and the UK-EU relationship are two topics that have attracted a great deal of public and media attention in Britain in the past 15 years. 2004 saw the most significant expansion of the EU in its history in terms of territory and population, with five Central and Eastern European countries (Czech Republic, Hungary, Poland, Slovakia, and Slovenia), three Baltic States (Estonia, Latvia, and Lithuania) – collectively referred to as A8 countries or the EU8 – and two Mediterranean countries (Malta and Cyprus) joining the Union in that year (cf. Leonard & Taylor 2016). Free movement of people is one of the four freedoms of the EU, along with free movement of goods, services, and capital, and would naturally have applied to these countries, but due to concerns about migration from the newly acceded countries to the older EU states (EU15), some transitional restrictions were put in place for five years until 2011. Romania and Bulgaria joined the EU in 2007 as part of the so-called “Eastern enlargement”. Once again, the UK, along with a number of other EU states, imposed transitional employment restrictions on Romanian and Bulgarian workers – restrictions that were not lifted until 2014. The Pre-Referendum Corpus contains articles that focus on the end of these restrictions on Bulgarians and Romanians in January 2014. The enlargement of the Union and the increased mobility of its citizens within its member states made European migration to Britain one of the most hotly debated issues in the British media. The mobility of EU citizens within the Union inevitably raised questions not only of national identity (Bayley & Williams 2012), but also of British exceptionalism, a concept that stands in opposition to the notion of a united Europe. Spiering (2015: 2) describes the UK-EU relationship as a continuously troubled one “marked by party divisions, media hostility, opt-outs and opt-ins, and persistent calls for an
4
Introduction
in-out referendum”. This troubled relationship is deeply rooted in a perceived cultural division between two opposing entities. Spiering (2015: 3–5) points out that we are dealing here not with an essential quality of Britishness, but rather with a perception or a constructed idea of extra-Europeanness that is not eternal or innate; instead, the relationship is determined by strong cultural notions of differentness. The Britain versus Europe dichotomy is additionally complicated by the fact that different nations, the English, the Welsh, the Scottish, and the Irish, inhabit the British Isles, making them a multination state. In a way, Britain can be seen as a cultural unit: the nations that inhabit these isles have a shared history, a shared language, and shared values, but, on the other hand, being English or Scottish is not equivalent to being British. According to Braber’s study (2009: 307), some Scottish people, for example Glaswegians, do not feel that “a sense of Britishness forms a strong part of their identity as [it] has English connotations” (emphasis added). Evidently, national identities are not fixed entities, but are relational concepts (Braber 2009: 308). Spiering (2015: 6) also asserts that national identities are relational; in order to define who we are, we need to distinguish ourselves from others. In their quest to forge their own unique identity, the English tend to define themselves against the Scots, the Welsh, and the Irish, and often claim to hold a special position within Britain. Immigrants and other ethnic and religious minorities represent another big out-group, followed by mainland Europeans, who function as “significant others in the English quest for the national self” (Spiering 2015: 6). The labels English and British are often used interchangeably (cf. Spiering 2015: 4). In many cases, this is a conscious ideological decision made by an author who does not wish to maintain a clear distinction between the two groups. As mentioned previously, the English often claim to have a special position within Britain, which is why English is often used in a broader sense to denote or represent a certain cultural tradition that primarily exists in the form of the English Language, the Queen of England, English breakfast tea, English weather, and the English rose. In this context, Englishness is the essence of Britishness at the level of stereotypes. The pair English/British represents an element of the socio-historical tradition; a dichotomy that will be discussed in more detail in Chapter 5 with an exploration of the English garden as a cultural construct and ideological metaphorical motif. Building on questions of national identity and the constructed dichotomy British versus European, Chapters 3 and 4 will analyse media representations of Europeans, in particular EU migrants in the UK. Finally, it was not only EU migration to the UK that gained public and media attention between 2013 and 2018. The growing number of people arriving in the EU from across the Mediterranean Sea raised serious concerns both in Europe and Britain. Ongoing armed conflicts in Afghanistan, Iraq, and Syria; human rights violations; and unstable political climates and social problems, along with environmental issues, have led to the constant movement of people and, as a result, to a growing number of forcibly displaced people, identified as asylum seekers and refugees. Since 2011, the number of asylum seekers and refugees worldwide has increased by 40 per cent. According to the United Nations High Commissioner
Introduction 5 for Refugees (UNHCR), by the end of 2016, this number reached 65.6 million worldwide, which is the highest number since the Second World War. In 2016, which is often seen as the peak of the refugee crisis, the asylum seekers in the EU originating from Syria, Afghanistan, and Iraq constituted 50 per cent of the total number of those who applied for asylum in the EU. The asylum issue, which is generally treated as part of a wider immigration issue, raises fundamental questions regarding the obligations that nation states have to non-citizens (Schuster 2003: 2). Those who claim to be refugees, but who have not yet been recognised as such, are referred to as asylum seekers (Schuster 2003: 3). It should be emphasised that in both political discourse and the media, it is often assumed that a group of asylum seekers consists of a very small sub-group of “genuine” asylum seekers – those who will ultimately be recognised as refugees. The rest of the group is thus considered to consist of a much larger contingent of “bogus” asylum seekers, who are referred to as economic migrants wishing to migrate, settle, and work in the host state (ibid.). Furthermore, much of the contemporary media and political discourse treats all asylum seekers as economic migrants in disguise and, as such, as a threat to the nation state that would receive them. As a result, European states try to filter economic migrants through the asylum process by using a definition of refugees that distinguishes between political and economic motives for flight (ibid.). However, distinctions between political and economic causes of migration, as well as differences between voluntary and involuntary migration, are difficult to ascertain. The linguistic analysis of the authentic data presented in this book aims to reveal the discursive ways these groups are presented in the contemporary British press.
Major research goals and research methods Issues such as national sovereignty, immigration, free movement within the EU, border control, security, and the increasing number of asylum seekers, as well as the idea of national identity and British exceptionalism, all contributed to the decision to hold the EU membership referendum in 2016, which led to the UK’s withdrawal from the EU – commonly known as Brexit. The linguistic analysis of the two newspaper corpora featured in this book aims to trace the correspondence between these complex socio-political events and the media’s construction of the identities of the key social actors within migration discourse. This analysis pursues three key objectives in particular: 1
In Chapter 3, the focus is placed on the linguistic differences and similarities in the discursive representation of refugees, asylum seekers, immigrants, and migrants (RASIM) by left- and right-wing broadsheets and tabloids in years 2013 to 2014, on the threshold of the general election of 2015 and the EU membership referendum of 2016. The term RASIM was first introduced in Baker (2007) and later applied in Gabrielatos & Baker (2008), Baker et al. (2008), Taylor (2014), Islentyeva (2018) and others. From a linguistic perspective,
6
2
3
Introduction words pairs such as refugee–asylum seeker and migrant–immigrant are semantically related; there are, however, some subtle differences in their meaning that can be identified by means of a systematic corpus analysis. Chapter 4 traces the transformation of migration discourse between 2016 and 2018, following the EU membership referendum. The focus is placed on the representation of European migrants, as the analysis aims to identify shifts in the representation of Europeans after the decision was made to leave to the EU. Chapters 5 and 6 analyse recurrent metaphorical motifs that are employed in migration discourse, with a particular focus on their evaluative and ideological functions.
A multi-method approach that includes two types of language data analysis – quantitative and qualitative – is employed in the case studies outlined in this book. In its broadest sense, the quantitative analysis of the newspaper corpora includes concordance and collocational/collexeme analyses, the extraction of ideologically relevant grammatical and metaphorical patterns from the corpora, as well as a statistical analysis that shows to what extent the findings are significant. The qualitative analysis involves contextual and intertextual analyses of the newspaper articles and serves as a further step in identifying the discursive strategies that are responsible for producing and reproducing ideology within a narrative. The ultimate goal of this book is to identify a set of systematic linguistic principles in the construction of dominant discourses, thereby revealing the nature of meaning in ideologically charged discourses on migration and asylum seeking by combining corpus linguistics with CDA. The case studies presented in this book aim to demonstrate how corpus techniques and discourse-analytical methods can contribute to a deeper understanding of the ideological processes at work.
Structure of the book Chapter 1 introduces different approaches to discourse analysis, with a special focus on the key goals of Critical Discourse Analysis (CDA) and critical discourse studies (van Dijk 1995, 2015). In CDA, discourse is primarily seen as a form of social practice (Fairclough 2015), while the socio-political order is largely (re)constructed through a range of discursive practices that are capable of shaping the attitudes and points of view of the general public. Chapter 1 elaborates on the concepts of ideology and power in line with CDA, establishing a vital link between power relations, language in general, and discursive practices within different genres of writing. Mass media is of great interest for critical investigation as it constitutes one of the public’s primary sources of information. Chapter 1 focuses on the contemporary British print media and provides a brief historical overview of the five newspapers – namely The Sun, The Daily Mail, The Daily Telegraph, The Daily Mirror, and The Guardian. The articles from these newspapers serve as data for the corpora employed in the case studies featured in this book. A special focus is given to the newspapers’ political allegiances and the concept of ideological bias.
Introduction 7 Chapter 2 introduces the fundamental principles of modern language study, with a focus on the systematicity of analysis and frequency of occurrence as the two basic principles of corpus-based research (McEnery & Wilson 2001), and provides a brief overview of corpus annotation. Section 2.2 defines the key criteria involved in constructing a balanced language corpus and lists the general corpora that are employed as reference corpora in this research. Section 2.3 provides a detailed overview of the process involved in designing and constructing the two comparable newspaper corpora analysed in this book. Finally, Chapter 2 elaborates on key corpus techniques, such as concordance and collocation analysis, and illustrates the advantages of employing a mixed-method approach that combines CDA and corpus linguistics when studying discourses (Baker 2006; Baker et al. 2008; Taylor & Marchi 2018). Chapter 3 focuses on media representations of refugees, asylum seekers, immigrants, and migrants (RASIM) in the period from 2013 to 2015, referred to as the pre-referendum period in the book. Frequency, concordance and collocational analyses help to identify the linguistic differences and similarities in the representation of RASIM by different British press outlets. The high frequency of the word migrant(s) indicates that European migrants were a main focus of attention in the British press prior to the 2016 EU membership referendum. Chapter 3 demonstrates how a systematic application of corpus linguistic methods can assist in identifying recurrent patterns that are employed in the representation of RASIM, with a focus on the differences between how left- and right-wing newspapers employ these words. Chapter 4 compares the linguistic patterns and discursive strategies identified in Chapter 3 with the patterns employed in the post-referendum period (2016–2018) in order to ascertain whether media discourse regarding EU migration has changed in the wake of the 2016 EU membership referendum. In particular, the analysis aims to trace the differences and similarities between the representation of European migrants and migration within the EU after the 2016 referendum and compare these representations with the discourses constructed around EU migrants in 2013 and 2014. The comparative analysis thus brings to light the linguistic mechanisms that ensure that the ideologies promoted by different newspapers remain adaptable to the changing socio-political environment of contemporary Britain. Chapter 5 represents a qualitative corpus-assisted analysis of two newspaper articles from The Daily Telegraph and The Guardian, both of which employ the English garden metaphor in their discussion of immigration and asylum seeking in the UK. The analysis identifies the different ways in which left- and rightwing newspapers use figurative language to discuss this complex socio-political issue. Chapter 5 also demonstrates the advantages of employing an approach that combines close reading with intertextual analyses (for example, referring to Rudyard Kipling’s (1911) poem The Glory of the Garden and Jonathan Swift’s (1729) Modest Proposal) and visual analyses (examining the accompanying photographs and illustrations). Chapter 5 concludes that the metaphor of the English garden is a politically charged figurative tool that represents the conservative ideal of a hierarchical society based on predominant values from Britain’s imperial past.
8
Introduction
Chapter 6 shifts the focus from the representation of social actors (Chapters 3 and 4) and society (Chapter 5) to the representation of the political processes in the British press. The recurring use of words like attack, battle, defend, enemy, and fight in reference to politics suggests that the semantic domain of WAR is systematically transferred onto the domain of POLITICS. The focus of the analysis is the evaluative function of metaphor (Charteris-Black 2004), which shows how the ideologically charged motif POLITICS IS WAR is employed in the representation of different political positions. In terms of methodology, Chapter 6 employs both qualitative and quantitative techniques. The latter includes identifying metaphors employed in the Pre-Referendum Corpus (2013–2014), subsequently extracting relevant words from the corpus and assessing whether these words are used metaphorically in the target domain. The quantitative analysis is complemented by a close reading of a selection of articles from the respective newspaper corpus. The conclusion explains that the research presented in this book goes beyond a mere linguistic analysis in that it focuses on the complex relationship between language, the media, and politics in terms how meanings and identities are constructed in contemporary Britain. The conclusion provides an overview of the key findings, explains their significance for linguistic research, and highlights methodological innovations. The benefits of combing corpus-linguistic techniques and discourse-analytical methods are outlined in terms of how this mixed-method approach can help clarify the semantics of RASIM terms, as well as contribute to critical research on contemporary media discourse.
References Angermuller, Johannes, Dominique Maingueneau & Ruth Wodak (eds.). 2014. The Discourse Studies Reader: Main Currents in Theory and Analysis. Philadelphia & Amsterdam: John Benjamins. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London & New York: Continuum. Baker, Paul. 2007. Discourses of Refugees and Asylum Seekers in the UK Press, 1996–2006: Full Research Report. ESRC End of Award Report, RES-000-22-1381. Swindon: ESRC. Baker, Paul, Costas Gabrielatos, Majid Khosravinik, Michal Krzyzanowski, Tony McEnery & Ruth Wodak. 2008. A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press. Discourse and Society, 19 (3), 273–306. Bayley, Paul & Geoffrey Williams (eds.). 2012. European Identity: What the Media Say. Oxford: Oxford University Press. Braber, Natalia. 2009. I’m Not a Fanatic Scot, But I Love Glasgow: Concepts of Local and National Identity in Glasgow. Identity: An International Journal of Theory and Research, 9 (4), 307–322. Charteris-Black, Jonathan. 2004. Corpus Approaches to Critical Metaphor Analysis. Basingstoke: Palgrave Macmillan. Charteris-Black, Jonathan. 2006. Britain as a Container: Immigration Metaphors in the 2005 Election Campaign. Discourse and Society, 17 (6), 563–582. Fairclough, Norman. 2015. Language and Power. 3rd ed. London & New York: Routledge.
Introduction 9 Gabrielatos, Costas & Paul Baker. 2008. Fleeing, Sneaking, Flooding: A Corpus Analysis of Discursive Constructions of Refugees and Asylum Seekers in the UK Press, 1996– 2005. Journal of English Linguistics, 3 (5), 5–38. Hart, Christopher. 2010. Critical Discourse Analysis and Cognitive Science: New Perspectives on Immigration Discourse. Basingstoke: Palgrave Macmillan. Islentyeva, Anna. 2018. The Undesirable Migrant in the British Press: Creating Bias Through Language. Neuphilologische Mitteilungen, 119 (2), 419–442. Koller, Veronika. 2004. Metaphor and Gender in Business Media Discourse: A Critical Cognitive Study. Basingstoke: Palgrave. Lee, David. 2008. Corpora and Discourse Analysis: New Ways of Doing Old Things. In Vijay Bhatia, John Flowerdew & Rodney H. Jones (eds.) Advances in Discourse Studies. London & New York: Routledge, 86–99. Leonard, Richard L. & Robert Taylor. 2016. The Routledge Guide to the European Union. London & New York: Routledge. McEnery, Tony & Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. McEnery, Tony & Andrew Wilson. 2001. Corpus Linguistics. 2nd ed. Edinburgh: Edinburgh University Press. Schuster, Liza. 2003. The Use and Abuse of Political Asylum in Britain and Germany. London: Frank Cass. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge. Spiering, Menno. 2015. A Cultural History of British Euroscepticism. Basingstoke: Palgrave Macmillan. Taylor, Charlotte. 2014. Investigating the Representation of Migrants in the UK and Italian Press: A Cross-Linguistic Corpus-Assisted Discourse Analysis. International Journal of Corpus Linguistics, 19 (3), 368–400. Taylor, Charlotte & Anna Marchi (eds.). 2018. Corpus Approaches to Discourse: A Critical Review. Oxon & New York: Routledge. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. van Dijk, Teun A. 1995. Aims of Critical Discourse Analysis. Japanese Discourses, 1 (1), 17–27. van Dijk, Teun A. 2015. Critical Discourse Analysis. In Deborah Schiffrin, Deborah Tannen & Heidi E. Hamilton (eds.) The Handbook of Discourse Analysis. 2nd ed. London: Blackwell, 466–485.
1
Ideology in the contemporary media Bias in the British press
1.1 Approaches to discourse and key goals in critical discourse analysis Approaches to discourse Since the 1960s, there has been an explosion of interest in the notion of discourse and, more specifically, discourse analysis or discourse studies. The term discourse is used to define a concept whose meaning and application tend to vary within the academic cultures of different countries. A fundamental distinction can be drawn between discourse and more traditional linguistic terms such as language, sentence, text, and genre (Angermuller et al. 2014: 3). First, the distinction between sentence and discourse is more or less transparent. Formally, discourse usually constitutes a series of sentences. Stubbs (1983: 1) defines discourse as language above the sentence or the clause. Discourse is thus synonymous with order on a transphrastic level, such as in Harris (1952), who first introduced the term discourse analysis. Secondly, discourse is not equivalent to language, that is, according to de Saussure (1916), defined as an abstract system of signs with a rigid structure. In contrast, discourse implies how language is used in specific contexts; in this sense, discourse can be defined as language use in both speech and writing. Thirdly, a differentiation is sometimes made between written and oral modes, but discourses are neither written nor oral texts. Widdowson (2004) argues that a discourse should be clearly distinguished from both written and oral texts; discourse is the process of meaning negotiation, while a text is its final product. Discourses do not necessarily represent written data; they can comprise written, oral, and even non-verbal data such as visual data, gestures, facial expressions, and other types of body language. Finally, the term discourse is often associated with different types of language use, such as political (Chilton 2004), media (Fairclough 1995b), advertising, legal, medical, educational, and environmental discourses. In this sense, discourse represents a conceptualisation that can be linked to such classic linguistic terms as genre or text type (cf. Baker 2006: 3). Angermuller et al. (2014: 2) state that the term discourse can be understood in terms of micro-sociological and macro-sociological approaches:
Ideology in the contemporary media 11 1
2
On a micro-sociological level, discourse is understood as language in use, while discourse analysis implies the process and practice of contextualising texts, the situated production of speech acts, as well as turn-taking practices. (Gumperz 1982; Brown & Yule 1983) On a macro-sociological level, discourse can be defined as a form of verbal and non-verbal practices within social communities. (Foucault 1972; Fairclough 1992, 1995a, 2015)
Similar to the distinction between these micro and macro levels are the ideas of James Paul Gee (1999), which were published in his foundational book on discourse analysis. Gee distinguishes between discourse and Discourse (with a capital D): the former can be defined quite simply as any stretch of language in use, while the latter combines language in use with a variety of social practices within a specific group (teachers or lawyers), such as actions, interactions, values, customs, and perspectives. The relationship between these two concepts can be described as hierarchical: different historically formed Discourses determine a broader context for analysing discourses. The methods of discourse analysis are frequently employed in linguistics, sociology, and political, communication, and media studies, but the actual ways in which discourse analysis is applied within these disciplines involve a range of approaches and methodologies. Broadly speaking, a differentiation can be drawn between the French theoretical approach to discourse and the tradition of empirically oriented discourse analytical research that was developed primarily in the UK and USA. The former approach emerged from post-structuralism and was inspired by philosophers such as Deleuze, Derrida, Lacan, Foucault, and Althusser (cf. Williams 1999). The key interests of the French discourse analytical school are relations between language, power, and ideology – specifically, the question of how conventional uses of language are created by conventional modes of thought (Bayley & Williams 2012: 13). For discourse theorists working in line with Foucault, Lacan, and Derrida, social relations are constructed in discursive practices, while the concepts of knowledge, power, and subjectivity constitute the triangle of discourse theory. The Anglo-American tradition concentrates on the empirical analysis of language in use; for example, a primary focus is placed on two major issues: first, how the social and discursive roles of speakers and addressees are enacted in texts, and secondly, how personal identities are constructed in different text types (Bayley & Williams 2012: 14). The Anglo-American tradition can therefore be described as more empirical: it includes both large-scale quantitative corpus analysis and more qualitative, micro-sociological studies (Angermuller et al. 2014: 5). Empirical discourse studies often include findings from the field of social semiotics (Halliday 1978, 2007; Kress & van Leeuwen 1996), as well as sociolinguistics (Stubbs 1983); in addition, discourse studies comprise a number of approaches, including speech act theory (Widdowson 2007) and corpus-based methods (Sinclair 2004; Baker 2006).
12
Ideology in the contemporary media
Discourse as social practice The approach that combines elements of both the French and Anglo-American traditions is known as Critical Discourse Analysis (CDA) or Critical Discourse Studies (CDS). This critical approach to discourse analysis – which is the main theoretical framework upon which this book is based – was born in response to concerns regarding the power of language to exercise control within society. CDA constitutes a vast, multifold research field that derives from a variety of theoretical backgrounds and is designed to be used to critically engage with different types of data. The roots of CDA lie in classical rhetoric and text linguistics, as well as sociolinguistics and pragmatics (Martin & Wodak 2003: 4). Most importantly, Teun van Dijk, one of the founders of CDA, emphasises the critical orientation of this type of discourse analysis by stating that, as an approach, CDA emerges from “critical linguistics, critical semiotics and in general from a sociopolitically conscious and oppositional way of investigating language, discourse and communication” (van Dijk 1995: 17; emphasis added). Historically, CDA developed out of the fields of functional linguistics and semiotics (Halliday 1985). In the UK, linguists such as Roger Fowler, Robert Hodge, Gunter Kress, and Tony Trew (Fowler et al. 1979; Fowler 1991, 1996) started formulating the basic principles of critical linguistics; their work was primarily based on the systemic-functional and social-semiotic linguistics of Halliday, whose methodology is still crucial to CDA practices. Fairclough defines clear linguistic categories for analysing the relations between discourse and social meaning (Blommaert 2005: 23). Apart from systemic-functional linguistics, British cultural studies have also profoundly influenced CDA. The Birmingham Centre for Contemporary Culture Studies, headed by Stuart Hall, has systematically addressed such relevant social, cultural, and political issues as neo-liberalism, the New Right, Thatcherism, racism, and the end of the welfare state (Blommaert 2005: 23). Deriving from a variety of theoretical backgrounds, CDA represents a complex network of principles, aims, and methods. Blommaert (2005: 21) argues that CDA is not just a school of thought; rather, it comprises a network of scholars from various theoretical backgrounds who address similar social and political issues, applying similar principles of analysis and employing some of the institutional tools developed within this framework. Political, media, advertising, educational, and various institutional discourses belong to the key areas of critical research, while Norman Fairclough, with a background in systemic-functional linguistics, Ruth Wodak, focusing on identity politics and the study of racist talk and texts, Teun van Dijk, specialising in text linguistics and cognitive linguistics, and Paul Chilton, with a focus on linguistics, semiotics and communication studies, can be called both the pioneers and leading scholars of CDA. Furthermore, these researchers are often associated with three distinct approaches within CDA that are labelled: 1 2 3
the socio-semiotic approach (Fairclough 1992, 1995a, 2010, 2015), the discourse-historical approach (Wodak 1996, 1991, 2001), and the socio-cognitive approach (van Dijk 1993, 1996, 1998, 2001).
Ideology in the contemporary media 13 These three approaches are the most firmly established and widely recognised and as such are often referred to as mainstream CDA (Hart 2010: 14). Norman Fairclough’s socio-semiotic approach analyses the relationships between semiosis and various elements of social practices and focuses on both structure and function (Angermuller et al. 2014: 378). Fairclough broadly defines discourse as a form of social practice (2015: 55). Discourse and society are closely connected in at least three interrelated ways. First, discourse is a part of society, and the relationship between discourse and society is thus internal and dialectical, however asymmetrical, which means that linguistic phenomena are always social phenomena, whereas social phenomena are only in part linguistic (Fairclough 2015: 56). Linguistic phenomena are social in the sense that whenever we produce text or talk, we do so in ways that are socially determined and have social effects. Conversely, social phenomena are linguistic in the sense that any discourse production is not merely a reflection of social processes and practices, but rather is a part of those processes and practices. Secondly, Fairclough (2015: 57) argues that discourse is a social process, whereas a text, both written and spoken, is a product of the process of text production. In addition to the production of a text, the process also includes the act of interpretation, for which the text operates as a resource. Text analysis therefore constitutes only one part of discourse analysis; a broader discourse analysis also includes the analysis of productive and interpretative processes. The formal properties of a text are in turn regarded as traces of the productive process and as cues in the process of interpretation. Thirdly, according to Fairclough (2015: 57), discourse is a socially conditioned process, which means that discourse is constrained by other non-linguistic elements of society. Such social conditions constitute three hierarchical levels of organisation: 1 2 3
the immediate social situation or environment, the social institution, and society as a whole.
Fairclough concludes that discourse analysis involves a complex analysis of texts and of the processes of production and interpretation that should be conducted in view of the social conditions both of the immediate situational environment, as well as those at a higher institutional and societal level. For Fairclough (2015: 60), discourse is always determined by social conditions and structures. Order of discourse, a term taken from Foucault (1971, 1972), corresponds to the highest level of how society is organised: social order. Orders of discourse embody particular ideologies and constrain subordinate types of discourse; types of discourse are diverse and include, for example, the discourse of job interviews, university seminars and lectures, police interrogations, etc. that correspond to specific types of (social) practice. Finally, actual social practices correspond to actual discourses, which are, in turn, governed by particular types of discourse. These hierarchical and parallel correspondences can be schematically represented as follows (Fairclough 2015: 61):
14
Ideology in the contemporary media
1 2 3
social order – order of discourse, types of practice – types of discourse, and actual practices – actual discourses.
The discourse-historical approach (DHA), primarily developed by Wodak (2001), focuses on the interdisciplinary analysis of texts, as well as intertextuality and the interdependence of historical development and discursive practices, while the socio-cognitive approach, established by van Dijk (1993), places emphasis on the relationships between society, discourse, and social cognition. For instance, van Dijk (1993: 251) argues that in order to understand the reproduction of hierarchies of power and dominance, it is important to analyse the role social representation plays in the minds of social actors. Furthermore, social cognition represents the necessary theoretical and empirical link between discourse and dominance; it connects the micro-level of textual structure to the macro-level of social relations (Hart 2010: 16). According to van Dijk (1993: 251), the fact that social cognition has been disregarded as the crucial link that connects text and social relations is one of the main theoretical shortcomings of most of the work conducted in critical discourse studies. Fundamental concepts such as ideology, power, control, and hierarchal relations are treated as particularly relevant to the interpretation and explanation of different modes of communication, especially text and talk. The notion of power is one of the central concepts of CDA. It is worth bearing in mind that discursive differences are typically negotiated in texts. A text is almost never the product of any one individual person. The processes of creating a text are determined by power imbalances, which are in part encoded in and determined by discourse and genre. Texts thus constitute metaphorical sites of struggle in that they show traces of divergent discourses and ideologies that struggle for dominance (Martin & Wodak 2003: 6). Since text and talk are grounded in ideology, critical discourse analysts are particularly interested in exposing the structural relationships of dominance, discrimination, power, and control that manifest in different types of discourse. The contemporary mass media, television, radio, the Internet, as well as social networks function as key agents in producing and reproducing these structural relationships. The general public obtains its knowledge of certain social and political realities not from first-hand experience, but rather from the media, which has been widely studied in CDA (cf. Hart 2010: 16). Aims of critical discourse studies Understanding discourse as an inalienable part of society and analysing the ways the social and political order is constructed and reconstructed in contemporary discursive practices are of crucial importance for the research featured in this book. Discourse is understood as the social production of meaning through various kinds of communication. Discourse, as discussed previously, is seen as a form of social practice by CDA scholars (Fairclough & Wodak 1997; Wodak 2001; van Dijk 2001, 2006). Discourse is socially constitutive, as well as socially conditioned; this means
Ideology in the contemporary media 15 that it constitutes events, the actors of these events, and their actions. Significantly, discursive practices may have an ideological impact; for instance, they can produce and reproduce unequal power relations between social classes, genders, or ethnic and cultural majorities and minorities, in particular through the ways in which discourse reprsents these social actors (Fairclough & Wodak 1997: 258). Teun van Dijk, the founder of the socio-cognitive approach, summarises the basic principles and aims of CDA, providing a solid methodological and conceptual framework for the field. Most importantly, van Dijk (1995: 24) defines CDA as an approach that stands in solidarity with oppressed groups and in opposition to those groups and institutions that abuse their power. As a special approach in discourse analysis, CDA primarily focuses on the analysis of discursive conditions, components, and consequences of power abuse by dominant or elite groups and institutions. Critical discourse analysts aim to examine patterns of access and control over text and talk, as well as the discursive strategies of mind control. The ways forms of inequality are expressed, legitimated, and eventually reproduced in discourses are of particular interest to CDA. Teun van Dijk’s research itself is mainly focused on the study of the discursive reproduction of racial practices by the so-called “symbolic elites”, namely politicians, journalists, and academics produced in the British and Dutch contexts; his main research interest is on theories of ideology. For a better understanding of the theoretical framework upon which the research featured in this book is based, van Dijk’s (1995: 17–18; van Dijk’s emphasis) key principles, practices, and aims, which are considered integral to CDA are outlined as follows: 1 2 3 4 5
6
7
CDA is seen as an explicitly critical approach, position or stance of studying text and talk. CDA is a problem-oriented or issue-oriented, rather than paradigm-oriented approach, i.e., it is appropriate as long as it is able to effectively study relevant social problems such as sexism, racism, and other forms of social inequality. CDA is an inter- or multidisciplinary approach, focusing on the relations between discourse and society, including social cognition, politics, and culture. CDA can be referred to as a part of a broader spectrum of critical studies in the humanities and the social sciences. CDA studies (may) analyse all levels and dimensions of discourse: grammar, style, rhetoric, speech acts, pragmatic strategies, etc. In addition, many studies in CDA focus on other semiotic dimensions of communicative events such as pictures, film, sound, music, gestures, etc. CDA pays special attention to the relations of power, dominance, and inequality, and the ways these are reproduced or are resisted by group members through text and talk. The focus of attention is put on the strategies of dominance and resistance in social relationships of class, gender, ethnicity, race, sexual orientation, language, religion, age, nationality, or world-region. CDA aims at identifying ideologies that play a role in the reproduction of dominance and inequality as well as opposing ideologies.
16
Ideology in the contemporary media
8
CDA focuses on the strategies of manipulation, legitimation, and the manufacture of consent, as well as other discursive ways that can influence the minds and, consequently, the actions of people in the interest of the powerful.
It is also important to bear in mind that not all research conducted within the framework of CDA can be characterised according to the aforementioned criteria (van Dijk 1995: 19). Nonetheless, these principles outline the key aims of the critical approach and distinguish CDA from other approaches to and methods of discourse analysis that are primarily descriptive, observational, and explanatory. It is of crucial importance for CDA that scholarly enterprise constitutes part of social and political life. This means that all theories, methods of analysis, and data selection in critical discourse studies should be of social and political importance. Finally, van Dijk (1995: 18; van Dijk’s emphasis) clearly defines the practical aims of CDA: 1 2
CDA aims to uncover the discursive means of mental control and social influence, which is why it acts as a critical and oppositional stance against the powerful and the elites, and especially those who abuse their power. Studies in CDA try to formulate an overall perspective of solidarity with dominated groups by formulating strategic proposals for the enactment and development of counter-power and counter-ideologies in practices of challenge and resistance.
It should be emphasised that in modern Western societies, ideologies and their means of social control and influence can be quite subtle and thus difficult to detect. One of the reasons for this is that recurrent patterns and structures of dominant discourses are deeply rooted in the minds of controlled and dominated groups. Social practices in general and discourses in particular are the primary ways of expressing and reproducing ideologies: ideologies are (re)produced through social practices, and, more specifically, acquired, confirmed, changed, and perpetuated through discourse (van Dijk 2006: 115). Therefore, the initial step that critical discourse analysts should take is to uncover how linguistic patterns and structures and discursive strategies are employed in the construction of discourses in order to reveal underlying dominant discourses. There have been some criticisms levelled at CDA – in particular, accusations of confirmation bias on the part of critical discourse analysts, that they analyse a limited number of texts in order to confirm a hypothesis (cf. Widdowson 2000). The advent of large text corpora has allowed for a systematic analysis of data, enabling linguists to achieve “greater soundness and greater breadth” of research (Taylor & Marchi 2018: 5), while reducing cognitive biases by applying corpus-linguistic methods to discourse analysis (cf. Baker 2006: 11). Corpus analysis makes researchers less selective and allows for greater distance between the observer and the data. A systematic corpus analysis enables researchers to identify the recurring structures and functions that are characteristic of specific types of discourses, and thus the ideologies that underlie them
Ideology in the contemporary media 17 (for further discussion of this methodological synergy, refer to Section 2.5). This subchapter has outlined the key aims of critical discourse studies. We will continue discussing the terms ideology and power in line with CDA in Section 2.2, where we will explore the crucial link between power relations and the contemporary mass media.
1.2 Media discourse as an instrument of power Access to and control over discourse Language is a powerful tool that has the capacity to influence the way in which we perceive reality; different kinds of discourses that are subject to constant change are capable of altering our perception and understanding of the world by shaping our point of view. Contemporary discourses are constantly being reconstructed, especially by those who possess access to the creation of discourse. Teun van Dijk (1995: 20) sees preferential access to discourses as an instrument of dominance and resource of power; he sees access to discourses as comparable to access to social resources such as education, knowledge, wealth, income, and status. The so-called elite groups and institutions have access to – and what’s more, control over – a great number of public and institutional discourses in terms of which topics are foregrounded and which are backgrounded or omitted entirely. In addition, the elites can often define the extent and frequency of (media) coverage. For example, politicians have control over governmental and parliamentary discourses, as well as preferential access to the mass media, while journalists have control over media discourse, as well as preferential access to other forms of official text and talk such as press conferences, press releases, and official reports. Similarly, scientists and professors have control over academic discourse in the form of publications, textbooks, and lectures, while corporate managers and judges define business and courtroom discourses, respectively. According to van Dijk (1995: 21), there is a direct correlation between preferential access to various forms of public and institutional discourses and the methods of examining social power and social abuse; he considers patterns of discourse control to be closely associated with the direct enactment of social power. The symbolic elites have control over, or at least preferential access to, the most influential types of discourse in the contemporary world. Such access can be defined in terms of their social position or institutional function, and vice versa, their control over and access to specific forms of public discourse sustain and reproduce their power in specific situations. Since CDA aims to detect different forms of power abuse, critical studies specifically focus on morally or legally illegitimate forms of control and access. Discourse types and common sense Contemporary societies develop rapidly. Likewise, modern discourses are not static entities; they constantly evolve over time. Baker (2006: 14) identifies two key types of discourses:
18
Ideology in the contemporary media
1 2
hegemonic, also referred to as dominant or common sense, and resistant, also known as opposing or counter-discourse.
Baker (2006) stresses the importance of corpus linguistic methods in identifying both hegemonic and resistant discourses. Section 2.5 discusses the advantages of employing a mixed-method approach that combines CDA and corpus linguistics to the study of different types of discourses. The analysis of large reference corpora can fulfil two major functions in relation to these types of discourses. First, corpus data identifies recurrent patterns of language use that demonstrate evidence of particular hegemonic discourses or “common sense” ways of viewing the world. Secondly, corpora can also help to reveal the presence of counterexamples typical of opposing discourses, whose presence would be difficult to uncover using smaller-scale discourse studies (Baker 2006: ibid.). It is important to mention that both types of discourses are not fixed entities; for example, a hegemonic discourse might come to be viewed as a resistant or unacceptable discourse over the course of time. Recurrent linguistic patterns and structures, which tend to be the main focus of a corpus-based analysis, are usually indicative of dominant discourses; however, the high frequency of linguistic patterns does not always necessarily imply underlying hegemonic discourses (Baker 2006: 19). Fairclough (2015: 101) expands on the notion of dominant discourse, arguing that the effectiveness of (dominant) ideology depends on it being merged with the idea of common sense. An ideology is most effective when its ideological assumptions are embedded in features of discourse that are usually taken for granted and rarely examined or questioned. This process is known as the naturalisation of discourse. Since ideology is closely connected to power relations, Fairclough (2015: 107) believes that ideological common sense sustains unequal power relations. For example, the assumption that everybody enjoys freedom of speech is taken for granted and can be classified as commonsensical, but it also “disguises and helps to maintain the actuality of barriers to speech of various sorts for most people” (Fairclough 2015: 107). Crucially, ideologies are the most powerful and most effective when their influence is least visible. Fairclough (2015: 126) also asserts that ideology is not inherently commonsensical; particular ideologies acquire this dominant status in the course of ideological struggles. These struggles are often manifested linguistically in the form of a battle between ideologically opposing discourse types. For instance, genres that are widely available to the general public, such as political, media, educational, and advertising discourses, often reproduce specific discursive patterns that come to be accepted as commonsensical over the course of time, thus contributing to the promotion of a single ideology and perpetuating unequal power relations. The mass media is considered to be particularly influential in terms of constructing ideologies and identities, and is the primary focus of the analysis outlined in this book. The media as an ideological tool: selection, transformation, and recontextualisation The mass media, in particular radio, television, and the press, and more recently, the Internet and social media, can easily reach a vast audience and therefore play a
Ideology in the contemporary media 19 crucial role in terms of forming and spreading (mostly dominant) discourses. CDA views the mass media as one of the most powerful instruments for constructing and reconstructing meaning in contemporary societies, and thus, as a form of social control. Roger Fowler (1991: 1), one of the pioneers of the critical approach to the study of language, contends that the press employs language with the intention of influencing public ideas and beliefs; their language is therefore not neutral, but a highly constructive mediator. Fowler (ibid.) also argues that the content of newspapers represents not facts about the world, but rather ideas in a general sense. He also uses words like beliefs, values, propositions, and ideology partly interchangeably. For van Dijk (2006: 115), ideology also constitutes a system of ideas. He views ideologies as socio-cognitively defined; they can be defined as shared representations of social groups, and more specifically, the “axiomatic” principles of such representations. It is widely acknowledged that news is a socially constructed phenomenon. First, news is not a direct reflection of important events, but rather the result of careful selection. Secondly, the news that is selected is subject to the further processes of transformation according to the modes of publication (the press and the Internet) or transmission (radio and television). The processes of both selection and transformation are “guided by reference, generally unconscious, to ideas and beliefs” (Fowler 1991: 2). Finally, the production of news is an industry with its own commercial interests, therefore, any news represents a discourse that is far from neutral in terms of how it reflects reality. The media often focuses on particular issues, the so-called preferred topics, and the recontextualisation of others, thus tactically shaping popular perception and opinion; as a result, public opinion, beliefs, and attitudes are influenced by this mediated information. The media not only shapes public opinion, what’s more, it also articulates the opinions of the public and thereby plays an important role in policy-making. For example, consumers of news tend to refer to the news sources that are most in line with their values and interests; therefore, media outlets have a financial interest in reporting on the issues that are of most interest to their readers or listeners. In sum, the relationship between the media and public can be described as bidirectional and dynamic (Gabrielatos & Baker 2008: 9). The primary foci of the case studies featured in the upcoming chapters are the linguistic patterns, structures, and metaphors employed by British left- and right-wing broadsheets and tabloids in the promotion of their political agendas. Richardson (2007: 9) points outs that it is not only the content of mediated information that should be analysed; the linguistic tools and discursive strategies that help construct the argument are arguably of even greater importance. While it is essential that we understand what is written (about the working class or minority groups, for example), it is even more important that we examine how journalistic discourse is produced – in particular, how arguments are constructed and promoted, and how newspaper texts can lead to the production of social inequalities. This book represents a corpus-based linguistic analysis of the modern migration discourse produced by the British national press in the period prior to the EU membership referendum (2013–2014) and after (2016–2018). According to Freedom House, an independent watchdog organisation, the UK has an open and pluralistic media environment:
20
Ideology in the contemporary media The country has a strong tradition of editorially independent public broadcasting. Private outlets generally maintain their independence from political pressure and convey a range of views, but a few large companies control a disproportionate number of these outlets.
In addition, British national newspapers can be classified according to their allegiance to the established political parties and may thus be seen as representative of the ideologies these parties seek to promote. What follows in Section 1.3 is a brief overview of the development of the British press, with a focus on major national newspapers and how they relate to the established political parties and their corresponding ideologies.
1.3 The British press: political allegiances and ideological interests Political allegiances Historically, the UK has had a well-established national press based in its capital, London. The modern history of the British press began in 1855 when the government abolished taxes on newspapers (Conboy 2011: 8). The abolition of newspaper stamp duty allowed, for instance, for the daily publication of The Manchester Guardian at a greatly reduced price, making it one of the key moments in the history of the British press. However, the abolition also meant that only commercially successful newspapers could survive, which gave rise to competition between newspaper owners. Conboy (ibid.) states: The abolition of taxes on newspapers in 1855 had begun to release the full force of competition into newspaper production. . . . Henceforth newspapers would survive as commercial concerns or not at all. They would do so by maximizing their profits through targeting a topical miscellany aimed at specific readerships that were to be addressed with increasing efficiency. From a contemporary standpoint, the British press, which provides the data for the research outlined in this book, can be classified according to a series of different parameters. Gabrielatos & Baker (2008: 8) identify the following major criteria: • • • •
frequency of publication: dailies, weeklies and Sunday editions coverage: national or regional political stance: conservative/right-wing and liberal/left-wing style or genre: broadsheets/qualities and tabloids/populars
Traditionally, national British newspapers have tended to express their political allegiances in every general election by backing one of the major British political parties in their editorials and opinion pieces. Table 1.1 provides an overview of a selection of election years, indicating which political parties have been supported by The Sun, The Daily Mail, The Daily Telegraph, as well as The
Ideology in the contemporary media 21 Table 1.1 Newspaper support for political parties in the UK general elections General Elections 1945
1970
1987
2001
2010
2015
2017
Winner
Labour
Con
Con
Labour
Con
Con
The Sun The Daily Mail The Daily Telegraph The Guardian
Labour Con Con
Labour Con Con Con Con Con
Labour Con Con
Con/Lib Dem Con Con Con
Con Con Con
Con Con Con
The Daily Mirror
Labour/Lib Labour Labour Labour Dem Labour Labour Labour Labour
Lib Dem Labour Labour Labour
Labour Labour
Source: Butler & Butler (2000: 536), expanded: years 2001–2017
Guardian and The Daily Mirror in general elections since 1945 (Butler & Butler 2000: 536). The corpus data for the studies featured in this book comprise 1,000 newspaper articles taken from these five British newspapers, which all have a nationwide circulation. Gabrielatos & Baker (2008) also stress the fact that most national newspapers do not try to conceal their ideological bias. The political stance of a newspaper can also be revealed by its reporting style and the choice of which issues are reported on. The overall style of editorials and opinion pieces are of particular interest here. In this respect, Gabrielatos & Baker (2008: 8) point out that most British newspapers make no attempt to be impartial. Instead, they reveal their political stance in a range of both explicit and subtle ways. First, through the issues they choose to report on and which readers’ letters they choose to publish; and secondly, through decisions pertaining to language use, such as specific lexical and grammatical structures, as well as metaphorical patterns. It also essential to remember that it is not only the writing style of The Sun that differs from that of The Independent, but that the readerships of these two newspapers are also very distinct from one another in terms of socio-economic status (cf. Fowler 1991: 4). The correlation between linguistic form and the socio-economic status of the target audience is of crucial importance, which is why both stylistic and social co-variation should be taken into consideration when analysing ideological bias. Referring to Halliday, Fowler (1991) also argues that enquiries into the functions of sociolinguistic variety can delimit social groups and encode the different ideologies that these groups align themselves with. As mentioned previously, the major British newspapers express their allegiances to the established political parties in every general election. As Table 1.1 clearly shows, both The Daily Telegraph and The Daily Mail have supported the Conservatives in every general election since the end of the Second World War, whereas The Sun’s political allegiance appears to have shifted; until the 1980s, the newspaper had supported Labour, but then it began endorsing the Conservatives – again with an exception in 2001, when the newspaper supported Tony Blair’s Labour Party. Nonetheless, The Sun has backed the Conservative Party since
22
Ideology in the contemporary media
2010. The political stance of The Daily Mirror is clearly left-oriented – since 1945, the tabloid has exclusively endorsed the Labour Party. Likewise, the political allegiance of The Guardian is left-leaning and liberal; in different decades, the newspaper has supported Labour, the Liberal Democrats (in 2010), or their coalition (in 1945 and 2005). In terms of genre, The Guardian and The Telegraph represent the so-called quality press or broadsheets, while The Mirror, The Mail, and The Sun are widely known as popular press or tabloids. The Telegraph, The Mail, and The Sun’s consistent support for the Conservatives, as well as The Mirror and The Guardian’s endorsement of the Labour Party, was the main reason these national newspapers were selected to be the data source for the case studies presented in this book. In the following paragraphs, a brief historical overview is provided of each newspaper in order to frame the British print media within a broader socio-political context. Historical background and political stance The Guardian and The Observer The Guardian has traditionally supported Labour and the Liberal Democrats in UK general elections and represents left-leaning, left-wing, or left-of-centre politics – these three adjectives are used interchangeably in this book. The newspaper was founded as a weekly broadsheet in 1821 in Manchester and was known as The Manchester Guardian until 1959 (Butler & Butler 2000: 529). The paper was originally associated with classical liberalism, which advocates civil liberties with an emphasis on economic freedom – an ideology expressed by the Whigs and later by the Liberal Party. After the Second World War, The Guardian aligned with Labour and the political left in general. Until today, the paper has maintained its left-leaning stance in the form of editorials, while its coverage includes national and international news, with an emphasis on British and EU politics and economics, as well as culture and sport. Its Sunday sister paper, The Observer, founded in 1791 (ibid.), is also on the left of the political spectrum. In 2013, the print edition of The Guardian had an average daily circulation of 187,000 copies, placing it behind The Daily Telegraph. This can be seen in Table 1.2, which provides daily sales figures for these five newspapers in question in 2013 and 2014 – the publication period of the articles featured in the PreReferendum Corpus. Although The Guardian has been published in a tabloid format since 2018, the paper remains an example of the quality press. The Daily Telegraph The Daily Telegraph was first published in 1855 in London as The Daily Telegraph and Courier. The broadsheet was launched in the same year that the British government abolished taxes on British newspapers (Conboy 2011: 8). By the late 1870s, the newspaper claimed to already have the largest circulation in the world (ibid.). The newspaper’s sister Sunday edition was first published in 1961. The broadsheet is generally known for its conservative approach to news coverage; its major sections include national and international politics, business, banking,
Ideology in the contemporary media 23 investment, lifestyle, sport, and culture. In January 2018, The Daily Telegraph had an average daily circulation of 385,346 copies, making the paper the second-most widely read national broadsheet after The Times (refer to Table 1.2). Regarding its political allegiance, The Telegraph editorial board has endorsed the Conservative Party in every general election since 1945, as shown in Table 1.1. In this book, the adjectives conservative, right-wing, and right-leaning are used interchangeably in reference to this newspaper. The Daily Mail The Daily Mail was launched as a mass-market newspaper by Alfred Harmsworth in 1896, and the paper enjoyed immediate commercial success (Conboy 2011: 17). By 1900, The Mail had already reached its first million in daily sales, becoming “the first truly mass circulation paper” (Conboy 2011: 8). Today, the tabloid remains one of most widely read in the UK. Table 1.2 shows that in 2014, the paper had an average daily circulation of 1,673,579 copies, making it the second-best-selling UK national newspaper after The Sun, with more than 2 million copies sold per day. The Daily Mail reports national and international news, with an emphasis on British politics and economics. The paper also focuses its attention on a number of themes that are characteristic of the tabloid genre: lifestyle, show business, celebrity news, and the British royal family. Like The Telegraph, The Mail has exclusively supported the Conservatives in every general election after the Second World War. In 2015, the newspaper expressed its loyalty to the UK Independence Party (UKIP) in several constituencies, which means the paper is located on the right of the political spectrum. The Daily Mirror The Daily Mirror was founded in 1903 as a conservative, middle-class massmarket newspaper; its readers were predominantly the metropolitan middle class (Conboy 2011: 111). By the late 1930s, however, The Mirror had turned into a working-class newspaper, filling the vacant position in the market for a paper that would appeal to a working-class audience (ibid.). Since then, The Daily Mirror has been known as a left-wing popular newspaper (refer to Table 1.1). It is the only tabloid that has exclusively supported Labour in every general election since 1945. At present, the newspaper is still one of the largest British tabloids, with daily sales amounting to more than one million copies in 2013. Nonetheless, these sales figures leave The Mirror trailing behind the right-wing tabloids The Sun and The Mail, as is evidenced in Table 1.2 and Table 1.3. The Sun The Sun was founded in 1964 as the successor to The Daily Herald. In 1969, the newspaper was purchased by Rupert Murdoch’s News Corp and became a tabloid that has been the best-selling newspaper in the UK since the late 1970s, with more than 2 million copies sold daily in 2013 and more than 1.5 million copies sold
24
Ideology in the contemporary media
in 2018 – refer to Table 1.2 and Table 1.3, which outline the daily sales in 2013 and 2014, as well as in 2016 and 2018, respectively. Throughout its history, the newspaper has changed its political allegiance several times. Until the 1980s, it had supported Labour, but then it started endorsing the Conservatives, and in the late 1990 and 2000s, the paper supported Tony Blair’s Labour Party. Since 2010, The Sun has been backing the Conservative Party. Press popularisation (BrE) and circulation figures The popular press has had a major impact on the development of British journalism in the 20th century. While much of this development has been positive, the popularisation of early-20th-century mass-market journalism also had a negative cultural effect (Leavis 1932), cited in Conboy (2011: 19). The mass-market papers caused – or at least increased – a cultural division, creating mass working-class readerships. Before the development of the popular press, the working-class readership had been marginalised within a lower public sphere of weekly newspapers. The popular daily press saw the working class as an economically attractive mass readership and a market for advertising. What’s more, the majority of the subsequent developments within journalism have been structured around mass-market journalism as the dominant and defining model; even the quality press orient their style and language towards the news values of the tabloids (Conboy 2011: 20, 109). British print media has been experiencing a constant decline in circulation in the 21st century. Table 1.2 provides the average daily sales figures for the five aforementioned newspapers in June 2013 and June 2014, giving a percentage of the annual shift in daily sales. These five newspapers provided the data for the Pre-Referendum Corpus of 2013–2014. As Table 1.2 clearly shows, all five newspapers have experienced a decline in their sales; The Sun has been the most significantly impacted, suffering a 9.37% decrease in sales. Likewise, the sales figures of The Daily Mirror, The Daily Mail, and The Daily Telegraph have declined by 7.71%, 7.36%, and 5.94%, respectively. The left-wing broadsheet The Guardian, however, has experienced an insignificant fall in sales amounting to less than 1%. A similar tendency towards a steady decline in circulation can be seen in the figures from 2016 and 2018. Table 1.3 provides the average daily sales of these newspapers in January 2016 and January 2018, as well as the percentage of
Table 1.2 Daily circulation figures for British national newspapers in 2013 and 2014 British daily newspapers
June 2013
June 2014
Per cent change (%)
The Sun The Daily Mail The Daily Mirror The Daily Telegraph The Guardian
2,243,903 1,806,569 1,038,753 547,106 187, 000
2,033,606 1,673,579 958,674 514,592 185,313
−9.37% −7.36% −7.71% −5.94% −0.9%
Source: Audit Bureau of Circulations (UK)
Ideology in the contemporary media 25 Table 1.3 Daily circulation figures for British national newspapers in 2016 and 2018 British daily newspapers
January 2016
January 2018
Per cent change (%)
The Sun The Daily Mail The Daily Mirror The Daily Telegraph The Guardian
1,787,096 1,589,471 809,147 472,033 164,163
1,545,594 1,343,142 583,192 385,346 152,714
−13.5% −15.5% −27.9% −18.4% −7%
Source: Audit Bureau of Circulations (UK)
change in daily sales between these years. The data for the Post-Referendum Corpus of 2016–2016 was retrieved from these newspapers. The sales figures show that the right-wing tabloids, The Sun and The Mail, are still the two best-selling newspapers in the UK and have a much higher circulation in comparison to the left-wing tabloid The Daily Mirror. Likewise, the circulation figures for conservative broadsheet The Daily Telegraph are higher than those of the left-wing The Guardian. The clear dominance in terms of circulation leads us to conclude that the right-wing press ultimately reaches a far broader audience than the left-wing newspapers, thereby playing a key role in terms of forming and spreading dominant discourse. Similarly, Hart (2010: 18–19), referring to the average daily sales from 2000 to 2006, notes that right-wing publications have the highest circulation out of all the newspaper formats in Britain (tabloids, broadsheets, and mid-market publications); it is this right-wing voice which is therefore most widely heard. Finally, it can be argued that the pivotal role the print media once played is now rapidly declining in the modern world of information technologies. The British press is gradually losing its discursive control over the public due to its constant decline in sales, leading it to yield to more popular forms of contemporary media, such as TV broadcasting, the Internet, and social networks like Facebook. However, Gabrielatos & Baker (2008: 9) argue that the ever-increasing availability of online newspaper articles has expanded the role of the press in contemporary society. What’s more, newspapers are ultimately able to increase their influence over a younger generation of readers who generally prefer the Internet over newspapers and television as their main source of information (cf. Coleman et al. 2002). According to the National Readership Survey (NRS), digital readership of all five newspapers in question greatly outnumbers the number of print copies sold. In November 2018, The Sun’s print and online versions combined reached a monthly sale figure of 29.286 million copies, making The Sun the UK’s most widely read national newspaper. Another right-wing tabloid, The Daily Mail, together with the Mail Online, had a combined monthly reach of 29.280 million. The online and print versions of The Guardian combined reached an average monthly readership of more than 25 million, making it the most-read quality newspaper and putting it ahead of the conservative Telegraph with a readership of 22.7 million. Table 1.4 provides the average monthly print and digital readership of the five newspapers in question for November 2018.
26
Ideology in the contemporary media
Table 1.4 Average monthly print and digital readership for November 2018 (in 1,000 copies) British daily newspapers
Sun titles & website The Mail titles & website The Guardian & The Observer & their website Mirror titles & website The Telegraph titles & website
Print
Digital
Total
mobile
tablet
desktop
8,250 7,821 3,476
21,174 19,841 16,920
3,209 3,134 2,860
3,210 5,123 8,179
29,286 29,280 25,210
4,246 3,303
17,904 14,988
2,305 2,814
2,963 5,970
23,963 22,741
Source: Pamco (formerly the National Readership Survey)
1.4 Summary: discourse as a form of social practice Section 1.1 introduced modern approaches to discourse analysis, primarily focusing on the development of Critical Discourse Analysis (CDA), which is the main theoretical framework upon which this book is based. CDA scholars define discourse as a form of social practice that reproduces unequal power relations between classes, genders, and ethnic and cultural groups through the discursive representation of these social groups, and thus has an ideological impact. CDA is an oppositional and critical way of investigating language, discourse and communication and is primarily interested in exposing the structural relationships of dominance, discrimination, and control that are manifested in different forms of language. Section 1.2 explained how social power is enacted by the elites and maintained through preferential access to and control over various forms of public and institutional discourses (parliamentary debates, governmental reports, academic and educational discourses), as well as how dominant discourses being subject to a process of naturalisation (becoming common sense) contributes to the perpetuation of unequal power relations. The media is one of the most powerful tools in terms of shaping and disseminating discourses and, as such, is of particular interest to critical investigation. Section 1.3 outlined the relationships between five major national British newspapers and the political parties in the UK and how these newspapers represent and disseminate specific ideologies within the establishment. Chapter 2 will illustrate the advantages of combining CDA and corpuslinguistic methods, while the corpus-based case studies outlined in Chapters 3 to 6 will identify the differences in the linguistic patterns and discursive strategies employed in the representations of the key social actors in migration discourse, which are indicative of ideological bias.
References Angermuller, Johannes, Dominique Maingueneau & Ruth Wodak (eds.). 2014. The Discourse Studies Reader: Main Currents in Theory and Analysis. Philadelphia & Amsterdam: John Benjamins.
Ideology in the contemporary media 27 Baker, Paul. 2006. Using Corpora in Discourse Analysis. London & New York: Continuum. Bayley, Paul & Geoffrey Williams (eds.). 2012. European Identity: What the Media Say. Oxford: Oxford University Press. Blommaert, Jan. 2005. Discourse: A Critical Introduction. Cambridge: Cambridge University Press. Brown, Gillian & George Yule. 1983. Discourse Analysis. Cambridge: Cambridge University Press. Butler, David & Gareth Butler. 2000. Twentieth Century British Political Facts 1900–2000. 8th ed. Basingstoke: Macmillan Press. Chilton, Paul. 2004. Analysing Political Discourse. London & New York: Routledge. Coleman, Stephen, Barry Griffiths & Eleanor Simmons. 2002. Digital Jury – The Final Verdict. London: Hansard Society. Conboy, Martin. 2011. Journalism in Britain: A Historical Introduction. Los Angeles: Sage. de Saussure, Ferdinand. (1916) 1959. Course in General Linguistics. Charles Bally (ed.). New York: Philosophical Library. Fairclough, Norman. 1992. Discourse and Social Change. Cambridge: Polity Press. Fairclough, Norman. 1995a. Critical Discourse Analysis. London: Longman. Fairclough, Norman. 1995b. Media Discourse. London: Hodder Arnold. Fairclough, Norman. 2010. Critical Discourse Analysis: The Critical Study of Language. Harlow, Munich: Longman. Fairclough, Norman. 2015. Language and Power. 3rd ed. London & New York: Routledge. Fairclough, Norman & Ruth Wodak. 1997. Critical Discourse Analysis. In Teun van Dijk (ed.) Discourse as Social Interaction. London: Sage. Foucault, Michel. 1971. L’ordre du Discours. Paris: Gallimard. Foucault, Michel. 1972. The Archaeology of Knowledge and the Discourse on Language. New York: Pantheon Books. Fowler, Roger. 1991. Language in the News: Discourse and Ideology in the Press. London: Routledge. Fowler, Roger. 1996. On Critical Linguistics. In Rosa Caldas-Coulthard & Malcolm Coulthard (eds.) Texts and Practices: Readings in Critical Discourse Analysis. London: Routledge. Fowler, Roger, Robert Hodge, Gunter Kress & Tony Trew. 1979. Language and Control. London: Routledge. Gabrielatos, Costas & Paul Baker. 2008. Fleeing, Sneaking, Flooding: A Corpus Analysis of Discursive Constructions of Refugees and Asylum Seekers in the UK Press 1996–2005. Journal of English Linguistics, 36 (1), 5–38. Gee, James Paul. 1999. An Introduction to Discourse Analysis: Theory and Method. London & New York: Routledge. Gumperz, John. 1982. Discourse Strategies. Cambridge: Cambridge University Press. Halliday, Michael Alexander Kirkwood. 1978. Language as Social Semiotic. London: Edward Arnold. Halliday, Michael Alexander Kirkwood. 1985. An Introduction to Functional Grammar. London: Edward Arnold. Halliday, Michael Alexander Kirkwood. 2007. Language as Social Semiotic: Towards a General Sociolinguistic Theory. In Jonathan Webster (ed.) Michael Alexander Kirkwood Halliday, Language and Society. London: Continuum, 169–202. Harris, Zellig S. 1952. Discourse Analysis. Language, 28, 1–30. Hart, Christopher. 2010. Critical Discourse Analysis and Cognitive Science: New Perspectives on Immigration Discourse. Basingstoke: Palgrave Macmillan.
28
Ideology in the contemporary media
Kress, Gunther & Theo van Leeuwen. 1996. Reading Images: The Grammar of Visual Design. London & New York: Routledge. Leavis, Queenie Dorothy. 1932. Fiction and Reading Public. London: Bellew. Martin, James R. & Ruth Wodak. 2003. Re/reading the Past: Critical and Functional Perspectives on Time and Value. Amsterdam & Philadelphia: John Benjamins. Richardson, John E. 2007. Analysing Newspapers: An Approach from Critical Discourse Analysis. London: Palgrave Macmillan. Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge. Stubbs, Michael. 1983. Discourse Analysis: The Sociolinguistic Analysis of Natural Language. Chicago: Chicago University Press. Taylor, Charlotte & Anna Marchi (eds.). 2018. Corpus Approaches to Discourse: A Critical Review. Oxon & New York: Routledge. van Dijk, Teun A. 1991. Racism and the Press. London: Routledge. van Dijk, Teun A. 1993. Principles of Critical Discourse Analysis. Discourse and Society, 4 (2), 249–283. van Dijk, Teun A. 1995. Aims of Critical Discourse Analysis. Japanese Discourses, 1 (1), 17–27. van Dijk, Teun A. 1996. Discourse, Power and Access. In Rosa Caldas-Coulthard & Malcolm Coulthard (eds.) Texts and Practices: Readings in Critical Discourse Analysis. London: Routledge, 84–104. van Dijk, Teun A. 1998. Ideology: A Multidisciplinary Approach. London: Sage. van Dijk, Teun A. 2001. Critical Discourse Analysis. In Deborah Schiffrin, Deborah Tannen & Heidi E. Hamilton (eds.) The Handbook of Discourse Analysis. London: Blackwell, 352–371. van Dijk, Teun A. 2006. Ideology and Discourse Analysis. Journal of Political Ideologies, 1 (2), 115–140. Widdowson, Henry G. 2000. On the Limitations of Linguistics Applied. Applied Linguistics, 21 (1), 3–25. Widdowson, Henry G. 2004. Text, Context, and Pretext: Critical Issues in Discourse Analysis. Malden, MA: Blackwell. Widdowson, Henry G. 2007. Discourse Analysis. Oxford: Oxford University Press. Williams, Glyn. 1999. French Discourse Analysis: The Method of Post-Structuralism. London & New York: Routledge. Wodak, Ruth. 1996. Disorders of Discourse. London: Longman. Wodak, Ruth. 2001. The Discourse-Historical Approach. In Ruth Wodak & Michael Meyer (eds.) Methods of Critical Discourse Analysis. London: Sage, 1–13.
2
Corpus-based discourse analysis Data collection and corpus construction
2.1 Central principles in the study of language Contemporary linguistics has undergone considerable changes in recent decades, with developments in computer technologies providing the strongest impetus for the growth and evolution of language corpora. Computer-assisted studies, computational methods and access to large collections of electronically stored texts have considerably changed the methods of modern linguistic analysis. Furthermore, the focus of analysis has shifted to studying the use of real language in written and spoken discourse with reference to the functions of language in social institutions (Stubbs 1993: 1). As a result, the textual analysis of naturally occurring language data has become a major trend in linguistics. In an introductory article on British traditions in text analysis, Stubbs presents nine central principles of the study of language formulated in British linguistics in recent decades and further developed by corpus linguists in concrete ways. Seven of these principles, which are of particular relevance to the case studies featured in this book, are listed here and then expanded upon in some detail, as they serve as a starting point for the construction of the newspaper corpora used in said studies and their subsequent linguistic analysis: 1 2 3 4 5 6 7
Language should be studied in actual, attested, authentic instances of use, not as intuitive, invented, isolated sentences. Language should be studied as whole texts, not as isolated sentences or text fragments. Texts and text types must be studied comparatively across text corpora. Linguistics is concerned with the study of meaning; form and meaning are inseparable. There is no boundary between lexis and syntax; lexis and syntax are interdependent. Much language use is routine. Language in use transmits culture. (Stubbs 1993: 2)
In his first point, Stubbs criticises traditional methods of linguistic analysis that are primarily based on the analysis of invented sentences and points out that past
30
Corpus-based discourse analysis
analyses of these sentences have also not been particularly extensive. Stubbs stresses the fact that very little authentic linguistic data has been analysed in the most influential literature in 20th-century linguistics (Stubbs 1993: 8). For instance, in Course in General Linguistics, proposing to turn to the observation of speech, de Saussure (1916) does not analyse any textual data and gives very few examples of authentic language use. Bloomfield (1933: 22) constructs his argument on an entirely hypothetical story involving Jack, Jill, and an apple. Likewise, Chomsky (1957, 1965) analyses a number of sentences that are all invented by the author, and even Austin (1962) and Searle (1969) base their speech act theory on invented examples. However, it is worth bearing in mind that linguists working in the first half of the 20th century did not have access to large electronically stored collections of authentic language data that could be analysed with computer-assisted software. Contemporary linguistics, which employs corpora, overcomes these issues by following the central principle in the study of language, which states that language should be studied in actual, attested, authentic instances of use. That much of the language data that has been analysed has been invented or analysed as isolated sentences also contradicts Firth’s contextual theory of meaning, which states that “the complete meaning of a word is always contextual, and no study of meaning apart from a complete context can be taken seriously” (Firth 1957: 7; emphasis added). Firth’s contextual theory of meaning corresponds to the second principle mentioned by Stubbs (1993), which states that the unit of linguistic analysis should be a whole text. We communicate not by means of isolated words or sentences, but by means of texts, which consist of words, clauses, and sentences, which is why a text is a basic unit of meaning. A text is a semantic unit; not a unit of form, but of meaning (cf. Halliday 1994). The second principle described by Stubbs (ibid.) was used as the basis for constructing the newspaper corpora employed in this book and their subsequent analysis; this principle therefore justifies the combination of qualitative and quantitative methods. Stubbs (1993: 11) mentions that this principle was not fully followed in the construction of first-generation corpora such as Brown and London-Oslo-Bergen (LOB), which contain 500 samples of only 2,000-word text fragments. Nowadays, linguists tend to include whole texts when constructing modern-language corpora; the Bank of English (BoE) is one of the monitor corpora that follows this principle. The third principle asserts that texts and text types must be studied comparatively across language corpora, and that it is only in this way that corpus linguists will be able to identify certain patterns that are typical of different text types and genres, and thus be able to systematically trace linguistic variations. Sinclair (1991: 4) argues that “language in use is characterised by spectacular regularities of pattern with endless variation”. A comparative corpus approach helps linguists identify the varying frequency of lexical and grammatical features, allowing them to draw conclusions about what is systematic and what is random in certain types of texts and genres. The newspaper corpora analysis featured in this book was conducted in accordance with this principle; the five newspapers selected all correspond to different types of British media outlets and are comparatively analysed
Corpus-based discourse analysis
31
in order to identify linguistic differences and similarities typical for left- and right-wing broadsheets and tabloids. The third principle is also closely related to the notion of intertextuality, which assumes that most texts make intertextual references and that most (newspaper) texts are consequently influenced or even shaped by prior texts. This book addresses the issue of intertextuality in Chapters 5 and 6, providing an intertextual analysis along with an outline of the primary texts that form the basis of the newspaper articles discussed. The fourth principle posits that grammar cannot be studied independent of meaning; form and meaning are therefore inseparable. While contemporary linguistics assumes that grammar and semantics are not autonomous linguistic entities, this point of view was not always prevalent. For instance, Chomsky (1957: 17) stated the opposite in his early works, and his viewpoint was dominant in much of US linguistics throughout the second half of the 20th century. The fifth principle emphasises the interdependence of meaning and grammar, establishing a close interrelationship between lexis and syntax. A simple test justifies the argument that the two are interdependent: if syntax and lexis were independent systems, one would expect that a given selection of lexemes would be evenly distributed over different grammatical positions in the clause such as subject, object, adjunct, etc. Using the COBUILD corpus, Francis (1991) shows that the distribution of different lemmas in the same syntactic positions is highly uneven. Francis (ibid.) and other studies (Sinclair 1991, 2004: 17, 31) in corpus linguistics were able to identify new relationships between the co-patterning of different forms and senses of words and syntactic patterns. Stubbs (1993: 16) points out that even different forms of a single lemma have different grammatical distribution. This means that any syntactic structure will restrict the lexis that occurs in it, and, conversely, any lexical item can be specified in terms of the structures in which it occurs (Francis 1991). Following this principle, Section 2.4 introduces the method of distinctive collexeme analysis (Gries & Stefanowitsch 2004), and in Chapter 3, this method is applied to constructions that are functionally and structurally similar to one another (collocations of the word migrant and immigrant) in order to identify the distributional differences between the members of synonymous word pairings. The sixth principle discussed by Stubbs (1993) asserts that much language use is routine. This principle implies the assumption that language consists of a large number of set patterns, frames, and structures that are recurrent, and therefore often predictable. This hypothesis corresponds to the core principles of corpus analysis which aim to identify recurrent linguistic phenomena. Finally, the seventh principle posits that language in use transmits culture. Language in general and lexicalisation in particular are seen as ways of articulating and thereby transmitting human experience. Referring to prior research on English and German lexis, Stubbs (1993: 21; emphasis added) states that the study of lexis eventually provides clear insight into culture: The study of fixed expressions, idioms, clichés and recurrent wordings can therefore be given a cultural interpretation through a study of how the culture is expressed in lexical patterns. Keywords, in the sense of fixed expressions,
32
Corpus-based discourse analysis are both linguistic and cultural units, where the interplay of language and ideology can be studied.
To expand on Stubbs’s argument, it is necessary to mention that it is not only lexical patterns that are capable of expressing cultural meaning. In this book, the linguistic analysis applied to the study of the modern-day British press shows that both syntactic and semantic levels can express ideology and should thus be studied in combination. Metaphorical motifs are ideologically charged and act as the primary evaluative mechanisms employed by the press to promote their ideological positions. The aforementioned principles of language study are of crucial importance to this line of research and are followed systematically in both the construction of the newspaper corpora referred to in this book and in their subsequent detailed analysis.
2.2 Corpus linguistics as a methodology Frequency and systematicity Corpus linguistic methods involve a relatively new approach to the study of language and have become one of the most prevalent methodologies in modern linguistics. It is, however, important to understand that corpus linguistics is not a “type” of linguistics in the same sense as cognitive, generative, or functional linguistics are; neither is it an aspect of language like grammar, lexis, or syntax. McEnery & Wilson (2001: 2; emphasis added) state that corpus linguistics is “a methodology that may be used in almost any area of linguistics, but it does not truly delimit an area of linguistics itself”. Importantly, a corpus-based approach can even challenge some existing linguistic theories. McEnery & Gabrielatos (2006: 33) argue that the analysis of corpora has not only reinforced the findings of descriptive linguistics, but has also enhanced theoretically oriented linguistic research. Contemporary corpus linguistics represents a heterogeneous set of methods that can be combined with other methods of analysis. Crucially, corpus linguistics represents an empirical approach to the study of language and involves the observation of naturally occurring or authentic language data that is collected, annotated, and put together to form a corpus. Broadly speaking, a language corpus is a collection of texts in electronic form that are selected according to specific criteria relevant to a particular line of linguistic enquiry. A classic definition from Sinclair (2004), one of the pioneers of corpus linguistics, states: A corpus is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research. However, not all researchers see corpus linguistics as a mere methodological tool. There are at least two distinct methodological approaches to conducting corpus
Corpus-based discourse analysis
33
linguistics, known as corpus-based and corpus-driven approaches to the study of language (Hardie & McEnery 2010; McEnery & Hardie: 2012: 6, 147). While the corpus-based approach uses corpus data to explore a theory or hypothesis, corpus-driven linguistics and its adherents (Baker et al. 1993; Sinclair 1991, 2004: 12, 191–193), sometimes referred to as “neo-Firthians”, see the corpus itself as the only source for forming hypotheses about language. Tognini-Bonelli (2001: 84–85) states: Examples are normally taken verbatim, in other words they are not adjusted in any way to fit the predefined categories of the analyst; recurrent patterns and frequency distributions are expected to form the basic evidence for linguistic categories; the absence of a pattern is considered potentially meaningful. Linguists working in the tradition of the corpus-driven approach turn to large corpora in order to formulate hypotheses about language, rather than to confirm existing hypotheses that stem from traditional pre-corpus research. Sinclair (2004: 191) even argues in favour of processing raw texts rather than pre-tagged texts in order to be able to observe linguistic patterns without referring to traditional categories of analysis. It is, however, important to note that there is substantial overlap, not only of practice, but also of conceptual apparatus, between these two traditions (Hardie & McEnery 2010: 389). Nowadays, language researchers adopt methodologies from both traditions; initially, it was empirical data that united these approaches and caused them to coexist. The corpus-based approach to the study of newspaper texts referred to in this book is used to confirm a number of hypotheses, but at the same time, both the analysis of expanded concordances and close reading of full texts are in line with a corpus-driven method that often provides new insights into the research of modern discourse. Needless to say, corpus linguists require special software to enable them to work with the large collections of texts. In addition, language data should be in a machine-readable format that will allow researchers to perform a quick and reliable analysis. Concordance software was designed to perform several different tasks. It displays the words or phrases to be analysed together with a given amount of context around – that is, proceeding and following – these words and phrases, which are commonly referred to as concordances. Modern concordancers can also facilitate more advanced analyses, such as the production of word frequency and keyword lists. Describing functions that modern concordancers are able to perform, McEnery & Hardie (2012: 2; emphasis added) describe how two seemingly opposed methods of linguistic analysis can be combined: “Concordances and frequency data exemplify respectively the two forms of analysis, namely qualitative and quantitative, that are equally important to corpus linguistics”. Generally, a corpus-based method seeks to provide a more systematic approach to the study of language. First, large amounts of data help analysts to formulate hypotheses about certain tendencies; the analysis of significant amounts of data usually shows which tendencies are normal or typical in everyday or “real-life”
34
Corpus-based discourse analysis
language use. Second, a corpus-based analysis can reveal instances of rare or even exceptional cases of language use that would otherwise be almost impossible to find and that might also be of relevance to a specific line of research. Rare cases of use cannot be identified by means of qualitative analysis, which tends to focus on single passages or texts, rather than large collections of texts. However, even corpora that contain hundreds of millions of words are, by their nature, incomplete, and this can be attributed to the nature of human language itself. Language is non-enumerable and, hence, no finite corpus can adequately represent language. Corpora are ‘skewed’. Some sentences are in the corpus because they are frequent constructions, some by sheer chance. (McEnery & Wilson 2001: 10) Consequently, corpora can help researchers to identify both regularities and exceptions or non-standard uses in language, and some rare linguistic structures and patterns can only be identified with the help of a corpus. This leads us to two key principles of corpus linguistics: the frequency of occurrence and the systematicity of linguistic analysis. McEnery & Wilson (2001: 15) stress the importance of the frequency of occurrence of language patterns and structures: Human beings have only the vaguest notion of the frequency of a construct or word. Natural observation of data seems the only reliable source of evidence for such features as frequency. [C]orpora are sources of quantitative information beyond compare. Language corpora are thus reliable sources for conducting a quantitative analysis, which makes them very powerful tools from a scientific perspective. Although frequency of occurrence and statistical tests constitute its fundamental principles, corpus linguistics is far from a mere quantitative approach to the study of language. The rapid development of computer technologies has given rise to the creation of advanced language corpora; Leech (2013: 1) notes that “as the power and capacity of computers have increased, corpora have increased in size, variety and ease of access”. Refining Leech’s point, McEnery & Wilson (2001: 17; emphasis added) state: [T]he interest in the computer for the corpus linguist comes from the ability of the computer to carry out the processes of searching for, retrieving, sorting and calculating linguistic data, whether that be textual (most common) or digitized speech (increasingly common). Corpus annotation Further developments in IT have made it possible to not only store and process large collections of texts, but also to annotate them – to provide language corpora
Corpus-based discourse analysis
35
with additional linguistic and meta-linguistic information in order to specify the grammatical characteristics of a text, a function that is generally performed by part-of-speech (pos) tagging software. Leech (2013: 1) defines annotation as “the practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language data”. The term annotation can also refer to the end product of this process. Linguistic annotation that involves the attachment of special codes to words is often referred to as tagging, while the codes that are assigned are known as tags. There are several types of annotation at different levels of linguistic analysis: phonetic, grammatical, syntactic, and semantic, among others. Table 2.1 is an abridged table adapted from Leech (2013: 12) that provides a list of the most common types of linguistic annotation. Part-of-speech, also known as grammatical annotation, is the most basic type of annotation that can be applied to modern language corpora. Part-of-speech annotation assigns a tag indicating the part of speech (for example, noun, pronoun, lexical verb, or auxiliary verb) to each lexical unit in a corpus. This type of annotation is the most widespread and has been applied to many corpora of different languages. One of the reasons for this is that pos-tagging can be accomplished by a computer to a high degree of accuracy with minimal manual intervention. Grammatical tagging thus makes the search for a word class much easier, and the linguistic analysis gains accuracy. The first corpus to be tagged according to word classes was the Brown Corpus. This corpus contains roughly one million words and was compiled from written works published in the USA in 1961. The name of the tagger used was TAGGIT, which created a tagset of 77 different word-class labels that helped to identify not only major parts of speech (noun, verb, adjective, etc.), but also values defining subclasses, such as singular and plural nouns, and comparative and superlative adjectives (Leech 2013: 8). A second major tagging project was the tagging of the Brown Corpus’s British counterpart, the LOB, in 1979–1982. The tagger used was CLAWS, and the success rate of automatic tagging leapt from 77 per cent – the
Table 2.1 Basic types of corpus annotation Linguistic level
Annotation carried out so far
Orthographic Phonetic/Phonemic
This is generally considered part of a “mark-up” Widespread in speech science; typically collected in laboratory situations Two or three prosodically annotated corpora are available for widespread use The most widespread type of corpus annotation; has been applied to many languages This is the second most widespread type of corpus annotation, and it is rapidly developing Some exists, and more is developing Little exists, but some is developing Little exists, but some is developing
Prosodic Part-of-speech/ Grammatical tagging Syntactic/Partial parsing Semantic Discoursal Pragmatic/Stylistic
36
Corpus-based discourse analysis
outcome of the first automatic tagging of the Brown Corpus – to 96.7 per cent. Today, similar degrees of success are being achieved by most teams of linguists who are tagging their corpora. However, part-of-speech tagging is still not an entirely automated process; manual intervention – human post-editing – is still an unavoidable part of any linguistic annotation. One subtype of the part-of-speech tagging includes lexeme or lemma annotation, also known as lemmatisation. A lemma is a group of words that are related to the same base word and differ only by inflection. For example, resettled, resettling, and resettles all belong to the verb lemma RESETTLE. Lemmatisation is the process whereby the words in a corpus are reduced to their respective lexemes – a very useful tool, as it allows the user to search for all forms of a word without having to input all of its possible variants. This type of annotation is especially useful when dealing with languages that have a highly inflected morphology, such as Slavic languages (Russian, Czech, or Polish, for example). Other forms of annotation are more complex and involve syntax, semantics, discourse, and pragmatics. Syntactic or partial parsing is the second-most widespread type of annotation, and the technology behind it is developing rapidly. Leech (2013: 19) also mentions that grammatical tagging can be seen as “a specification of the leaves (or pre-terminal nodes) of the phrase-structure (PS) tree which is a favoured model for syntactic annotation”. McEnery & Wilson (2001: 54) add that corpora that have been parsed are referred to as treebanks, with the term alluding to the tree diagrams taken from syntax textbooks. Two broad types of semantic annotation are: (1) marking items in a corpus according to their semantic roles, such as agent, patient, theme, etc.; and (2) marking the semantic features of a word in a corpus in order to allow researchers to identify how that word relates to a specific theme and, consequently, to a specific semantic field. Discoursal or text-level annotation, as well as pragmatic or stylistic types of annotation, are the least frequently encountered types of linguistic annotation. There are two possible reasons for this. First, these types of annotation are less evolved due to the intense manual work required to construct them which makes the whole process extremely time-consuming. For instance, anaphoric annotation is a subtype of discoursal annotation that indicates the pronoun reference and thus identifies some of the basic methods of cohesion in a text. This is a type of annotation that can only be carried out by human analysts. Secondly, linguistic categories at the level of text are generally context-dependent, and linguistic classifications are often a source of dispute among linguists. The corpus without annotation – the so-called “raw corpus” – presents an authentic collection of texts. Annotations are superimposed onto texts, providing corpora with “added value” or, as Leech (2013: 5) notes, “it adds overt linguistic information, which can then be used for a multitude of purposes”. However, not all corpus linguists share this point of view; Sinclair (2004: 191), for example, insists on the analysis of raw language data.
Corpus-based discourse analysis
37
2.3 Data collection, types of corpora, and newspaper corpora Principles of data collection The principles of data collection, as well as the processes involved in the subsequent construction and annotation of a corpus are among the major issues of concern in contemporary corpus linguistics. Ideally, data should be systematically collected and annotated in order to ensure that a corpus is representative of a particular language, language variety, or genre, such as political or media discourse. Both corpus construction and annotation are therefore laborious and time-consuming processes, which usually include the following steps: 1 2 3 4 5
Designing a corpus in accordance with a research question; collecting relevant language data, for example, scientific texts, political speeches, parliamentary debates, newspaper editorials, or literary texts; encoding the corpus; assembling and storing metadata, such as information about the author(s), the publication date, or the genre of a text; and marking up the texts and adding linguistic annotation, for example, part-ofspeech tagging and syntactic and semantic annotations.
The term corpus (from Latin: body) was once used to refer to any collection of texts; within the context of modern corpus linguistics, however, the term is generally used to refer to the following specific criteria identified by McEnery & Wilson (2001: 29; authors’ emphasis): 1
2
3 4
A corpus needs to be representative of a language, language variety, or genre; thus, it should be sampled accordingly to act as a standard reference point for what can be seen as typical within language in general or in the language variety or genre being researched. The term corpus tends to imply a body of texts that is finite in size. However, there are some exceptions to this rule: monitor corpora, such as the Bank of English, or the Corpus of Contemporary American (henceforth COCA), are updated on a weekly, monthly, or yearly basis and are therefore open-ended. Modern language corpora are usually large collections of machine-readable texts that are often annotated with additional meta-information and grammatical information such as part-of-speech tagging. A corpus is supposed to represent a standard reference and it should be readily available to other researchers in order for them to be able to carry out replica studies, thus minimising the possibility of different results due to the differing language data.
Summarising the aforementioned criteria, McEnery & Wilson (2001: 32) define the prototypical corpus as “a finite-sized body of machine-readable text, sampled
38
Corpus-based discourse analysis
in order to be maximally representative of the language variety under consideration”. Most language corpora are carefully designed in accordance with a specific research question, which is why modern corpora vary considerably in size, methods of sampling, and degree and type of linguistic annotation. Different types of modern-language corpora are described in the following section, as well as the corpora that are employed as reference corpora in the case studies featured later in this book. Types of corpora A fundamental distinction can be drawn between general or reference corpora and specialised corpora (Hunston 2002: 14–15). A general corpus is a prototypical corpus that can contain up to hundreds of millions of words drawn from a wide range of text types and genres. As such, reference corpora aim to represent the general principles of a language or language variety – for example, British, Australian, or US English – whereas the parameters of specialised corpora depend on the source and scope of the linguistic research. Hunston (2002: 14) notes that there is no limit to the degree of specialisation involved in the construction of a corpus. A specialised corpus may be restricted to a specific time frame, particular text types, or a specific topic. Corpora of general English can be divided into three generations (Butterfield 2008). The Brown Corpus (Francis & Kučera 1964), published in 1961 and containing about 1 million words of written Standard American English (SAE), belongs to the first generation. The Brown Corpus is famous for its balanced sampling frame, containing samples comprising roughly 2,000 words each from 500 different text sources (press, general prose, learned writing, and fiction). The Brown Corpus’s British counterpart, known as the LOB (Lancaster-Oslo-Bergen), was created between 1979 and 1982, and contains British writings from 1961. The Brown sampling frame was used as a model to create the new corpora that now comprise the Brown Family of corpora. Today, there are eight one-millionword corpora that are collectively referred to as the Brown Family (cf. Baker 2017: 6–11). Four of the corpora are representative of written US English from 1930 (Before-BROWN), 1961 (BROWN), 1992 (Freiburg-BROWN), and 2006 (AmE06); the other four corpora are representative of written British English from 1931 (B-LOB), 1961 (LOB), 1991 (FLOB), and 2006 (BE06). The Brown Family of corpora provides linguists with an opportunity to investigate the differences between British and US English varieties, while controlling their sampling frame and taking into account the effects of diachronic change. The British National Corpus (BNC1994) belongs to the second generation of general corpora. The BNC (Aston & Burnard 1998) is a 100-million-word collection of samples of written (90%) and spoken language (10%) drawn from a wide range of sources, designed to represent a broad cross section of British English from the early 1990s. The successor to the original BNC, a 100-millionword corpus called the British National Corpus 2014 (BNC2014) is currently
Corpus-based discourse analysis
39
being compiled by Lancaster University and Cambridge University Press. The first stage of this project has already been completed, with the spoken part of the BNC2014 (Spoken BNC2014) released in the autumn of 2017 on CQPweb’s Lancaster server (Love et al. 2017). The spoken BNC2014 contains transcripts of conversations collected from members of the UK public between 2012 and 2016. A team of researchers at Lancaster University, led by Vaclav Brezina and Tony McEnery, is currently working on the written part of the BNC2014, which will contain text samples drawn from fiction writing, academic journals, the press, and blogs and will enable diachronic comparisons with the original BNC (BNC1994). The Bank of English (BoE), which currently contains 650 million words (as of April 2020), also belongs to the larger, second generation of corpora. Corpora that exceed the one-billion-word mark are referred to as thirdgeneration corpora, such as the Cambridge International Corpus with 1 billion words measured in 2007 and the Oxford English Corpus with 2 billion words measured in 2006 (Baker 2010: 12). The Internet is also considered by some to be a special type of reference corpus. Lew (2009) estimates that it contains about five trillion (5,000,000,000,000) word tokens, making it about 50,000 times larger than the BNC1994. The English Web Corpus, also known as enTenTen Corpora, is a corpus that is made up of texts collected from the Internet. For example, English Web Corpus 2015 (enTenTen15) contains more than 15 billion words. In many ways, the Web is similar to a monitor corpus that is updated on a monthly or yearly basis, whereby web search engines such as Google and Yahoo can be seen as equivalent to a query processor used to search a corpus. However, the Web as a corpus raises a number of issues. Firstly, in contrast to most corpora that usually consist of carefully selected and annotated texts, the content of the Web is “an undifferentiated mass” that would require “a great deal of processing to sort into meaningful groups of texts” (McEnery & Hardie 2012: 7), and the data retrieved from the Internet would not automatically be divided into their respective genres or text types. Secondly, Internet data is prone to containing various kinds of errors and inaccuracies; hence, the data retrieved from the Internet often requires additional processing in order to be usable. Nonetheless, McEnery & Hardie (2012: 8) still see the potential utility of using the Internet as a corpus, concluding that it “does undoubtedly provide a substantial volume of data which can be selected and prepared to produce corpora suitable for a wide variety of purposes”. Likewise, Baker (2010: 13; emphasis added) argues: The web is . . . a potentially useful electronic ‘corpus’, but we should not view it as particularly balanced or representative of other types of language use, nor should we abandon projects that aim to create smaller, more carefully constructed reference corpora. Another distinction is commonly drawn between the two existing broad data collection approaches: the monitor corpus approach (Sinclair 1991: 24–26;
40
Corpus-based discourse analysis
Sinclair 2004: 189, 193) and the balanced corpus or sample corpus approach (Biber 1993). A monitor corpus is a type of corpus that grows continually; new texts are added over time so that it continues to represent the most recent state of the language, as well as earlier periods. The BoE is one example of a monitor corpus (Hunston 2002); this project began in the 1980s as a partnership between the University of Birmingham and Collins-Cobuild publishers. Nowadays, the BoE is a subset of 650 million words from the Collins Corpus – a database of English with over 4.5 billion words (as of April 2020), updated monthly and containing various types of written and spoken data. The Corpus of Contemporary American English (COCA) is a genre-balanced monitor corpus representative of US English (Davies 2010). The corpus currently contains more than 560 million words (as of August 2019) and its data is evenly divided between spoken and written types of texts, including fiction, press, and academic texts. However, the earlier version of the COCA, containing 450 million words from 1990 to 2012 (Davies 2008), was used as a reference corpus for the case studies carried out for the research presented in this book. Contemporary corpora can be multilingual as well as monolingual, containing subcorpora in several – often related – languages. To be included in a multilingual corpus, texts in the different language subcorpora should be sampled according to the same principles and criteria in order to ensure they are comparable. If a multilingual corpus contains the same texts, for example, translations of James Joyce’s novels into German, Dutch, and Danish, it is referred to as a parallel corpus. In order to bring the parallel subcorpora into a specific relationship with each other they need to be aligned. Aligning is the process of making an explicit link between those elements that are a mutual translation of one another (McEnery & Wilson 2001: 70). Sketch Engine provides access to a number of reference corpora in a variety of languages which are publicly available (Arabic, Hebrew, German, Russian, Turkish, Japanese, and many others). However, it is important to bear in mind that most linguists and discourse analysts usually approach corpora with a specific research question and hypothesis that can be tested with the help of specialised corpora constructed in accordance with a specific sampling frame. Such corpora therefore contain diachronic and synchronic restrictions in terms of which language data may be included. These specialised corpora do not aim to be representative of a whole language, but rather are designed to reflect a language variety or genre at a given point in time. Newspaper corpora employed in this book The remaining part of this subchapter provides a detailed account of the process involved in the construction of two comparable newspaper corpora employed for research into migration discourse in the contemporary British press. The research outlined here pursues two major goals: first, the comparative analysis aims to trace the linguistic differences and similarities (Taylor 2013, 2018) between the construction of migration discourse in the British left- and right-wing press; secondly, the analysis aims to identify if there are (diachronic) changes in the representation
Corpus-based discourse analysis
41
of European migrants and migration within the EU after the 2016 referendum (2016–2018) and compare them with the discourses constructed around migrants in the pre-referendum period (2013–2015). Corpus data: general information In order to achieve the aforementioned aims, two specialised monogeneric and monolingual corpora with a similar sampling frame were compiled and annotated. The articles included in the corpora were taken from five British newspapers with a nationwide circulation. The Guardian (and The Observer) and The Daily Mirror represent left-wing ideology, while The Daily Telegraph, The Daily Mail, and The Sun represent right-wing or conservative ideology. Additionally, these newspapers belong to two different genres: the so-called quality or broadsheet newspapers, and the popular or tabloid press. Section 1.3 explains the choice of corpus data and explores the political allegiances of the newspapers in question. Time frame The Pre-Referendum Corpus consists of 500 newspaper articles from the years 2013–2014, while the Post-Referendum Corpus is comprised of 500 articles from the years 2016–2018. The first period encompasses the two-year period between 2013 and 2014 and is thus representative of the media’s attitude towards both the EU and migration within the Union prior to the referendum of 2016, at a time when Brexit was not yet being extensively discussed. The Guardian subcorpus of the Pre-Referendum Corpus also contains two articles from June and August 2015. These texts contain the two leading metaphorical motifs of WATER and GARDEN, which will be explored in detail in Chapters 3, 4, and 5. The second corpus encompasses the six months before and the two and a half years after the EU referendum that was held on 23 June 2016. The point that separates the two corpora is not the referendum itself, since issues like the referendum and Brexit were extensively discussed from the beginning of 2016 onwards. Types of newspaper articles The newspaper articles collected for both corpora represent different types of texts, including editorials, opinion pieces, and news reports focusing on British and European news, as well as world news. In the Pre-Referendum Corpus, news reports comprise approximately 65–75 per cent of the corpus data, while the rest is made up of editorials and opinion pieces, depending on the newspaper. In the Post-Referendum Corpus, the division into text types is slightly more balanced; around 50 per cent of the data is comprised of news reports, whereas the other 50 per cent is divided evenly between editorials and opinion pieces. There is one exception in The Daily Mail subcorpus; due to the lack of editorials and opinion pieces on migration published between 2016 and 2018, news reports comprise 70
42
Corpus-based discourse analysis
per cent of this subcorpus data. For more specific, detailed information pertaining to the breakdown of text types in each subcorpus, refer to Tables 2.2 and 2.3. Corpus size The articles used to construct the newspaper corpora were retrieved from the official websites of the aforementioned newspapers, downloaded as plain text files and sorted according to the newspaper from which they were extracted, with one file for each newspaper. Table 2.4 provides the number of tokens for each Table 2.2 Number of text types in each subcorpus of the Pre-Referendum Corpus (2013–2014)
The Sun The Daily Mail & The Mail on Sunday The Daily Telegraph & The Sunday Telegraph The Daily Mirror & The Sunday Mirror The Guardian & The Observer
Editorials
Opinion pieces
News reports
Total
25 17
0 10
75 73
100 100
21
4
75
100
0
22
78
100
27
8
65
100
Table 2.3 Number of text types in each subcorpus of the Post-Referendum Corpus (2016–2018) Editorials Opinion pieces News reports Total The Sun The Daily Mail & The Mail on Sunday The Daily Telegraph & The Sunday Telegraph The Daily Mirror & The Sunday Mirror The Guardian & The Observer
25 18 21
25 12 28
50 70 51
100 100 100
24 25
21 25
55 50
100 100
Table 2.4 Total number of tokens in each subcorpus British Newspapers The Sun The Daily Mail & The Mail on Sunday The Daily Telegraph The Daily Mirror & The Sunday Mirror The Guardian & The Observer TOTAL
Pre-Referendum Corpus (2013–2014)
Post-Referendum Corpus (2016–2018)
32,729 89,267
69,011 83,148
79,882 59,986
68,990 69,668
74,042 335,906
95,576 386,393
Corpus-based discourse analysis
43
subcorpus in both corpora. The Sun subcorpus, which features articles from 2013 to 2014, is smaller in size due to the shorter length of the articles published by the newspaper during that period. Corpus annotation and software Both corpora were annotated with the help of a part-of-speech tagger called TreeTagger (Schmid 1994), which is a tool for annotating texts with part-of-speech and lemma information. The tagger was developed by Helmut Schmid at the Institute for Computational Linguistics at the University of Stuttgart. Both corpora were lemmatised, which enables users to search for all forms of a word by inputting its lemma. The analysis of the corpora was conducted with the help of a software package cqp@ fu (corpus query processor at the Freie Universität Berlin in Germany). Key themes The topics for the corpora were restricted to the most salient themes within the broader topic of migration: European migration, illegal migration, and asylum policies. The articles used were extracted from the following newspaper sections: Immigration and Asylum, Politics, Welfare, Education (left-wing newspapers), Immigration, Politics, Crime, Terrorism in the UK, and Finance (right-wing newspapers). The titles of these sections were chosen by the newspapers themselves; the fact that the right-wing newspapers assign some articles on migration to sections like Crime and Terrorism is indicative of ideological bias, which will be discussed in detail in the forthcoming chapters. The articles in the Pre-Referendum Corpus were written at the time of the implementation of the Immigration Act 2014, which restricted people without legal status in the UK from obtaining driving licences and bank accounts. This corpus also contains articles that focus on the lifting of transitional employment restrictions for Bulgarians and Romanians in January 2014, as well as the ongoing armed conflict in Syria which has been raging since 2011. The political and social changes that occurred in 2016 (attributable in large part to the EU membership referendum) shifted the focus of attention and, as such, the Post-Referendum Corpus primarily contains articles regarding migration within the EU and Europe more generally, negotiations between the UK and the EU, and Brexit. Interestingly, in the Pre-Referendum Corpus, the lemma Brexit occurs just twice, whereas in the PostReferendum Corpus, the same lemma has 1608 hits. This proves that the ongoing Brexit negotiations and the UK’s relationship with the EU became the primary focus of both media and public attention after the referendum. In general, the two corpora help trace the interrelatedness of a number of primarily migration-related sociopolitical events and their representation in a selection of different UK newspapers. Meta-information Tables 2.5 and 2.6 provide detailed meta-information for the Pre-Referendum and Post-Referendum Corpora, respectively.
44
Corpus-based discourse analysis
Table 2.5 Metadata for the Pre-Referendum Corpus Time period Corpus size Newspapers
Major themes Types of articles Newspaper sections
2013–2014 335,906 tokens left-wing: The Guardian & The Observer, The Daily Mirror conservative/right-wing: The Daily Telegraph, The Daily Mail, The Sun EU migration, illegal immigration, asylum policies Editorials, opinion pieces, news reports UK news, European and EU news, politics, immigration & asylum, business & finances
Table 2.6 Metadata for the Post-Referendum Corpus Time period Corpus size Newspapers
Major themes Types of articles Newspaper sections
2016–2018 386,393 tokens left-wing: The Guardian & The Observer, The Daily Mirror conservative/right-wing: The Daily Telegraph, The Daily Mail, The Sun EU migration and Brexit, illegal immigration, asylum policies Editorials, opinion pieces, news reports UK news, European and EU news, politics, EU referendum, immigration & asylum, business & finances
2.4 Corpus linguistics techniques Collocations The importance of context as one of the basic principles in the study of language was discussed in Section 2.1. Firth (1957, 1968) was one of the first linguists to draw attention to the context-dependent nature of word meaning, stating that “the complete meaning of a word is always contextual, and no study of meaning apart from a complete context can be taken seriously” (Firth 1957: 7; emphasis added). Firth also describes collocation as “actual words in habitual company” (1957: 14, 1968: 182). McIntosh (1961), Sinclair (1966, 1991, 2003, 2004), Halliday (1966) and later other so-called neo-Firthian linguists combined Firth’s ideas with corpus linguistic methods. Since then, collocation has become a key term in corpus linguistics. Collocation has been defined in a number of different ways by linguists, but the basic idea still remains that a word’s meaning is not contained within the word itself, but rather subsists in the associations that the word participates in, such as other words and structures with which it frequently co-occurs (McEnery & Hardie 2012: 122–123). The term collocation can be defined as a sequence of words that tend to co-occur more often than they would be expected to by chance. A collocate is a word that regularly occurs next to or close to a word that is under investigation (often referred to as a node word). McEnery &
Corpus-based discourse analysis
45
Hardie (2012: 123) consider the study of patterns of co-occurrence in a corpus of authentic language data to be “the only way to reliably identify the collocates of a given word or phrase”. The simple definition of a collocation is “co-occurrence patterns observed in corpus data” (ibid.). Collocations primarily function on a semantic level (Sinclair 1991: 115–116). A thorough investigation of repetitive patterns or frequent collocates can help linguists define the meaning of a word more accurately. In Chapters 3 and 4, an analysis of the most frequent right- and left-hand collocates of the words migrant and immigrant helps to clarify the nuances of their meaning and overcome certain problems and inconsistencies found in existing dictionaries. The process of identifying collocations involves a number of criteria, with the following three distinct criteria traditionally used: (1) distance, (2) frequency of co-occurrence, and (3) exclusivity (Brezina et al. 2015: 140). Distance refers to the span between a node word and its collocates; a span of five words to the left and to the right (+/− 5 words) of a node word seems to be one of the most typical spans for identifying collocates. Alternatively, this span can be set at the boundaries of a sentence. Many corpus linguists also set a minimum frequency threshold for words to count as collocates. In the case studies featured in Chapters 3 and 4 of this book, the main focus is placed on the adjectives and nouns that immediately precede or follow the words under investigation: RASIM terms. For example, the word jobless was identified as an adjective that often directly precedes the node word migrant and, in this instance, would be referred to as its direct left-hand collocate. The noun community immediately follows the node word immigrant, making it its right-hand collocate (based on raw frequencies of co-occurrence). In order to take into account more general associations and thus constructed representations of these node words, the span is often expanded to one sentence. The second criterion, frequency of use, identifies the typicality or salience of a co-occurrence and will be one of the leading criteria used to identify the salient collocates of RASIM. Exclusivity is a criterion that cannot be applied to every collocation since the relationship between words is rarely exclusive. In addition to the aforementioned criteria, Gries (2013) discusses three other criteria: (4) directionality, (5) dispersion, and (6) type-token distribution among collocates. Directionality refers to the strength of the attraction between two words, which Gries proves is rarely symmetrical. For instance, word A has a stronger relationship with word B than word B with word A because word B co-occurs with other words more often than word A does. Dispersion describes the general distribution of the node and its collocates in the corpus (cf. Gries 2008). Some collocations can therefore be evenly distributed across the corpus, while others might occur in a specific type of text that appears in a corpus. The type-token distribution takes into account not only the strength of a given collocational relationship, but also the level of competition or the slot(s) around the node word from other collocate types. Brezina et al. (2015: 141) also identify a seventh criterion: (7) the connectivity between individual collocates. Brezina et al. (ibid.) argue that collocates of words “do not occur in isolation, but are part of a complex network of semantic relationships which ultimately reveals their meaning and the semantic structure of a
46
Corpus-based discourse analysis
text or corpus”. It is therefore important to stress the point that collocates, like individual words, do not function in isolation; instead, they form complex systems of semantic relationships that help to reveal the typical contextual domains in which words and their frequent collocates are used. Stubbs (1996: 172) also describes how collocations can have an ideological impact that may prime the reader to think of particular words in certain ways. A typical example of this phenomenon taken from migration discourse is the strong collocational relationship between illegal and immigrant, which often leads readers and listeners to think of illegality, even if they encounter the word immigrant on its own. This point is especially interesting in light of the analysis presented in this book; the collocations illegal immigrant(s) and illegal migrant(s) will be analysed in Chapters 3 and 4. Collostructional analysis: distinctive collexeme analysis Access to corpora allows linguists and discourse analysts to trace repetitive patterns of language use. However, in order to identify whether a particular collocate is salient, or reveal subtle differences between seemingly synonymous words or constructions, researchers have to resort to statistical methods. Corpus linguistic methods generally rely heavily on frequency of word or construction occurrence, as well as the frequency of the co-occurrence of words and constructions. Collostructional analysis is a family of statistical methods developed jointly by Anatol Stefanowitsch and Stefan Gries that is designed to measure the degree of attraction or repulsion that words exhibit in relation to various constructions. At present, collostructional analysis comprises three different methods: simple collexeme analysis (Stefanowitsch & Gries 2003), distinctive collexeme analysis (Gries & Stefanowitsch 2004), and covarying collexeme analysis (Stefanowitsch & Gries 2005). Stefanowitsch (2006: 257–258) explains the basic principle of collostructional analysis, stating that it was developed against the backdrop of a psychological interpretation of associations between words and constructions. In the case of simple collexeme analysis, such associations are produced by comparing the frequencies of each of the words that occur in a given construction with their frequencies in the corpus as a whole. The result of this analysis provides information on which words are typical for a given construction (Stefanowitsch 2006: 258). In the case of distinctive collexeme analysis, “such associations are arrived at by comparing the frequency of all words occurring in a given construction with their frequencies in other, comparable constructions” (Stefanowitsch 2006: 258). It is important to mention that distinctive collexeme analysis “is specifically geared to investigating pairs of semantically similar grammatical constructions and the lexemes that occur in them” (Gries & Stefanowitsch 2004: 97). In investigating the collocates of RASIM, distinctive collexeme analysis was applied to the direct collocates of the word pairs migrant–immigrant and refugee– asylum seeker, which are treated as near-synonyms by modern English dictionaries (e.g. Longman Dictionary of Contemporary English). This analysis helps to ascertain which collocates/collexemes exhibit a strong preference for migrant as opposed to immigrant, or refugee as opposed to asylum seeker. Regarding terminology, it is essential to point out that left- and right-hand collocations are seen
Corpus-based discourse analysis
47
as constructions, hence the use of the term collostructional analysis. For example, left-hand collocations of the word migrant follow the pattern adjective + migrant or noun phrase + migrant. Similarly, right-hand collocations of the same word follow the structure migrant + noun phrase. One of the main advantages of distinctive collexeme analysis is that “it can reveal subtle differences between seemingly synonymous constructions, many of which are difficult to identify on the basis of more traditional approaches” (ibid.). It should be mentioned that a distinctive collexeme analysis can only be applied to functionally and structurally similar constructions in synchronic data (Stefanowitsch 2006: 260). In the case of the studies performed in Chapters 3 and 4, the analysis was applied to the lefthand (e.g., European migrant, illegal immigrant) and right-hand (e.g., migrant worker, immigrant child) collocations that appeared the most frequently in the two newspaper corpora analysed. The statistical analysis measured the strength of these collocations and thus identified if a particular collocate or collexeme was more salient for the word migrant or immigrant, or for refugee or asylum seeker. Collostructional strength was measured with the help of the log-likelihood test, which compares the observed and expected values for two datasets (cf. Oakes 1998: 42). Overall, this kind of analysis allows us to identify the distributional differences between the members of each seemingly synonymous word pairing, as well as to identify broader contextual domains that are typical for the words under investigation. Stefanowitsch & Gries (2003: 237; emphasis added) argue: Since collostructional analysis goes beyond raw frequencies of occurrence, it identifies not only the expressions which are frequent in particular constructions’ slots; rather, it computes the degree of association between the collexeme and the collostruction, determining what in psychological research has become known as one of the strongest determinants of prototype formation, namely cue validity, in this case, of a particular collexeme for a particular construction. That is, collostructional analysis provides the analyst with those expressions which are highly characteristic of the construction’s semantics and which, therefore, are also relevant to the learner. This means that a distinctive collexeme analysis will provide a list of right- and left-hand collocates that are highly characteristic of the semantics of the words under investigation, namely the word pairs migrant–immigrant and refugee– asylum seeker.
2.5 Advantages of combining critical discourse analysis and corpus linguistics The final part of this chapter sheds light on the advantages of combining corpus linguistics with discourse analysis by referring to the work that has been accomplished in recent decades, as well as by explaining how this methodological synergy can be particularly useful for the analysis of migration discourse. The combination of corpora and discourse methods has become increasingly popular in the past 20 or so years; a growing number of studies and publications
48
Corpus-based discourse analysis
(Baker 2006; Partington et al. 2013; Taylor & Marchi 2018) have strived to build a bridge between the two different fields and reflect on what this methodological combination has achieved. The mixed-method approach includes “a diverse range of kin approaches that go under different names” (Marchi & Taylor 2018: 5). These kin approaches are known under a number of labels: corpus-assisted discourse studies (CADS) (e.g. Partington 2004), corpus-based critical discourse studies (CDA) (e.g. Baker et al. 2008), corpus-driven discourse studies, and corpora and discourse studies (e.g. Baker & McEnery 2015). Language corpora have also been successfully employed in the areas of stylistics, pragmatics, and sociolinguistics (e.g. Baker 2010). What unites these diverse methodological combinations is a genuine desire to bridge a gap between qualitative and quantitative methods of analysis in order to provide a broader picture of how language functions in various socio-political contexts and how multiple discourses are constructed. However, it is essential to point out that discourse analysis and corpus methods are not fully synonymous with qualitative and quantitative methods, respectively. Discourse analysis cannot simply be reduced to the close reading of pre-selected texts; as a linguistic discipline, it employs a set of quantifiable techniques that enables the interpretation and categorisation of language data (Marchi & Taylor 2018: 2). Likewise, corpus linguistics is not a purely quantitative approach; even the most basic type of analysis – concordance analysis – requires a thorough reading of expanded lines in order to identify recurrent patterns of use, which implies a qualitative dimension at each stage of the analysis. It can thus be concluded that both approaches employ techniques that are broadly referred to as qualitative and quantitative methods; corpora and discourse studies are, therefore, complementary. In general, empirical evidence provided by large text corpora enables linguists to achieve “greater soundness and greater breadth” (Marchi & Taylor 2018: 5) of research, while reducing cognitive bias, conscious or subconscious (cf. Baker 2006: 11), by employing corpora in discourse analysis. Cognitive bias includes (1) confirmation bias: researchers tend to search for patterns that confirm their hypotheses rather than those that contradict them; and (2) the primacy effect: people usually focus more on information they encounter at the beginning of an activity, for instance, during a close reading. In addition, classic discourse analysis of selected texts has often been criticised for cherry-picking language material that supports the proposed argument while ignoring opposing points of view. By analysing a corpus, linguists are able to trace the overall recurrent linguistic patterns and discursive trends in the data. In particular, keyword analysis, collocational analysis, and a close reading of concordances provide insight into recurrent patterns that cannot always be identified by close reading of a restricted number of selected texts. Corpora help us to identify not only reoccurring patterns, but also reveal rare cases of use that are less likely to be found using smaller-scale studies. Corpus analysis thus makes researchers less selective and should allow greater distance between the observer and the data (Partington 2006: 268). The analysis of large amounts of data should, in turn, reduce the researcher’s bias, at least to some extent. It is true, however, that each and every corpus is a collection of selected texts or even parts of texts; nonetheless, the criteria that regulate corpus
Corpus-based discourse analysis
49
design and text selection aim to make a corpus as balanced and as representative of a language or a genre as possible, thus reducing the aforementioned biases. In any case, it is essential to remember that while corpora and computer software do not necessarily make research methods more objective, they can provide “greater accuracy and accountability” (Marchi & Taylor 2018: 7), as well as allow studies to be easily replicated, thus making research methods more transparent. In order to adequately engage with the topic under investigation in a study, it is necessary to combine both discourse analytical and corpus linguistic methods. Combining the two makes it possible to overcome the limitations of each individual approach by combining “the quantitative rigor of corpus linguistics with the social perspective of qualitative approaches to discourse analysis” (Marchi & Taylor 2018: 4; emphasis added). A thorough discourse analysis requires insight into various levels of discourse structure, as well as in-depth knowledge of the current social and political climate. When a researcher decides to focus on a pertinent socio-political issue, such as immigration, they will necessarily have to consider a broad variety of factors that contribute to the construction of modern migration discourse. For example, when it comes to research regarding the modern British press, it is important for the researcher to understand the social and economic context in which the texts were produced, in particular the type of a newspaper (broadsheet, tabloid), its political stance, date of publication, and article type (report, editorial, opinion piece), as well as whether the position presented is an official statement from the government, the opposition, or the opinion of the editorial staff. The current political situation (for example, campaigns before a general election or referendum) should also be taken into consideration more broadly in order to understand why some social actors get more attention and media coverage, whereas others are excluded from discourse; why some actors are presented as powerful, while others are voiceless. For example, in the pre-referendum data (2013–2014), European and EU migrants were the focus of media attention, whereas refugees and asylum seekers were barely mentioned in the right-wing press (refer to Chapters 3 and 4). In addition, a diachronic perspective can contribute to a better understanding of how modern discourses evolve and change, and why counter-discourses sometimes become dominant and dominant discourses lose their power. Corpus linguistic methods help to describe to what extent the identified trends are generalisable. This methodological synergy thus brings together “social relevance and statistical relevance” (Marchi & Taylor 2018: 4). Researchers can expand their analysis into the different domains that will be illustrated in the upcoming chapters. For example, a researcher can test an existing hypothesis by searching a corpus for evidence that originates from a prior study; this can be defined as a corpus-based approach. Alternatively, a researcher could employ a bottom-up or corpus-driven approach and encounter new patterns while looking through a large amount of data. Both approaches are combined in the research outlined in Chapters 3 and 4. Yet another alternative would involve a researcher conducting a close reading of a dozen texts in order to identify interesting patterns and constructions, and then checking the salience of these linguistic trends against a larger set of data. A corpus-assisted approach will be presented
50
Corpus-based discourse analysis
in Chapter 5 that provides a more traditional, in-depth discourse analysis of two newspaper articles, but refers to the general corpus (the BNC1994) in order to (1) identify the semantics of the selected patterns more accurately, and (2) show the historic and cultural significance of the metaphorical motif of the English garden employed in the analysed newspaper articles. In order to increase the scope of an analysis of a discourse, a researcher must combine the following three dimensions in their approach: analysis of recurrent patterns and trends, detailed analysis of the broader contexts in which these patterns and trends occur, and analysis of the social context in which the discourse has been constructed. The research presented in this book aims to go beyond a classic descriptive approach in order to interpret the analysed data and position it within the broader socio-political context of contemporary Britain. Such an approach would explain the current trends in the representation of migration, while focusing on the issue of media polarisation and bias. In Chapters 3 to 6, the combined-methods approach is systematically applied in order to demonstrate that corpus linguistic methods and discourse-analytical methods are complementary, not mutually exclusive, and that it is only in combination that they can contribute to a fuller understanding of the ideological processes at work in the media in contemporary Britain. A number of studies also apply methods stemming from cognitive linguistics, in particular Conceptual Metaphor Theory (CMT) and using a critical discourse analytical approach (cf. Charteris-Black 2004). Each of these approaches to the study of language offers a number of analytical tools for understanding the evaluative nature of meaning in texts. When combined, they provide a solid framework for interpreting ideological bias on both a structural and a semantic level.
References Aston, Guy & Lou Burnard. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press. Austin, John. 1962. How to Do Things with Words. Cambridge, MA: Harvard University Press. Baker, Mona, Gill Francis & Elena Tognini-Bonelli (eds.). 1993. Text and Technology: In Honour of John Sinclair. Philadelphia & Amsterdam: John Benjamins. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London & New York: Continuum. Baker, Paul. 2010. Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press. Baker, Paul. 2017. American and British English: Divided by a Common Language? Cambridge: Cambridge University Press. Baker, Paul, Costas Gabrielatos, Majid Khosravinik, Michal Krzyzanowski, Tony McEnery & Ruth Wodak. 2008. A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press. Discourse and Society, 19 (3), 273–306. Baker, Paul & Tony McEnery (eds.). 2015. Corpora and Discourse: Integrating Discourse and Corpora. London: Palgrave Macmillan. Biber, Douglas. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing, 8 (4), 243–257. Bloomfield, Leonard. 1933. Language. Chicago: University of Chicago Press.
Corpus-based discourse analysis
51
Brezina, Vaclav, Tony McEnery & Stephen Wattam. 2015. Collocations in Context: A New Perspective on Collocation Networks. International Journal of Corpus Linguistics, 20 (2), 139–173. Butterfield, Jeremy. 2008. Damp Squid: The English Language Laid Bare. Oxford: Oxford University Press. Charteris-Black, Jonathan. 2004. Corpus Approaches to Critical Metaphor Analysis. Basingstoke: Palgrave Macmillan. Chomsky, Noam. 1957. Syntactic Structures. The Hague & Paris: Mouton. Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: The MIT Press. Davies, Mark. 2008. The Corpus of Contemporary American English: 450 Million Words, 1990–2012. Available at: corpus.byu.edu/coca. Davies, Mark. 2010. The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English. Literary and Linguistic Computing, 25 (4), 447–464. de Saussure, Ferdinand. (1916) 1959. Course in General Linguistics. Charles Bally (ed.). New York: Philosophical Library. Firth, John Rupert. 1957. Papers in Linguistics 1934–1951. London: Oxford University Press. Firth, John Rupert. 1968. Selected Papers of J. R. Firth: 1952–59. Frank Palmer (ed.). London: Longman. Francis, Gill. 1991. Nominal Group Heads and Clause Structure. Word, 42 (2), 144–156. Francis, Nelson & Henry Kučera. 1964. A Standard Corpus of Present-Day Edited American English. Providence: Brown University. Gries, Stefan Thomas. 2008. Dispersions and Adjusted Frequencies in Corpora. International Journal of Corpus Linguistics, 13 (4), 403–437. Gries, Stefan Thomas. 2013. 50-something Years of Work on Collocations: What Is or Should Be Next. International Journal of Corpus Linguistics, 18 (1), 137–166. Gries, Stefan Thomas & Anatol Stefanowitsch. 2004. Extending Collostructional Analysis: A Corpus-Based Perspective on ‘Alternations’. International Journal of Corpus Linguistics, 9 (1), 97–129. Halliday, Michael Alexander Kirkwood. 1966. Lexis as a Linguistic Level. In C. E. Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins (eds.) In Memory of J. R. Firth. London: Longmans, 148–162. Halliday, Michael Alexander Kirkwood. 1994. An Introduction to Functional Grammar. 2nd ed. London: Edward Arnold. Hardie, Andrew & Tony McEnery. 2010. On Two Traditions in Corpus Linguistics, and What They Have in Common. International Journal of Corpus Linguistics, 15 (3), 384–394. Hunston, Susan. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Leech, Geoffrey. 2013. Introducing Corpus Annotation. In Geoffrey Leech, Roger Garside & Tony McEnery (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. 2nd ed. London & New York: Routledge. Lew, Robert. 2009. The Web as Corpus Versus Traditional Corpora: Their Relative Utility for Linguists and Language Learners. In Paul Baker (ed.) Contemporary Corpus Linguistics. London & New York: Continuum, 289–300. Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and Building a Spoken Corpus of Everyday Conversations. International Journal of Corpus Linguistics, 22 (3), 319–344. Marchi, Anna & Charlotte Taylor. 2018. Introduction: Partiality and Reflexivity. In Charlotte Taylor & Anna Marchi (eds.) Corpus Approaches to Discourse: A Critical Review. Oxon & New York: Routledge.
52
Corpus-based discourse analysis
McEnery, Tony & Costas Gabrielatos. 2006. English Corpus Linguistics. In Bas Aarts & April McMahon (eds.) The Handbook of English Linguistics. Oxford: Blackwell, 33–71. McEnery, Tony & Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. McEnery, Tony & Andrew Wilson. 2001. Corpus Linguistics. 2nd ed. Edinburgh: Edinburgh University Press. McIntosh, Angus. 1961. Patterns and Ranges. Language, 37 (3), 325–37. Oakes, Michael. 1998. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press. Partington, Alan. 2004. Corpora and Discourse, a Most Congruous Beast. In Alan Partington, John Morley & Louann Haarman (eds.) Corpora and Discourse. Bern: Peter Lang, 11–20. Partington, Alan. 2006. Metaphors, Motifs and Similes Across Discourse Types: CorpusAssisted Discourse Studies (CADS) at Work. In Anatol Stefanowitsch & Stefan Thomas Gries (eds.) Corpus-Based Approaches to Metaphor and Metonymy. Berlin & New York: Mouton de Gruyter, 267–304. Partington, Alan, Alison Duguid & Charlotte Taylor. 2013. Patterns and Meanings in Discourse: Theory and Practice in Corpus-Assisted Discourse Studies (CADS). Amsterdam & Philadelphia: John Benjamins. Schmid, Helmut. 1994. Probabilistic Part-of-speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing. Manchester, UK. Searle, John. 1969. Speech Acts. Cambridge: Cambridge University Press. Sinclair, John. 1966. Beginning the Study of Lexis. In C. E. Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins (eds.) In Memory of J. R. Firth. London: Longmans, 410–430. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, John. 2003. Reading Concordances. London: Longman. Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge. Stefanowitsch, Anatol. 2006. Distinctive Collexeme Analysis and Diachrony: A Comment. Corpus Linguistics and Linguistic Theory, 2 (2), 257–262. Stefanowitsch, Anatol & Stefan Thomas Gries. 2003. Collostructions: Investigating the Interaction Between Words and Constructions. International Journal of Corpus Linguistics, 8 (2), 209–243. Stefanowitsch, Anatol & Stefan Thomas Gries. 2005. Covarying Collexemes. Corpus Linguistics and Linguistic Theory, 1 (1), 1–46. Stubbs, Michael. 1993. British Traditions in Text Analysis – From Firth to Sinclair. In Mona Baker, Gill Francis & Elena Tognini-Bonelli (eds.) Text and Technology: In Honour of John Sinclair. Amsterdam & Philadelphia: John Benjamins, 1–37. Stubbs, Michael. 1996. Text and Corpus Analysis. London: Blackwell. Taylor, Charlotte. 2013. Searching for Similarity Using Corpus-Assisted Discourse Studies. Corpora, 8 (1), 81–113. Taylor, Charlotte. 2018. Similarity. In Charlotte Taylor & Anna Marchi (eds.) Corpus Approaches to Discourse: A Critical Review. Oxon & New York: Routledge. Taylor, Charlotte & Anna Marchi (eds.). 2018. Corpus Approaches to Discourse: A Critical Review. Oxon & New York: Routledge. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.
3
Discursive representations of refugees, asylum seekers, immigrants, and migrants (RASIM) prior to the EU referendum
3.1 Introducing RASIM in media discourse Immigration is a highly topical socio-political issue that receives a lot of attention in the contemporary British and European media. The high frequency of words like refugee(s), asylum seeker(s), immigrant(s), and migrant(s) – henceforth referred to as RASIM – in the socio-political context of the contemporary British media makes this chapter particularly relevant. The term RASIM was introduced by Baker (2007) and later used in various corpus-based studies (Gabrielatos & Baker 2008; Baker et al. 2008; Taylor 2014; Islentyeva 2018). A great number of studies have been conducted on media discourse in general and on the ways in which RASIM are represented, particularly in the British media (van Dijk 1991; Baker & McEnery 2005; Baker 2006; Hart 2010; Musolff 2015). As one of the least powerful groups in society, RASIM do not often have the opportunity to construct their own identities in the media. In most cases, their identities are constructed by more powerful groups who possess direct access to the media, such as editors and journalists, as well as politicians and other public figures who are cited in the press (van Dijk 1991; Baker 2006: 74). Teun van Dijk (1996: 91–94) asserts that minority groups such as migrants and refugees are frequently discussed in the media, but that these groups have little control over their own representations in socio-political discourse. Fowler (1991: 94) goes as far as to argue that “groups” such as young married women, immigrants, and teachers are imaginary; in a sense, they are socially constructed concepts, “almost as fictitious as trolls at bridges and princesses in towers”. Media discourse has tremendous power in terms of its capacity to construct the identities of these groups. Fowler (1991: 94) states that language not only provides names for different categories, but also defines the relationships that exist between those categories. Vocabulary is taxonomically organised; the sense relations between words classify experience appropriate to the ideology of a community of the discourse. For example, vocabulary classifies immigrants as a special and deviant group, just by providing a word for them; there is, however, no distinct term that is commonly used to describe the opposite phenomenon – that of being a “normal” citizen. Fowler also stresses that it is not only lexis that contributes to the reproduction of discrimination in discourse, but that syntactic forms, whole text structures, and
54
RASIM prior to the EU referendum
metaphorical patterns are also central to this process. All of these elements will be thoroughly analysed in the following chapters of this book. This chapter focuses on the key – although voiceless – social actors within migration discourse. By employing a range of corpus linguistic methods, we will analyse the linguistic differences and similarities in the representation of RASIM by left- and right-wing broadsheets and tabloids in 2013–2014, on the threshold of the 2015 general election and the 2016 EU membership referendum. The Pre-Referendum Corpus (Islentyeva & Stefanowitsch 2019) is a specialised monogeneric corpus containing 500 articles from five British newspapers that have a nationwide circulation. For further information on the newspaper corpora employed in this book, refer to Chapter 1 (Section 1.3) and Chapter 2 (Section 2.3). While all four RASIM terms are semantically related, there are subtle differences in their meanings, as well as in the discursive representation of these groups in different types of British newspapers. In order to trace these differences (and similarities) in the representation of RASIM, a multi-level linguistic analysis was applied involving several steps and both qualitative and quantitative data analysis. Section 3.2 outlines the quantitative differences between the uses of RASIM in different types of newspapers and examines to what extent these differences are statistically significant. Sections 3.3 and 3.5 focus on the semantic differences between words that tend to be treated as near synonyms: migrant and immigrant; refugee and asylum seeker. To this end, distinctive collexeme analysis (Gries & Stefanowitsch 2004) is applied to the left- and right-hand collocates of RASIM terms in order to identify which collocates exhibit a strong preference for migrant as opposed to immigrant, or for refugee as opposed to asylum seeker, thus defining the semantics of these words. In a further step, a broader contextual analysis of newspaper articles from the Pre-Referendum Corpus identifies the salient features of RASIM terms, the contextual domains in which these words occur, and the corresponding discourses that are typically constructed around them. The result of the semantic analysis is presented as two dictionary entries in Section 3.3. Next, Section 3.4 explores the ideological differences that inform how EU migrants are represented by the different newspapers that make up the Pre-Referendum Corpus (2013–2014). My overall approach consists of two methodological strands. One involves a corpus-based analysis that uses the computer software cqp@fu (Flach 2015) to investigate recurrent patterns and identify salient collocates and typical textual domains. The other represents a more traditional critical analysis of selected newspaper passages, which allows identified patterns to be traced in a broader context.
3.2 Quantitative analysis: frequency analysis and quantification techniques This chapter focuses on the media’s representation of the key social actors within immigration discourse. In order to investigate how the identities of individuals
RASIM prior to the EU referendum 55 coming to the UK from abroad are constructed in the British press, the first step of the analysis involved using the computer software cqp@fu (Flach 2015) to search the Pre-Referendum Corpus for all occurrences of the following lemmas:1 REFUGEE, ASYLUM SEEKER, IMMIGRANT, and MIGRANT. Table 3.1 provides the raw frequencies of the target words, as well as the number of tokens in each subcorpus of the Pre-Referendum Corpus, while Table 3.2 provides the normalised frequencies of these words per 10,000 tokens. As Table 3.1 illustrates, out of all of the RASIM terms in all of the subcorpora, the word migrant(s) occurs the most frequently, with 1,214 occurrences; immigrant(s) is the second-most frequent (621 occurrences); asylum(-)seeker(s) is the least frequent (only 69 occurrences); and refugee(s) is also infrequent in the Pre-Referendum Corpus (164 occurrences). Let us now take a closer look at the target lemmas and their distribution in the different subcorpora of the PreReferendum Corpus. When we look at the normalised frequencies in Table 3.2, we notice that migrant(s) (45.22 occurrences per 10,000 tokens) and immigrant(s) (28.72 occurrences per 10,000 tokens) are used most frequently in the right-wing tabloid The Sun. Additionally, a chi-square test of independence shows that there is a significant difference between the right-wing tabloids The Sun and The Mail and the left-wing tabloid The Mirror. The use of the word migrant(s) is significantly more frequent in the right-wing press (χ2=14.47, df = 1, p