Variational Text Linguistics: Revisiting Register in English 9783110443554, 9783110443103

Owing to the ever-increasing possibilities of communication, especially with the advent of modern communication technolo

209 78 2MB

English Pages 347 [348] Year 2016

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgements
Table of contents
Introduction: Current trends in register research
Section I: Specialised registers
Towards a user-based taxonomy of web registers
The interrelationship of register and genre in medical discourse
Aviation English: Two distinct specialised registers?
‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-linguistic perspective
The register of English crossword puzzles: Studies in intertextuality
Section II: Cross-register comparison
Punctuation as an indication of register: Comics and academic texts
Linking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry
Cohesive devices across registers and varieties: The role of medium in English
Section III: Regional, contrastive and diachronic register variation
Metaphors in New English academic writing
The influence of register on noun phrase complexity in varieties of English
Real-time online text commentaries: A cross-cultural perspective
Word order is in order here: A diachronic register analysis of syntactic markedness in English
Index
Recommend Papers

Variational Text Linguistics: Revisiting Register in English
 9783110443554, 9783110443103

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Christoph Schubert and Christina Sanchez-Stockhammer (Eds.) Variational Text Linguistics



Topics in English Linguistics

Editors Elizabeth Closs Traugott Bernd Kortmann

Volume 90



Variational Text Linguistics Revisiting Register in English Edited by Christoph Schubert Christina Sanchez-Stockhammer



ISBN 978-3-11-044310-3 e-ISBN (PDF) 978-3-11-044355-4 e-ISBN (EPUB) 978-3-11-043533-7 ISSN 1434-3452 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2016 Walter de Gruyter GmbH, Berlin/Boston Cover image: Brian Stablyk/Photographer’s Choice RF/Getty Images Typesetting: fidus Publikations-Service GmbH, Nördlingen Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com



Acknowledgements The foundations for this edited collection of articles were laid at the international conference Register revisited: New perspectives on functional text variety in English, which took place at the University of Vechta, Germany, from June 27 to 29, 2013. The aim of the present volume is to conserve the research papers and many inspiring discussions which were stimulated then and to make them available to a larger audience. It was only possible to achieve this aim thanks to the help of many people joining us in the effort. First and foremost, we would like to thank all contributors for their continued cooperation in this project. Furthermore, we are very grateful to the external peer reviewers who contributed their expertise to the selection and improvement of the contributions. These are (in alphabetical order): ­Federica Barbieri (Swansea, Wales), Eniko Csomay (San Diego, USA) Jürgen Esser (Bonn, Germany), Maria Freddi (Pavia, Italy), Christer Geisler (Uppsala, Sweden), Bethany Gray (Ames, Iowa, USA), Joachim Grzega (Eichstätt, Germany), Thomas Kohnen (Cologne, Germany), Rocío Montoro (Granada, Spain), Neal Norrick (Saarbrücken, Germany), Caroline Tagg (Birmingham, UK), Sanna-Kaisa ­Tanskanen (Helsinki, Finland) and Marija Zlatnar Moe (Ljubljana, Slovenia). We are very happy that this volume appears in the series Topics in English Linguistics (TiEL) and would like to thank the series editors Elizabeth Traugott and Bernd Kortmann as well as Wolfgang Konwitschny, Julie Miess and Birgit Sievert at de Gruyter Mouton for their invaluable support in the preparation of this book. Needless to say that we are to blame for any remaining inadequacies. Going back to the roots of this project, we would like to express our gratitude to the German Research Foundation/Deutsche Forschungsgemeinschaft (DFG) for the generous funding of the conference as well as to the Kommission für Forschung und Nachwuchsförderung der Universität Vechta, the Universitätsgesellschaft Vechta (UGV), the Volksbank Vechta and the city of Vechta for their financial support and hospitality, which contributed immensely to the memorable pleasant atmosphere of the event. Christoph Schubert and Christina Sanchez-Stockhammer April 2016

Table of contents Acknowledgements 

 v

Christoph Schubert Introduction: Current trends in register research 

 1

Section I: Specialised registers Douglas Biber and Jesse Egbert Towards a user-based taxonomy of web registers 

 19

Heidrun Dorgeloh The interrelationship of register and genre in medical discourse  Markus Bieswanger Aviation English: Two distinct specialised registers? 

 43

 67

Rolf Kreyer ‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpuslinguistic perspective   87 Teresa Pham The register of English crossword puzzles: Studies in intertextuality 

 111

Section II: Cross-register comparison Christina Sanchez-Stockhammer Punctuation as an indication of register: Comics and academic texts 

 139

Martina Lampert Linking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry   169 Stella Neumann and Jennifer Fest Cohesive devices across registers and varieties: The role of medium in English   195

viii 

 Table of contents

Section III: Regional, contrastive and diachronic register variation Barbara Güldenring Metaphors in New English academic writing 

 223

Steffen Schaub The influence of register on noun phrase complexity in varieties of English   251 Valentin Werner Real-time online text commentaries: A cross-cultural perspective 

 271

Javier Pérez-Guerra Word order is in order here: A diachronic register analysis of syntactic markedness in English   307 Index 

 337

Christoph Schubert

Introduction: Current trends in register research 1 Research interest and goals of the volume The discipline of text linguistics is firmly established as “any work in language science devoted to the text as the primary object of inquiry” (de Beaugrande and Dressler 1981: 14). Although there is a variety of theories and approaches in text linguistics, common research issues are the definition of “text” in old and new media, the formal and functional connections between sentences, typological classifications of texts and processes in the production and comprehension of texts (cf. Esser 2009: 20–21 and Schubert 2012: 29). As the new discipline of “variational pragmatics”, which investigates contextual language use across regional varieties of English, has been established in recent years (cf. Schneider and Barron 2008), the present volume aims to foster and further develop the discipline of “variational text linguistics”. Since this new field of research covers both functional and regional types of textual variation, it intends to provide novel insights into the multi-faceted concept of “register”. Along the lines of Biber and Conrad’s monograph Register, Genre, and Style (2009: 6), we regard “register analysis” as a perspective on text variety which investigates context-dependent communicative functions of characteristic lexico-grammatical features in discourse. Thus, quantitative results based on adequate corpora are here combined with qualitative assessment. We approach the subject of “register” from a wide perspective, incorporating stylistics, variational linguistics and discourse analysis, so that convergences and synergistic effects between disciplines become obvious. In recent years, other volumes dedicated to textual variety have placed emphasis on different research foci, which may be illustrated by three examples: the essay collection by Dorgeloh and Wanner (2010) is interested in textual variety in English exclusively from the perspective of syntactic parameters and it investigates genre rather than register. In the volume by Andersen and Bech (2013), genre variation is only one parameter next to diachronic variation in time and geographical variation in space. Moreover, the three types of variation are largely discussed separately, and the editors’ main interest lies in corpus development

Christoph Schubert, University of Vechta

2 

 Christoph Schubert

and analysis. The book by Szmrecsanyi and Wälchli (2014) does not only discuss register and dialectology but also includes language typology and therefore comprises articles on a number of languages such as Dutch or members of the Slavic family. Yet, they also formulate the central diagnosis that “[e]ven though dialectologists, register analysts, typologists, and quantitative linguists all deal with linguistic variation, there is astonishingly little interaction across these fields” (Wälchli and Szmrecsanyi 2014: 1). In general, register analysis offers a constantly widening range of research opportunities because of the ever-increasing possibilities of communication, mainly triggered by the advent of modern communication technologies. As the main body of linguistic research has concentrated on well-established and frequent registers such as newspaper writing or face-to-face conversations, many descriptive and theoretical issues have not yet been sufficiently investigated. Accordingly, the report on major register studies in Biber and Conrad (cf. 2009: 271–295) reveals that research on specialized registers has had a clear preference for academic and newspaper texts. In particular, the language of popular genres such as pop music, comics or puzzles has hardly been investigated so far, and there are several forms of electronic communication, such as online text commentaries, which need to be described more closely. Hence, by giving room to the description of registers which have not received an appropriate amount of attention so far, we intend to point out emerging trends as well as new directions for future research. By means of cross-cultural comparisons of registers, the volume aims to build bridges to neighbouring disciplines such as cultural studies, especially with regard to intercultural communication. By pointing out the ubiquitous nature of register, we also intend to show that adequate register choice is not a marginal phenomenon but a fundamental prerequisite for successful communication in specific social situations.

2 Definitions of “register” As far as the semantic origin of the term “register” is concerned, the linguistic use of the term represents a metaphorical borrowing from the domain of music, in particular organ playing (cf. Renkema 2004: 146), where it refers to a “sliding device controlling a set of organ-pipes which share a tonal quality” or “the compass of a voice or musical instrument; a particular range of this compass” (Trumble and Stevenson 2002: 2514), so that it is common to speak of “the upper/ middle/lower register” (Summers et al. 2005: 1380) of a specific instrument. Hence, in this analogy, “[l]anguage is seen to be regulated in the same way as the



Introduction: Current trends in register research 

 3

musical tuning of an organ” (cf. Dittmar 2010: 223), and competent speakers of a language have the ability to fine-tune their linguistic choices according to their intended contextual functions. As regards the semantic extension of the term register, it is worthwhile to consider different subdisciplines of linguistics in more detail (cf. Gut and Schubert 2012: 4–6). Thus, it is striking that sociolinguistic approaches usually employ a narrow definition of the term, reducing it to the language of occupations, such as “the register of law”, “the register of medicine” and the like. Since the topic of discourse is the central determining factor in this type of approach, it is mainly the vocabulary that is responsible for the constitution of a register. The following two quotations taken from standard introductions to sociolinguistics aptly demonstrate this narrow notion of “register”. Linguistic varieties that are linked […] to particular occupations or topics can be termed registers. […] Registers are usually characterized entirely, or almost so, by vocabulary differences. (Trudgill 2000: 81) Register is another complicating factor in any study of language varieties. Registers are sets of language items associated with discrete occupational or social groups. Surgeons, airline pilots, bank managers, sales clerks, jazz fans, and pimps employ different registers. (Wardhaugh 2002: 51)

It is obvious that subject matters connected to certain types of activity are responsible for the linguistic choices made by discourse participants in this type of approach to “register”. Although the second quotation includes the term “social groups”, this is conceptualized in a narrow way, excluding the language of social classes in the sense of working- or middle-class sociolects. In contrast to this narrow notion of “register”, a wide definition of the term is employed by the tradition of Systemic Functional Linguistics (SFL), as can be seen in the next two definitions taken from a classic introduction to cohesion and a recent study on register variation. The linguistic features which are typically associated with a configuration of situational features – with particular values of the field, mode and tenor – constitute a register. (Halliday and Hasan 1976: 22, emphasis original) Just as situations tend to recur and thus form types, registers represent recurring ways of using language in a given situation. […] Registers can thus be described as sub-systems of the language system or, when viewed from below, as types of instantiated texts reflecting a similar situation. (Neumann 2013: 16)

As is the case in the influential monograph by Halliday (1978), “registers” are here seen as functional varieties, corresponding to use in specific contexts, while

4 

 Christoph Schubert

“dialects” are defined as varieties based on the respective user, who has a certain social or regional background that surfaces in linguistic behaviour. The fact that registers can be rightfully viewed as “sub-systems” of a given language underlines their formative and constitutive character in a language. As for the three situational features determining register choices, “field” refers to the subject matter under discussion, “tenor” pertains to the relationship between the participants in a given context and “mode” characterizes the medium of transmission (cf. also Bex 1996: 94–110 and Matthiessen 1993: 236–238). This wide notion of “register” is also adopted by the currently prevailing approach of Multidimensional Analysis (MDA) à la Douglas Biber (e.g. Biber 1988, 1995, 2006, 2007; Gray 2013: 363–366), which relies on corpus-derived co-occurrences of lexico-grammatical features that serve equivalent functions in discourse. Despite the enhanced methodology, the definition is relatively similar, since a register is regarded as “a variety associated with a particular situation of use (including particular communicative purposes)” (Biber and Conrad 2009: 6). By increasing the degree of specificity, it is possible to distinguish between “sub-registers” (Biber and Gray 2013), so that, for instance, academic writing can be subdivided into sub-registers such as social science, multi-disciplinary science and humanities. In text linguistics, the terminological differentiation between “register” and “genre” has always been a notorious issue. One possible solution to the problem is offered by Dorgeloh and Wanner (2010: 10), who suggest three main differences, although the distinction between the concepts is still seen as scalar and gradient. First, while register implies linguistic features dependent on situational contexts, genres are regarded as types of “social action” (Dorgeloh and Wanner 2010: 10) used to perform interindividual tasks. Second, register is dominantly geared towards the function of linguistic features, whereas genres rely to a large degree on “patterned practice” (Dorgeloh and Wanner 2010: 10), involving characteristic textual structures. Third, register operates at a high level of generality, while genre has a more specific and concrete character, such as “on-line medical advice” or a “corporate blog” (Giltrow 2010: 47). In fact, this more specific definition offers a niche for the term “genre” in linguistics, since recently, research on “genre” has been superseded by linguistic interest in “register” (cf. Giltrow 2010: 31). Literary criticism, by contrast, clearly maintains a preference for the concept of “genre”. An alternative approach to terminological differentiation is provided by Biber and Conrad (2009: 15–23), who regard the three terms “register”, “genre” and “style” as “different perspectives on text varieties” (2009: 15–16). The perspective of register pertains to all kinds of “frequent and pervasive” lexico-grammatical items that fulfil specific communicative functions in “a sample of text excerpts”,



Introduction: Current trends in register research 

 5

so that it can be applied to all sorts of discourse. As opposed to that, “[i]n the genre perspective, the focus is on the linguistic characteristics that are used to structure complete texts” (Biber and Conrad 2009: 16). Thus, genres rely on rather specific expressions that occur “in a particular place in the text” (2009: 16) and thus add up to a distinct rhetorical organization, which can be found in texts with a fixed structure, such as formal letters. Finally, “style” is very similar to “register” but depends on linguistic features that are “not directly functional” and “are preferred because they are aesthetically valued” (Biber and Conrad 2009: 16). That is to say that it is possible to determine the style of specific authors or periods of literary history, because these linguistic items do not correspond to particular contexts of situation but serve the poetic function of language. Conclusively, in an extension of the “music” metaphor previously mentioned in the definition of “register”, “genre” equals the specific musical piece chosen by the church organist, while “style” is the organ-player’s individual interpretation and performance of the composition.1

3 Recent developments in register research While some twenty years ago in the volume Register Analysis Robert-Alain de Beau­grande still diagnosed that “[t]hroughout much of linguistic theory and method, the concept of ‘register’ has led a rather shadowy existence” (1993: 7), research in the field has considerably gained momentum ever since. As regards recent developments in register research in English, five main strands may be distinguished. First, there are numerous studies on diachronic register variation, which cover various periods of English and usually focus on specific aspects. For instance, Alonso-Almeida (2008) discusses the Middle English medical charm with reference to register, genre and text type variables, whereas Warner (2005) investigates the variable use of do-support in different registers of Early Modern English. Moving on to Modern English, Biber and Finegan (2001) discuss variation in written and spoken registers from the 17th to the 20th centuries. Various 19th-century registers are covered by Geisler (2002) as well as by Egbert (2012). More generally, Davies (2009) examines word frequency in registers from a diachronic perspective, whereas Crespo Garcia (2004) and Taavitsainen (2001) employ a narrower focus, concentrating on the history of the scientific register.

1 The editors sincerely thank Jan Renkema for this metaphorical insight.

6 

 Christoph Schubert

Along similar lines, Biber and Gray (2013) investigate diachronic change in news reportage and academic research writing during the twentieth century. Second, there is a considerable body of research on register variation in specialized domains. The dimensions under discussion include parameters such as medium, public and private spheres as well as the discourse of certain fields of knowledge. Research on academic English is most frequent, as shown by Csomay’s (2002) analysis of lectures and Biber’s (2006) comprehensive multidimensional study of spoken and written register variation in university discourse. Fryer (2013) investigates medical research articles with regard to evaluation practices, while Schutz (2013) discusses the use of verbs in registers pertaining to business, linguistics, and medical research. Gotti (2012) argues that academic English is by no means uniform but varies according to a number of criteria, such as disciplinary conventions, expertise in the respective field, and linguistic competence of the author. A particular focus on interdisciplinary discourses is found in Teich (2009), whereas further recent studies on academic English and scientific texts respectively have been published by Bartsch (2009) and Teich (2010). In Quinto-Pozos and Mehta’s (2010) study of American Sign Language, it becomes clear that different registers are present not only in verbal but also in nonverbal communication. Concerning the parameter of medium, earlier studies on spoken and written registers have been complemented by research on computer-mediated communication (Biber 2007). As the research survey in Biber and Conrad (cf. 2009: 271–295) underlines, interest in electronic discourse has significantly increased over the last ten to fifteen years. Further studies on specialized domains comprise register shifting in US public discourse (Cole 2012), the creation of humour through incongruity in register (Venour, Ritchie and Mellish 2011), the register of news reporting in its social context (Lukin 2010), Business English (Cortés de los Ríos 2010), the evaluative language of corporate social reporting (Fuoli 2013), legal language (Battarbee 2010) and the language of linguistics (Freddi 2005). There is also some research on the use of registers in literary texts, as exemplified by Pollner’s (2005) analysis of language variation in Irvine Welsh’s novel Trainspotting. Third, a quickly developing trend brings together register research with socio­ linguistic investigations of regional variation, usually concentrating on international varieties of English, or “World Englishes”, used as a second language (ESL). Xiao (2009) provides a discussion of general issues of the study of World Englishes from the perspective of multidimensional analysis. The recent volume by Szmrecsanyi and Wälchli (2014) contains a number of papers which combine quantitative techniques in register analysis, dialectology, and language typology. For instance, the contribution by Diwersy, Evert and Neumann (2014) shows how a corpus-driven multivariate approach can be used for the study of both regis-



Introduction: Current trends in register research 

 7

ter and regional variation. Hilbert and Krug (2012) present a study on the use of progressives in spoken conversations and written press language in Maltese English, as compared to British and American English. As far as Asian varieties are concerned, there is research on registers in Singapore English (Bao and Hong 2006) and on Indian English registers (Balasubramanian 2009a), complemented by a special focus on adverbials (Balasubramanian 2009b). Regarding Africa, there is multidimensional research on various registers in East African English, pointing out, among other aspects, the presence of a greater degree of formality and an increased involvement of the addressee (Van Rooy et al. 2010). Other papers analyse expository writing in Cameroon English (Nkemleke 2006) and academic texts by African American college students (Syrquin 2006). Neumann (2012) chooses a more comprehensive approach, comparing a number of registers in the Englishes spoken in New Zealand, Hong Kong, India, Jamaica, Singapore and Canada. The ultimate goal of most of these studies is to give a complete and comprehensive account of geographical varieties by describing their internally diversified registers, thus taking sociolinguistics to the next level. Along these lines, Balasubramanian (2009a: 19) argues that “[t]o provide a thorough linguistic description of a variety […], it is important to study registers of that variety – i.e. to study the variation within the dialect” and that “[s]uch study of register was missing in the earlier methodologies of dialectology”. As has been pointed out in research on postcolonial Englishes, it is common for these new Englishes to develop use-related varieties in addition to user-related ones, which corresponds to the stage of “differentiation” in the evolutionary development of postcolonial varieties (cf. Schneider 2007: 52–55). Hence, the study of registers aptly complements sociolinguistic approaches, so that this liaison will undoubtedly prove highly fruitful in future research on linguistic variety. Fourth, contrastive register analysis investigates register variation across two or more languages and is often linked to questions of translation studies. For instance, Teich (2003) compares textual variety in English and German and thereby significantly extends the scope of Contrastive Linguistics, which used to focus mainly on relatively isolated phonological and morphosyntactic features. Neumann (2013) likewise contrasts English and German registers by including both cross-linguistic variation and variational differences between original and translated texts. One central result is that related registers in the two languages show different register features with regard to the chosen subdimensions, so that individual register studies for both languages are necessary. More specifically, the monograph by Barron (2012) compares public information messages in Irish English and German, while register shifts in translations from English into Slovene are investigated by Zlatnar Moe (2010). Focusing on the digital medium, Hardy (2012) contrasts electronic discourse in Filipino and American English.

8 

 Christoph Schubert

Fifth, from an applied linguistic perspective there are numerous publications on register and language teaching. While Painter (2001) writes on general issues of teaching genre and register and Reppen (2001) compares spoken and written registers of school-aged students and adults, many articles – quite unsurprisingly – deal with the teaching of academic English. For instance, Halliday’s Systemic-Functional Linguistics is used for the analysis of student report writing by Gardner (2012), and Gilquin (2008) as well as Moore (2006) investigate Learner Academic Writing. On the basis of similar research interests, Han (2010) discusses the teaching of English for Specific Purposes (ESP) from the perspective of register theory. Another language-pedagogical topic is addressed by Volden (2009), who concentrates on registers used by autistic children. Rühlemann (2008) examines the teaching of the informal conversational register, which is frequently neglected in EFL research. With the exception of language pedagogical approaches, all of the trends mentioned are taken up by the papers in the present volume.

4 A model for register analysis All of the contributions in this volume refer to the theoretical model of the influential textbook by Biber and Conrad (2009). The central statement underlying register analysis in this textbook names the following crucial parameters: “[t]he description of a register covers three major components: the situational context, the linguistic features, and the functional relationships between the first two components” (Biber and Conrad 2009: 6). By establishing meaningful relations between these aspects, any given register can be described on the basis of a qualitative and quantitative investigation. As far as the situational context is concerned, Biber and Conrad expand the three parameters proposed by Halliday (1978) by establishing the following seven characteristics (2009: 40–47): (1) participants: the addressor(s) as the producer(s) of texts can be defined according to number, situation in society (individual or institutional) and personal parameters (age, gender, education etc.). Addressees as the recipients of texts may also be classified according to number and the question whether they can be personally identified or not. In addition, there may be onlookers, who do not directly contribute to the verbal exchange but whose physical presence may nevertheless influence the linguistic choices made by the interlocutors. (2) Relations among participants: it is crucial to analyse whether the communication is immediately interactive, which social roles are played by the participants in terms of power, whether they have a personal relationship, and to what degree the interactants share relevant background



Introduction: Current trends in register research 

 9

knowledge. (3) Channel: the communication can be conducted in the written or spoken mode, and a particular medium may be utilized, such as telephone, radio, television or the internet. (4) Production circumstances: while spoken communication commonly takes place in real time, written or electronic discourse may be carefully planned and additionally revised. (5) Setting: in spoken interaction, the participants often share time and place, which is usually not the case in written texts. Moreover, communication can take place in a private or public setting or at a specific location such as a church. In temporal terms, linguistic conventions change through the decades and centuries. (6) Communicative purposes: while general discourse intentions include description, persuasion or narration, they may be complemented by specific textual functions referring to particular states of affairs, such as scientific findings or political spin. What is more, the text may be presented as fictitious or factual, and addressors often use linguistic items expressing their personal stance. (7) Topic: the theme of any kind of communication can be classified at a very general level as belonging to a certain field of discourse, such as science or business, while such domains obviously offer manifold possibilities of topical sudivisions. Those seven situational characteristics can be related to fifteen linguistic categories that may be worthwhile investigating in a register analysis (cf. Biber and Conrad 2009: 78–82): vocabulary features (e.g. technical terms), content word classes, function word classes, derived words, verb features (e.g. tense and aspect), pronoun features, reduced forms and dispreferred structures (e.g. contractions or ellipsis), prepositional phrases, coordination, main clause types, noun phrases, adverbials, complement clauses, word order choices (e.g. raising or extraposition) and special features of conversation (e.g. backchannels, pauses and repetitions). Any of these features may then function as either “register features” or “register markers”, which are distinguished in the following way (Biber and Conrad 2009: 53–54): register features are both pervasive and frequent, as they occur in all parts of a sample text belonging to a given register and appear more often in a selected register than in others. In contrast, register markers are unique to a particular register, as they do not occur in any other register, such as technical expressions in specific types of sport broadcasts. In order to make a comparison of registers possible, it is necessary to introduce a limited set of dimensions along which various registers show different frequencies of the respective linguistic features. For instance, dimensions used for the study of spoken and written university registers may be “oral versus literate discourse” or “procedural versus content-focused discourse” (Biber and Conrad 2009: 226–230). This approach, accordingly entitled “multidimensional (MD) analysis”, heavily relies on corpus-derived quantitative data. With the help of factor analysis, co-occurring clusters of linguistic features in target registers

10 

 Christoph Schubert

can be retrieved. Eventually, it is possible to identify register-specific dimension scores, by means of which the registers can be compared. This approach also underlies the register distinction present in the seminal Longman Grammar of Spoken and Written English (Biber, Johansson, Leech, Conrad and Finegan 1999) as well as in the monograph University Language (Biber 2006), and it is the foundation of numerous studies on registers in recent years. For instance, Biber (2012) challenges the common practice of reference grammars which fail to take into account register distinctions and treat grammatical structures as general features of English at large. Biber’s impact can be measured by the fact that his method of multidimensional analysis has become more and more widespread (e.g. Egbert 2012; Geisler 2002; Reppen 2001; van Rooy et al. 2010; Xiao 2009). This trend is further corroborated by a recent edited volume which is dedicated explicitly to Biber’s MDA and contains articles on regional and register variation in both English and Romance languages (Sardinha and Pinto 2014).

5 An outline of the volume This volume is subdivided into three thematic parts, each introduced by general remarks on the respective section topic and by a summary of the individual articles: the first part, specialised registers, is dedicated to the description of individual registers, namely web registers (Biber and Egbert), medical texts (Dorgeloh), Aviation English (Bieswanger), hip-hop (Kreyer) and crossword puzzles (Pham). The second part, cross-register comparison, builds upon that basis by providing register-transcending studies which compare individual registers. More specifically, it contrasts comics and academic texts (Sanchez-Stockhammer), academic prose and minimalist poetry (Lampert) as well as academic writing, administrative writing, timed exams, conversations and broadcast discussions (Neumann and Fest). The third part, regional, contrastive and diachronic register variation, widens the perspective by investigating register variation along international, contrastive-linguistic and historical dimensions. It is dedicated to metaphors in the New Englishes of India, Hong Kong and Singapore (Güldenring) as well as noun phrase structure in Indian English, Jamaican English, Hong Kong English and Canadian English (Schaub). Online text commentaries are analysed contrastively in British and German sports reports (Werner). The diachronic perspective is considered in the discussion of developments of word order from Middle English to Late Modern English (Pérez-Guerra). The paper by Neumann and Fest functions as an apt link between Sections II and III, since it combines cross-register comparisons with regional variation. Although the various contri-



Introduction: Current trends in register research 

 11

butions to the volume take different research perspectives, all deal with frequent and recurrent linguistic features throughout texts supporting specific superordinate functions. Conclusively, the papers cover theoretical considerations, case studies and reflections on presently employed methods, suggesting approaches and topics for future research on variational text linguistics in English.

Bibliography Alonso-Almeida, Francisco. 2008. The Middle English medical charm: Register, genre and text type variables. Neuphilologische Mitteilungen 109(1). 9–38. Andersen, Gisle & Kristin Bech (eds.). 2013. English corpus linguistics: Variation in time, space and genre. Amsterdam: Rodopi. Balasubramanian, Chandrika. 2009a. Register variation in Indian English. Amsterdam: Benjamins. Balasubramanian, Chandrika. 2009b. Circumstance adverbials in registers of Indian English. World Englishes 28(4). 485–508. Bao, Zhiming & Huaqing Hong. 2006. Diglossia and register variation in Singapore English. World Englishes 25(1). 105–114. Barron, Anne. 2012. Public information messages: A contrastive genre analysis of state-citizen communication. Amsterdam: Benjamins. Bartsch, Sabine. 2009. Corpus studies of register variation: An exploration of academic registers. Anglistik: International Journal of English Studies 20(1). 105–124. Battarbee, Keith. 2010. Shifts in the language of the law: Reading the registers of officiallanguage statutes. Text & Talk 30(6). 637–655. Bex, Tony. 1996. Variety in written English: Texts in society – societies in text. London: Routledge. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge UP. Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge UP. Biber, Douglas & Edward Finegan. 2001. Diachronic relations among speech-based and written registers in English. In Susan Conrad & Douglas Biber (eds.). Variation in English: Multi-dimensional studies, 66–83. Harlow: Pearson Education. Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamins. Biber, Douglas. 2007. Towards a taxonomy of web registers and text types: A multidimensional analysis. In Marianne Hundt, Nadja Nesselhauf & Carolin Biewer (eds.). Corpus linguistics and the Web, 109–131. Amsterdam: Rodopi. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge UP. Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus linguistics and linguistic theory 8(1). 9–37. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. London: Longman. Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of sub-register. Journal of English Linguistics 41(2). 104–134.

12 

 Christoph Schubert

Cole, Debbie. 2012. Uptake (un)limited: The mediatization of register shifting in US public discourse. Language in Society 41(4). 449–470. Cortés de los Ríos, Ma Enriqueta. 2010. A combined genre-register approach in texts of business English. LSP Journal 1(1). 13–28. Crespo García, Begoña. 2004. The scientific register in the history of English: A corpus-based study. Studia Neophilologica 76(2). 125–139. Csomay, Eniko. 2002. Variation in academic lectures: Interactivity and level of instruction. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore linguistic variation, 203–224. Amsterdam: Benjamins. Davies, Mark. 2009. Word frequency in context: Alternative architectures for examining related words, register variation and historical change. In Dawn Archer (ed.). What’s in a word-list? Investigating word frequency and keyword extraction, 53–68. Surrey: Ashgate. De Beaugrande, Robert-Alain. 1993. ‘Register’ in discourse studies: A concept in search of a theory. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 7–25. London: Pinter Publishers. De Beaugrande, Robert-Alain & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics. London: Longman. Dittmar, Norbert. 2010. Register. In Mirjam Fried, Jan-Ola Östman & Jef Verschueren (eds.). Variation and change: Pragmatic perspectives, 221–233. Amsterdam: Benjamins. Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.). Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 174–204. Berlin: de Gruyter. Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 1–26. Berlin: De Gruyter Mouton. Egbert, Jesse. 2012. Style in nineteenth century fiction: A multi-dimensional analysis. Scientific Study of Literature 2(2). 167–198. Esser, Jürgen. 2009. Introduction to English text-linguistics. Frankfurt/Main: Peter Lang. Freddi, Maria. 2005. From corpus to register: The construction of evaluation and argumentation in linguistics textbooks. In Elena Tognini-Bonelli & Gabriella Del Lungo Camiciotti (eds.). Strategies in academic discourse, 133–151. Amsterdam: Benjamins. Fryer, Daniel Lees. 2013. Exploring the dialogism of academic discourse: Heteroglossic engagement in medical research articles. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 183–207. Amsterdam: Rodopi. Fuoli, Matteo. 2013. Texturing a responsible corporate identity: A comparative analysis of appraisal in BP’S and IKEA’S 2009 corporate social reports. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 209–235. Amsterdam: Rodopi. Gardner, Sheena. 2012. Genres and registers of student report writing: An SFL perspective on texts and practices. Journal of English for Academic Purposes 11(1). 52–63. Geisler, Christer. 2002. Investigating register variation in nineteenth-century English: A multi-dimensional comparison. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore linguistic variation, 249–271. Amsterdam: Benjamins. Gilquin, Gaëtanelle. 2008. Too chatty: Learner academic writing and register variation. English Text Construction 1(1). 41–61.



Introduction: Current trends in register research 

 13

Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 29–51. Berlin: De Gruyter Mouton. Gotti, Maurizio. 2012. Variation in academic texts. In Maurizio Gotti (ed.). Academic identity traits: A corpus-based investigation, 23–42. Bern: Peter Lang. Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4). 359–379. Gut, Ulrike & Christoph Schubert. 2012. Approaches to language variation: Introduction. In Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 3–9. Trier: WVT. Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language and meaning. London: Arnold. Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman. Han, Huabing. 2010. On the methodology employed in ESP teaching under register theory. The 1st Asian ESP conference. [Special edition]. Asian ESP Journal, 158–163. Hardy, Jack A. 2012. Filipino and American online communication and linguistic variation. World Englishes 31(2). 143–161. Hilbert, Michaela & Manfred Krug. 2012. Progressives in Maltese English: A comparison with spoken and written text types of British and American English. In Marianne Hundt & Ulrike Gut (eds.). Mapping unity and diversity world-wide, 103–136. Amsterdam: John Benjamins. Lukin, Annabelle. 2010. ‘News’ and ‘register’: A preliminary investigation. In Ahmar Mahboob & Naomi K. Knight (eds.). Appliable linguistics, 92–113. London: Continuum. Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of register analysis. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 221–292. London: Pinter Publishers. Moore, Nick. 2006. Advanced language for intermediate learners: Corpus and register analysis for curriculum specification in English for academic purposes. In Heidi Byrnes (ed.). Advanced language learning: The contribution of Halliday and Vygotsky, 246–264. London: Continuum. Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT. Neumann, Stella. 2013. Contrastive register variation: A quantitative approach to the comparison of English and German. Berlin: Mouton de Gruyter. Nkemleke, Daniel A. 2006. Some characteristics of expository writing in Cameroon English. English World-Wide 27(1). 25–44. Painter, Clare. 2001. Understanding genre and register: Implications for language teaching. In Anne Burns & Caroline Coffin (eds.). Analysing English in a global context, 167–180. London: Routledge. Pollner, Clausdirk. 2005. English 0 – and drugs galore: Varieties and registers in Irvine Welsh’s Trainspotting. In Gisela Hermann-Brennecke & Wolf Kindermann (eds.). Anglo-american awareness: Arpeggios in aesthetics, 193–202. Münster: LIT. Quinto-Pozos, David & Sarika Mehta. 2010. Register variation in mimetic gestural complements to signed language. Journal of Pragmatics 42(3). 557–584. Renkema, Jan. 2004. Introduction to discourse studies. Amsterdam: John Benjamins.

14 

 Christoph Schubert

Reppen, Randi. 2001. Register variation in student and adult speech and writing. In Susan Conrad & Douglas Biber (eds.). Variation in English: Multidimensional studies, 187–199. London: Longman. Rühlemann, Christoph. 2008. A register approach to teaching conversation: Farewell to Standard English? Applied Linguistics 29(4). 672–693. Sardinha, Tony Berber & Marcia Veirano Pinto (eds.). 2014. Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. Amsterdam: John Benjamins. Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge UP. Schneider, Klaus P. & Anne Barron (eds.). 2008. Variational pragmatics: A focus on regional varieties in pluricentric languages. Amsterdam/Philadelphia: Benjamins. Schubert, Christoph. 2012. Englische Textlinguistik: Eine Einführung. 2nd edn. Berlin: Erich Schmidt. Schutz, Natassia. 2013. How specific is English for academic purposes? A look at verbs in business, linguistics and medical research articles. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 237–257. Amsterdam: Rodopi. Summers, Della et. al. (ed.). 2005. Longman dictionary of contemporary English. Harlow: Pearson Education Limited. Syrquin, Anna F. 2006. Registers in the academic writing of African American college students. Written Communication 23(1). 63–90. Szmrecsanyi, Benedikt & Bernhard Wälchli (eds.). 2014. Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech. Berlin: de Gruyter. Taavitsainen, Irma. 2001. Language history and the scientific register. In Hans-Jürgen Diller & Manfred Görlach (eds.). Towards a history of English as a history of genres, 185–202. Heidelberg: Winter. Teich, Elke. 2003. Cross-linguistic variation in system and text. Berlin: Mouton de Gruyter. Teich, Elke. 2009. Scientific registers in contact: An exploration of the lexico-grammatical properties of interdisciplinary discourses. International Journal of Corpus Linguistics 14(4). 524–548. Teich, Elke. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries, Stefanie Wulff & Mark Davies (eds.). Corpus-linguistic applications: Current studies, new directions, 233–247. Amsterdam: Rodopi. Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn. London: Penguin. Trumble, William R. & Angus Stevenson (eds.). 2002. Shorter Oxford English dictionary on historical principles. 2 vols. Oxford: Oxford UP. Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register differentiation in East African English: A multidimensional study. English World-Wide 31(3). 311–349. Venour, Chris, Graeme Ritchie & Chris Mellish. 2011. Dimensions of incongruity in register humour. In Marta Dynel (ed.). The pragmatics of humour across discourse domains, 125–144. Amsterdam: Benjamins. Volden, Joanne. 2009. Bossy and nice requests: Varying language register in speakers with autism spectrum disorder (ASD). Journal of Communication Disorders 42(1). 58–73. Wälchli, Bernhard & Benedikt Szmrecsanyi. 2014. Introduction: The text-feature-aggregation pipeline in variation studies. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.).



Introduction: Current trends in register research 

 15

Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 1–25. Berlin: de Gruyter. Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell. Warner, Anthony. 2005. Why DO dove: Evidence for register variation in Early Modern English. Language Variation and Change 17(3). 257–280. Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World Englishes 28(4). 421–450. Zlatnar Moe, Marija. 2010. Register shifts in translations of popular fiction from English into Slovene. In Daniel Gile, Gyde Hansen & Nike K. Pokorn (eds.). Why translation studies matters, 125–136. Amsterdam: Benjamins.

Section I: Specialised registers The volume opens with five contributions discussing the lexico-grammatical features of previously underdescribed registers, which are situated on different levels in the hierarchy of specificity: web registers, medical discourse, Aviation English, hip-hop and crossword puzzles. The first two registers comprise hetero­ geneous sub-registers, as, for instance, a distinction is made among the web registers between interviews, discussion forums, encyclopedia articles, advertisements and recipes, while Aviation English is a twofold construct and hip-hop and crossword puzzles constitute relatively uniform categories. All studies can be situated within the analytical register framework described in Biber and Conrad (2009) and examine to what extent their object of inquiry can be considered a ­register or where the boundaries between more general categories and sub-­ registers may be drawn. In addition, Dorgeloh’s contribution extends the model by including the genre perspective in the analyses. The first paper in the volume, Douglas Biber and Jesse Egbert’s study “Towards a user-based taxonomy of web registers”, stands out from the other papers’ corpus-based approaches by its use of a bottom-up design in which internet users were asked to identify basic situational characteristics of web documents. These characteristics were then used to construct a hierarchical decision tree, which permitted the successful categorisation of most internet texts by the same type of informants in the next step. Among the most important results of this study are the finding that some sub-registers might be easier to identify than their superordinate category and the observation that a relatively large proportion of registers on the internet can be considered hybrid with regard to their ­communicative purposes. Hybridity of either form, discourse function or both is also observed by Heidrun Dorgeloh in her study “The interrelationship of register and genre in medical discourse”, which finds hybridity in the three medial registers under consideration: illness blogs, medical case reports and medical case presentations. She argues that the correlations between form and function in medical discourse are less linked to the communicative situation than to the type of activity and concludes that the notion of genre should be conferred primacy over that of (sub-)registers. Markus Bieswanger, by contrast, applies a classical Biberian register analysis to the field of air traffic communication in his paper “Aviation English: Two distinct specialised registers?”. While the term Aviation English is generally used to designate both the standardised phraseology promoted by the International Civil

18 

 Section I: Specialised registers

Aviation Organization and the plain English used in exceptional situations where communicative needs transcend the routine repertoire, Bieswanger’s analysis of authentic air traffic communication material manages to demonstrate that these are actually two distinct registers and not just one register with two sub-registers. While Dorgeloh’s and Bieswanger’s material-based approaches place a particular focus on the qualitative analysis of their data in order to explore the boundaries of their particular register(s), the remaining two studies represent quantitative corpus-based studies of specialised corpora. Rolf Kreyer’s contribution, “‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-linguistic perspective”, targets a question similar to Bieswanger’s, namely whether hip-hop lyrics should be considered a sub-register of pop song lyrics. Based on a corpus of lyrics from the top albums in the US album charts in 2003 and 2011, Kreyer contrasts a hip-hop sub-corpus with lyrics by rappers and hip-hoppers to the lyrics from the remaining albums. His analyses yield differences regarding the semantically annotated content and some nonstandard spellings but particularly regarding the absence of the copula. Kreyer therefore concludes that the language used in hip-hop can be considered a register in its own right. The section closes with Teresa Pham’s corpus analysis entitled “The register of English crossword puzzles: Studies in intertextuality”, in which she reaches the conclusion that cryptic and non-cryptic puzzles constitute sub-registers of the general register of crossword puzzles. The differences with regard to the use of intertextuality between the two types of crossword puzzle suggest the addition of intertextuality to the list of linguistic features that can be used to distinguish registers from each other in the Biberian framework.

Douglas Biber and Jesse Egbert

Towards a user-based taxonomy of web registers Abstract: There is a well-established need for a comprehensive taxonomy of English web registers grounded in the actual experiences of end-users. In this paper, we introduce a new grant-funded initiative aimed at filling this gap. We first describe the methods used to develop a hierarchical web register framework and introduce our bottom-up, user-based method of web register classification. Using a hierarchical decision tree, a large sample of webpage URLs (N = 1,000) was classified into register and sub-register categories by four raters each. The results indicate that the approach can be effectively used to identify the register category for most internet texts, although the results also show that many texts belong to ‘hybrid’ registers. The primary goals of the paper are to present the overall distribution of internet texts across general registers, sub-registers and ‘hybrid’ registers, and to discuss some of the key characteristics of the major register categories. We conclude with a discussion of challenges and future directions for web register research.

1 Introduction There is a mind-boggling amount of information available on the World Wide Web. For example, Fletcher (2012: 1) estimates that Google indexes about 40 billion webpages. Although not its intended purpose, the WWW also provides a tremendous resource for linguists, who can use the web as a corpus to investigate linguistic patterns of use. This approach has become so prevalent that the acronym WAC (Web-as-Corpus) has now become commonplace among researchers who explore ways to mine the WWW for linguistic analysis. One of the major challenges for WAC research is that a typical web search usually provides us with no information about the kinds of texts investigated. For example, Fletcher notes that a linguistic search of the Web-as-Corpus will tell us nothing about:

Douglas Biber, Northern Arizona University Jesse Egbert, Brigham Young University

20 

 Douglas Biber and Jesse Egbert

For whom and what purpose is the text intended? What […] target audience does it represent? Was it written carefully or carelessly by a native speaker, or is it an unreliable translation by man or machine? Is the document authoritative – accurate in content and representative in linguistic form? (2012: 1341)

Similar problems were noted a decade earlier by Kilgarriff and Grefenstette (2003) in their introduction to a special issue of Computational Linguistics on WAC. Thus, they write: “Text type” is an area in which our understanding is, as yet, very limited. Although further work is required irrespective of the Web, the use of the Web forces the issue. Where researchers use established corpora, such as Brown, the BNC, or the Penn Treebank, researchers and readers are willing to accept the corpus name as a label for the type of text occurring in it without asking critical questions. Once we move to the Web as a source of data, and our corpora have names like “April03-sample77,” the issue of how the text type(s) can be characterized demands attention. (2003: 343)

These concerns are shared widely among WAC researchers, and as a result, there has been a surge of interest over the last several years in Automatic Genre Identification (AGI): computational methods using a wide range of descriptors to automatically classify web texts into genre classes. The typical methodology used in an AGI study is to manually identify the genre (or register) of selected internet texts and to then test the extent to which computer programs can automatically place those texts into the same categories. However, although some studies have achieved high accuracy rates (e.g., Lindemann and Littig 2010; Santini 2010), serious questions have been raised about the validity of those results. First, some scholars raise doubts about the representativeness of the web corpora analysed in previous AGI studies: researchers often disregard the question of whether the sample used in an AGI study represents the full population of internet texts (see discussion in Santini and Sharoff 2009). There have also been questions raised about the actual genre/register categories that we are trying to predict. Most studies have followed the same general procedure: they first begin with a list of possible genre categories; then internet texts are manually classified into those categories by an ‘expert’; and then computational methods are used to determine whether those genre categories can be automatically predicted. This approach is based on two assumptions: 1) that researchers have identified the ‘correct’ set of possible genre/register categories found on the web, based on a priori intuitive consideration of internet texts; and 2) that a single expert user is able to ‘correctly’ identify the genre/register category of individual internet texts. Unfortunately, neither assumption seems to be warranted. The few cases where inter-rater reliability is reported have shown



Towards a user-based taxonomy of web registers 

 21

that it tends to be quite low, even for linguists. This is especially true for corpora composed of randomly extracted web texts (see discussion in Sharoff, Wu, and Markert 2010). Given the problems that ‘experts’ have identifying web genre categories, it is not surprising that non-expert web users also vary in their understanding of genre labels (see Crowston, Kwaśnik, and Rubleske 2010) and that reliability among lay users is often unacceptably low (Rosso and Haas 2010). More importantly, though, it is not clear that the genre categories being predicted in AGI studies are actually valid. This problem has been recognised and discussed in previous research; thus, for example, Rehm et al. (2008: 352) note: One of the most important problems concerns the elusiveness of the concept of genre. The consequence is that, in practical terms, genre researchers usually have different ideas of what a genre is, how genres should be defined and identified and, therefore, they use different genre labels in their approaches.

A few years ago, there was considerable effort to agree on a standard set of ­register/genre categories for AGI research, as part of a wiki-based collaboration among Web-as-Corpus experts (http://www.webgenrewiki.org/). That collaborative effort resulted in a list of 78 register/genre distinctions, but the initiative appears to have faded out in the last few years, with little consensus regarding the relative status of those categories. As a result, there is still no generally agreed-on set of register/genre categories used in current AGI research. (In the remainder of this paper, we use the term ‘register’ rather than ‘genre’ to refer to situationally-based textual distinctions, following the research tradition developed in Biber 1995, Biber et al. 1999, Biber and Conrad 2009, etc.). In the present study, we tackle this problem with a completely different approach: instead of relying on expert coders, we recruit typical end-users of the web for our register analyses, assessing the degree of agreement among those users. Most importantly, we do not force users to choose directly from a pre-defined set of register categories. Rather, we ask users to identify basic situ­ ational characteristics of each web document, coded in a hierarchical manner (see below). Those situational characteristics lead to general register categories, which in turn allow users to select a specific sub-register category. By working through a hierarchical decision tree, users are able to identify the register cat­ egory of most internet texts with a high degree of reliability. In Section 2 below, we briefly document the methodological procedures used for this project. (Readers are referred to Egbert and Biber 2013 for more detailed discussions.) In Section 3, we introduce the register framework used for our study. In Section 4, then, we describe the overall prevalence of different types of registers on the web and briefly describe and illustrate some of the major web regis-

22 

 Douglas Biber and Jesse Egbert

ters identified in the study. Section 5 discusses a more specialised type of register identified by users in this study: ‘hybrid registers’. Finally, in the conclusion we outline our on-going research to extend this methodological approach to a large representative corpus of web documents.

2 Methods 2.1 Corpus for analysis The corpus used for our study was extracted from the Corpus of Global Webbased English (GloWbE), constructed by Mark Davies (see http://corpus2.byu. edu/glowbe/). The entire corpus contains ca. 1.9 billion words and 1.8 million web pages, collected by using the results of Google searches of highly frequent English 3-grams (e.g., is not the, and from the). The use of n-grams as search engine seeds is an approach that has been used in the past by many WAC scholars (see, e.g., Baroni and Bernardini 2004; Baroni et al. 2009; Sharoff 2005, 2006). Our decision to use 3-grams (rather than 2-grams or 4-grams) was based largely on empirical evidence from the Longman Grammar of Spoken and Written English (Biber et al. 1999). 2-grams are generally collocations that are semantically-based and likely to result in topic-driven Google search results. 4-grams, on the other hand, are much less frequent than 3-grams and were thus not likely to offer us a broad enough sample of n-grams to choose from. To create the actual corpus, the web pages identified through these random searches were downloaded using HTTrack (http://www.httrack.com). Our ultimate goal in this project is to carry out linguistic analyses of internet texts from the range of web registers. To prepare the corpus for such analyses, non-textual material was removed from all web pages (HTML scrubbing and boilerplate removal) using JusText (http://code. google.com/p/justext). Finally, for the present pilot study, we randomly extracted 1,000 web pages from the larger corpus (with URLs from the US, UK, CA, AU, NZ). Roughly 7 % of the web pages in this initial sample were dropped from the register analysis: 33 of the 1,000 web sites in the corpus were no longer available at the time of coding and an additional 36 web pages consisted mostly of photos or graphics. Consequently, the results reported below are based on a corpus of 931 web pages.



Towards a user-based taxonomy of web registers 

 23

2.2 Overview of procedures The study described here is part of a larger project, designed to identify the registers found on the web, document the extent to which each of those registers is actually used and ultimately undertake comprehensive linguistic analyses of those register categories as the basis for automatic register and genre identification. The first step required to reach these goals was to establish a set of register distinctions that end-users actually recognise and can reliably identify. This step turned out to be highly challenging, requiring several rounds of pilot testing with end-users. In the process, we reconsidered our basic approach, developing a decision tree of situational characteristics rather than asking users to directly identify the register category of a given internet text. We discuss these register distinctions, and the development of a web classification tool, in Section 3 below. Once we had developed this tool, and verified that end-users were able to reliably identify the register distinctions built into the tool, we moved on to the larger pilot study to explore the types and distributions of registers found on the web. We recruited 85 raters (typical end-users of the web) to analyse the 1,000 web pages in our pilot corpus. Raters were recruited through Mechanical Turk. Mechanical Turk is an Amazon-based online crowd-sourcing utility that connects Requesters – or people who need small tasks completed by human raters—with Workers  – or people who are willing to complete those small tasks for money. Each web page was coded by four independent raters, so we were able to analyse the reliability of the coding. We determined that four was the optimal number of raters as a result of several rounds of pilot research. The choice to use 1,000 URLs was based mostly on practicality and the money available to us. While there was consensus on the coding of the majority of pages, this approach also allowed us to identify the existence of ‘hybrid registers’ (see Section 5 below). Finally, we compiled distributional results from the coding, providing the basis for our preliminary description of register variation on the web (Sections 4–5).

3 Register categories distinguished in the study Before undertaking empirical investigation of the registers found on the web, we needed to decide on a set of register categories to be used for the coding. For this purpose, we began with the 78 register/genre categories identified through the wiki-based collaboration of Web-as-Corpus experts (http://www.webgenrewiki. org/; see also the discussion in Rehm et al. 2008). We catalogued the underlying situational characteristics of those 78 categories (e.g., mode, interactivity, commu-

24 

 Douglas Biber and Jesse Egbert

nicative purpose; see Biber and Conrad 2009, Chapter 2), and based on that analysis, we developed a framework with the eight general registers shown in Table 1. Table 1: General web register categories distinguished in the study A. Internet texts that originated in the spoken mode (e.g., transcripts of speeches or interviews) B. Internet texts that originated in the written mode 1. Interactive written internet texts 2. Non-interactive written internet texts 2.a. Narratives 2.b. Informational descriptions or explanations 2.c. Overt opinions 2.d. Information presented with the intent to persuade 2.e. How-to procedures or instructions 2.f. Lyrical discourse

In our early pilot studies, we asked non-expert users of the internet to categorise web pages by directly identifying the register category of each page. However, this approach proved problematic, in some cases achieving agreement rates below 50 %. As a result, we developed a more bottom-up approach involving a decision tree with basic situational characteristics. At the top level, we asked users to make a 2-way decision about the mode of production: 1. Internet texts that originated in the spoken mode (e.g., transcripts of speeches or interviews) 2. Internet texts that originated in the written mode Then, for the written texts, we asked users to distinguish between interactive discussions (e.g., discussion forums) versus non-interactive internet texts. Even this simple distinction is often not clear-cut on the web, because authored web docu­ ments are often followed by reader comments. We thus made it clear to coders that ‘written interactive discussions’ are distinct from written documents followed by reader comments, and that coders would be able to note the existence of reader comments for non-interactive texts later in the process. These reader comments are common in web documents. While we do not currently have plans to classify documents with reader comments differently than those without comments, coding for their presence makes this a possibility for future analyses. For the first two general categories above (spoken and interactive written), we immediately asked coders to identify a specific sub-register (see Table 2 below). In both cases, users could select ‘other’ if the page did not fit clearly into one of the existing categories.



Towards a user-based taxonomy of web registers 

 25

For the third general category – written non-interactive internet texts – we asked users to distinguish among general registers based on communicative purpose: – to narrate or report on EVENTS [past, present, or future] – to describe or explain INFORMATION – to express OPINION – to describe or explain FACTS WITH INTENT TO PERSUADE – to explain HOW-TO or INSTRUCTIONS – to express oneself through LYRICS Then, once a user had selected one of those general categories (2.a.–2.f. in the list above), we asked them to identify the specific sub-register. The full list of general register and specific sub-register distinctions in our framework is listed in Table 2 below. Table 2: Web registers and sub-registers distinguished in the study 1. Internet texts that originated in the SPOKEN mode – interview – formal speech – transcript of video/audio recording – TV/movie script – other (spoken) 2. INTERACTIVE internet texts that originated in the WRITTEN mode – question/answer forum – discussion forum – reader/viewer responses – other (discussion) 3.–8. Non-interactive internet texts that originated in the written mode 3. NARRATIVES or reports of events [past, present or future] – news report/blog – sports report/blog – personal/diary blog – historical article – short story – novel – biographical story/history – magazine article – memoir – obituary – travel blog – other (narrative)

26 

 Douglas Biber and Jesse Egbert

Table 2(continued) 4.

INFORMATIONAL DESCRIPTION or EXPLANATION – description (place, product, organisation, program, job, etc.) – description of a person (including celebrity profiles) – frequently asked questions (FAQ) about information – encyclopedia article – abstract – research article – course materials – informational blog – legal terms and conditions – technical report – other (informational)

5.

express OPINION – opinion blog – review (product, service, movie, etc.) – advice – religious blog/sermon – advertisement – self-help – letter to the editor – other (opinion)

6.

describe or explain FACTS WITH INTENT TO PERSUADE – description with intention to sell – editorial – persuasive article or essay – other (informational persuasion)

7.

explain HOW-TO or INSTRUCTIONS – instructions – frequently asked questions (FAQ) about how to do something – how-to – technical support – recipe – other (instructions)

8.

express oneself through LYRICS – poem – prayer – song lyrics – other (lyrical)



Towards a user-based taxonomy of web registers 

 27

4 Distribution of registers on the web Applying the register classification scheme outlined in the last section, we asked 85 raters to code the register characteristics of 1,000 web pages, with each text being coded by four different raters. As noted above, ca. 7 % of the web pages in our initial sample were dropped from the register analysis (pages that were no longer available or consisted mostly of photos or graphics). Thus, the results reported below are based on a corpus of 931 web pages. As Table 3 shows, at least three raters were able to agree on the general register category for 62.7 % of the web pages in our corpus (see Table 3 below). All four raters agreed on the classification of ca. 34 % of the texts, while three of the four raters agreed on the classification of an additional ca. 29 % of the texts. For 11 % of the texts, raters showed a 2-2 split in their classifications. It turned out, though, that many of the specific classifications in these splits occurred repeatedly in the corpus. As a result, we explored the possibility that these common 2-2 splits represent ‘hybrid registers’ on the web. We return to that possibility in Section 5 below. Table 3: Agreement results for the general register classification of 931 webpages 4 agree

3 agree

2-2 split

2-1-1 split

No agreement

Total

315 33.8 %

269 28.9 %

104 11.1 %

173 18.6 %

70 7.6 %

931 100 %

Table 4 shows that the levels of agreement were somewhat lower for the coding of specific sub-register categories: raters were able to agree on the sub-register for ca. 43 % of the web pages (with 3 or all 4 raters in agreement), while an additional ca. 8 % of these pages were coded with a 2-2 split. Table 4: Agreement results for the specific sub-register classification of 931 webpages 4 agree

3 agree

2-2 split

2-1-1 split

No agreement

Total

171 18.3 %

231 24.8 %

73 7.8 %

90 9.8 %

366 39.3 %

931 100 %

28 

 Douglas Biber and Jesse Egbert

Taken together, the distributional results from the pilot study show that non-­ expert web users can, to a large extent, reliably classify web pages into general register categories, and that there is substantial agreement even for specific sub-register categories. The data obtained from this coding process allow us to begin to explore the content of the web, asking what registers are especially prevalent and which ones are relatively rare. Thus, Table 5 shows the breakdown of general register categories (presented in order of frequency) for all 931 texts in our corpus (see Table 3 above). Table 6 shows the breakdown of specific sub-registers within each of these general register categories. Table 5: Frequency information for general register categories General Register

#

%

Narrative Informational Description/Explanation

177 140

19.0 15.0

Interactive Discussion How-to/Instructional Lyrical Informational Persuasion Spoken Hybrid (see Section 5) No agreement Total

79 27 19 15 6 277 70 931

8.5 2.9 2.0 1.6 0.6 29.7 7.5 100

Table 6: Frequency information for sub-register categories Register Narrative News report/blog Sports report/blog Personal/diary blog Historical article Short story Novel Biographical story/history Joke Magazine article Memoir Obituary Other factual narrative

#

%

177 99 19 7 4 3 2 1 0 0 0 0 0

55.9 10.7 4.0 2.3 1.7 1.1 0.6 0 0 0 0 0



Towards a user-based taxonomy of web registers 

Table 6(continued) Register Other fictional narrative Other personal narrative Travel blog No agreement on sub-register Informational Description/Explanation Description of a thing Description of a person Research article Abstract Legal terms and conditions FAQ about information Encyclopedia article Informational blog Course materials Technical report No agreement on sub-register Opinion

#

%

0 0 0 42

0 0 0 23.7

140 34 9 7 5 4 2 2 2 1 1 73

24.3 6.4 5.0 3.6 2.9 1.4 1.4 1.4 0.7 0.7 52.1

121

Opinion blog Review Advice Religious blog/sermon Self-help Advertisement Letter to the editor No agreement on sub-register

57 23 9 5 1 0 0 26

Interactive Discussion

79

Question/answer forum Other forum Other discussion Reader/viewer responses No agreement on sub-register

46 7 1 0 25

How-to/Instructional

27

How-to Technical support Recipe Instructions FAQ No agreement on sub-register

13 2 1 0 0 11

47.1 19.0 7.4 4.1 0.8 0 0 21.5

58.2 8.9 1.3 0 31.6

48.1 7.4 3.7 0 0 40.7

 29

30 

 Douglas Biber and Jesse Egbert

Table 6(continued) Register

#

Lyrical

19

Song lyrics Other Poem Prayer No agreement on sub-register

17 1 0 0 1

Informational Persuasion

15

Description with intent to sell Persuasive article or essay Editorial Other No agreement on sub-register

8 2 0 0 5

Spoken

6

Interview Transcript of video/audio TV/movie script No agreement on sub-register

5 1 0 0

%

89.5 5.2 0 0 5.2

53.3 13.3 0 0 33.3

83.3 16.7 0 0

Based on the data in our pilot corpus, the most common general internet register is Narrative (19 % of the texts in our corpus; see Table 5). Table 6 shows that ca. 65 % of the texts in this general register were classified as either News report/ blogs or Sports reports/blogs. Many of these texts are examples of registers found in print media that have simply been transferred to the web. At first we planned to distinguish news/sports blogs, which have their origin on the web, from news/ sports reports that have their origin in print media. In practice, though, it proved nearly impossible to determine whether a news/sports report was originally published in a print newspaper or whether it had been written specifically for a web blog. As a result, we treat these reports and blogs as a single category (although it was generally easy for raters to distinguish between news reports/blogs versus sports reports/blogs, based on the topic of the text). The second most frequent general register is Informational Description/ Explanation (15 % of the texts in our corpus; see Table 5). However, as Table 6 shows, raters often failed to agree on the specific sub-register for this general category (52 % of the total texts). In future research, we plan to investigate the possibility of hybrid registers at the sub-register level to better understand the nature of these texts.



Towards a user-based taxonomy of web registers 

 31

Opinion web pages were nearly as common as description pages (see Table 5). Nearly half of these were classified as Opinion blogs (47 %), while another 19 % were classified as Reviews. In general, there was much higher agreement about these sub-register categories of Opinion than there was for the general cate­gory of Informational Description/Explanation. The Interactive Discussion general register was also used relatively frequently, and the majority of these texts were classified as Question/Answer forums. Similar to blogs, these are specialised web registers not found in print media. The other four general register categories  – Lyrical, How-to/Instructional, Informational Persuasion and Spoken – occurred much less frequently than the major categories of Narration, Informational Description/Explanation, Opinion and Interactive Discussion. However, it is clear that these registers each comprise one or two important sub-register categories. For example, the specific sub-registers of song lyrics and spoken interviews were especially prevalent. While some of these general registers and sub-registers are very similar to traditional print registers (e.g., News reports, Sports reports, Reviews, Research articles, Song lyrics), many of them are unique to the domain of the internet. For example, the sub-registers of Personal/diary blogs and Opinion blogs, as well as the general register of Interactive Discussion are distinctive to the internet. Furthermore, some of the web registers that appear to be traditional are actually quite different from their printed, non-internet counterparts. This is due to several factors, including the relative ease of ‘publishing’ on the internet and decreased attention to pre-planning and editing common in many internet registers. In future research, we plan to explore these innovative registers in considerably more detail (see Section 6 below).

5 Hybrid registers At the beginning of Section 4, we noted that many web pages were coded with a 2-2 split. For example, two raters might have coded a given page as a ‘narrative’, while two other raters classified the same page as an ‘informational description/ explanation’. One interpretation of these splits is that they simply show a lack of agreement among raters, reflecting a lack of reliability in the register framework. However, the actual distribution of these pairings suggests a different interpretation. In theory, there are 28 different 2-2 categories that could be formed by combining the 8 general register categories in our framework. So, for example, there

32 

 Douglas Biber and Jesse Egbert

are 7 different 2-2 categories that could have been formed by combining ‘narrative’ with one of the other categories (narrative-spoken, narrative-interactive discussion, narrative-informational description, narrative-opinion, narrative-information presented with the intent to persuade, narrative-how-to, narrative-lyrical). Similarly, there are 21 other pairings of general registers that are theoretically possible. Given this fact, it is surprising that only four combinations of general registers commonly occurred in 2-2 splits (see Table 7): Narrative+Informational Description, Narrative+Opinion, Informational Description+Opinion and Informational Persuasion+Opinion. Other combinations occur in 2-1-1 splits (see Table 8). This restricted set of commonly occurring register combinations suggests an alternative explanation for the lack of agreement among raters: rather than reflecting a problem with the coding rubric, these common 2-2 combinations (and 2-1-1 combinations) can be interpreted as evidence that these texts belong to ‘hybrid’ registers – registers that combine the communicative purposes and other situational characteristics of two or more general registers. Evidence for this interpretation comes from the fact that these combinations were identified by coders much more often than others. In particular, the frequent hybrid combinations are restricted to four general register categories: Narrative, Informational Description/Explanation, Opinion and Informational Persuasion. These four general register categories are distinguished primarily by their communicative purposes: For example, Table 7 shows that Narrative+Informational Description occurred 43 times, accounting for ca. 41 % of all 2-2 splits. Table 8 shows that Narrative+Description+Other also accounts for ca. 56 % of 2-1-1 splits, further supporting the existence of a hybrid register that combines these purposes. Table 7: General register 2+2 hybrid combinations Hybrid Combination (2+2) Narrative + Informational Description/Explanation Narrative + Opinion Informational Description/Explanation + Opinion Informational Persuasion + Opinion Informational Description + Informational Persuasion Informational Description + How-to/Instructional Interactive Discussion + Opinion Informational Description + Interactive Discussion How-to/Instructional + Opinion TOTAL

Count 43 27 17 11 6 4 4 3 3 118



Towards a user-based taxonomy of web registers 

 33

Table 8: General register 2+1+1 hybrid combinations Hybrid Combination (2+1+1) Narrative + Description + Opinion Description + Informational Persuasion + Opinion Narrative + Description + Informational Persuasion Informational Persuasion + Narrative + Opinion Description + How-to/Instructional + Opinion Other combinations TOTAL

Count 56 40 28 24 15 10 173

Text Sample 1 illustrates a web page from the Daily Mail with combined Narrative+Informational Description communicative purposes. Two raters coded the sub-register of this text as a news report/blog and two other raters coded it as a description of people. This text occurs online as a single web page (which is still available on the web, despite its dated content). However, the text comprises a series of topics, demarcated only by the use of ALL-CAPS. (The formatting of the 8th paragraph is corrupted in the original version of the page online, since THURSDAY nights and THE fashionable residents seem to begin new topics.) The title of the page (It’s King Tony to see you, ma’am) seemingly relates only to the first of these embedded topics. Such pages are common on the web (and perhaps becoming more common in print media). They have no single topic or communicative purpose, except maybe to present a bunch of information that the author happens to find interesting or amusing. The information in the page is sometimes descriptive and sometimes narrative, resulting in the hybrid nature of such texts. Text Sample 1:

It’s King Tony to see you, ma’am

Tony and Cherie Blair arrived at Balmoral last night for their annual get-together with the Queen and the Duke of Edinburgh.

The Blairs have spent the summer touring the West Indies, Italy and Greece, hobnobbing with celebrities and world leaders, barely spending a penny of their own money. A Royal tour in all but name.

The Windsors spent most of the summer pottering unnoticed around Britain.

One can’t help wondering why Her Majesty doesn’t just hand over the key to the castle.

BORIS JOHNSON is in big trouble with Commons speaker and former sheet-metal worker Michael ‘Gorbals Mick’ Martin. The Tory MP’s new novel features a Commons Speaker who is a “buttockclenching, fat, tactless, Left-wing Scot who eats the traditional sheet-metal worker’s breakfast of black pudding”. Order! Order!

34 

 Douglas Biber and Jesse Egbert

DON’T be taken in by claims that Tory chairman Liam Fox patched up the row over the warning by Karl Rove --George Bush’s aide – that Michael Howard will never be allowed to meet the President. Rove was “too busy” even to speak to Fox at the Republican convention, let alone sit next to him during Bush’s speech, as was claimed.

CHERIE BLAIR’S new job as ambassador for Britain’s 2012 Olympic bid has surprised friends who cannot recall her interest in sport. She is being ‘coached’ by her new spin doctor Jo Gibbons, a former Football Association aide.

Gibbons is best friends with Jo Moore, the Labour aide who “coached” the former Transport THURSDAY nights at London disco, Base 1, situated in a basement beneath the Tory Party’s new HQ in Victoria Street, Westminster, are booming. The club has been “adopted” by smart preppy males who work for the Conservatives and pop downstairs for a sweaty session of high-energy dancing once a week. THE fashionable residents of Suffolk resort Walberswick – including film-maker Richard Curtis and his partner Emma Freud, daughter of ex-MP Clement – may be alarmed to learn the least fashionable member of the Cabinet has moved in. Defence Secretary Geoff Hoon, the kind of man who wears knee-length socks with open-toed sandals on his hols, is a new neighbour. Somehow he mingled with them unnoticed at last week’s summer fete.

THE death of spin has been greatly exaggerated. Labour HQ has sent out invitations to MPs summoning them to a series of three all-day training sessions on how to ‘spin’ stories to the media.

It is perhaps not surprising that such texts also often include opinionated purposes. (Even Text Sample 1 could be interpreted in that way, although there are few overt lexico-grammatical expressions of stance.) In particular, personal blogs commonly combine narrative and opinionated purposes. For example, Text Sample 2 was coded by two raters as a narrative-personal blog, and by two raters as an opinion blog. A quick read through this text shows both purposes: it begins with a narrative, but it also includes considerable discussion that could be regarded as overt opinion (e.g., my gut is; Here’s one good reason to do that; But I’m already on-side with that argument. It’s time to convince people…; ‘Making the internet happen’ shouldn’t be magic). Text Sample 2:

Time to get out more

So, I’ve been thinking about something else that Laptops and Looms threw up for me.

At one point someone -- I think it was Alice Taylor -- remarked that we’re really good at talking about post-digital stuff to one another, but that it’s time to talk to other people. And while many people at the event seemed to think about that in the context of reaching out to manufacturers and discussing new ways of grokking production, my gut is that we should talk more to people totally uninvolved with the whole thing.

Here’s one good reason to do that. It was fascinating, hearing what a bunch of people might do if given the opportunity to turn old mills and factories built a hundred and fifty years ago into things that operate in the space between digital interfaces and traditional



Towards a user-based taxonomy of web registers 

 35

manufacture. But I’m already on-side with that argument. It’s time to convince people who’ll have to live with those products and live alongside the places that produce them.

Here’s another. Russell jokingly mentioned the ‘Google apprenticeship’ as a means of answering some of the questions floating around the room to do with aspiration, but my gut feeling is that you get people engaged with working in companies like Google when you demystify the whole process. ‘Making the internet happen’ shouldn’t be magic that someone else does anymore, it should be something we show off. Find me at Email me

Finally, informational/descriptive texts often incorporate evaluative language, but they are not uniformly regarded as ‘opinionated’. Text Sample 3 presents an extreme case: a business report on a corporation that begins with an explicit disclaimer that the blog represents ‘personal opinions’. However, this text is mostly presented as a simple report of information. It overtly identifies ‘strengths’ and ‘weaknesses’, but the information provided appears to be mostly factual description. Reflecting these combined purposes, two raters coded this text as an opinion blog, one rater coded it as descriptive information and one coded it as a news report/blog. Text Sample 3:

Get a LEG Up on the Market

AnnaLisa is a member of The Motley Fool Blog Network -- entries represent the personal opinions of our bloggers and are not formally edited.

Leggett & Platt (NYSE: LEG ) , the diversified bedspring, automotive, and industrial manufacturer, just announced it would pay its dividend early so that shareholders wouldn’t see a big tax on the dividend usually paid out in January. The early Christmas present goes ex-dividend on Dec. 10, with the dividend to be paid out on Dec. 27. Leggett & Platt seems to be one of the first companies to react to an anticipated tax increase on dividends come 2013. This Standard & Poor’s dividend aristocrat is certainly shareholder attentive, but let’s drill down on this company’s strengths, weaknesses, opportunities, and threats.

STRENGTHS

The company is extremely shareholder friendly, with dividends paid since 1987, and has more than 25 consecutive years of increasing the dividend.

An EPS growth rate of 15 %, and a P/E that currently stands at 21.59.

The company is diversified across many industries besides their original status as a bedspring company. It also manufactures retail store fixtures and display units, industrial parts (especially for automotive and aviation), and parts for office and residential furniture.

Their latest 10-K states the company plans to maintain a 4-5 % growth rate.

The company repurchased 10 million shares in 2011.

Their latest Q3 earnings release on Oct. 29 beat with EPS rising 45 % over the same quarter a year ago and reflected strong volume and expanding margins.

The yield now stands at 4.20 %

36 

 Douglas Biber and Jesse Egbert

WEAKNESSES

The payout ratio on the yield is 90 % , very high for a company that is not a REIT or a master limited partnership.

Their P/E is higher than the industry average and higher than the 15.63 P/E of competitor Genuine Parts Company (NYSE: GPC )

While they manufacture most of their steel wire in house, steel is their number one raw material and fluctuations in steel prices are a continuing concern, according to their 10-K.

Revenue from international operations dropped due to currency fluctuations. […]

Three-way splits, summarised in Table 8 above, suggest that there might be hybrid registers that combine multiple communicative purposes. The most frequent 3-way hybrid is Narrative+Opinion+Description. Text Sample 3 above gives one example of this type. Another example of a 3-way hybrid was coded as a News report/blog (2 raters), a Description of a person (1 rater) and an Opinion blog (1 person). The title of this text is enough by itself to demonstrate the triad of characteristics recognised by raters: ‘On the road: Bradley Wiggins and Team Sky have made Tour de France history – it’s been emotional’. This text is a blog post that recounts a recent news story (Narrative), describes a team of athletes (Description), and recounts the emotions and attitudes of the author (Opinion). A different kind of hybrid register is extremely common on the web: pages that present a text followed by reader comments. Table 9 shows that this type of hybrid can occur with any of the non-interactive written registers.1 However, it is interesting to note that reader comments are much more likely with some registers than others. In particular, pages expressing opinions or persuasion are especially likely to include reader comments: ca. 60 % of opinion pages and 80 % of informational persuasion pages are followed by reader comments.

1 This option is not applicable to written interactive discussions, which incorporate reader comments by definition. We are not sure why transcribed texts of spoken events are not followed by reader comments in our sub-corpus.



Towards a user-based taxonomy of web registers 

 37

Table 9: Frequency information for texts containing reader comments Register Narrative Opinion Description Informational Persuasion How-to/Instructional Lyrical Spoken Discussion Total

Count

% of register with comments

87 86 37 12 8 4 0 0

49.1 % 61.4 % 30.6 % 80.0 % 29.6 % 21.1 % 0 0

234

--

6 Summary and future directions The approach for register classification adopted here  – a bottom-up hierarchical framework based on underlying situational characteristics  – allows us to describe the register characteristics of most web pages. Raters agree on the general register category of ca. 63 % of the web pages included in our corpus (see Table 3 above). Approximately another 25 % of these texts were coded as ‘hybrid’ registers belonging to a few combinations that occur commonly on the web (e.g., Narration + Information Description; Narration + Opinion; see Tables 7 and 8). Taken together, these results indicate that approximately 88 % of web pages can be reliably described for their singular or hybrid register characteristics. An alternative perspective is to consider the register categories themselves, regarding the extent to which general registers occur in their ‘simple’ state, rather than as hybrids in combination with some other register category. At one extreme, Table 10 shows that interactive discussions (e.g., question-answer forums) and lyrical texts (e.g., songs or poems) usually occur as ‘simple’ registers, with only ca. 30 % of those texts being coded as hybrids in combination with some other register category, a relatively small proportion in comparison with several of the other register categories. At the opposite extreme, Informational Persuasion was almost never identified as the simple register of a web text. However, it was commonly selected by at least one of the raters, suggesting that this communicative priority frequently occurs in hybrid combinations with other general register cate­gories.

38 

 Douglas Biber and Jesse Egbert

Table 10: Extent to which each register category was identified as a simple register (3 or 4 raters in agreement), as a hybrid category (2-2 or 2-1-1 splits), or by only 1 rater General Register

3-4 raters

2 raters

Narrative

177

(47 %)

109

(29 %)

91

(24 %)

377

Informational Description/­ Explanation

140

(30 %)

97

(21 %)

231

(49 %)

468

Opinion

121

(50 %)

114

(47 %)

8

(3 %)

243

Interactive Discussion

79

(69 %)

14

(12 %)

22

(19 %)

115

How-to/Instructional

27

(33 %)

23

(28 %)

33

(40 %)

83

Lyrical

19

(68 %)

3

(11 %)

6

(21 %)

28

Informational Persuasion

15

(8 %)

38

(21 %)

125

(70 %)

178

6

(43 %)

8

(57 %)

0

(0 %)

14

Spoken

1 rater

Total (100 %)

Narration, description, exposition and argumentation have long been regarded as core textual distinctions distinguished by their communicative purposes (corresponding to the rhetorical ‘modes’ of discourse; see Connors 1981). In the register framework developed here, we divided these distinctions up in a somewhat different way, based on our survey of the kinds of texts found on the web and our early pilot studies to investigate the distinctions that end-users could reliably make (see Sections 2 and 3 above). Thus, we ended up combining ‘exposition’ and ‘description’ into our category of Informational Description/Explanation, while we split ‘persuasion’ into two categories: Opinion (expressing attitudes with little supporting evidence) and Informational Persuasion (a type of exposition with a clear intent to sell or persuade). However, our preliminary results, summarised in Table 10, indicate that these general register categories are not equally well-defined for end-users. For example, almost half of the texts in our corpus (468 of the 931 texts) were coded as Informational Description/Explanation by at least one rater, suggesting that most texts can be regarded as presenting some kind of description/explanation of information. Texts were also commonly coded as having narrative purposes (377 texts), often in hybrid combinations with other registers. The results for opinionated/persuasive texts are especially interesting here. On the one hand, the category of simple opinion seems to be relatively well defined: half of the texts classified as such in some way were categorised as simple opinion by 3 or 4 raters. In most other cases, if a text was coded as opinion by two raters, it was coded as narration or description by the other raters. By con-



Towards a user-based taxonomy of web registers 

 39

trast, the category of Informational Persuasion seems especially problematic: it was almost never identified as the simple register of a text, but there were many instances where one rater noted this communicative priority. Over half of those texts were coded as simple opinion by other raters, suggesting that these two general registers are especially difficult to distinguish. Results like this point to the need for more detailed future research focused on these categories. In our on-going research, we are applying the framework and analytical approach outlined here to a much larger corpus, with over 50,000 texts randomly sampled from the web. That research effort will allow us to investigate the extent to which the patterns described in Sections 4 and 5 above are typical of the web more generally and to undertake more detailed analysis of specific patterns (especially regarding sub-registers and sub-register hybrids). Beyond that, we plan to analyse the lexico-grammatical characteristics of those texts and eventually undertake predictive research for the purposes of automatic register (genre) identification. One of the major limitations of the hierarchical approach used for these analyses is that specific sub-registers are restricted to a single general register category on an a priori basis. For example, sports blogs are listed only as a sub-­ register of Narrative; reviews are listed only as a specific sub-register of Opinion; editorials are listed only as a specific sub-register of Informational Persuasion. This approach was motivated by two considerations: 1) previous research had indicated that end-users become overwhelmed when they are required to directly choose from a massive list of specific sub-registers and 2) we therefore believed that general register categories – isolating specific situational characteristics – would be easier to identify than specific sub-registers. However, review of our findings here suggests the need to further explore these decisions. As a result, we also plan to explore the possibility that some sub-register distinctions might be easier to directly identify than general register distinctions. For example, a particular text might be a clear instance of a sports blog. However, given the design of our coding framework at present, an end-user might never be given the chance to make that simple classification. For example, if a user decided that a text was primarily opinionated rather than narrative, there would be no possibility of subsequently identifying the text as a ‘sports blog’ (see Table 2 above). To explore this possibility, we plan to recode a set of web pages from our corpus, asking users to directly choose a specific sub-register category. Then, the results of the hierarchical coding will be compared to the results of the direct sub-register coding for those texts. Our expectation is that the two approaches will uncover complementary patterns. For example, we expect to find some texts that clearly belong to a single specific sub-register but combine multiple general

40 

 Douglas Biber and Jesse Egbert

registers (e.g., a sports blog with both narrative and opinionated purposes). We also expect to find some common hybrid sub-register categories that bridge general registers (e.g., a personal blog + opinion blog hybrid; or an editorial + review hybrid). We would not argue that one or the other of these approaches is correct, but taken together, our hope is that we will be able to offer a more comprehensive description of the incredible range of register variation found on the web.

Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No. 1147581. We also thank Anna Gates and Rahel Oppliger for their help with the pilot testing of register classification schemes.

References Baroni, Marco and Silvia Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the web. Proceedings of LREC 2004, 1313–1316. Lisbon: ELDA. Baroni, Marco, Silvia Bernardini, Adriano Ferraresi & Eros Zanchetta. 2009. The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3). 209–226. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999. Longman grammar of spoken and written English. London: Longman. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Connors, Robert J. 1981. The rise and fall of the modes of discourse. College Composition and Communication 32(4). 444–455. Crowston, Kevin, Barbara Kwaśnik & Joseph Rubleske. 2010. Problems in the use-centered development of a taxonomy of web genres. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 69–84. New York: Springer. Egbert, Jesse & Douglas Biber. 2013. Developing a user-based method of web register classification. In Stefan Evert, Egon Stemle & Paul Rayson (eds.), Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013, 16–23. Fletcher, William H. 2012. Corpus analysis of the World Wide Web. In Carol A. Chapelle (ed.), Encyclopedia of applied linguistics, 1339–1347. Hoboken, NJ, Wiley-Blackwell. Kilgarriff, Adam and Gregory Grefenstette. 2003. Introduction to the special issue on the Web as Corpus. Computational Linguistics 29. 333–347.



Towards a user-based taxonomy of web registers 

 41

Lindemann, Christoph & Lars Littig. 2010. Classification of Web sites at super-genre level. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 211–235. New York: Springer. Rehm, Georg, Marina Santini, Alexander Mehler, Pavel Braslavski, Rüdiger Gleim, Andrea Stubbe, Svetlana Symonenko, Mirko Tavosanis & Vedrana Vidulin. 2008. Towards a reference corpus of Web genres for the evaluation of genre identification systems. In Proceedings of the 6th Language Resources and Evaluation Conference, 351–358, Marrakech, Morocco. Rosso, Mark A., & Stephanie W. Haas. 2010. Identification of Web genres by user warrant. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 47–68. New York: Springer. Santini, Marina. 2007. Characterizing genres of Web pages: Genre hybridism and individualization. In Proceedings of the 40th Hawaii International Conference on System Sciences (HICSS-40). Hawaii. Santini, Marina. 2008. Zero, single, or multi? Genre of Web pages through the users’ perspective. Information Processing and Management 44(2). 702–737. Santini, Marina and Serge Sharoff. 2009. Web genre benchmark under construction. Journal for Language Technology and Computational Linguistics 25(1). 125–141. Santini, Marina. 2010. Cross-testing a genre classification model for the Web. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 87–127. New York: Springer. Sharoff, Serge. 2005. Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini (eds.), WaCky! Working papers on the Web as Corpus, 63–98. Bologna: Gedit. Sharoff, Serge. 2006. Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics 11(4). 435–462. Sharoff, Serge, Zhili Wu & Katja Markert. 2010. The Web library of Babel: Evaluating genre collections. In Proceedings of the Seventh Language Resources and Evaluation Conference, LREC 2010. Malta. Vidulin, Vedrana, Mitja Luštrek & Matjaž Gams. 2009. Multi-label approaches to Web genre identification. Journal for language technology and computational linguistics 24(1). 97–114.

Heidrun Dorgeloh

The interrelationship of register and genre in medical discourse Abstract: This chapter is concerned with medical discourse which is produced beyond the established roles of doctors and patients. The text varieties investigated are all somewhat hybrid, either in form, discourse function, or both. A study based on a small corpus of these texts investigates the presence of features from a narrative discourse mode and finds variable relationships of textual form and textual function, which are then discussed from a genre as well as from a register perspective. While it turns out that the presence of a narrative register crosscuts over specific discourse activities, the genre perspective can explain the nature of this textual variation. It accounts for the pervasiveness of linguistic features but, more importantly, for the variant discourse functions which apply to the verbalisation of medical experience. In such cases, it is argued, a genre ana­ lysis logically subsumes and pre-determines a register analysis.

1 Introduction Medicine uses a variety of texts since it is both an “area of knowledge […] and the applied practice of that knowledge to medical praxis” (Gotti and Salager-Meyer 2006: 9). Accordingly, most linguistic research on medical discourse focuses either on written genres of the medical profession, such as case reports or medical research articles, or on the speech of medical practitioners and their patients, i.e. on medical encounters or interviews. By contrast, the present study is concerned with text varieties in medicine which are produced beyond the established roles of both speaker groups. It deals with illness blogs, on the one hand, and medical case presentations, including some innovative forms, on the other. These constitute, in line with the purpose of the present volume (cf. Schubert, this volume), less established and more hybrid forms of medical case writing and thus provide good cases in point for illustrating new directions in register research. In particular, I will argue for a close interrelationship between register and genre as well as for a primacy of the notion of genre, rather than (sub-)register.

Heidrun Dorgeloh, Düsseldorf University

44 

 Heidrun Dorgeloh

As laid down in the introduction to this volume, register and genre are different perspectives for analysing text variety: the register perspective considers functional correlations of linguistic co-occurrence patterns with variables from the situation of use while the genre perspective refers to properties of entire texts and has a conventional basis (Biber and Conrad 2009: 15; also Schubert, this volume). It results from this distinction that a register analysis rests upon quantitative co-occurrence patterns in a given situation whereas genre characteristics can actually be quite rare. They contribute to the rhetorical organisation of a text, often occurring only once or in a particular position (Biber and Conrad 2009: 16). Since textual variation can in principle refer to any level of text classification (Biber 2006: 12) other approaches to register and genre point out that the concepts also differ in the level of generality at which they determine situational varieties (Giltrow 2010; Dorgeloh and Wanner 2010). The concept of a genre focuses primarily on the discourse goals and purposes (e.g. Martin and Rose 2003; Swales 2004), on the kind of “social action” (Miller 1984); therefore the classification is typically more specific for genres than for registers (Giltrow 2010: 30). More specialised text varieties are also referred to as “sub-registers” (cf. Biber and Gray 2013), but genre studies have emphasised that the textual or social event is an important basis for text classification, thus subsuming in one category a co-patterning of setting, structure, and function (Richards and Schmidt 2002: 224). I will argue here that for text varieties of medical discourse, which are often marked by “discourse hybridity” (Sarangi and Roberts 1999; Sarangi 2001; also cf. Biber and Egbert, this volume), a genre perspective in line with these approaches covers the relevant linguistic patterns at a sufficient level of specificity. In particular, I will show that the form-function correlations that one finds have more to do with activity types, such as covered by the concept of genre, than with general situational parameters. The case studies presented below contrast with more recently developing medical genres. The aim of the analysis is to show that, on the one hand, there are general discourse goals and purposes within medical discourse, notably narration, which crosscut over all the texts investigated. The resulting language variation is covered by the register perspective, since it defines a rather general, presumably universal, register pattern (Biber and Conrad 2009: 259). On the other hand, this pattern serves in a given genre more specific discourse goals, which are expressed by features which need not be frequent nor pervasive. For example, the interactional hybridity of a medical encounter includes a narrative discourse type, but this type is embedded within a more complex social event, in which a doctor fulfils several tasks such as data gathering, relationship building, and educating the patient about diagnoses and treatment (Frankel 2000: 85; also Maseide 2003). This variation within one activity produces more hybrid registers. In such



The interrelationship of register and genre in medical discourse 

 45

cases, the genre perspective has clear advantages over the register perspective, since it focuses on the social activities going on and hence provides text classification at a rather low level of generality. However, this means that the concept of genre must be taken beyond the limits of rhetorical conventions. The chapter is structured as follows: in Section 2, I offer a more detailed consideration of the concepts of register and genre as categories for text classification from a theoretical point of view. Section 3 introduces three varieties of medical discourse: on the one hand, it describes how they are situated with regard to a general narrative dimension of textual variation (level of form); on the other, the texts are discussed as instantiating different genres (level of discourse function and social activity). The resulting profiles of the three functional varieties show that the sample texts investigated are all hybrid in either form, function or both. This complex picture is typical for the domain of medicine, and it can be best understood from the genre perspective. Based on these profiles, an analysis of characteristic form-function-relations within the medical register, in particular with regard to narrative features, is provided in Section 4, followed by a concluding discussion in Section 5.

2 Some theoretical issues on register and genre This section will discuss the concepts and positions relevant for the analysis of the medical text varieties in Sections 3 and 4.

2.1 Register and genre in the context of the study of language variation1 Language variation is conditioned by a variety of social and pragmatic factors. When studied by way of quantitative, corpus-based methodology, there are in principle two research goals that can be pursued: the first is “to describe the variants and use of a word or linguistic structure” and the second “to describe differences among texts and text varieties, such as registers […]” (Biber 2012: 12). While the former approach is variationist in nature, i.e. it presupposes the existence of “formal alternatives which can be considered optional variants, in the sense that they are nearly equivalent in meaning” (Biber et al. 1999: 14), register variation in

1 Cf. also the introduction to Dorgeloh and Wanner (2010).

46 

 Heidrun Dorgeloh

principle also involves “different ways of saying different things” (Halliday 1978: 35; emphasis added). As a result, the study of textual variation deals with “variation in verbalization [which] is not occasional [… but] UBIQUITOUS” (Croft 2010: 10; emphasis in the original). This difference allows for some insights regarding the nature of both registers and genres. Rosenbach (2002: 77) proposes the attribute “choice-based” for this type of linguistic variation, in contrast to the “variation-based” perspective, which concentrates on sets of formal variants. The study presented here, and in fact the entire volume, belongs to the choice-based, “text-linguistic” tradition (Biber 2012: 12), which means that the texts themselves are the target of the description and not a predictor for the occurrence of formal variants.2 It results from this approach that register and genre differences are typically “not categorical (such that one variety has a certain grammatical element or syntactic construction which another has not)” (Kortmann 2006: 603); instead, the choices motivated and reflected by them are “meaningful choices”, in the sense of serving “the […] needs of the language user” (Schulze 1998: 7). As shown below, this applies not only to the occurrence of individual linguistic features, but also to entire patterns of textual form, which can be shared by what are nonetheless distinct text varieties. Another consequence of the “polyvalent” nature of “grammatical structure in discourse” (Sankoff 1988: 141, emphasis in the original) is that genres, but not registers, are in principle formally “underdetermined” (Giltrow and Stein 2009: 3). Only by virtue of their being “typified responses to situations” (Salmon 2010: 219) do users of a genre generally know what to expect and infer “both the stable and variable aspects of form” (Salmon 2010: 223). For the linguistic variation taking place within them this means that the genre perspective includes both frequently occurring features as well as patterns that occur less pervasively; i.e. the genre perspective logically subsumes, rather than opposes, the register perspective.

2.2 Genre in relation to register and discourse type Textual variation is “normal in individuals’ linguistic performance” (Honeybone 2011: 167): speakers show “shifts in usage levels” for features associated with the situation, i.e. they switch into specific registers, but they also switch “into and out of genres” (Schilling-Estes 2002: 375). While a register is “associated with a particular situation of use” (Biber and Conrad 2009: 6), the concept of genre

2 A detailed account of the distinction can be found in Biber (2012).



The interrelationship of register and genre in medical discourse 

 47

focuses primarily on the discourse goals and purposes, including “culturally recognized” patterns (Coupland 2007: 15) for realising them. As a result, the level of genre classification tends to be lower, i.e. more specific, suggesting that genres can, and typically do, contrast in registers, for example when requiring a certain level of formality or technicality. Use of a certain register is therefore a function of, but not a sufficient condition for, a genre, i.e. the genre perspective is the more encompassing one. In the text-linguistic tradition, discourse goals and purposes have also led to the establishment of text typologies, which often integrate basic rhetorical types (e.g. Kinneavy 1971; Werlich 1976). The text or discourse type here refer to entire texts; but this tradition is still rather separate from genre analysis, if only due to the fact that they “feature in different studies” (Virtanen 2010: 55). By contrast, corpus-linguistic work (e.g. Biber 1988, 1989) understands text types as “co-­occurrence variables” (Eckert and Rickford 2001: 5), i.e. these text types are, much like registers, the outcome of a classification based on linguistic form (Biber 1988: 170). It is a central insight from this corpus-based tradition that genre distinctions do not “adequately represent the underlying text types” (Biber 1989: 6). This finding is further support for the position that genres are to a certain extent underdetermined by, and hence independent of, their form. The category of discourse type, in contrast to text type, refers more directly to the function of a discourse (Virtanen 2010: 57), but, in contrast to the discourse goal pertaining to a genre, this has traditionally meant a discourse classification based on a limited set of functions; for instance, on a classification of illocutions (e.g. Brinker 2005). It is an important insight from this kind of work that the functional discourse types are related in different ways to their linguistic form, since a discourse type can express its function more or less directly (Virtanen 1992a, 2010). Narrative structures, in particular, have been noted to have primary or secondary uses, i.e. they are a textual pattern that “can be put to use in very different genres” (Virtanen 2010 76).3 The analysis of medical texts presented here rests upon such a principled separation of linguistic form, i.e. register features and text structure, and discourse function. A classification by discourse function leads, at a more general level, to the identification of the discourse type; at a more specific level, it results in genres. The analysis is also based on the assumption that the category of “narrative” refers both to a very basic and presumably universal register and text type (Virtanen 1992a; Biber and Conrad 2009) as well as to a widely used discourse type or meta-genre (Fludernik 1996; Smith 2003). In the domain of medicine,

3 Werner (this volume), for example, notes the narrative properties of online text commentaries.

48 

 Heidrun Dorgeloh

both narrative form and function play a prominent role, since knowledge in this discipline is not just expertise, i.e. “relevant biological and pathological information”, but is primarily evidence based on human experience (Hunter 1991: 8). It is interesting to note in this context that recent discussions on medical discourse have argued quite explicitly in favour of a more “narrative” kind of medicine (e.g. Charon 2006), emphasising the importance of the individual patient and his or her experience. As a result, there are now genres within the medical register which are innovative particularly with respect to the role of narration. While proper storytelling is absent in professional medical reporting, there are now other types of medical discourse which are more open to narration. This difference, however, does not primarily manifest itself in a more or less extensive use of narrative features. Looking at three different genres from the medical register in this study, I therefore hypothesise here that 1) a narrative discourse function correlates only insufficiently with a narrative form, and that 2) a discourse purpose other than narration does not necessarily result from the absence of narrative form. This in turn suggests that the function or goal of a discourse is not primarily something to be observed in the form of frequencies of occurrence. On a more theoretical level, these findings will lead me to the claim that, with respect to the specific discourse goals and purposes typical of the context of medicine, the target of the description should be the genre, rather than the register.

3 Types of medical discourse 3.1 Sources and voices in medicine The instances of medical discourse which I will cover in my analysis come from three different sources: illness blogs written by patients, case reports written by doctors, and texts from a special section termed “Clinical Crossroads” of The Journal of the American Medical Association (JAMA). Each of these text varieties is characterised more closely in Sections 3.2 to 3.4. Before discussing these genre profiles, I will first comment on the general nature of the relation between their situational characteristics, in particular the discourse function, and their linguistic form. The three text varieties represent discourse with different perspectives on the topic of disease or illness; i.e. the medical topic is the only situational variable which they share. The texts differ, not only in the different speaker roles of doctor and patient, but, more specifically, in that these groups of authors assume, by different ways of speaking throughout their own discourse, different “voices”



The interrelationship of register and genre in medical discourse 

 49

(Mishler 1984: 103). In the professional medical discourse “of disease” (Fleisch­ man 2001: 475), such as in case reports, doctors primarily use the voice of medicine; however, they also have a doctor’s voice when they occur in the discourse as a participant, for example, when concerned with “information about the patient’s current health condition, […] patient compliance, and […] test results” (Murawska 2012: 71). Patients, by contrast, have primarily a voice of health-related storytelling, but over time they also develop a medical competence of their own (Cordella 2004: 119). At some point, diagnosis and further treatment become a collaborative effort, which is when patients also use elements of a voice of medicine. The interactional hybridity of medical discourse referred to above is thus primarily a hybridity of voices and it is one of the central variables that guide linguistic variation across all medical text varieties. By contrast, illness blogs, professional case reports and the discourse jointly produced by doctors and patients for “Clinical Crossroads” (for details, cf. Section 3.4) differ in a variety of other situational variables, especially those pertaining to production circumstances and setting (cf. Biber and Conrad 2009: 40). The text varieties under investigation are therefore not easily subsumed as one single register. However, instead of taking up a principled position about where a register ends, and a new (sub-)register starts, the analysis below rests upon two observations. On the one hand, the verbalisation of a disease or illness leads to a concern with medical case histories, which cuts across general communicative purposes, such as to narrate or to report (cf. Biber and Conrad 2009: 40). Linguistically, this is marked by a pervasive presence of linguistic features such as “past tense, communication verbs, third person pronouns, and time adverbials”, i.e. the characteristic features of a narrative dimension of linguistic variation (Biber and Conrad 2009: 259). It is with regard to these features, which arise out of the topic of illness, that the texts share the same register. On the other hand, although there are recognisably different discourse goals involved in the verbalisation of a case history, the difference between “private” and “public” medicine has always been gradual, as the evolution of medical research writing has also shown (Atkinson 1992: 361–363). While professional medicine has long drifted away from the “rhetoric of immediate experience” (Atkinson 1992: 359), and while published case reports are professional and public, only illness blogs constitute real narratives of personal experience. However, nowadays, with the movement towards a narrative medicine, there are also professional texts which aim at being more “patient-focused” again (Winker 2006: 2888). Genre categories grasp this mixing of purposes and voices present in such developments, not only due to the level of specificity they refer to, but also because genres are often formally underdetermined and may therefore be com-

50 

 Heidrun Dorgeloh

posed of hybrid form. This is illustrated in Figure 1, which shows the three text varieties as three different genres, with distinctly different discourse goals and purposes, as the discussion has just shown. On the level of the general communicative purpose, i.e. at a high level of generality, these discourse functions can be described as being narrative, non-narrative, or hybrid. This categorisation links up the genre classification to register variation, because the narrative as discourse mode (Georgakopoulou and Goutsos 2004: 43–47) is an important aspect of the register in all three cases. As the analysis below will illustrate in detail, the narrativisation of the events (Georgakopoulou and Goutsos 2004: 43) which have to do with the course of an illness is a major source of hybrid form across the three text varieties and therefore explains some pervasive register features. Before turning to the linguistic features and their interrelationship with the genre category in Section 4, the next three subsections will introduce each text variety and the sample texts used in more detail.

hybrid form and narrative function patients’ tale

hybrid formand non-narrative function medical case report

hybrid form and hybrid function Clinical Crossroads in JAMA

Figure 1: Narrative form and function in medical text varieties

3.2 Illness blogs: The patients’ tale Medical topics are among the ubiquitous contents on the internet (Döring 2003: 19). When patients tell their stories on the web, i.e. when they produce narratives of illness (cf. McCullough 1989: 124), this constitutes, not “a solitary occupation”,



The interrelationship of register and genre in medical discourse 

 51

but one which is shaped by the context of “the community of web users” (Page 2012: 45). Patients’ tales in illness blogs are thus more interactive than when elicited in medical interviews, and they establish a particularly strong relation to the audience: “the primary function of the comments on the […] blogs is to provide or seek support in the form of shared experience, advice, and encouragement” (Page 2012: 45). From the point of view of this interactive function, illness blogs qualify as patients’ tales, i.e. proper stories, but not in the first place from a structural point of view. Narrative discourse, in essence, “attempts to sweep narrator and audience into a community of rapport”, i.e. the aim is to move, rather than to inform (Georgakopoulou and Goutsos 2004: 53; also Tannen 1989). This means that, although patients’ tales typically employ a “narrative syntax” (Labov 1997: 3), they show the narrative mode primarily due to the “function of personal interest” (Labov and Waletzky 1967: 13; emphasis added). This function rests upon the sharing of the individual experience of illness (Dorgeloh 2012: 263) and distinguishes a patient’s tale, as any other kind of story, from a report, which “is most typically elicited by the recipient […] or in response to circumstances which require an accounting of what went on“ (Polanyi 1985: 10–11). The examples of the variety of illness blogs come from a website where patients share their stories about a rare neurological disease [SPS: The Real Stories4]. Note that, as its title suggests, the website focuses primarily on the publication of the stories, and not, as other types of illness blogs, on the discussion and commenting of postings on illness (cf. Page 2012). As sample (1) illustrates, the typical structure is that the patients introduce themselves and then turn to the chronology of the events: (1) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symptoms for the past several years that got worse as the years went on. I was exercising and swimming three times a week and then I started getting more muscle cramps. I went to the doctor and he just told me to take calcium and magnesium and drink more water. It took him a long time to understand that the muscle cramp were extremely painful happening several time a day. I would have abdominal muscle cramps that felt like i was in full-blown labor. They would come on suddenly when I was startled or when I coughed. They would ease up for a few seconds and then just get worse again. Several times my feet and hands would cramp up until they were fully distorted. I did go to a neurologist who seemed to have an idea of what I had but made no effort to diagnosis what I had. He told me that it would not do any good to try to diagnosis my disease and instead gave me all kinds of different pills and most of them did not work well and also caused several side effects. Often when I went to see him I did not feel like he even

4 http://www.stiffpersonsyndrome.net; last accessed on March 30, 2015.

52 

 Heidrun Dorgeloh

remembered me. I did finally request a new doctor, which has been a Godsend to me and now is treating me with IVIG, which is working well. My symptoms still get worse at times but they are manageable. I am eager to talk to people that have the same syndrome. Most people do not understand the pain and all the other symptoms. I found your web site today and am eager to learn more. (http://www.stiffpersonsyndrome.net, accessed March 17, 2011)

The proper narrative contained in (1) ends when the course of the events reaches its most recent state. This description of the current situation (My symptoms still get worse at times but they are manageable) serves as a coda and is followed by an explicit mention of the story point. This point relates to the ill person him- or herself, as in (1), or it centres on the social function of the blog by addressing the readers’ interests, as in (2) and (3): (2) If in any way I can contribute to bringing awareness to this insidious disease I throw in my hat. (Wendy’s story; http://www.stiffpersonsyndrome.net) (3) I must tell you that neither my wife nor myself ever gave up hope, In fact just the opposite. We were very pro active in the treatment of our diseases. […] My prayer is for all of you to see your journey through SMS with the knowledge that there is hope for all. Stay the course, keep the faith, and fight on. (John’s story; http://www.stiffpersonsyndrome.net)

The story point expressed in (3) shows that the verbalisation of the experience of illness has a strong component of self-reflection and evaluation. Many illness blogs have such properties of “reflective anecdotes” (Page 2012: 58–59) and in that tend towards less purely narrative text forms. It is highly typical that, instead of the completeness of the recount and the degree of detail which one can expect of more trivial narration (Georgakopoulou and Goutsos 2000: 125), patients’ tales often limit themselves to “remarkable event[s], characterized by an evaluative punch line” (Page 2012: 59). As was illustrated in Figure 1, a patient’s tale therefore possesses hybridity in its narrative form, since it limits the experience which is shared to the main points of interest.

3.3 The medical case report Case presentations in the form of published case reports are used by medical professionals “to communicate the salient details of patient cases to one another” (Schryer et al. 2003: 63; also Hurwitz 2006: 217), which means that the texts pursue a predominantly professional discourse goal. On a more general level, the discourse function is thus to inform, i.e. state “verifiable events”, rather than to



The interrelationship of register and genre in medical discourse 

 53

move. This function contrasts with the point of personal interest which applies to proper storytelling, which is why the discourse mode in case reports is essentially non-narrative (cf. Georgakopoulou and Goutsos 2004: 53). The central component of a case report is the case presentation itself. It begins “ritualistically with a brief account of a patient’s complaint as translated by the doctor” (Hurwitz 2006: 234; emphasis added), followed by an account of the examinations, findings, diagnosis and suggestions for treatment. Text (4) exemplifies such an initial case presentation, referring to the same disease as text (1): (4) A 27-year-old Hispanic woman presented to the University Medical Center Emergency Department in Las Vegas, Nevada with a sudden onset of shortness of breath and increased difficulty in moving her right arm. She reported that during the evening prior to her presentation, she was lying down when she began to experience shortness of breath with worsening right-arm weakness. She also reported that for the past two months her arm weakness was characterized as having limited strength and range of motion. She also complained of chest pains that were localized behind her sternum. The pain was characterized as a pressure sensation that was non-radiating. She did not have any aggravating or relieving factors. Pertinent positive findings included nausea, palpitations and lightheadedness. Pertinent negative symptoms included no loss of consciousness, headache, vomiting, diarrhea, or vertigo. (Journal of Medical Case Reports 4, 2010)

It has been noted that case reports published in journals “reorganize clinical data using a variety of narrativising techniques” (Hurwitz 2006: 217; also Hunter 1990). However, as one can see in (4), from a narratological viewpoint this is only a “degree-zero” narrativity (Fludernik 1996: 358); i.e. although a sequence of events is verbalised, it is “translated” by a medical professional. The result is a discourse which deals with a disease, i.e. which foregrounds the medical facts and assigns “the sufferer […] the experiencer role” (Fleischman 2001: 476). In such a text, the chronology lacks “experientiality” as the central component of narrativity (Fludernik 1996) and is therefore only a hybrid narrative form.

3.4 ‘Clinical Crossroads’ in JAMA In 1995, JAMA launched the publication of various types of medical discourse within a section titled “Clinical Crossroads”. The contributions in this section follow the organisation of a “Grand Round” in clinical departments, where case presentations are given from various perspectives. These case presentations are later edited and published in the journal. The full process is described as follows (cf. also Dorgeloh 2014):

54 

 Heidrun Dorgeloh

The Grand Round begins with the case history of a patient and that patient’s firsthand account of the medical decision he or she faced, occasionally along with the patient’s primary care physician’s perspective. These accounts are followed by questions for the Grand Rounds discussant, which the discussant, usually a well-recognized authority on the clinical topic, addresses based on available evidence in the literature, and, where no evidence exists, clinical experience. Following the presentation, the discussant drafts the manuscript for submission to JAMA, including the case description, the patient’s perspective, the discussion (including references and pertinent tables and figures), and the question-and-answer session that occurred at the end of the Grand Rounds. The manuscript then undergoes editorial evaluation, external peer review, and revision. If the manuscript is revised satisfactorily and determined to have a level of quality appropriate for JAMA, the manuscript is accepted and published in JAMA and usually is featured in Clinician’s Corner. (Winker 2006: 2888)

The idea behind this more innovative medical text variety is to approach a case from various perspectives, including that of the patient. The purpose is not only to offer and exchange information, but to improve medical decisions, which is to be achieved by “aligning the goal of the patient and physician” (Winker 2006: 2888). Since its foundation, the section has been re-structured several times, but the core idea, a joint context for doctors and patients, who contribute different perspectives, has essentially remained unchanged. (5) and (6) are text samples of a patient’s and a doctor’s presenting on the same case: (5) After I had bladder surgery […], my doctor told me, “I have good news and bad news and good news; it’s not bladder cancer, but the bad news is that it’s something else.” I accepted the complete hysterectomy, which at my age was not disturbing news. But in terms of the treatment and how it was going to affect me, the thing that worried me most was that I kept hearing about nausea, exhaustion, and that I wouldn’t be able to do things. As a result of that, I canceled my teaching for that fall. I remembered being very anxious the first day of chemotherapy because I just didn’t know what to expect. I decided to do the intraperitoneal chemotherapy because it made spatial logic to me. If you are aiming a treatment at the area of the cancer, it was going to get there more rapidly. I probably had some benefit from having had this mode of treatment before I went back to complete the treatment with the IV. Now, I have CAT [computed tomography] scans every 3 to 4 months. I don’t like to go to doctors, my mother never went until she was 80, but I go now because I’ve learned to trust the process, so I keep my appointments. The last time I chatted with the oncologist, I asked him if we could talk about the kinds of symptoms I should look for going forward. What should I expect for myself? (Journal of the American Medical Association, 4 April 2010; Ms W) (6) Ms W is a 75-year-old woman with epithelial ovarian cancer. She first developed lower abdominal pain in 2008. After workup for a genitourinary origin of the pain, she was found to have a 13.5 × 11 × 15.5–cm complex right adnexal mass. She had an optimal surgery cytoreductive, with less than 1 cm of peritoneal disease remaining at the end of the procedure. The pathologic findings were consistent with epithelial ovarian cancer



The interrelationship of register and genre in medical discourse 

 55

of mixed endometrioid/clear-cell histology. Her uterus, fallopian tubes, and omentum were free of disease. Metastatic adenocarcinoma was noted in the left paracolic gutter and she was diagnosed as having stage IIIC disease. She then started intraperitoneal and intravenous (IV) cisplatin/paclitaxel chemotherapy, which was switched to IV carboplatin/paclitaxel because of an infection of the intraperitoneal catheter. She was in complete clinical remission after 6 cycles of platinum-based chemotherapy and was then registered in a clinical trial of maintenance abagovomab vs placebo. She is currently not receiving any treatment and is questioning her prognosis and how she should be followed up in the long term. (Journal of the American Medical Association, 4 April 2010; Dr Tess)

The texts in (5) and (6), although from a highly professional medical journal, illustrate that the discourse is intended for the narrative kind of medicine described in Section 2.2. This situation makes for text varieties that show a more mixed character than illness narratives, as exemplified by (1), as well as case reports, such as (4). On the one hand, both (5) and (6) have a chronological structure, i.e. “degreezero” narrativity; on the other hand, the patient in (5) shows a degree of expertise and professional competence, a voice of medicine (cf. Section 3.1), which makes the register in the text more similar to professional medical discourse, like (4) and (6). As a result, (5) and (6) possess hybridity in form, i.e. they combine narrative and non-narrative register features. By contrast, considering the discourse function, the doctor’s motivation in this context is not limited to presenting a case to colleagues. Instead, there is a more personal, though third-party, point in telling the patient’s story, as expressed by She is […] questioning her prognosis and how she should be followed up in the long term. Although throughout the main body of the presentation the doctor uses the voice of medicine, the main purpose is collaboration and a joint effort; the presentation thus comes from the doctor’s voice and carries an indirect, and ultimately more hybrid function for a narrative. As the analysis of linguistic features in the corpus study will show, this complex relationships of form and function is reflected by the genre perspective.

4 Register and genre profile of the three types of medical discourse 4.1 Data and research aims The analysis which follows is based on a small corpus of texts, covering in roughly equal shares the three genres under investigation and amounting to a 3,777 words

56 

 Heidrun Dorgeloh

total. The exact proportions are included in Table 1. The analysis is intended as a pilot study and rests upon a limited database, but it will demonstrate how the interpretation of findings on register features benefits from a genre perspective. Numerous studies already document the co-occurrence of features from a narrative dimension of variation on a quantitative basis (starting with Biber 1988, 1989), among which, most notably, the presence of past tense forms, pronominal reference, and time adverbials. The claim here is that these features, which are pervasive to varying extents in the texts investigated, on the one hand testify the formal hybridity of the genres as illustrated in Figure 1 but, on the other, do not determine the text variety at a sufficient level of specification. The more integrated genre analysis will be presented in two steps: in 4.1, the register features indicative of a chronology, i.e. past tense narration and time adverbials, are functionally re-interpreted from the point of view of the genre in which they occur. This part of the analysis illustrates that in medical discourse high frequencies of narrative features may in fact correlate with a non-narrative discourse mode. It is argued, in particular, that the dominance of such narrative text form goes beyond the presence of narrative episodes, which is something that applies to many kinds of discourse (e.g. Csomay 2006, 2007; also Werner, this volume), but is specifically motivated by the “object-oriented” discourse goal of the genres investigated here. Section 4.2 then looks at features reflecting the expression of human experience: pronoun usage and choice of subjects. The aim of this section is to show that, rather than in a grammatical form such as pronoun usage, genres with a narrative as opposed to a non-narrative purpose differ in a characteristic way in a use of semantic categories. The more general claim behind both analyses is that genre categories, in the sense of referring to discourse at a relatively low level of generality, are effective beyond both register features as well as textual conventions, but lead to patterns at several levels of analysis. Complex discourse goals, such as the verbalisation of medical experience, are therefore better accounted for from a genre, rather than from a register perspective.

4.2 Degree-zero narrativity in different medical genres A narrative discourse mode is primarily associated with events that happened in the past and with their temporal sequencing (Georgakopoulou and Goutsos 2000: 125, 2004: 43). For this reason, the primary narrative register features indicative of this degree-zero narrativity (cf. Section 3.4) are the use of past tense narration and of time adverbials (cf. Biber and Conrad 2009: 119). In Table 1, the proportion of overall text in the narrative, past tense mode is shown as a word count, compared to the amount of text passages containing



The interrelationship of register and genre in medical discourse 

 57

other tenses.5 The second feature is the use of time adverbials, which situate the events in their temporal sequence. For example, text (1), shown here as (7), has non-narrative passages (printed in italics) in the beginning and in the closing evaluative comment, serving as a coda, while the main body of the narration is structured in episodes marked by explicit temporal reference (in bold print). (7) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symptoms for the past several years that got worse as the years went on. I was exercising and swimming three times a week and then I started getting more muscle cramps. I went to the doctor and he just told me to take calcium and magnesium and drink more water. It took him a long time to understand that the muscle cramp were extremely painful happening several time a day. I would have abdominal muscle cramps that felt like i was in full-blown labor. They would come on suddenly when I was startled or when I coughed. They would ease up for a few seconds and then just get worse again. Several times my feet and hands would cramp up until they were fully distorted. I did go to a neurologist who seemed to have an idea of what I had but made no effort to diagnosis what I had. He told me that it would not do any good to try to diagnosis my disease and instead gave me all kinds of different pills and most of them did not work well and also caused several side effects. Often when I went to see him I did not feel like he even remembered me. I did finally request a new doctor, which has been a Godsend to me and now is treating me with IVIG, which is working well. My symptoms still get worse at times but they are manageable. I am eager to talk to people that have the same syndrome. Most people do not understand the pain and all the other symptoms. I found your web site today and am eager to learn more. (http://www.stiffpersonsyndrome.net , accessed March 17, 2011)

Table 1 shows the proportion of text passages in the narrative mode across the three genres. While case presentations from the medical case report contain only past tense passages, patients’ tales from the blog, i.e. from a medium that encourages reflection and relation-building (cf. Section 3.2), have a lower proportion of the narrative mode. The texts from “Clinical Crossroads” contain the lowest proportion of proper narration, which is in line with a discourse goal consisting in, not only the sharing of information, but also in preparing an adequate decision.

5 Instead of counting verb forms, the proportion of narrative as opposed to non-narrative mode is measured in the relative length of the text passages in which past as opposed to non-past tenses are used.

58 

 Heidrun Dorgeloh

Table 1: Narrative features in three medical genres6 Illness blog

Case report

Clinical Crossroads

total no. of words

1,126

1,391

1,260

proportion of past tense text ­passages (by no. of words7)

80 %

100 %

59 %

time adverbials per 100 words (­absolute frequency)

3.11 (35)

1.44 (20)

2.78 (35)

The results are almost opposite for the occurrence of time adverbials: their frequency is high in the patients’ tales, including the case presentations in “Clinical Crossroads”, and much lower in case reports. Explicit temporal reference thus seems to be directly related to more personal accounts, i.e. to a narrative, or at least to a partly narrative (hybrid) function (cf. Figure 1). The finding is in line with research which has shown that in proper stories time adverbials do not only carry temporal meaning, but are also text-strategic devices (cf. Virtanen 1992b). Note, however, that this applies, in particular, to time adverbials in sentence-initial position, where they mark temporal shifts in the progression of a narrative strategy (Virtanen 2004). For example, in (7) then I started getting more muscle cramps marks the beginning of a new episode, whereas uses of the same temporal adverb in (6) (She then started […] chemotherapy, […]; She […] was then registered in a clinical trial), do not mark a text structure based on temporal sequence and are thus placed sentence-medially. This means that the point of departure is the patient as medical case, and not as a character. The lower amount of time adverbials in case reports thus reflects their “topic-oriented strategy” focussing on the medical case, turning them into an expository, rather than a narrative, text (Virtanen 2010: 66–67). These two findings together suggest that a differentiated look is necessary when interpreting quantitative results about pervasive linguistic features in their discourse context. In particular, a narrative form and a narrative text function need to be distinguished, as the outline in Sections 3.2 to 3.4 and the illustration in Figure 1 have shown. In the texts investigated, the non-narrative function of the case reports, in the sense of a lack of personal story-point, goes together with an

6 Besides the individual sample texts discussed for illustration in Section 3, the corpus consists of other texts from the same genre, totaling to the amount of words as indicated. 7 As reflected by the use of past tense verbs. As in texts (1) and (5), this also includes the use of the so-called “habitual conditional” (cf. Haiman and Kuteva 2002: 120).



The interrelationship of register and genre in medical discourse 

 59

exclusive use of past tense forms, showing that the verbalisation of a chronology of events has a variety of uses (cf. Section 2.2). A narrative function, by contrast, also involves passages in which the narrative mode is absent, since it is evaluative comments, particularly the coda, which verbalise the point of a proper story. In this way, although dominated less by past tense narration, illness blogs as well as case presentations from “Clinical Crossroads” gain their narrative or hybrid function from passages in the non-narrative mode – a form-function complexity which a genre perspective makes understood.

4.2 From register feature to genre feature: Exploring reference in medical discourse While past tense forms and time adverbials have to do with the past temporality of the events reported, the pervasiveness of pronominal reference, as opposed to more explicit forms of expression, arises from the fact that a narrative verbalises human experience (Biber and Conrad 2009: 259, also cf. Neumann and Fest, this volume). The presence of a narrator allows “readers to immerse themselves in a different world and in the life of the protagonists” (Fludernik 2009: 6). The main protagonists in a medical process are the doctor and the patient, reference to them being made, in particular, when the doctor’s voice and the patient’s storytelling are used (cf. Section 3.1). By contrast, the voice of medicine tends to de-focus human experience; turning the language of medical discourse into a more scientific register, which is “object”- rather than “agent”-oriented (Atkinson 1999). It is therefore expected that reference to these different components of an illness correlates in significant ways with the genre of medical discourse. As Table 2 shows, the frequency of pronouns8 is higher in the text samples with a narrative or hybrid function, i.e. in the patients’ tales and in “Clinical Crossroads”. It is lower, though not very low, in the case reports. This reflects the non-narrative, object-oriented discourse goal of professional medical discourse, although the main object of investigation is nonetheless a human agent. The hybrid narrative form of medical case reports is thus also confirmed by the use of pronominal reference. Since the use of pronouns as a register feature distinguishes the three genres only insufficiently, Table 2 also presents results of an alternative analysis of the referential patterns one finds in the texts. Looking at the subjects of all (finite and

8 This feature includes personal and possessive (including reflexive) pronouns as well as relative pronouns referring to a noun phrase (and not to a clause).

60 

 Heidrun Dorgeloh

non-finite) clauses in the corpus, the instances of the (explicit or implicit) subjects were categorised as referring to the patient, the doctor, or to the domain of medicine.9 Subjects being the unmarked point of departure of the English clause and therefore more often than not the topic (e.g. Börjars and Burridge 2010: 226), it was assumed that their reference is likely to indicate which voice is talking (cf. Section 3.1) and to what extent the discourse truly focuses on human experience. Table 2: Pronominal reference and reference of topics in the three genres Illness blog

Case report

Clinical Crossroads

personal pronouns per 100 words (­absolute frequency)

10.48 (118)

7.55 (111)

10.64 (134)

clausal topics per 100 words (absolute frequency)

12.79 (144)

8.63 (120)

13.02 (164)

patient as topic

5.60 (63)

3.45 (48)

8.50 (107)

doctor as topic

1.51( 17)

0.86 (12)

0.56 (7)

topic from the domain of medicine

5.16 (58)

4.10 (57)

2.70 (34)

other topics

0.98 (11)

0.22 (3)

1.27 (16)

While the overall frequencies of clausal topics per text category differ mainly for reasons of sentence length, the semantic sub-categorisation contained in Table 2 yields some notable similarities and differences. In particular, illness blogs and case reports are quite similar with respect to their reference to the domain of medicine, and both do not reach the extent of reference made to the patient in “Clinical Crossroads”. Although they pursue opposite, i.e. narrative as opposed to non-narrative, discourse goals and are produced by opposite speaker roles, illness blogs and case reports, which otherwise differ in their use of narrative register features, reveal a striking similarity in this respect.

9 Assuming that every lexical verb gives rise to a clause, each explicit or implicit subject belonging to a lexical verb was categorised semantically. The category “patient” includes reference to the person as well as to body parts. The category “medicine” covers symptoms (weakness, pain), reference to the disease, as well as to elements from the diagnosis (tests, findings) or therapy (e.g. medication or treatment). In the majority of cases, these categories were distinct; there were only two instances of a subject referring to both patient and doctor, as in: The last time I chatted with the oncologist, I asked him if we could talk. Subjects like these were counted towards both categories.



The interrelationship of register and genre in medical discourse 

 61

By contrast, the texts from “Clinical Crossroads” show a pattern of reference to topics which reflects the discourse goal of aligning the perspectives of the patient and the doctor (cf. Section 3.4). The focus of this discourse is not so much on the domain of medicine, nor on the role of the doctor, but in line with the objective of a “narrative medicine” it represents a true expression of patient-centred medical care (Gerteis et al. 1993). That such a discourse context provides in fact for a new genre becomes particularly evident if one looks at the proportions of topics as used by the patients, as opposed to ones used by the doctors, in Figure 2. 100% 90% 80% 70% other

60%

patient

50%

doctor

40%

medicine

30% 20% 10% 0%

patient in blog doctor in case report patient in CC

doctor in CC

Figure 2: Use of medical topics10 by both speaker groups

The results from Table 1 and Figure 2 make obvious that genres with opposite functions, i.e. illness reports and medical case reports, can in fact be more similar than the ones with a related function, such as patients communicating their illness in different situations. The reason is that different voices are used for communicating illness (cf. Section 3.1), which highlight different aspects of the course of the events. While due to general situational parameters, such as speaker or discourse function, illness blogs and “Clinical Crossroads” are similar in their register usage, they nonetheless differ in their choice of topics. It is this

10 Percentages show the proportion of the four semantic categories in relation to the total of ­topics as given in Table 2.

62 

 Heidrun Dorgeloh

interrelationship of form (register), function, and social context, which for the analysis of medical discourse suggests a primacy of the notion of genre.

5 Conclusion My analysis of text varieties from medical discourse has intended to show that investigating linguistic variation with a view to genre adds an important perspective to the understanding of form-function relationships in text-linguistic studies. While these commonly rest upon the assumption that “linguistic co-occurrence reflects shared function” (Biber 1989: 5) and present corpus-linguistic evidence for this, the interrelationship of register and genre can only be made explicit by combining the perspectives. Since a genre classifies discourse at a rather low level of generality, especially with regard to the purpose and goal of a discourse, it determines both pervasive linguistic features as well as the choice of discourse topics and semantic categories. Hence, I have argued here that a genre analysis logically subsumes and pre-determines a register analysis. Genres, especially in the domain of medicine, make regular use of the narrative discourse type with its attested register features. This is not surprising, given the acknowledged role of the narrative as a basic text type or meta-genre (cf. Section 2.2). A similar interrelationship underlies the observation that the dividing line between lay and professional communication is also one between narrative and non-narrative discourse (Georgakopoulou and Goutsos 2000). The discussion here has added to this view that one needs to distinguish between narrative form and narrative discourse function, and that more professional social and cognitive activities typically go together with more complex (in the sense of more indirect) uses of narrative register variation. Text varieties of this kind are best understood from a genre perspective, which can account for their mixed purposes and voices and, thus, their hybridity in register.

References Atkinson, Dwight. 1992. The evolution of medical research writing from 1735 to 1985. Applied Linguistics 13. 337–374. Atkinson, Dwight. 1999. Scientific discourse in sociohistorical context: The Philosophical Transactions of the Royal Society of London 1675–1975. Mahwah, NJ: Lawrence Erlbaum. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas. 1989. A typology of English texts. Linguistics 27. 3–43.



The interrelationship of register and genre in medical discourse 

 63

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Longman. Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam & Philadelphia: John Benjamins. Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory 8(1). 9–37. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of sub-register. Journal of English Linguistics 41(2). 104–134. Börjars, Kersti & Kate Burridge. 2010. Introducing English grammar. London: Arnold. Brinker, Klaus. 2005. Linguistische Textanalyse: Eine Einführung in Grundbegriffe und Methoden. Berlin: Schmidt. Charon, Rita. 2006. Narrative medicine: Honoring the stories of illness. Oxford & New York: Oxford University Press. Cordella, Marisa. 2004. The dynamic consultation: A discourse analytical study of doctorpatient communication. Amsterdam & Philadelphia: John Benjamins. Coupland, Nikolas. 2007. Style: Language variation and identity. Cambridge: Cambridge University Press. Croft, William. 2010. The origins of grammaticalization in the verbalization of experience. Linguistics 48. 1–48. Csomay, Eniko. 2006. Academic talk in American university classrooms: Crossing the boundaries of oral‐literate discourse? Journal of English for Academic Purposes 5(2). 117–135. Csomay, Eniko. 2007. A corpus-based look at linguistic variation in classroom interaction: Teacher talk versus student talk in American University classes. Journal of English for Academic Purposes 6(4). 336–355. Dorgeloh, Heidrun. 2012. Arztbericht vs. Patientengeschichte: Story point als Genremerkmal im medizinischen Internetdiskurs. In Ansgar Nünning, Jan Rupp, Rebecca Hagelmoser & Jonas Ivo Meyer (eds.), Narrative Genres im Internet: Theoretische Bezugsrahmen, Mediengattungstypologie und Funktionen (WVT-Handbücher zum literaturwissenschaftlichen Studium), 261–276. Trier: WVT. Dorgeloh, Heidrun. 2014. ‘If it didn’t work the first time, we can try it again’: Conditionals as a grounding device in a genre of illness discourse. Communication & Medicine 11(1). 55–67. Dorgeloh, Heidrun & Anja Wanner. 2010. Syntactic variation and genre. Berlin & New York: de Gruyter Mouton. Döring, Nicola. 2003. Sozialpsychologie des Internet. Göttingen: Hogrefe. Eckert, Penelope & John R. Rickford (eds.). 2001. Style and sociolinguistic variation. Cambridge: Cambridge University Press. Fleischman, Suzanne. 2001. Language and medicine. In Deborah Schiffrin, Deborah Tannen & Heidi E. Hamilton (eds.), The handbook of discourse analysis, 470–502. Malden, Mass.: Blackwell. Fludernik, Monika. 1996. Towards a ‘natural’ narratology. London: Routledge. Frankel, Richard M. 2000. The (socio)linguistic turn in physician-patient communication research. In James E. Alatis, Heidi E. Hamilton & Ai-Hui Tan (eds.), Linguistics, language, and the professions, 81–103. Georgetown: Georgetown University Press.

64 

 Heidrun Dorgeloh

Georgakopoulou, Alexandra & Dionysis Goutsos. 2000. Mapping the world of discourse: The narrative vs. non-narrative distinction. Semiotica 131(1–2). 112–141. Georgakopoulou, Alexandra & Dionysis Goutsos. 2004. Discourse analysis: An introduction. Edinburgh: Edinburgh University Press. Gerteis, Margaret, Susan Edgman-Levitan, Jennifer Daley & Thomas L. Delbanco (eds.). 1993. Through the patient’s eyes: Understanding and promoting patient-centered care. San Francisco: Jossey-Bass. Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 29–52. Berlin & New York: de Gruyter Mouton. Giltrow, Janet & Dieter Stein. 2009. Genres in the internet. Amsterdam & Philadelphia: John Benjamins. Gotti, Maurizio & Françoise Salager-Meyer. 2006. Introduction. In Maurizio Gotti & Françoise Salager-Meyer (eds.), Advances in medical discourse analysis: Oral and written contexts, 9–16. Bern: Peter Lang. Haiman, John & Tania Kuteva. 2002. The symmetry of counterfactuals. In Joan Bybee & Michael Noonan (eds.), Complex sentences in grammar and discourse: Essays in honor of Sandra A. Thompson, 101–124. Amsterdam & Philadelphia: John Benjamins. Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold. Honeybone, Patrick. 2011. Variation and linguistic theory. In Warren Maguire & April McMahon (eds.), Analysing variation in English, 151–177. Cambridge: Cambridge University Press. Hunter, Kathryn M. 1991. Doctors’ stories: The narrative structure of medical knowledge. Princeton, NJ: Princeton University Press. Hurwitz, Brian. 2006. Form and representation in clinical case reports. Literature and Medicine 25(2). 216–240. Kinneavy, James Louis. 1971. A theory of discourse: The aims of discourse. Englewood Cliffs, NJ: Prentice-Hall. Kortmann, Bernd. 2006. Syntactic variation in English: A global perspective. In Bas Arts & April McMahon (eds.), Handbook of English linguistics, 603–624. Oxford: Blackwell. Labov, William. 1997. Some further steps in narrative analysis. The Journal of Narrative and Life History 7. 395–415. Labov, William & Joshua Waletzky. 1967. Narrative analysis: Oral versions of personal experience. In June Helm (ed.), Essays on verbal and visual arts, 12–44. Seattle: University of Washington Press. Martin, James Robert & David Rose. 2003. Working with discourse: Meaning beyond the clause. London: Continuum. Maseide, Per. 2003. Medical talk and moral order: Social interaction and collaborative clinical work. Text 23(3). 369–403. McCullough, Laurence B. 1989. The abstract character and transforming power of medical language. Soundings 72(1). 111–125. Mishler, Elliot G. 1984. The discourse of medicine: Dialectics of medical interviews. Norwood, NJ: Ablex. Miller, Carolyn R. 1984. Genre as social action. Quarterly Journal of Speech 70. 151–167. Murawska, Magdalena. 2012. The many narrative faces of medical case reports. Poznan Studies in Contemporary Linguistics 48(1). 55–75. Page, Ruth. 2012. Stories and social media: Identities and interaction. New York: Routledge.



The interrelationship of register and genre in medical discourse 

 65

Polanyi, Livia. 1985. Telling the American story: A structural and cultural analysis of conversational storytelling. Norwood: Ablex. Richards, Jack C. & Richard W. Schmidt. 2002. Longman dictionary of language teaching and applied linguistics. Harlow, UK: Longman. Rosenbach, Annette. 2002. Genitive variation in English: Conceptual factors in synchronic and diachronic studies (Topics in English linguistics 42). Berlin & New York: Mouton de Gruyter. Salmon, William N. 2010. Formal idioms and action: Toward a grammar of genres. Language & Communication 30(4). 211–224. Sankoff, David. 1988. Sociolinguistics and syntactic variation. In Frederick J. Newmeyer (ed.), Linguistics: The Cambridge survey, 140–161. Oxford: Blackwell. Sarangi, Srikant & Celia Roberts. 1999. Introduction: Discourse hybridity in medical work. In Srikant Sarangi & Celia Roberts (eds.), Talk, work, and institutional order: Discourse in medical, mediation, and management settings. 61–74. Berlin: Mouton de Gruyter. Sarangi, Srikant. 2001. Activity types, discourse types and interactional hybridity: The case of genetic counseling. In Srikant Sarangi & Malcolm Coulthard (eds.), Discourse and social life, 1–27. Harlow: Longman. Schilling-Estes, Natalie. 2002. Investigating stylistic variation. In Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds.), The handbook of variation and change, 374–401. Oxford: Blackwell. Schmid, Hans-Jörg. 2013. Is usage more than usage after all? The case of English not that. Linguistics 51(1). 75–116. Schryer, Catherine, Lorelei Lingard, Marlee Spafford & Kim Garwood. 2003. Structure and agency in medical case presentations. In Charles Bazerman & David R. Russel (eds.), Writing selves/writing societies, 92–96. Fort Collins: WAC. Schulze, Rainer (ed.). 1998. Making meaningful choices in English: On dimensions, perspectives, methodology, and evidence. Tübingen: Gunter Narr. Smith, Carlota S. 2003. Modes of discourse: The local structure of texts (Cambridge Studies in Linguistics 103). Cambridge: Cambridge University Press. Swales, John M. 2004. Research genres: Explorations and applications. Cambridge: Cambridge University Press. Tannen, Deborah. 1989. Talking voices: Repetition, dialogue and imagery in conversational discourse. Cambridge: Cambridge University Press. Virtanen, Tuija. 1992a. Issues of text typology: Narrative – a ‘basic’ type of text? Text 12(2). 293–310. Virtanen, Tuija. 1992b. Given and new information in adverbials: Clause-initial adverbials of time and place. Journal of Pragmatics 17(2). 99–115. Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 53–84. Berlin & New York: de Gruyter Mouton. Werlich, Egon. 1976. A text grammar of English. Heidelberg: Quelle & Meyer. Winker, Margaret A. 2006. Clinical crossroads: Expanding the horizons. The Journal of the American Medical Association 295(24). 2888–2889.

Markus Bieswanger

Aviation English: Two distinct specialised registers? Abstract: The communication between air traffic controllers and pilots via voice radio is regularly referred to as Aviation English in the literature. Responding to growing international air travel after the Second World War and in reaction to several accidents and incidents at least partly caused by controller-pilot miscommunication, the International Civil Aviation Organization (ICAO) developed a set of standards and recommended practices concerning language use in air traffic control communication. These ICAO guidelines permit the use of two different and precisely defined varieties of Aviation English: standardised phraseology in most routine situations and plain Aviation English when standardised phraseology is insufficient to serve an intended transmission. Based on the official ICAO recommendations and the analysis of text excerpts from authentic air traffic control communication, this paper addresses the question whether the two varieties currently referred to as Aviation English are distinct registers in the sense of Biber and Conrad (2009). The relationship between the two different interpretations of Aviation English in actual controller-pilot communication and the linguistic characteristics of these varieties are investigated and compared. The analysis shows that the two varieties in question are indeed distinct specialised registers and supports the main objective of the volume by demonstrating that adequate register choice is a prerequisite for successful communication, in this case in aviation contexts.

1 Introduction For several decades, aviate – navigate – communicate has been widely known as the axiomatic set of any pilot’s duties, particularly during non-routine and emergency situations, but also in everyday routine flying. From the point of view of prioritisation of tasks in high workload situations, the order implies that the primary concern of any flight crew must be to maintain control over their aircraft, the second most important duty is to make sure that the aircraft moves in the direction it is supposed to fly and the third priority is to communicate the intentions Markus Bieswanger, University of Bayreuth

68 

 Markus Bieswanger

of the flight crew to and receive instructions from air traffic control. However, this order does not mean that communication plays an unimportant role in aviation. Despite the highly plausible prioritisation of tasks, it should also be noted that communication is included in the set of the three most important duties of pilots (cf. Kostecka 2007: 13). As a result of a number of incidents and accidents associated with communication problems as well as several decades of continuous growth of air traffic around the globe, communication issues in air traffic control contexts are currently taken very seriously by the aviation authorities and play a heightened role in pilot and air traffic controller training. The International Civil Aviation Organization explains this as follows: With mechanical failures featuring less prominently in aircraft accidents, more attention has been focused in recent years on human factors that contribute to accidents. Communication is one human element that is receiving renewed attention. (ICAO 2010: vii)

The renewed interest in air traffic control communication also shows in the desire for an exchange of ideas and expertise between aviation professionals and linguists, as illustrated by the recent volume entitled Aviation Communication: Between Theory and Practice (Hansen-Schirra and Maksymski 2013). Voice-based communication between pilots and air traffic controllers, so-called radiotele­ phony, is regularly referred to as Aviation English or at least constitutes a central part of even the broadest definitions of Aviation English. Moder (2013: 227) provides such a broad definition: Aviation English describes the English used by pilots, air traffic controllers and other personnel associated with the aviation industry. Although the term may encompass a wide variety of language use situations, including the language of airline mechanics, flight attendants, or ground service personnel, most research and teaching focus on the more specialized communication between pilots and air traffic controllers, often called radiotelephony.

Linguistic publications indeed often adopt a more focused definition of Aviation English as “the language used by pilots and air traffic controllers” (Intemann 2008: 21). The present article follows this definition of Aviation English as the English used in voice-based air traffic control communication, but differs from most previous work in that it does not aim to analyze Aviation English prima­ rily to investigate the reasons for miscommunication in air traffic control and the contribution of communication problems to incidents and accidents (cf., e.g., Bieswanger 2013), but to assess the status of Aviation English from the perspective of register research. In the following, this article will give a short overview



Aviation English: Two distinct specialised registers? 

 69

of the history of English in air traffic control contexts and then go on to answer the question whether the two varieties currently referred to by the term Aviation English are distinct registers which can be categorised as specialised registers in the sense of Biber and Conrad (2009).

2 English in Air Traffic Control In 1944, 52 states signed the Chicago Convention, i.e. the first international convention on civil aviation. The convention resulted in the foundation of the International Civil Aviation Organisation (ICAO), which became a United Nations Agency in 1947. One of the purposes of the ICAO is to provide international standards for air traffic control and safe flight operations, which includes recommendations on language use in pilot-controller communication. These provisions concerning language use and language requirements are primarily defined in Volume II of the Annex 10 to the Convention on International Civil Aviation on Aeronautical Communications (ICAO 2001), additional language recommendations are defined in the Annexes 1, 6, and 11. The requirements are further specified in the Manual of Radiotelephony (ICAO 2007a) and the Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b). It is mainly as a result of World War II that English was chosen as the basis of the world-wide aviation communication language. It has to be noted that the ICAO recognises national languages and does not forbid the use of languages other than English for local air navigation purposes, provided that all persons involved share that other language. In international aviation, by contrast, the use of English is the rule. Crystal (2003: 108) sums up the reasons for this choice as follows: “[…] they agreed that English should be the international language of aviation when pilots and controllers speak different languages. This would have been the obvious choice for a lingua franca. The leaders of the Allies were English-speaking; the major aircraft-manufacturers were English-speaking; and most of the post-war pilots in the West (largely ex-military personnel) were English-speaking.” Regarding the economic, technological, and military dominance of Great Britain and especially the USA at that time, other languages were not a realistic option. The Chicago convention granted “complete and exclusive sovereignty over the airspace above its territory” (Convention on International Civil Aviation 1944) to each of the contracting states, but also demanded that all contracting states provide adequate regulations for the safety of aviation. The original language of the document is English, but it was translated into French and Spanish as the

70 

 Markus Bieswanger

two other languages “equal of authenticity” (cf. Convention on International Civil Aviation 1944). Today, there are also translations of the document into Russian, Chinese and Arabic, since these are official languages of the United Nations. Currently the ICAO has 190 member states. The first version of the Convention on International Civil Aviation (1944) does not include any statements on the question of an international air traffic communication language, but it promises further regulations. Today’s air traffic management procedures are the result of an ongoing evaluation and revision of the first document provided in 1946 by the Air Traffic Control Committee of the International Conference on North Atlantic Route Service Organisation (cf. ICAO 2007b: vii). This bias towards North Atlantic air traffic has definitely also contributed to the choice of English. Responding to the constantly growing international air travel after the Second World War and in reaction to several accidents and incidents at least partly caused by controller-pilot miscommunication (cf., e.g., Cushing 1994; Jones 2003), the ICAO developed a set of standards and recommended practices (SARPs) concerning language use in general and the use of English in particular in air traffic control communication (cf. ICAO 2001; ICAO 2007a; ICAO 2007b), which has been adopted by most countries world-wide. For several decades, until about the turn of the century, these SARPs were almost exclusively devoted to the definition of the so-called “ICAO standardized phraseology” (ICAO 2001: 5-1; for a detailed description cf. Section 3.2 below), which is supposed to “provide the tools for communication in most of the situations encountered in the daily practice of ATC [= air traffic control] and flight” (ICAO 2010: 3-5). More recently, the ICAO has added SARPs concerning the proficiency in plain Aviation English of all pilots and air traffic controllers involved in international aviation (cf. Mathews 2004; Mitsutomi and O’Brian 2004; ICAO 2010). Experience with standardised phraseology had shown that in unusual and unexpected “cases, where phraseology provides no ready made form of communication, pilots and controllers must resort to plain language” (ICAO 2010: 3-5). The motivation for the demand of a certain level of proficiency in plain Aviation English by all stakeholders in air traffic control communication was similar to the reasons that had earlier led to the development of the standardised phraseology: Over 800 people lost their lives in three major accidents […]. In each of these seemingly different types of accidents, accident investigators found a common contributing element: insufficient English language proficiency on the part of the flight crew or a controller had played a contributing role in the chain of events leading to the accident. In addition to these high-profile accidents, multiple incidents and near misses are reported annually as a result of language problems, instigating a review of communication procedures and standards worldwide. (ICAO 2010: 1-1)



Aviation English: Two distinct specialised registers? 

 71

As a result of accidents and incidents more or less intimately connected to communication problems, currently all pilots and air traffic controllers involved in international aviation have to demonstrate proficiency in plain aviation-related English or plain Aviation English; the required level of proficiency is at least level 4 “operational” on a scale from level 1 “pre-elementary” to level 6 “expert” (ICAO 2010: A-7 and A-8). To sum up, two varieties of English used for communication between pilots and air traffic controllers are presently referred to by the term Aviation English, namely standardised phraseology, on the one hand, and plain Aviation English, on the other. In this paper, Aviation English will be used as the umbrella term, while standardised phraseology and plain Aviation English will be used to refer to the varieties of Aviation English respectively. The following chapter will apply the classification of Biber and Conrad (2009) to these varieties and investigate whether we are concerned with two distinct specialised registers referred to by the same designation.

3 Registers of Aviation English According to Biber and Conrad (2009: 6), “a register is a variety associated with a particular situation of use (including particular communicative purposes.” Biber and Conrad (2009: 6) identify three components of a register analysis: firstly, the situational context of use, i.e. the unique situational characteristics of a certain variety of language use. Secondly, the linguistic analysis, i.e. the description of “typical lexical and grammatical features” (Biber and Conrad 2009: 6) that are pervasive in a variety. Thirdly, the interpretation of the functions of these pervasive linguistic features in the situational context specified earlier. Section 3.1 will be devoted to a situational analysis of the two registers in question, while Sections 3.2 and 3.3 will describe their linguistic characteristics and their specific functions.

3.1 Situational analysis As already mentioned, Aviation English consists of standardised phraseology and the use of plain English in aeronautical radiotelephony communication. When applying Biber and Conrad’s (2009: 39) “framework for analyzing situational characteristics,” many similarities and some crucial differences concerning the situational context of these two varieties of Aviation English can be identified.

72 

 Markus Bieswanger

According to Biber and Conrad (2009: 40), the major situational characteristics of registers are: participants, relations among participants, channel, production circumstances, setting, communicative purposes and topic (cf. also Schubert, this volume). Participants The participants in both varieties of Aviation English are identical. The stakeholders in aeronautical radiotelephony communication, i.e. pilots and controllers engaging in air traffic control communication, are both addressors producing text as well as intended listeners referred to as addressees (cf. Biber and Conrad 2009: 41). Depending on national regulations, it may or may not be legal for outsiders to listen to air traffic control communication, but there is no difference between the two varieties concerning what Biber and Conrad (2009: 42) call “on-lookers”. Since all parameters concerning participants and participation are identical, differences between the use of standardised phraseology and plain Aviation English cannot be attributed to this situational characteristic. Relations among participants There are no differences between the two varieties of Aviation English in the relations among participants either. The participants in air traffic control communication directly interact with each other. Usually, one member of the flight crew interacts with one air traffic controller in a dialogue at any given point in time. In both varieties, the social roles of the interlocutors are identical, there are usually no personal relationships between them and all participants share considerable background knowledge about aviation. Channel With channel, Biber and Conrad (2009: 43) mean the binary distinction into the physical modes of speech and writing and what they call the “specific mediums of communication.” Both types of Aviation English are voice-based and thus clearly spoken registers. Written air traffic control communication with the help of a so-called controller-pilot data link is still in its infancy and faces a number of disadvantages that seem to inhibit its more widespread use, such as the ensuing lack of situational awareness of all pilots of surrounding aircraft when messages are exchanged bilaterally between one pilot and one air traffic controller. The specific medium of communication for transmitting speech in air traffic control communication is voice radio. Unlike face-to-face communication, Aviation English thus generally belongs to the types of mediated spoken communication (cf. also setting below).



Aviation English: Two distinct specialised registers? 

 73

Production circumstances As both kinds of Aviation English are spoken registers, there is typically not much time for speakers to plan what to say next and no possibility to “edit or erase language once it is spoken” (Biber and Conrad 2009: 43). As in all spoken conversations, there are certain expectations as to when a speaker has to say something as well as limitations with respect to the length of pauses. Since all pilots a particular air traffic controller is responsible for are tuned to the same frequency and since aviation radio technology does not allow more than one pilot to address the controller at the same time, efficient communication is one of the main concerns in air traffic control communication. Setting According to Biber and Conrad (2009: 44), “the setting refers to the physical context of the communication – the time and place” (original emphasis). As with most spoken communication, the time is shared by the interlocutors in air traffic control communication, as the messages are transmitted instantaneously. Aviation English, however, is generally mediated communication and thus the situation is special with respect to place. The participants have a certain knowledge about the place of production of their interlocutor’s speech but do not share the place of production as in face-to-face communication. The quality of transmission in air traffic control communication is one of the reasons for the implementation of SARPs, as it can be adversely affected by weather, distance and other circumstances. Communicative purposes The two varieties of Aviation English show their biggest differences in relation to the communicative purposes. It could be argued that both share what Biber and Conrad (2009: 45) call the “general purpose”, i.e. the aim to ensure efficient and effective communication between pilots and controllers, and differ only in the specific purpose. If register status was decided by the general purpose alone, the two varieties of Aviation English could be termed specific “subregister[s]” (Biber and Conrad 2009: 45) of one register. However, according to the ICAO (2001: 5-1), there should be no overlap between these two varieties: “ICAO standardized phraseology shall be used in all situations for which it has been specified. Only when standardized phraseology cannot serve an intended transmission, plain language shall be used.” Considering the fundamentally different and complementary situations of use  – routine versus non-routine air traffic control communication (cf. ICAO 2010: 3-4, 3-5) – and the considerable linguistic differences between the two varieties, as shown below, it can be argued that we are concerned with two distinct, albeit related, registers.

74 

 Markus Bieswanger

Topic The situation concerning the factor topic resembles the differentiation of communicative purposes: the shared general topic of both varieties is aviation, but the specific topics covered are different. While standardised phraseology is concerned with the fairly restricted aspects of routine air traffic control issues, plain Aviation English covers a broader range of topics in non-routine situations, such as emergencies as well as other unusual or unexpected contexts. “Topic is the most important situational factor influencing vocabulary choice” (Biber and Conrad 2009: 46) and so it is not surprising that standardised phraseology and plain Aviation English should differ to a large extent at the lexical level (cf. also Sections 3.2 and 3.3). Summary With respect to the situational characteristics of the two varieties of Aviation English, many of Biber and Conrad’s (2009: 40) parameters such as participants, relations among participants, channel, production circumstances and setting are shared by both registers. However, there are clear differences in the communicative purposes and the range of topics covered by standardised phraseology and plain Aviation English respectively, which leads to the conclusion that we are not concerned with sub-registers of a single register. From the perspective of situational characteristics, which “can be definitely specified” (Biber and Conrad 2009: 33) for both registers, standardised phraseology and plain Aviation English can be categorised as two distinct specialised registers.

3.2 Standardised phraseology In this section, the linguistic features of standardised phraseology and their functions will be presented and discussed. In contrast to many other registers, the functions of the linguistic features of this variety are clearly and explicitly defined. The register that is officially referred to as “ICAO standardized phraseology” (ICAO 2001: 5-1) is a variety of English that is used in a precisely defined situational context and characterised by prescribed and pervasive linguistic features used for a specific function, mainly “for the purpose of ensuring uniformity in RTF [= radiotelephony] communications” (ICAO 2007a: 3-1) and “to provide maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2). This variety thus fulfils all the criteria of a “specialized register” in the sense of Biber and Conrad (2009). The ICAO standardised phraseology is precisely defined in several official documents published by the ICAO. The second volume of Annex 10 to the Con-



Aviation English: Two distinct specialised registers? 

 75

vention on International Civil Aviation (ICAO 2001) describes “Aeronautical Communications”, chapter  12 of ICAO Document 4444 on Air Traffic Management (ICAO 2007b) is devoted entirely to “Phraseologies” and ICAO Document 9432, the Manual of Radiotelephony (ICAO 2007a), provides a collection of illustrations of the recommendations given in the other two documents. Recommendations exist for all levels of language, including lexicon, grammar and pronunciation. According to Biber and Conrad (2009: 6), “[r]egisters are described for their typical lexical and grammatical characteristics” and they state that their “linguistic features are always functional”. Pronunciation features are not included in the list of linguistic features of registers by Biber and Conrad (2009: 6), but since the pronunciation features of the ICAO phraseology are strictly functional and since Biber mentioned phonological features as register features in an earlier study (cf. Biber 1995: 29), they will also be considered linguistic features of this register and thus be presented in this section. Lexical characteristics Standardised phraseology is probably best known for its characteristics at the lexical level. At the heart of this register is a reduced vocabulary consisting of a limited number of words and fixed phrases, each with a single precise meaning in the situational context of routine air traffic control communication. Section 5.2.1.5.8 of Annex 10 to the Convention on International Civil Aviation (ICAO 2001) contains a brief list of words and phrases that “shall be used in radio­ telephony communications as appropriate and shall have the meaning ascribed hereunder.” The list contains key terms of radiotelephony communication, such as affirm for ‘yes’, cleared (cf. Transcript 1) for ‘authorised to proceed [with the aircraft] under the conditions specified’, go ahead (cf. Transcript 4) meaning ‘proceed with your message’ but not ‘proceed with your aircraft’, monitor (Transcript 3) for ‘listen out on (frequency)’ and maintain (cf. Transcript 2) for ‘continue in accordance with the condition(s) specified’. Section 12.3 of ICAO Document 4444 on Air Traffic Management (ICAO 2007b) provides a more comprehensive collection of words and phrases to be used in specific circumstances. For example, climb (cf. Transcript 2) is prescribed as the phonetically dissimilar opposite of descend in standardised phraseology, ruling out the use of ascend, which is regularly listed as an antonym of descend in dictionaries of plain English (cf. OALDO 2014). The recommendations even explicitly include words and phrases that should not be used at all. For example, Section  3.1.4 of the Manual of Radiotelephony (ICAO 2007a: 3-1) suggest that “the use of courtesies should be avoided” altogether; however, courtesies such as greeting and parting expressions are often used and tolerated in non-urgent contexts (cf. Trancript 3). Standardised phraseology is thus not among the many text varieties native speakers of a language acquire

76 

 Markus Bieswanger

“without explicitly studying them” (cf. Biber and Conrad 2009: 2) but has to be learned by both native as well as non-native speakers of English with explicit instruction. From the lexical perspective, two main characteristics of the special register referred to as standardised phraseology can be identified. First, in contrast to most other varieties of English – where it is the rule rather than the exception for words to have multiple meanings – each word and phrase has just one specific and precisely defined meaning in aviation phraseology. Other meanings of words which are polysemous in plain English are thus explicitly excluded from this register and some of the defined meanings of words and phrases in aviation phraseology do not occur outside of this specialised register. Meanings of words and phrases that do not occur in other registers are called “register markers” (Biber and Conrad 2009: 53). Unlike register markers in many other registers, however, these unique characteristics are strictly functional in standardised phraseology (cf. Biber and Conrad 2009: 55). The second main lexical characteristic of this register is the fact that words and phrases are carefully selected to avoid confusion and misunderstandings due to phonetically similar expressions, since “maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2) are considered the most important aims of the prescription of aviation phraseology. Grammatical characteristics At the grammatical level, standardised phraseology is also characterised by a number of pervasive and frequent “register features” (Biber and Conrad 2009: 53). With respect to the use of verbs in aviation phraseology, the prescription to use most verbs in the list of essential “words and phrases” in the imperative only is certainly striking (cf. ICAO 2001: 5-6 and 5-7). According to the definitions in this list, verbs such as cancel ‘annul the previously transmitted clearance’, check ‘examine a system or procedure’, contact (cf. Transcript 2) ‘establish communications with …’, disregard ‘ignore’, monitor (Transcript 3) ‘listen out on frequency’, maintain (cf. Transcript 2) ‘continue in accordance with the condition(s) specified’, report ‘pass me the following information …’, and many more can only be used in imperatives, which is certainly a register feature of this variety. Aviation phraseology even prescribes the use of certain words as verbs in the imperative which are not commonly used as verbs and thus not listed in this part of speech in general-use dictionaries, e.g. the verbal use of standby (cf. Transcript 4) meaning ‘wait and I will call you’ (ICAO 2001: 5-7). Another grammatical feature characteristic of aviation phraseology is the specific prescribed order of elements in an utterance and the high frequency of ellipses, as illustrated by the following authentic example:



Aviation English: Two distinct specialised registers? 

 77

Transcript 1: Aerogal seven hundred heavy Kennedy Tower (.) winds calm (.) runway one three left (.) cleared to land (JFK Tower, own transcript, 2010)

In line with the recommendation in Section  5.2.1.6 “Composition of messages” of Annex 10 to the Convention on International Civil Aviation (ICAO 2001), the message uttered by the air traffic controller at JFK International Airport consists of two main parts, the “call” made up of the call sign of the addressee Aerogal seven hundred heavy and the call sign of the originator Kennedy Tower, and the “text” winds calm (.) runway three one left (.) cleared to land, which provides information concerning the weather and contains the instruction that the plane is cleared to land on runway three one left. The fixed structure permits elliptical constructions and the reduction of function words “to a small number of prepositions” (Moder 2013: 229; cf. also ICAO 2010: 3-4), as illustrated by the above example. Overall, the grammatical characteristics of standardised phraseology reflect the dominant functions of pilot-controller communication identified by Mell (2004: 13), which are sharing of information (cf. information on the wind conditions in Transcript 1 above), triggering actions, management of the pilot-controller relationship and managing the dialogue. For example, the frequent use of imperatives is directly linked to the category “triggering actions” as “the core function of pilot-controller communications” (Mell 2004: 13) and the prescribed structure reduces the number of words needed for managing the dialogue between pilot and air traffic controller. Transcript 2 illustrates the importance of imperatives for triggering immediate actions (cf. also the imperatives continue, follow and monitor in Transcript 3), in this case right after the decision of the pilots to abort the landing and initiate a go-around: Transcript 2: Lufthansa four two four heavy climb [to and] maintain 3000 [feet] (.) fly runway heading […] contact Boston Departure […] (Boston Tower, own transcript, 2015; imperatives in bold)

Pronunciation characteristics The ICAO publications on standardised phraseology make specific recommendations, which leads to additional linguistic features of this register. For example, there are recommendations concerning the pronunciation of numbers and letters. The “Radiotelephony Spelling Alphabet” defines the “desired pronunciation” (ICAO 2001: 5-4) of the words representing letters when spelling out “names, service abbreviations and words of which the spelling is doubtful” (ICAO

78 

 Markus Bieswanger

2001: 5-3). According to the ICAO (2001: 5-4), for example, the letter has to be pronounced as zulu /'zu:lu:/ and has to be realised as kilo /'ki:lo/ (cf. Transcript 3). Transcript 3: Delta four twenty-seven (.) good day (.) continue down to kilo kilo [= taxiway KK] (.) follow company [= another Delta jet] seven three seven (.) monitor tower one two three point niner (JFK Ground, own transcript, 2008, my emphasis)

The pronunciation of numbers, which under most circumstances have to be pronounced as single digits, is also specified in the recommendations for standardised phraseology. Section 5.2.1.4.3 “Pronunciation of Numbers” of Annex 10 to the Convention on International Civil Aviation (ICAO 2001) provides a description of the desired pronunciation of numbers including recommended stress placements:

(ICAO 2001: 5-5)

To avoid misunderstandings in radiotelephony communication, some of the recommended pronunciations of numbers are deliberately different from the common pronunciation of these numbers in many varieties of English spoken by native speakers. The prescribed pronunciation features thus have to be learned by native and non-native speakers of English alike. For example, dental fricatives are regularly replaced by alveolar stops  – a recommendation in line with Jenkins’ (2008: 146) recommendations for the so-called “Lingua Franca Core” of English  – and so the initial sounds in thousand and three are supposed to be realised as /t/. Unfortunately, these recommendations are “often not adopted by



Aviation English: Two distinct specialised registers? 

 79

native speakers of English, who typically pronounce ‘3’ and ‘5’ in the usual plain English way” (Moder 2013: 229–230). This is illustrated by Transcript 3, in which the air traffic controller at JFK International Airport in New York City, most likely a native speaker of English, pronounces “in the usual plain English way” (Moder 2013: 230) but realises as niner. Unlike for most other registers, there are even provisions concerning the speed of delivery of utterances in Aviation English. The ICAO recommends “an even rate of speech not exceeding 100 words per minute” (ICAO 2001: 5-5) and an even slower rate “[w]hen it is known that elements of the message will be written down by the recipient” (ICAO 2007a: 2-1). Studies, however, have shown that particularly native speakers tend to use a much higher speech rate, often over 200 words per minute, which can lead to misunderstandings and the need for time-consuming clarifications (cf. Bieswanger 2013: 19–20). Silberstein and Dittrich (2003: 9) quote an air traffic controller who admits: “I talk faster, a lot faster – I talk so fast that they have to slow me down because they don’t understand me anymore.” Since the speech rate is obviously crucial in Aviation English, all pilots and air traffic controllers have to be trained to develop an awareness of the importance of their speed of delivery. Ever since its introduction after the Chicago Convention more than half a century ago, the ICAO standardised phraseology has been refined and expanded. The continuous development of standardised phraseology had been based on pilots’ and controllers’ experiences and the analysis of language-related accidents, in order to cover more areas of language use in aviation, to adopt new procedures and technologies, and to deal with previously unknown or rare situations. For example, in reaction to recent events, the 15th edition of the ICAO Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b: xv) adds, among other regulations, new “pilot procedures in the event of unlawful interference” and “procedures related to volcanic ash”. Pilots and air traffic controllers are constantly urged to use standardised phraseology and to avoid non-standard communication whenever possible (cf., e.g., ICAO 2001: 5-1; ICAO 2007a: 3-2; ICAO 2010: 2-3; Prinzo et al. 2010: 15). Despite all efforts to regularly update the standardised phraseology, the ICAO also acknow­ ledges that “[i]t is not possible, however, to develop phraseologies to cover every conceivable situation” (ICAO 2010: 4-2) and that “plain language shall be used” (ICAO 2001: 5-1) when standardised phraseology is not available to cover the communicative needs of the stakeholders in air traffic control communication. The following section will describe the use of plain language in such situations and show that plain Aviation English can also be considered a specialised register.

80 

 Markus Bieswanger

3.3 Plain Aviation English The use of plain language has never been excluded from the use in pilot-controller communication but, quite on the contrary, has always been permitted and used in clearly defined situations in which “standardized phraseology cannot serve an intended transmission” (ICAO 2001: 5-1). As a result of this precise situational context, however, plain Aviation English is fundamentally different from everyday conversations in several respects: Plain language in aeronautical radiotelephony communications means the spontaneous, creative and noncoded use of a given natural language, although constrained by the functions and topics (aviation and non-aviation) that are required by aeronautical radiotele­ phony communications, as well as by specific safety-critical requirements for intelligibility, directness, appropriacy, non-ambiguity and concision. (ICAO 2010: 3-5)

Plain Aviation English is thus characterised by features that result from the function it has to fulfil with respect to safety and the topics covered in air traffic control communication. These constraints are the reason for distinctive register features at all linguistic levels, described and illustrated in the following subsections. Lexical characteristics The lexicon of plain Aviation English is less precisely defined than the words and phrases used in standardised phraseology, but at the same time more restricted than, for example, the lexicon of everyday conversation in what could be called plain English. The ICAO recommendations make it very clear that the obvious need for plain language in non-routine situations “should in no way be interpreted as permission to chat” (ICAO 2010: 4-3). At the lexical level, plain Aviation English is thus characterised by words and phrases corresponding to topics related to pilot-controller communication. These topics, which are also addressed in textbooks and courses on plain Aviation English (cf., e.g., Emery and Roberts 2008), include, among others, fields such as technology, health, animals, fire and weather (for a detailed list of domains, cf. ICAO 2010: B5-B8). For example, in-flight medical emergencies often make the use of plain Aviation English ne­cessary (cf. Transcript 4). In Transcript 4, standardised phraseology is used in the first two transmissions to establish contact but then turns out to be insufficient to serve all of the communicative needs of the pilots. Hence a code-switch takes place and the further three transmissions are carried out in plain Aviation English. The vocabulary in these transmissions, however, is different from plain everyday English in that it is characterised by aviation-related terms such as diversion, declaring emergency and met report.



Aviation English: Two distinct specialised registers? 

 81

Transcript 4: American 182 Tokyo Control American 182

Tokyo Control American one eight two American one eight two (.) go ahead Yes sir (.) we are (.) have a possible diversion to Narita [=Tokyo Narita International Airport] (.) we are not declaring emergency yet but would like Narita weather […] Narita airport is closed, Tokyo Haneda is suggested for a possible diversion Tokyo Control American one eight two (.) do you need met report [=weather report] of Haneda? American 182 Yes sir (.) request met report for Haneda Tokyo Control Okay, standby (Tokyo Control, own transcript, 2014)

Grammatical characteristics The grammatical structure of plain Aviation English is similar to plain English and only characterised by some tendencies which constitute functionally oriented register features. Of the factors mentioned in the quotation above, “concision” (ICAO 2010: 3-5) is certainly one of the main driving forces responsible for these characteristics. Concision is defined as ‘giving only the information that is necessary, using few words’ in the OALDO (2014). In the context of plain Aviation English, this means that the utterances produced by pilots and air traffic controllers have to be as brief as possible and simply structured. According to Prinzo et al. (2010: 15), the rate of readback errors is affected by “both message length and complexity” and they claim that “controllers should transmit less information more often.” With reference to concision, it has also been reported that the desire for brevity leads to an influence of standardised phraseology on plain Aviation English, showing in the deletion of function words such as determiners even when not using phraseology (ICAO 2010: 3-6). The last two transmissions in Transcript 4 illustrate this claim, as the determiner the is omitted in both transmissions before met report. Pronunciation characteristics At the level of pronunciation, plain Aviation English is less restricted than standardised phraseology, as there are no specific recommendations concerning the realisation of individual words and phrases. Other ICAO recommendations concerning pronunciation, however, also apply to the use of plain language and make plain Aviation English more restricted than plain English in many other situations. For example, the recommended speech rate of 100 words or less per minute (ICAO 2001: 5-5; cf. above) is also valid for plain Aviation English, which aims for maximum “intelligibility” (cf. ICAO 2010: 3-5), just like standardised phraseology.

82 

 Markus Bieswanger

This necessity for maximum mutual intelligibility in pilot-controller communication is also the reason for another requirement concerning the pronunciation of plain Aviation English, namely the demand that all pilots and air traffic controllers “must take care to acquire an internationally understood accent or dialect” (ICAO 2010: 5-6). The ICAO does not specify more precisely what is meant by “internationally understood accent” and does not name any recommended accents in particular, but this fairly vaguely defined rule applies to both native speakers and non-native speakers of English. From a functional perspective, such an accent or dialect is a register feature of plain Aviation English and necessary for efficient and effective communication in air traffic control contexts.

4 Conclusion The above sections have shown that Aviation English is not monolithic and that there is not one but two varieties referred to as Aviation English, namely standardised phraseology and plain Aviation English. Both varieties occur in precisely defined and complementary situations in pilot-controller communication: standardised phraseology covers most routine situations, whereas plain Aviation English is only permitted in non-routine situations. Both varieties share many of the situational characteristics Biber and Conrad (2009: 39) consider “relevant for describing and comparing registers”. They are employed by the same partici­ pants, i.e. pilots and air traffic controllers, with identical relations between the participants, use the same channel, face the same production circumstances and share the same setting. The main differences with regard to the situational characteristics can be found in the communicative purposes and the topics covered. While both varieties share their general purpose, namely to facilitate efficient and effective air traffic control communication, standardised phraseology is restricted to a limited set of frequently used communicative purposes in routine situations, whereas plain Aviation English covers a whole range of less frequently used and non-routine communicative purposes such as emergencies. A similar pattern can be identified concerning the topics covered by these two varieties: while standardised phraseology covers a restricted but very frequently used set of topics in routine air traffic control communication, plain Aviation English covers a much broader range of air traffic related topics in non-routine situations. Resulting from the partially different situational contexts, both varieties of Aviation English are characterised by pervasive linguistic features that fulfil specific functions in each of the situations. Standardised phraseology is characterised by a very precisely defined reduced set of words and phrases, each with a



Aviation English: Two distinct specialised registers? 

 83

single prescribed meaning, a grammar marked by ellipsis, short utterances and a frequent use of imperatives, and a prescribed pronunciation of numbers and letters as well as recommendations concerning the speech rate. Reflecting the wider range of communicative purposes and topics covered by plain Aviation English, the lexical, grammatical and pronunciation characteristics are less precisely specified than for standardised phraseology. There are, however, characteristics at all linguistic levels that distinguish plain Aviation English from conversations in plain English, such as a reduced lexicon resulting from the restriction of plain Aviation English to the topics related to aeronautical radiotelephony, a grammar determined by the fundamental need for concision and non-ambiguity, and ICAO recommendations concerning the speech rate and the intelligibility of accents and dialects. In conclusion, considering situational, linguistic and functional characteristics, the analysis presented in this paper shows that both varieties of Aviation English used in pilot-controller communication can be categorised as specialised registers in the sense of Biber and Conrad (2009: 10; 32–33). They are both fundamentally different from the very general register of conversation, and they are distinct because they differ in their degree of specificity. Compared to plain Aviation English, the situational, linguistic and functional characteristics of standardised phraseology can be much more precisely specified. Standardised phraseology thus represents one extreme of a continuum of specificity of registers, while conversations would be at the other end. Plain Aviation English could be placed somewhere in between, although certainly in the range of specialised registers and closer to aviation phraseology than to everyday conversations. In air traffic control communication, routine and non-routine situations alternate constantly, meaning that changes in communicative purpose and the switching between the two specialised registers described in this article are the rule rather than the exception in the work-life of pilots and air traffic controllers (cf. Biber and Conrad 2009: 45). The two specialised registers, standardised phraseology and plain Aviation English, however, are the only choices permitted in English-language air traffic control situations; plain English – as used in everyday face-to-face or mediated conversations – is not an option and explicitly discouraged by the ICAO. Both native speakers and non-native speakers of English have to learn these two specialised registers with explicit instruction, as neither of these specialised registers is among the many registers native speakers acquire “automatically” without any extra effort. The need for situation-specific register selection in air traffic control communication provides yet another example for the fact that the use of the appropriate register in a given situation is the pre­ requisite for successful communication.

84 

 Markus Bieswanger

References Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge Universtity Press. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Bieswanger, Markus. 2013. Applied linguistics and air traffic control: Focus on language awareness and intercultural communication. In Silvia Hansen-Schirra & Karin Maksymski (eds.), Aviation communication: Between theory and practice, 15–30. Frankfurt am Main: Peter Lang. Convention on International Civil Aviation. 1944. Convention on international civil aviation done at the 7th day of December 1944. Original version available at http://www.icao.int/ publications/Documents/7300_orig.pdf (accessed 31 January 2014). Crystal, David. 2003. English as a global language. 2nd edn. Cambridge: Cambridge University Press. Cushing, Steven. 1994. Fatal words: Communication clashes and aircraft crashes. Chicago: The University of Chicago Press. Emery, Henry & Andy Roberts. 2008. Aviation English: For ICAO compliance. Oxford: Macmillan. Hansen-Schirra, Silvia & Karin Maksymski (eds.). 2013. Aviation communication: Between theory and practice. Frankfurt am Main: Peter Lang. ICAO (International Civil Aviation Organisation). 2001. Annex 10: Aeronautical telecommunications. Volume II. 6th edn. ICAO (International Civil Aviation Organisation). 2007a. Manual of radiotelephony. 4th edn. ICAO Document 9432-AN/925. ICAO (International Civil Aviation Organisation). 2007b. Procedures for air navigation services: Air traffic management. 15th edn. ICAO document 4444-ATM/501. ICAO (International Civil Aviation Organisation). 2010. Manual on the implementation of ICAO language proficiency requirements. 2nd edn. ICAO Document 9835-AN/453. Intemann, Frauke. 2008. ‘Taipei ground, confirm your last transmission was in English … ?’ – An analysis of Aviation English as a world language. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English and the English language classroom, 76–93. 2nd edn. Tübingen: Narr. Jenkins, Jennifer. 2008. Teaching pronunciation for English as a Lingua Franca: A sociopolitical perspective. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English and the English language classroom, 145–158. 2nd edn. Tübingen: Narr. Jones, R. Kent. 2003. Miscommunication between pilots and air traffic control. Language Problems and Language Planning 27(3). 233–248. Kostecka, Robert. 2007. Aviate—Navigate—Communicate. Transport Canada: Aviation safety letter 2/2007, 12–14. Live-atc.net. www.live-atc.net. (accessed 19 February 2015) Mathews, Elizabeth. 2004. New provisions for English language proficiency are expected to improve aviation safety. ICAO Journal 59(1). 4–6, 27. Mell, Jeremy. 2004. Language training and testing in aviation need to focus on job-specific competencies. ICAO Journal 59(1). 12–14, 27. Mitsutomi, Marjo & Kathleen O’Brien. 2004. Fundamental aviation language issues addressed by new proficiency requirements. ICAO Journal 59(1). 7–9, 26–27.



Aviation English: Two distinct specialised registers? 

 85

Moder, Carol Lynn. 2013. Aviation English. In Brian Paltridge & Sue Starfield (eds.), The handbook of English for specific purposes, 227–242. Malden: John Wiley & Sons. OALDO (Oxford advanced learner’s dictionary online). 2014. http://oald8. oxfordlearnersdictionaries.com/(accessed 31 January 2014). Prinzo, Veronika O., Alan Campbell, Alfred M. Hendrix & Ruby Hendrix. 2010. U.S. airline transport pilot international flight language experiences. Report 5: Language experiences in native English-speaking airspace/airports. Technical report DOT/FAA/AM-10/18. Washington, DC: Federal Aviation Administration, Office of Aerospace Medicine. Silberstein, Dagmar & Rainer Dietrich. 2003. Cockpit communication under high cognitive workload. In Rainer Dietrich (ed.), Communication in high risk environments (Special issue 12 of Linguistische Berichte), 9–56. Hamburg: Buske.

Rolf Kreyer

‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpuslinguistic perspective Abstract: The present paper wants to provide a first corpus-based analysis of one of the most successful kinds of popular music, namely hip-hop. In particular, the paper explores to what extent hip-hop can be regarded as a register in its own right, analysing data drawn from a 200,000-word corpus of the most successful hip-hop albums in 2003 and 2011. Taking Biber and Conrad’s (2009) register-­ defining trias of situation of use, linguistic features, and associated functions as a descriptive framework, it is argued that hip-hop can be warranted the status of a register in its own right indeed.

1 Introduction In Western societies, pop songs are an integral part of everyday life: we are surrounded by pop songs in the supermarket, in the elevator or when driving a car. Moreover, listening to pop songs is one of the (if not the) most popular pastime among adolescents in America or Western Europe (cf., for instance, Schwartz and Fouts 2003). Given the pervasiveness of pop songs, it is surprising that the scientific study of this register does not figure very prominently in linguistics, although pop songs have been given a considerable amount of attention in fields like cultural studies. In this respect, it is telling that none of the major corpora of the English language provide any lyrics of pop songs. The linguistic analysis of this register is still in its infancy and corpus-linguistic studies are few and far between. An early corpus-based analysis of pop songs is Murphey (1989; cf. also 1990 and 1992). He provides both quantitative as well as qualitative data from a 13,000-word corpus of pop-song lyrics. His main focus, however, does not lie in the description of a register but in the exploitation of pop songs for the learning and teaching of English as a foreign language. A much more ambitious project is the BLUR (Blues Lyrics collected at the University of Regensburg) corpus, which contains 7,341 song

Rolf Kreyer, University of Marburg

88 

 Rolf Kreyer

texts comprising roughly 1.5 million words (Miethaner 2001, 2005; Schneider and Miethaner 2006). However, this corpus consisting of recordings from the 1920s to the 1940s was compiled as evidence for earlier African American Vernacular English and, accordingly, is only of limited value for the study of pop songs as an important present-day register. More detailed analyses of modern pop songs can be found in Kreyer and Mukherjee (2007) and Kreyer (2012). The former provide a first attempt at describing the major linguistic properties of the register at issue, such as deviant spellings (also cf. Mukherjee 2000) and lexical/lexico-grammatical aspects. One focus of their research is on the degree to which pop songs can be considered a written or spoken register. The data show that the register is more spoken-like in general, as is shown in similarities in average word length or the high frequency of the personal pronouns you and I. Interestingly, other features that are typical of spoken language, such as the frequent use of you know as a discourse marker, were shown not to be that important in pop songs. Kreyer (2012) explores the use of love-related metaphors in pop songs within the framework of conceptual metaphor theory (e.g. Lakoff and Johnson 1980; Kövecses 2002). He finds that, despite the (perhaps) popular assumption that pop songs are clichéd, metaphors in pop songs are quite varied and creative. The most recent register-­ related study of pop songs is Werner (2012). Since he is interested in small-scale diachronic as well as varietal aspects of pop songs, his corpus consists of two subcorpora, one with British lyrics and the other with American lyrics. The 1,128 songs included in the corpus span the years 1952–2008 and 1946–2005, totalling 171,968 and 170,234 words, respectively (Werner 2012: 23). Werner’s findings also confirm earlier claims about the informal and conversational nature of pop songs lyrics. However, he argues convincingly that subsuming pop song lyrics under the conversational register would go too far. Rather, the low frequencies of typical spoken features such as interjections or non-standard morphosyntactic elements call for a more careful analysis: “the picture of pop-song lyrics as exemplars of spoken/informal register […] had to be […] altered to be thought of as a ‘special’ register” (Werner 2012: 43). The present paper wants to further contribute to our understanding of pop song lyrics from a register perspective by exploring hip-hop as a potential sub-register. A question that comes to mind is whether pop songs can be regarded as one single monolithic register or whether it makes sense to assume more specific registers covered by the umbrella term ‘pop songs’. Biber and Conrad (2009: 10) claim that “[t]here is no one correct level on which to identify a register” and “that registers can be studied on many different levels of specificity”. The present paper aims at providing a first corpus-based analysis of one of the most successful (musical) genres among pop songs, namely hip-hop. The label ‘genre’ is also to be understood in its linguistic sense at this point, since



‘Now niggas talk a lotta Bad Boy shit’ 

 89

we cannot yet be sure that hip-hop constitutes a register. Based on data from an updated pilot version of the Giessen-Bonn corpus of Popular music  – GBoP (cf. Kreyer and Mukherjee 2007), the paper explores Biber and Conrad’s (2009: 50) three criteria for register analysis (situational characteristics, linguistic characteristics and function; cf. Schubert, this volume) and shows that with regard to all of these, hip-hop must be regarded as a register in its own right.

2 The data The data for the present study is taken from an extended pilot version of GBoP. It contains lyrics from the top albums from the US album charts of the years 2003 and 20111. More specifically, for 2003, 48 of the top 52 albums were included. Four albums had to be ignored because they either did not contain any lyrics at all or only contained non-English lyrics. The 2003 lyrics were taken from internet lyric archives or from CD booklets (cf. Kreyer and Mukherjee 2007 for details). The 2003 material has been supplemented by the (English) lyrics of the top 50 albums from 2011. These lyrics were primarily taken from A-Z lyrics (www.azlyrics.com). This site is particularly suitable, since the lyrics it provides are usually reviewed by a number of different users, resulting in a fairly ‘reliable’ version of the texts. In some cases, other archives like metrolyrics (www.metrolyrics.com) or lyricsfreak (www.lyricsfreak.com) had to be consulted. From this compilation of albums, a subcorpus was compiled of albums that would usually be considered as representative of hip-hop. Of course, the decision whether to include an album or not is not an easy one. The criterion applied was whether the featured artist was primarily considered a rapper/hip-hopper (information taken from www.discogs.com). Nelly, for instance, is primarily regarded as a rapper, which is why his album Nellyville was included in the corpus, even though it contains tracks that might rather be considered R&B. Stripped by Christina Aguilera, by contrast, was not included, since the performer is not primarily regarded as a rapper or hip-hopper, although some of the songs in her album would fall under that category. Compilation albums were excluded if they featured more than one artist. All in all, the hip-hop corpus contains the lyrics from 18 albums; 9 from 2003 and 9 from 2011. Table 1 shows the composition of the corpus.

1 My first explorations of the development of pop music registers started in 2012 when the data from 2011 was the most recent data available.

90 

 Rolf Kreyer

Table 1: The corpus analysed in the present study. Album

# words

2Pac – Better Dayz 50 Cent – Get Rich or Die Tryin’ Chingy – Jackpot Eminem – The Eminem Show Ja Rule – The Last Temptation Missy Elliot – Under Construction Nelly – Nellyville Outkast – Speakerboxx/The Love Below Sean Paul – Dutty Rock

20,349 13,711 10,475 13,049 8,425 7,360 13,424 15,043 10,163

Total 2003

111,999

Bad Meets Evil – Hell_The Sequel Eminem – Recovery Jay Z & Kanye West – Watch the Throne Kanye West – My Beautiful Dark … Lil’ Wayne – I am not a Human Being Lil’ Wayne – Tha Carter IV Nicki Minaj – Pink Friday The Black Eyed Peas – The Beginning Wiz Khalifa – Rolling Papers

9,246 15,694 7,529 8,407 7,218 11,520 9,492 7,750 7,564

Total 2011

84,420

Total 2003 + 2011

198,387

Since “[t]he analysis of register characteristics […] will generally focus on the comparison of two or more registers” (Biber and Conrad 2009: 36), the hip-hop data will be contrasted with the data from the remaining albums, in the following referred to as ‘non-hip-hop corpus’ or ‘control corpus’ (cf. Appendix 1 for its composition). Although the number of albums in this control corpus is almost four times as large, the number of words is comparatively small, namely slightly below 350,000. In all the texts, the original punctuation and spelling deviations were retained. This is particularly important for hip-hop, as spelling conventions are an important means of creating identity (cf. Morgan 2001, 2002 and Olivio 2001). Metatextual comments like verse, chorus or bridge or the identity of the singer in duets, for example, were removed from the text. Choruses were spelt out any time they appeared in the text, i.e. a comment like Chorus [2x] was replaced by a repetition of the lines of the chorus. In those cases where it was not clear from the text layout which words are still part of the chorus and which are part of the verse, an



‘Now niggas talk a lotta Bad Boy shit’ 

 91

audio version of the song was consulted. Other kinds of repetition were spelt out if they contained words, e.g. a line like She (When she loves) [3x] was represented three times in the corpus (without the [3x], of course). However, if repetitions consisted of non-lexical material only, they were not made explicit, e.g. Oooooh oooh ooohohhh [x2]. All texts were stored in .txt format. An example of a text is given in (1) below (note that , from German Zeilenumbruch, stands for line break). (1) G-Unit (What) We in here (What) We can get the drama popping We don’t care (What, what, what) It’s going down (What) ’Cause I’m around (What) 50 Cent, you know how I gets down (Down) What up, Blood? (What) What up, Cuz? (What) What up, Blood? (What) What up, Gangstaaa? What up, Blood? (What) What up, Cuz? (What) What up, Blood? (What) What up, Gangstaaa? (50 Cent – What Up Gangsta?)

All analyses of the corpus material were conducted by using AntConc 3.2.4 (Anthony 2011) and Wmatrix (Rayson 2003, 2009).

3 Hip-hop – a register in its own right? Following the definition of ‘register’ provided in Biber and Conrad (2009; cf. Schubert, this volume), hip-hop can be regarded as a register in its own right if we can specify a particular situation of use, a particular set of linguistic features and a particular function of these features vis-à-vis the situation of use. This section will discuss the first two of these three aspects. By way of conclusion, possible functions will be explored.

3.1 Situation of use In many respects, hip-hop and pop songs in general share situational features. For instance, in both cases the channel is identical: the primary mode is (sung) speech and the speech event is captured on a permanent medium (apart from a live concert, of course). Similarly, the settings are identical, e.g. different times and places of communication for the participants. Features of addresser and addressee can be regarded as similar as well, at least on a general level. Production circumstances might be described as ‘revised and edited’ in both cases, although spontaneous rapping plays an extremely important role in hip-hop culture (e.g. during battlin’ or cypha, i.e. rap competitions).

92 

 Rolf Kreyer

Alongside these similarities, two aspects are worth considering by which hip-hop and other popsongs diverge, namely topic and relations among participants. To explore topic-related differences, the corpus-analysis tool Wmatrix (Rayson 2003, 2009) was used. Wmatrix provides web access to the UCREL Semantic Analysis System (USAS), which automatically assigns semantic categories to all of the lexical items in a given corpus. On the whole, the semantic tagger employs 21 broad semantic categories, which are shown in Figure 1. A general and abstract terms

B the body and the individual

C arts and crafts

E emotion

F food and farming

G government and public

K entertainment, sports and games

L life and living things

M movement, location, travel and transport

N numbers and ­measurement

O substances, materials, objects and equipment

P education

Q language and ­communication

S social actions, states and processes

T Time

W world and environment

X psychological actions, states and processes

Y science and technology

H I architecture, housing money and commerce and the home in industry

Z names and grammar Figure 1: The semantic categories of USAS (Archer et al. 2002: 2).

On the highest level of specificity a total of 232 category labels is provided. The category E ‘Emotion’, for instance, contains six subcategories, one of these being subdivided into two further sub-classes. Figure 2 shows the structure of the category ‘Emotion’:



‘Now niggas talk a lotta Bad Boy shit’ 

Subcategory II

 93

Category

Subcategory I

E: Emotion

E1: General

emotion, hysterical

E2: Liking

adore, beloved

E3: Calm/Violent/Angry

gentle, infuriated

E4: Happy/Sad

Example

E4.1: …: Happy

amused, cheerful

E4.2: …: Contentment

dismay, humour

E5: Fear/Bravery/Shock

amazed, dread

E6: Worry/Concern/­ Confident

anxious, edgy

Figure 2: The semantic category ‘Emotion’ in USAS (Archer et al. 2002: 10–11).

An example of the semantic tagging can be seen in (2), which shows a few words from Tupac Shakur’s Still Ballin’. (2) 0000002 0000002 0000002 0000002 0000002

510 520 530 540 550

VV0 PPH1 II APPGE NN1

Blame it on my mama

Q2.2/G2.2- G2.1 Z8 Z5 Z8 S4f

The verb blame is tagged as a ‘speech act term’ (Q2.2) and, alternatively, as either ‘general ethics’ (G2.2) or ‘Crime, law and order: Law & order’ (G2.1). The minus sign following G2.2 indicates the lack of ethics. Note that the tags are not given in alphanumerical order; their sequence depends on the likelihood that USAS assigns to each tag. The following three words, it, on, and my are either tagged as ‘pronoun’ (Z8) or ‘grammatical bin’ (Z5). The tag ‘S4f’ for mama tells us that we are dealing with a kinship term, more specifically, female kin. Like all automatic annotation, semantic annotation is not fully accurate. In particular, hip-hop, with its idiosyncratic spelling and use of words, can lead to problems. For instance, the frequencies of individual semantic categories showed ‘Food and Farming’ (category F) to be a topic of particular relevance for rappers and hip-hoppers – a somewhat counter-intuitive finding. A closer look at the data quickly revealed that this was due to the ambiguity of the string hoe, namely as a farming tool and in the slang use of the term in the sense of ‘promiscuous woman’. Another problem became apparent with the tag G1.2, ‘Politics’: the Patois personal pronoun form dem, which is highly frequent in the lyrics by Sean Paul, was obviously understood as an abbreviation for democrat or related words. Similarly, the form dat (that), presumably misinterpreted as the acronym for digital audio

94 

 Rolf Kreyer

tape, led to a very high frequency of the semantic category K3, ‘Recorded Sound’, which as a consequence has also been ignored. Such problematic cases aside, semantic annotation can give us an idea about topics that are comparatively frequent or rare in hip-hop as opposed to other pop songs. To this end, all semantic categories that showed relative frequencies higher than 0.02 % in the hip-hop corpus were checked against the respective categories in the control corpus, i.e. the non-hip-hop corpus. Table 2 provides an overview of some semantic categories that seem especially suited to paint a particular picture of the artists. Table 2: A sample of semantic categories that are particularly frequent in the hip-hop corpus. Semantic category

Rel. freq. in hip-hop (r1)

Rel. freq. in non-hip-hop (r2)

r1/r2

F3, ‘Cigarettes and Drugs’

0.1 %

0.02 %

5

G2.1, ‘Crime, Law and Order’

0.16 %

0.05 %

3.2

G3, ‘Warfare, Defence, Weapons, Army’

0.34 %

0.12 %

2.83

I1, ‘Money: Generally’

0.27 %

0.07 %

3.86

I1.1+, ‘Money: Affluence’

0.03 %

0.01 %

3

I2.1, ‘Business: Generally’

0.04 %

0.01 %

4

An example of a semantic category that is overrepresented in hip-hop is F3, ‘Cigarettes and Drugs’. While the hip-hop corpus contains 193 (0.1 %) tokens that are assigned to that category, other pop songs only show 50 cases in 293,410 words (0.02 %); i.e. in hip-hop there are five times as many words relating to cigarettes and drugs than in other pop songs. An arguably related category is G2.1, ‘Crime, Law and Order’, whose relative frequency in the hip-hop corpus is 3.2 times that of the control corpus, namely 0.16 % as opposed to 0.05 %. Another comparatively frequent hip-hop category is G3, ‘Warfare, Defence, Weapons, Army’, which is over 2.8 times more frequent in hip-hop than in other pop songs, namely 0.34 % as opposed to 0.12 %. In addition to topics related to crime, drugs and weapons, questions of wealth and money seem to play an important role in hip-hop: the categories ‘Money: Generally’ (I1), ‘Money: Affluence’ (I1.1+) and ‘Business: Generally’ (I2.1) all are at least three times more frequently attested here than in the control corpus. It can be argued that the overrepresentation of the above categories serves to paint a particular picture of the hip-hop artist as an independent, successful and rich person that is involved in (gun) fights and crime. This image that emerges



‘Now niggas talk a lotta Bad Boy shit’ 

 95

from the semantic categories is in line with analyses from rap and hip-hop videos. Jones (1997: 353), for instance, claims that rap music shows a high amount of “socially questionable behaviors [… like] guntalk, drugtalk, the presence of alcohol, bleeping of profanity, and gambling” (Jones 1997: 353; cf. also DuRant et al. 1997; Smith and Boysen 2002; Kreyer 2015). On the whole, it could be argued that the topics explored in hip-hop promote a ‘bad boy’ image of the artist. In addition to topic-related contrasts between pop songs and hip-hop, another major difference seems to lie in the relations among the participants, which, in turn, has a bearing on the communicative purpose of hip-hop as opposed to other pop songs. Relations among participants, are described along four dimensions, namely interactiveness, social roles, personal relationship, and shared knowledge, in Biber and Conrad’s (2009) approach. With regard to this variable, hip-hop seems to obtain a special status. Spady et al. (1999: 67) provide the following quote from the rapper Method Man: “The streets is where you get you stripes at”. This hints at the important role of street credibility, i.e. a hip-hopper’s being close to his or her cultural backgrounds in ‘the streets’. Alim (2006: 113) writes: “Hip-hop Culture not only began in the streets of Black America, but the streets continue to be a driving force in contemporary Hip-hop Culture.” Although successful hip-hop artists, like any other kind of successful pop singer, mostly interact with a displaced audience, “[t]he members of the Black American Street Culture, to whom the artists are directing their lyrics, are not physically present, yet they are in conversation” (Alim 2006: 123). This hints at a relatively high level of (maybe abstract) interactiveness that might not be typical of other pop songs. Similarly, the artists’ focus on street identity and group solidarity seems to have important consequences on the other three dimensions of participant relations: artists assume a relation with the members of their audience that can be characterised by relative similarity of status, a huge amount of shared knowledge (which has been gained on the streets) and a personal relationship that would be described as friends or brothas and sistas, rather than that of star and fan as in many other pop music genres. This special relation of artist and audience leads to an additional communicative purpose, namely that of “staying street”, i.e. of staying connected to the streets and to their cultural background. Hip-hoppers use their art to “represent ‘the streets’” but at the same time “to connect with the streets as a space of culture, creativity, cognition, and consciousness” (Alim 2006: 124). A particularly impressive example of this is provided by JaRule’s Connected from the album The Last Temptation.

96  (3)

 Rolf Kreyer

We world wide connected, and ya’ll don’t want to fuck with us In the streets we respected, so ya’ll don’t want to fuck wit us World wide connected nigga, ya’ll don’t want to fuck wit us We gangster ass niggas and we hard to hit Murder Inc in the role who could fuck wit this

On the whole, then, the situational characteristics of hip-hop and other pop songs warrant the status of hip-hop as a register in its own right.

3.2 Linguistic features This section discusses orthographical, lexical and grammatical phenomena as possible register features/markers.

3.2.1 Orthographic features – -er/-a and -s/-z Non-standard spelling is a common feature in written hip-hop culture, which according to Beers Fägersten (2008: 227) “permeate[s] nearly all word types”. Of the 10 most frequent words in her corpus, all of them grammatical, of course, seven have non-standard alternatives, including the pairs the/da, you/u and that/ dat. In addition, she finds final orthographic -a as a substitution for both morphemic and non-morphemic -er, as in rappa, younga and holla, neva, respectively. Whereas this example of idiosyncratic spelling represents non-standard phonology, the frequently occurring word-final -z is usually used as a spelling variant that represents standard phonology more precisely than the standard spelling -s. In his study on spelling conventions in rap music, Olivio (2001: 73) distinguishes between two types of non-standard orthography, namely spelling variants that represent “distinctive features of AAVE [African American Vernacular English] phonology and syntax” and those that do not. He argues “that the meaning of the non-standard orthographic choices depends on its contrast with standard forms” (Olivio 2001: 73). This hints at a conscious decision on the part of the writer to use non-standard orthography. After all, writers seem to be aware of their deviation from the standard, as Olivio argues convincingly. In his corpus, an instance like fo’ shows the awareness of the final consonant that we find in the standard variant for. Similarly, the fact that bombers occurs as bombas in his data shows an awareness of the standard silent in the middle of this word. Similarly to what was discussed above, Olivio (2001: 72) interprets these choices as:



 97

‘Now niggas talk a lotta Bad Boy shit’ 

another way of addressing the particular audience […]. In other words, rap artists construct themselves as ‘authentic’ through the use of language […,] through the use of locally significant images, sounds, and written texts.

He, too, reports on the ‘r-lessness’ of AAVE, as in the two examples above or in cases like gangsta, rida, murda etc. In some cases, stressing the AAVE-pronunciation leads to a decisive shift in meaning, as the late Tupac Shakur points out regarding nigga: “Niggers was the ones on the rope, hanging off the thing; Niggas is the ones with gold ropes, hanging out at clubs” (Lazin 2003). In the following we will take a look at two idiosyncratic spelling features, namely orthographic -a instead of -er and word-final -z as a plural marker. Table 3 shows the frequency of these two non-standard spelling variants in the hip-hop corpus and the non-hiphop control corpus2. Table 3: ‘r-less’ forms in the hip-hop corpus and the non-hip-hop control corpus. Token anotha

Hip-hop Hip-hop Non-hip-a -er hop -a

Token

Hip-hop Hip-hop Non-hip-a -er hop -a

11

83

0

mutha

1

0

0

balla

0

10

1

Muthafucka

1

0

0

betta

12

43

0

muthafucka

7

0

0

bigga

1

16

0

muthafuka

1

0

0

brotha

1

16

3

neitha

1

7

0

Crossova

1

0

0

neva

13

463

0

deala

1

6

0

nigga

613

0

12

docka

1

0

0

Numba

1

49

0

Exploda

1

0

0

otha

2

101

0

figga

1

19

0

Ova

5

188

0

fucka

1

2

0

playa

18

26

3

41

6

4

Rida

3

4

0

gangsta

2 The frequencies shown here are not entirely unproblematic because the texts were primarily taken from lyrics archives (i.e. are most likely transcribed by fans) and not from official booklets. To some extent, then, the numbers represent the audience rather than the artists themselves. However, they still provide us with an idea of the use of non-standard spelling within the hiphop community, of which the artists want and claim to be a part.

98 

 Rolf Kreyer

Table 3(continued) Token

Hip-hop Hip-hop Non-hip-a -er hop -a

Token

Hip-hop Hip-hop Non-hip-a -er hop -a

Gangstaa

3

0

0

rocka

2

2

0

Harda

1

12

0

stoppa

1

0

0

hotta

1

13

0

stunna

1

1

0

killa

2

15

0

sucka

1

6

0

lova

0

3

12

Sucka

2

0

0

mobsta

1

1

0

supa

1

26

0

motha

1

33

16

swagga

2

6

0

mothafucka

7

51

0

trigga

3

6

0

Motherfucka

1

7

0

wanksta

8

0

0

muhfucka

1

0

0

whateva

4

37

0

Murda

4

112

0

Table 3 provides the frequencies of ‘r-less’ forms in hip-hop and non-hip-hop songs (columns ‘Hip-hop –a’ and ‘Non-hip-hop –a’, respectively). In addition, it gives the frequencies with which regularly spelt forms occur in the hip-hop texts (‘Hip-hop –er’). In the corpus we find a total of 45 ‘r-less’ types. 43 of these are attested in the hip-hop corpus. The control corpus, by contrast, only shows seven types of this particular kind of idiosyncratic spelling. With regard to type frequency, we see clearly that this spelling phenomenon is a feature highly typical of hip-hop. It is not surprising that this huge difference in type frequency results in a huge difference in token frequency, namely 785 in hip-hop texts as opposed to 51 in non-hip-hop texts. It is interesting to note, though, that the number of regularly spelt forms is usually higher than that of non-standard forms, even in hip-hop (see below for an explanation), notable exceptions being nigga, gangsta, muthafucka and wanksta, whose spelling is predominantly non-standard. Still, ‘r-less’ forms are a pervasive feature in hip-hop, much more so than in other pop songs: although the number of 51 tokens is fairly substantial, 28 of these occur in merely two pop songs, namely the 12 instances of lova and the 16 instances of motha. The former are all found in the song Eenie Meenie by Sean Kingston featuring Justin Bieber and all instances of motha occur in Girls by Beyoncé. Interestingly, in both cases these unconventional forms are part of the chorus in otherwise rather conventionally spelt songs.



‘Now niggas talk a lotta Bad Boy shit’ 

 99

(4) Shawty is a eenie meenie miney mo lova (Eenie Meenie) (5) Who run this motha? Girls! (Girls!) Table 4: Word-final orthographic -z as plural marker in hip-hop and non-hip-hop popsongs. Token

Hip-hop -z

Hip-hop -s

Non-hip-hop -z

Boyz

21

41

0

Dredz

1

0

0

gangstaz

0

8

1

Gunnerz

1

0

0

Gunz

2

0

0

Hoez

1

148

0

Killaz

1

6

0

178

380

0

Outlawz

6

0

0

Ridaz

8

5

0

Niggaz

Word-final orthographic -z is considerably less frequent both as far as types and tokens are concerned. In the data we find ten different types all in all (‘hypercorrect’ tokens like beatz or nutz, in which the voiced sibilant is not the correct plural allophone, were excluded), nine of which are only attested in the hip-hop corpus, totalling 219 tokens. The single type that occurs in the control corpus is gangstaz with the frequency of 1. Interestingly, this one occurrence appears in the song That’s how you like it by Beyoncé, featuring the rapper Jay-Z, who uses this form in the line shown below: (6) I know you’ve heard I’m a gangsta They say “Stay away from them gangstaz” They never change up, or pull they pants up (Beyoncé: That’s how you like it)

A comparison of the use of non-standard and standard variants (both in the case of ‘r-less’ forms and orthographic -z) quickly reveals that in most cases the standard still is the preferred version of spelling even in the hip-hop corpus. This finding hints at a twofold function of spelling in hip-hop lyrics, as Olivio (2001: 72) points out: […] the use of non-standard orthographic choices may be another way of addressing the particular audience, while these forms appear alongside standard orthographic forms

100 

 Rolf Kreyer

which are available to be consumed by a more general audience. In other words, rap artists construct themselves as ‘authentic’ through the use of language and accounts of the social and economic realities in late-capitalist society, and the effects of this reality on the lives of rap artists and their communities; but they also construct an ‘authentic’ audience through the use of locally significant images, sounds, and written texts.3

The only consistent use of non-standard spelling in the present corpus is shown in the texts by the Jamaican rapper Sean Paul. His texts seem to be primarily addressed at a specific audience consisting of speakers of Patois. Consider the example below: (7) So how can they waan big up dem chest But they dun know Dutty Cup we deh ya rated as di best A wouldn’t they love diss this is Sean-A-Paul this We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)

The generally mixed occurrence of standard and non-standard spelling in hip-hop non-withstanding, the data presented above show that the word-final -a instead of -er as well as -z as plural marker can be regarded as a register feature of written hip-hop lyrics (of course, partly influenced by attempts to mirror pronunciation while recording or in the actual performance).

3.2.2 Lexical aspects Other possible register features or even register markers can, of course, be found in the lexis of hip-hop, particularly taboo expressions. Beers Fägersten (2008) reports on the frequency of taboo terms as a feature of hip-hop. In her analysis of a 100,000-word corpus of postings on a hip-hop-message board she found that the frequency of “swear words, profanity or taboo terms” such as shit, fuck, ass, nigga and bitch “suggests that such linguistic behaviour is in fact characteristic of the hip-hop community” (Beers Fägersten 2008: 223–224). These taboo words “serve to discursively represent the hip-hop individual, and subsequently the community as well, by virtue of their recognisability as taboo words” (Beers Fägersten 2006: 29). With some uses of these taboo words we see what Morgan (2002: 121) refers to as inversion, where “an AAE [African American English] word means the oppo-

3 Of course, orthographic choices play a comparatively minor role since the main way of addressing the audience is through the auditory channel.



‘Now niggas talk a lotta Bad Boy shit’ 

 101

site of at least one definition of the word in dominant culture”. The word shit, for instance, “can refer to almost anything  – positions, events, etc.” (Smitherman 2000: 257). The shit is “a person who is the ultimate; most powerful; above all others; top dog” (Smitherman 2000: 257). Another example is the form nigga, where the idiosyncratic spelling signals a decisive shift in meaning, as discussed above. Table 5 shows the 30 words that are most key (according to AntConc) in the hip-hop corpus when compared to the non-hip-hop control corpus. Table 5: The top 30 key word forms in hip-hop when compared to the non-hip-hop corpus. Rank

Token

Freq. hip-hop

Rel. freq. hip-hop

1

nigga

606

0.31

2

shit

626

3

fuck

4

Freq. nonhip-hop

Rel. freq. non-hiphop

Keyness of token in hip-hop

12

0.00

1006.88

0.32

43

0.01

874.36

504

0.26

45

0.02

660.20

niggas

380

0.19

10

0.00

615.11

5

bitch

432

0.22

35

0.01

580.43

6

dem

232

0.12

7

0.00

370.02

7

ass

274

0.14

27

0.01

349.05

8

wit

207

0.11

4

0.00

344.62

9

ya

618

0.32

260

0.09

333.13

10

niggaz

177

0.09

0

0.00

325.09

11

Zoop

142

0.07

0

0.00

260.81

12

yo

253

0.13

52

0.02

239.10

13

hoes

148

0.08

5

0.00

232.88

14

gon’

163

0.08

13

0.00

219.86

15

bitches

150

0.08

9

0.00

215.50

16

fucking

161

0.08

14

0.00

212.40

17

gettin’

167

0.09

21

0.01

196.50

18

em

170

0.09

25

0.01

188.35

19

murder

112

0.06

2

0.00

187.61

20

they

925

0.47

705

0.24

187.39

102 

 Rolf Kreyer

Table 5(continued) Rank

Token

Freq. hip-hop

Rel. freq. hip-hop

Freq. nonhip-hop

Rel. freq. non-hiphop

Keyness of token in hip-hop

21

get

1036

0.53

834

0.28

182.07

22

ai

867

0.44

655

0.22

179.48

23

di

124

0.06

8

0.00

175.54

24

y’all

183

0.09

39

0.01

169.49

25

yuh

90

0.05

0

0.00

165.30

26

u

145

0.07

20

0.01

164.82

27

pussy

109

0.06

5

0.00

164.25

28

them

431

0.22

238

0.08

163.16

29

money

287

0.15

115

0.04

163.03

30

up

643

0.33

454

0.15

155.53

The frequent use of taboo words and profanity that is reported in Beers Fägersten (2006 and 2008) can also be observed in the present corpus, the top five keywords being nigga, shit, fuck, niggas and bitch. Inflectionally related forms occur at rank 10 (niggaz), at rank 15 (bitches) and rank 16 (fucking). In addition, we see a strong preference for terms with strong sexual connotations, such as ass, hoes and pussy. Some of the above list might even be considered register markers. The forms niggaz, Zoop, and yuh do not occur at all in the control corpus. The form Zoop, however, cannot be regarded as indicative of the register, as it lacks the pervasiveness necessary for register features/markers: it only occurs in one song, CG by Nelly.

3.2.3 Grammatical features – copula absence Anyone who has ever listened to hip-hop and has seen hip-hop videos is well aware of the fact that it is an art form which is dominated by African Americans, at least in the US. Are, then, the linguistic features of hip-hop merely a consequence of the AAVE dialect? If that was the case, one would be hard put to argue that these linguistic features fulfil a particular function in a particular situation. An answer to that question is provided by Alim (2009: 117–123) in an analysis of the absence of the present tense copular forms is and are. He compares the



‘Now niggas talk a lotta Bad Boy shit’ 

 103

frequencies of absence from the language of two hip-hoppers, Juvenile and Eve, in two kinds of texts: an interview and their lyrics. For both artists, Alim (2009: 121–122) finds an increase in the frequency of absence […] when moving from the interview data to the lyrical data. […] it is clear that both of these artists display the absent form more frequently in their lyrical data than in their interview speech data. […] the data suggest that the more attention the artists pay to their speech (comparing interviews to lyrics) the more ‘nonstandard’ their speech becomes […].

His claim “that Hip-hop artists are indeed in conscious control of their copula variability” (Alim 2009: 123) suggests that hip-hoppers deliberately make use of AAVE features to achieve a particular (yet to be identified) effect. It makes sense, therefore, to regard idiosyncratic linguistic features as exponents of register. We will now look at patterns where a personal pronoun is either followed or not followed by a present tense form of BE (in the past the copula is not absent; cf. Alim 2006: 117) followed by either a NP (with definite or indefinite article) or an ing-form of a verb, as in the examples below. (8) PersProN + BEpres/ø + a/an I am a pitbull off his leash a nigga that think he a cracker (9) PersProN + BEpres/ø +/the/ I am the baddest bitch in the petstore I the designated driver Chuck never the rider (10) PersProN + BEpres/ø + …ing/…in’ the world is falling and I am rising Nigga you fucking with a changed man

Originally, it was planned to conduct an automatic search for the above patterns. Since Wmatrix provides us with the means to tag corpora, a query for strings of parts of speech seemed to be the method of choice. However, it was soon found that the accuracy of the CLAWS tagger suffered from idiosyncratic syntax and from idiosyncratic spelling conventions, particularly in the hip-hop corpus. As a consequence, the patterns above were identified on the basis of lexical queries, for instance ‘I a/an’, I’m a/an’ or ‘I am a/an’ as the possible instantiations of pattern (8) with the first person singular personal pronoun. The resulting concordances were post-edited to weed out non-target hits, such as those shown below. As can be seen in example (11), a query that is only based on lexical information will also find tokens that end in -ing although they are not progressive forms. Example (12) shows a written representation of an extremely reduced variant of I am going to. The example under (13) shows how problems can arise

104 

 Rolf Kreyer

because of Patois transcription and grammar: a is not the indefinite article in this case. Rather, it seems to be an equivalent to an emphatic do in British English.4 Example (14) is particularly challenging, since the text alone would allow two readings, namely as an instance of the pattern we are interested in or as an appositive construction. The only way to resolve the ambiguity was to listen to the track, which showed that the second reading is the more plausible one. (11) I’m everything you love (Kid Rock: I’m Wrong But You Ain’t Right) (12) I’m a call you as soon as I land (Whiz Kalifa: Top Floor) (13) We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue) (14) We the people / Are we the people? (Metallica: Some Kind of Monster)

The results of our analysis concerning the absence or presence of copula in present tense BE are shown in Tables 6 and 7, which provide a detailed account of the distribution of the individual variants in hip-hop and non-hip-hop, respectively. More specifically, for each personal pronoun the tables provide the frequency of absent, contracted or full form of copula BE either in front of the indefinite article, the definite article or the progressive form (in various realisations) of a verb. Note that for Table 6 an additional row was inserted to include the idiosyncratic written form ya for you. This row was not needed for Table 7, since the form ya could not be found in those songs that were not hip-hop. Table 6: Copula be and copula absence in the hip-hop corpus (‘abs.’, ‘contr.’ and ‘full’ refer to absent, contracted and full form of the copula, respectively). Pattern

I am/ø a/the/…ing

Indef. article

Def. article

…ing/…in’/…in

Total

abs. contr. full

abs. contr. full

abs. contr. full

abs. contr. full

0

233

1

1

88 11

0

You are/ø a/the/…ing

43

23

1

15

22

3

155

79

1

213

124

5

ya are/ø a/the/…ing

0

0

0

0

0

0

15

0

0

6

0

0

He is/ø a/the/…ing

2

5

0

1

7

0

6

14

0

9

26

0

She is/ø a/the/…ing

4

3

0

4

3

0

20

10

0

28

16

0

It is/ø a/the/…ing

0

76

1

0

29

1

3

28

4

3

133

6

We are/ø a/the/…ing

0

0

0

8

1

0

136

15

2

144

16

2

4 I am grateful to André Sherriah for his information on Patois.

715 12

1 1036 24



‘Now niggas talk a lotta Bad Boy shit’ 

 105

Table 6(continued) Pattern

They are/ø a/the/…ing

Indef. article

Def. article

…ing/…in’/…in

Total

abs. contr. full

abs. contr. full

abs. contr. full

abs. contr. full

0

0

0

0

0

0

61

4

0

Total

61

4

0

465 1355 37

Table 7: Copula be and copula absence in the non-hip-hop control corpus (‘abs.’, ‘contr.’ and ‘full’ refer to absent, contracted and full form of the copula, respectively). Pattern

Indef. article

Def. article

…ing/…in’/…in

Total

abs. contr. full

abs. contr. full

abs. contr. full

abs. contr. full

I am/ø a/the/…ing

0

0

73 20

2

You are/ø a/the/…ing

9

60

3

3

98 14

47

290

7

59

448 23

He is/ø a/the/…ing

0

14

9

0

7

0

3

19

1

3

40 10

She is/ø a/the/…ing

1

34

7

0

6

3

18

50

2

19

90 12

It is/ø a/the/…ing

0

83

0

0

50

2

0

118

0

0

251

2

We are/ø a/the/…ing

0

0

0

4

11

1

48

67

2

52

78

3

They are/ø a/the/…ing

0

0

0

0

1

0

4

22

0

4

23

0

Total

152 14

895 16

2 1120 50

139 2050 100

A summary of the results shown in the two tables is provided in Figure 3, which compares the relative frequency of absent copula BE in hip-hop as opposed to non-hip-hop lyrics. As can be seen, the data show a very pronounced preference for copula absence in the hip-hop corpus compared to the non-hip-hop corpus. The largest proportion of copula absence in non-hip-hop songs is found with the personal pronoun we. A closer look at the data shows that, to a large extent, this exception can be explained by the African-American R&B artist R. Kelly. In particular, we find that a total of 17 tokens are found in one song only, namely Ignition. If we ignore this particular song, the relative frequency of copula absence in non-hiphop already drops to 30 %. All in all, these results suggest that copula absence is indicative of hip-hop. Future research will have to show to what extent this particular feature is also pervasive in other possible sub-registers of pop songs, such as R&B.

106 

 Rolf Kreyer

100% 90% 80% 70% 60% 50%

absent_hip-hop

40%

absent_other

30% 20% 10% 0%

I

You

Ya

He

She

It

We

They

Figure 3: Copula absence in the hip-hop and the non-hip-hop control corpus.

4 Conclusion: The functional dimension The concept ‘register’ rests on the assumption that a particular group of texts exhibits a set of features that are frequent and pervasive within this group, while at the same time being more or less rare in other groups of texts. In addition, these features are supposed to fulfil a function vis-à-vis the situation in which the texts at issue are used. Having explored the linguistic features above, this section concludes the paper by providing some remarks on the functional dimension of hip-hop lyrics. In one (maybe two) word(s), the function of hip-hop lyrics may best be described by the term street credibility. Already in our discussion of the situation of use it has become clear that hip-hop artists and their audience partake in a very special kind of relationship. This can be characterised by a high degree of (displaced) interactiveness, not between a star and a fan but between brothaz and sistaz of the same street culture from which hip-hop evolved. One major function of hip-hop lyrics is to demonstrate the artists’ authenticity and to show that they are ‘staying street’. All of the features discussed in the preceding sections can be interpreted along these lines: the major topics as evidenced in the comparatively high frequency of some semantic domains (‘Cigarettes and Drugs’, ‘Warfare …’, ‘Crime, Law and Order’ and money- or business-related concepts) mirror aspects of street life in African American neighbourhoods in the US, where hip-hop evolved. At the same time, idiosyncratic spelling (word-final -a and plural marker



‘Now niggas talk a lotta Bad Boy shit’ 

 107

-z), lexical features (the frequent use of taboo expressions and profanity often with a significant change of meaning) and grammatical characteristics (copula absence) focus on the common language background of the artist and his or her audience. So, when “niggas talk a lotta Bad Boy shit”, as the late Tupac Shakur raps, they portray themselves as representatives of ‘the streets’, while at the same time connecting back to the streets and the people living there.

References Anthony, Laurence. 2011. AntConc (Version 3.2.4) [Computer Software]. Tokyo, Japan: Waseda University. http://www.antlab.sci.waseda.ac.jp (accessed May 2014). Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS category system. University of Lancaster. http://ucrel.lancs.ac.uk/usas/usasguide.pdf (accessed May 2014). Beers Fägersten, Kristy. 2006. The discursive construction of identity in an internet hip-hop community. Revista Alicantina de Estudios Ingleses 19. 23–44. Beers Fägersten, Kristy. 2008. A corpus approach to discursive construction of a hip-hop identity. In Annelie Ädel & Randi Reppen (eds.), Corpora and discourse: The challenges of different settings, 211–240. Amsterdam: John Benjamins. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. DuRant, Robert H., Michael Rich, S. Jean Emans, Ellen S. Rome, Elizabeth Allred & Elizabeth R. Woods. 1997. Violence and weapon carrying in music videos: A content analysis. Archives of Pediatrics and Adolescent Medicine 151(5). 443–448. Forman, Murray & Mark Anthony Neal (eds.). 2004. That’s the joint! The hip-hop studies reader. New York: Routledge. Jones, Kenneth. 1997. Are rap videos more violent? Style differences and the prevalence of sex and violence in the age of MTV. Howard Journal of Communication 8(4). 343–356. Kövecses, Zoltan. 2002. Metaphor: A practical introduction. Oxford: Oxford University Press. Kreyer, Rolf. 2012. ‘Love is like a stove – it burns you when it’s hot’: A corpus-linguistic view on the (non-) creative use of love-related metaphors in pop songs. In Sebastian Hoffmann, Paul Rayson & Geoffey Leech (eds.), English corpus linguistics: Looking back, moving forward, 103–115. Amsterdam: Rodopi. Kreyer, Rolf. 2015. ‘Funky fresh dressed to impress’: A corpus-linguistic view on gender roles in pop songs. International Journal of Corpus Linguistics 20(2). 174–204. Kreyer, Rolf & Joybrato Mukherjee. 2007. The style of pop song lyrics: A corpus-linguistic pilot study. Anglia 125. 31–58. Lakoff, George & Mark Johnson. 1980. Metaphors we live by. Chicago: Chicago University Press. Lazin, Lauren. 2003. Tupac: Resurrection. Paramount. Miethaner, Ulrich. 2001. The BLUR (Blues Lyrics Collected at the University of Regensburg) corpus: Blues lyricism and the African American literary tradition. Current Objectives of Postgraduate Studies 2. http://copas.uni-regensburg.de/article/view/64/78 (accessed 3 January 2015).

108 

 Rolf Kreyer

Miethaner, Ulrich. 2005. I can look through Muddy: Analyzing earlier African American English in blues lyrics (BLUR). Frankfurt am Main: Peter Lang. Morgan, Marcyliena. 2001. ‘Nuthin’ but a G thang’: Grammar and language ideology in hip-hop identity. In Sonja L. Lanehart (ed.), Sociocultural and historical contexts of African American Vernacular English, 187–210. Athens: University of Georgia Press. Morgan, Marcyliena. 2002. Language, discourse and power in African American culture. Cambridge: Cambridge University Press. Mukherjee, Joybrato. 2000. ‘Krisis at Kamp Krusty’: Deviant spellings in popular culture as examples of medium-dependent graphic presentation structures. Arbeiten aus Anglistik und Amerikanistik 25. 161–172. Murphey, Tim. 1989. The where, when and who of pop song lyrics: The listener’s prerogative. Popular Music 8. 58–70. Murphey, Tim. 1990. Music and song in language learning: An analysis of pop song lyrics and the use of music and song in teaching English to speakers of other languages. Bern: Lang. Murphey, Tim. 1992. The discourse of pop songs. TESOL Quarterly 26. 770–774. Olivio, Warren. 2001. Phat lines: Spelling conventions in rap music. Written Language and Literacy 4. 67–85. Rayson, Paul. 2003. Matrix: A statistical method and software tool for linguistic analysis through corpus comparison. Lancaster University: Ph.D. thesis. Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Computing Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/(accessed May 2014). Schneider, Edgar W. & Ulrich Miethaner. 2006. When I started to using BLUR. Accounting for unusual verb phrase patterns in an electronic corpus of Earlier African American English. Journal of English Linguistics 34. 233–256. Schwartz, Kelly D. & Gregory T. Fouts. 2003. Music preferences, personality style, and developmental issues of adolescents. Journal of Youth and Adolescence 32. 205–213. Seidman, Steven A. 1992. An investigation of sex-role stereotyping in music videos. Journal of Broadcasting and Electronic Media 36(2). 209–216. Smith, Stacy L. & Aaron R. Boyson. 2002. Violence in music videos: Examining the prevalence and context of physical aggression. Journal of Communication 52(1). 61–83. Smitherman, Geneva. 2000. Black talk: Words and phrases from the hood to the Amen corner. Boston: Houghton Mifflin Company. Spady, James G., Charles G. Lee & H. Samy Alim. 1999. Street conscious rap. Philadelphia: Unum Loh Publishers. Werner, Valentin. 2012. Love is all around: A corpus-based study of pop lyrics. Corpora 7(1). 19–50.



‘Now niggas talk a lotta Bad Boy shit’ 

 109

Appendix 1: The non-hip-hop corpus Top 50 Albums – Non-hip-hop 2003

Top 50 Albums – Non-hip-hop 2011

3 Doors Down – Away From The Sun Aaliyah – I care 4 U Alan Jackson – Greatest Hits II … Audioslave – Audioslave Avril Lavigne – Let Go Beyoncé – Dangerously In Love Celine Dion – One Heart Cher – The Very Best Of Cher Christian Aguilera – Stripped Coldplay – A Rush Of Blood To The Head Dixie Chicks – Home Elvis Presley – 30 #1 Hits Evanescence – Fallen Faith Hill – Cry Good Charlotte – The Young And … Hilary Duff – Metamorphosis Jennifer Lopez – This Is Me … Then John Mayer – Room For Squares Justin Timberlake – Justified Kelly Clarkson – Thankful Kenney Chesney – No Shoes, … Kid Rock – Cocky Linkin Park – Meteora Luther Vandross – Dance With My Father Matchbox Twenty – More Than You … Metallica – St. Anger R. Kelly – Chocolate Factory Rascal Flatts – Melt Rod Stewart – It Had To Be You … Santana – Shaman Shania Twain – Up! Tim McGraw – Tim McGraw And … Toby Keith – Unleashed

Brad Paisley – This Is Country Music Adele – 19 Adele – 21 Beyoncé – 4 Bon Jovi – Greatest Hits Britney Spears – Femme Fatale Bruno Mars – Doo-Wops And Hooligans Chris Brown – F.A.M. E. Coldplay – Mylo Xyloto Florence and the Machine – Lungs Foo Fighters – Wasting Light Glee – The Music; Season 2 Glee – The Music, The Christmas … Jackie Evancho – Dream With Me Jackie Evancho – O Holy Night Jason Aldean – My Kinda Party Josh Groban – Illuminations Justin Bieber – My World 2.0 Justin Bieber – My World’s Acoustic Justin Bieber – Never Say Never … Katy Perry – Teenage Dream Keith Urban – Get Closer Kenny Chesney – Hemingway’s Whiskey Kid Rock – Born Free Lady Antebellum – Need You Now Lady Antebellum – Own the Night Lady Gaga – Born This Way Mumford and Sons – Speak Now P!nk – Greatest Hits … So Far!!! R. Kelly – Loveletter Rascal Flatts – Nothing Like This Rihanna – Loud Sugarland – The Incredible Machine Susan Boyle – The Gift Taylor Swift – Speak Now The Band Perry – The Band Perry The Black Keys – Brothers Tony Bennett – Duets 2 Zac Brown Band – You Get What You Give

Teresa Pham

The register of English crossword puzzles: Studies in intertextuality Abstract: Despite their popularity, crossword puzzles have so far been neglected in text-linguistic publications. Therefore, this paper provides a detailed analysis of crosswords. As a textual variety related to a specific situation, fulfilling specific functions and displaying pervasive, frequent linguistic and formal features, this type of linguistic riddle must be regarded as an independent register according to the framework by Biber and Conrad (2009). Moreover, a detailed linguistic ana­ lysis establishes non-cryptic and cryptic crosswords as two distinct sub-registers. For the purpose of exploring the role of intertextuality in those two sub-registers, a corpus of 270 intertextual non-cryptic and cryptic clue-answer pairs from The Sun (N.N. 2009) and The Times (Browne 2009) was compiled. A quantitative ana­ lysis of this corpus reveals that intertextual references in cryptic puzzles primarily target classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles additionally require knowledge of Anglo-American popular culture. The qualitative analysis of the corpus discusses the particular forms and functions of intertextuality in non-cryptic and cryptic puzzles (Stocker 1998), providing also an explanation for their use from a cognitive linguistic perspective (Geeraerts & Cuyckens 2007) as well as a comparison with intertextuality in other registers. The paper shows that intertextual references and their particular forms and functions may be distinctive features of certain registers. Intertextuality is context-­ dependent and used with a particular communicative function and should thus be incorporated as one possible feature into the linguistic analysis of registers according to the framework by Biber and Conrad (2009).

1 Introduction Crossword puzzles (or simply crosswords) are the most popular type of linguistic puzzle today (cf. Augarde 2003: 57) and hold a permanent place in most British and American newspapers. Given this prominence in regular, if not everyday language use, their marginalisation as a register in text linguistic analysis and the resulting scarcity of relevant linguistic publications are surprising. Most pubTeresa Pham, University of Vechta

112 

 Teresa Pham

lications on crosswords belong to the discipline of psychology (e.g. Hambrick, Salthouse, and Meinz 1999; Nickerson 2011; Underwood, Deihim, and Batt 1994; Witte and Freund 1995) or examine crosswords from the perspective of didactics (e.g. Mollica 2007; Weisskirch 2006) or cultural studies (e.g. Cornell and Cornell 1980; Stratmann 1995). Other types of word games have been studied in detail (e.g. Dienhart 1998; Fix 2011; Pepicello 1980) and crosswords have been analysed even from a general linguistic (though not specifically register-based) perspective (e.g. Coffey 1998; Mok 1987). Furthermore, some text linguistic publications explicitly refer to puzzles or even crosswords as a register (e.g. Heinemann 2000: 610–611; Furthmann 2006: 133; Rolf 1993: 258). Hence, while their status as a distinct register is largely uncontested, the field of register studies still lacks specific analyses of crosswords. Therefore, this paper first provides a register analysis of crosswords following the framework by Biber and Conrad (2009; see also Schubert’s introduction to this volume). It then reports the results of a corpus study on English-language crosswords, focusing on the role of intertextuality in the constitution of this register.

2 Crossword puzzles as a register The OED (Simpson and Weiner 2015) defines crosswords as “puzzle[s] in which a pattern of chequered squares has to be filled in from numbered clues”. Accordingly, crosswords are a type of word game in which answers to clues have to be inserted into a grid of boxes.

2.1 Situational analysis I. Participants: The clues of crosswords are provided by a setter or compiler, who usually remains anonymous or works under a pseudonym. Puzzles are addressed to a plural, yet un-enumerated set of solvers, who, in most cases, work individually and neither interact with setters nor are in direct, personal contact with them. Furthermore, there is some disagreement on the social status of setters and solvers. Since solving crosswords requires thorough general and sometimes even expert or ‘esoteric’, i.e. uncommon or specialist knowledge, Partridge (1992: 504) draws the following sociolinguistic profile of typical setters and solvers: “humanistically educated speakers of Standard English, with a reasonably deep basis of Western culture, a general knowledge of literature, history, geography



The register of English crossword puzzles: Studies in intertextuality 

 113

and current affairs, familiar with and perhaps active in what have been classed as middle-class sports”. However, since certain strategies of codification, chunks of knowledge and even clues are recurrent, crossword experience is also a major predictor of crossword proficiency (cf. Hambrick, Salthouse, and Meinz 1999: 140). From the cognitive linguistic perspective, this correlation, like the phenomenon of agenda-setting (cf. Scheufele and Tewksbury 2007), is due to the fact that frequent activation makes cognitive representations more easily retrievable. Therefore, others (e.g. Scott and O’Donnell 1998: 237) claim that the knowledge and skills necessary for crosswords can be acquired by everyone and consequently regard crosswords as democratic. II. Production circumstances and channel: With their close interdependence between clues and answers, crosswords result from a careful and time-­consuming process of planning and editing. The reception process may be equally time-consuming and non-linear. Therefore, the written mode is one essential characteristic of crosswords – even in the digital age, where puzzles can be downloaded from websites or generated by computer programmes or applications on mobile devices. Furthermore, what can be considered as a marker of crosswords and what is equally dependent on their appearance in writing is their physical layout on the page. Answers must be inserted, letter by letter, into a grid available either on paper or digitally and consisting of white (generally lights; cf. Scott and O’Donnell 1998: 219) and black squares (blocks; cf. Moorey 2008: 5). The corresponding numbering of clues and squares indicates into which light the first grapheme of the respective answer has to be inserted. Subsequent letters of the answer are inserted either horizontally or vertically into the grid, depending on whether the clue was labelled Across or Down. Answers are interdependent by their intersecting in so-called crosslights or checked letters (cf. Biddlecombe 2009). Consequently, each correct answer will simplify the search for subsequent intersecting answers to a greater or lesser extent (cf. Nickerson 2011; Goldblum and Frost 1987). The number of letters which are part of only one answer (unchecked letters or unches) is an indicator of the difficulty of a crossword (cf. Augarde 2003: 63; Scott and O’Donnell 1998: 219). III. Setting: Setters and solvers do not share the physical context of communication. As already mentioned, crosswords (as well as their solutions) are usually originally printed in newspapers, i.e. in a public space, but are typically solved in private. Heinemann (2000: 610–611) therefore assigns them to the (semi-)official, public domain. IV. Purposes: Crosswords are devoid of the usual purpose of language use, which is communication (cf. Schlepper 1981: 63). On the contrary, the primary purpose of crosswords is to entertain and delight the addressee: they allow

114 

 Teresa Pham

setters and solvers alike to manipulate language irrespective of established rules and conventions and thus “provide an opportunity of handling at one’s whim a medium which in other situations very much has a will of its own” (Schlepper 1981: 78; cf. Augarde 2003: vii). However, crosswords may also provide social pleasure when they are solved cooperatively or competitively. Finally, crosswords may be completed to test or consolidate one’s knowledge, to maintain or to boost one’s cognitive capacities (e.g. one’s memory capacity or mental flexibility). Medical research even suggests that such mental exercise reduces the risk for certain diseases like dementia (cf. Moorey 2008: 3). In order for crosswords to fulfil these functions, it is essential that they, despite answers being encoded, are devised to be solvable by the target ‘solvership’. While an unsolvable puzzle causes frustration, the ability to solve a puzzle is experienced as a success and provides the pleasurable feeling of being part of the intellectual elite.

2.2 Analysis of linguistic features 2.2.1 General features of crossword puzzles From a discourse analytic perspective, the basic building blocks of crosswords are adjacency pairs, each consisting of a clue and an answer. Each clue encodes its respective answer more or less strongly. A figure in brackets at the end of the clue usually indicates the number of letters of the answer. For answers to intersect in the grid, crosswords require a plural, yet variable number of such clue-answer pairs. The first turn is provided by the setter, whereas the second turn is provided by the solver. Since crosswords are intended to be solvable, only one answer is indubitably correct (sometimes also taking into consideration the number of lights or using the crosslights already filled in the grid). However, since clue-answer pairs function independently, there are usually no cohesive ties between them. On the contrary, linguistic means which are usually cohesive (e.g. articles, personal or demonstrative pronouns) may be employed to encode answers according to certain conventions. In some crosswords, the personal pronouns he or she, for example, may not function as anaphoric or cataphoric references to preceding or subsequent noun phrases, but may point to the fact that the words man or girl (or their letters) are part of the answer (cf. Skinner 2008: 25). In rare cases, the adjacency pairs of a puzzle are linked by a common topic, which may be indicated (more or less directly) by its title. Cohesion between adjacency pairs may also be established by an explicit “Cross-reference” (Partridge 1992:



The register of English crossword puzzles: Studies in intertextuality 

 115

501). Thus, in example (1) the clue requires prior identification of the answer to clue number 11: (1) Line also transported 11 to shore (9) – LANDWARDS (Browne 2009: 124)

Apart from that, the only links between clues usually are their appearing together with one uniform layout and the combinatory interdependence of the respective answers in the grid. If, following Halliday and Hasan (1976: 1), a text is defined as “a unit of language in use” whose texture arises from inter-sentential cohesive ties on the surface, crosswords do not normally constitute texts. Besides cohesion, further standards of textuality according to de Beaugrande and Dressler (1981) are not or only partially met: clues are thematically independent and there is no continuity of or even connection between underlying concepts (coherence). Furthermore, even if clues need to be new and creative to be intellectually challenging for solvers, crosswords do not have the function of transmitting information (informativity). However, the setter’s primary intention of entertaining solvers is evident (intentionality) and, although most clues would be unacceptable and irrelevant in usual communicative situations, crossword initiates accept these linguistic inconsistencies as being part of this type of puzzle (acceptability, situationality). Thus, if we define a text as a passage of language which “functions as a unity with respect to its environment” (Halliday and Hasan 1976: 1) and consider cohesion, informativity (cf. Schubert 2012: 23) and also coherence as frequent, but non-obligatory features of texts, then crosswords must certainly be regarded as texts.

2.2.2 Features of non-cryptic and cryptic crossword puzzles There are two basic types of English-language crosswords, generally called non-cryptic or primitive and cryptic puzzles (cf. Schlepper 1981: 61). In the latter type, clues are more obscure than in the former and encode the answers more strongly according to certain conventions (see below). Non-cryptic puzzles, which have been published since 1913 (cf. Stephenson 2007: 7), are common in most European and non-European countries. Cryptic crosswords emerged in England towards the end of the 1930s (cf. Scott and O’Donnell 1998: 211). Today they are an integral part of British culture and are regularly published (often alongside non-cryptic puzzles) in most British magazines and newspapers (quality as well as popular, national as well as regional and local). Cryptic crosswords have even influenced puzzles outside Great Britain: cryptic clues occur in some American dailies such as The New York Times (Variety puzzle) and some French newspa-

116 

 Teresa Pham

pers (e.g. Le Figaro, Le Nouvel Observateur; cf. Mok 1987: 98). Since the 1970s, Die Zeit, a weekly national German quality paper, has been publishing a type of crossword puzzle which combines cryptic and straightforward clues (Um die Ecke gedacht, literally ‘thought outside the box’). However, the British cryptic crossword remains unique: “Although traces of the cryptic crossword can be found in some European countries, it is nowhere developed to anything like the extent it has now reached in the UK […]. German-language puzzles are those which come closest to the British model […]. By and large, however, these are all relatively modest by British standards” (Scott and O’Donnell 1998: 211–213). A quantitative analysis performed on 20 puzzles (523 clue-answer pairs) from The Times (cryptic puzzles; Browne 2009), The Guardian (non-cryptic puzzles; Rusbridger 11.–16.05.2013) and The Sun (two-speed crosswords giving a non-cryptic and a cryptic clue for each answer; N.N. 2009) confirms the basic distinction between the two types of puzzle: Table 1: Quantitative analysis of non-cryptic and cryptic puzzles Non-cryptic puzzles

Cryptic puzzles

Length of clues (orthographic units delimited by blanks)

The Sun: 2.1 The Guardian: 3.3

The Sun: 6.1 The Times: 6.8

Average: 2.7

Average: 6.5

Length of answers (letters)

The Sun: 6.2 The Guardian: 6.7

The Sun: 6.2 The Times: 7.5

Average: 6.4

Average: 6.9

Despite variability within each type, clues and answers are considerably shorter in non-cryptic than in cryptic puzzles. Furthermore, both turns are morpho-­ syntactically more complex in the latter type. Non-cryptic clues are usually very simple phrases, often consisting of a head only as in (2), sometimes in combination with a simple pre- or postmodifier (3), whereas the corresponding answers are mostly single content words or proper names: (2) Flowery (6) – FLORAL (3) Mediterranean volcano (4) – ETNA (N.N. 2009: 105, 103)

Cryptic clues, by contrast, resemble block language headlines. When they are constituted by phrases, these are typically more complex, containing for example longer prepositional phrases or (finite or nonfinite) clauses as postmodifiers (4). Cryptic clues may also have an often elliptical clause structure, taking the form



The register of English crossword puzzles: Studies in intertextuality 

 117

of simple or complex, mainly declarative sentences (cf. Quirk et al. 1985: 40, 803) as in (5). In addition to single content words and proper names, the answers to cryptic clues often comprise morphologically complex lexemes (e.g. idioms as in (5), compounds nouns or multi-word verbs) as well as function words (6) or phrases (4). (4) Bloomer made by top performer in nativity scene? (4,2,9) – STAR OF BETHLEHEM (5) Find a lovely partner to share a seasonal moment (4,1,7) – PULL A CRACKER (6) Jarring we hear’s in contrast (7) – WHEREAS (Browne 2009: 122, 110, 124)

Furthermore, the relationship between the turns of the same non-cryptic adjacency pair is overtly governed by the “Rule of Inflection” and the “Rule of Identity” (Schlepper 1981: 67). The former prescribes that clue and answer must “be able to fulfil the same syntactic function” (Schlepper 1981: 67). Therefore, they usually have the same inflection (7) and/or belong to the same formal syntactic category. However, a prepositional phrase may also point to an adverb or a nonfinite clause to an adjective (8). (7) Least cooked (6) – RAREST (N.N. 2009: 111) (8) Lacking injury (6) – UNHURT (Rusbridger 11.–16.05.2013)

The latter rule dictates that clue and answer have to be semantically equivalent, allowing (absolute or near) synonymy (9), negated antonymy (10), hyponymy (11) as well as paraphrases and definitions of variable precision (12). (9) Applaud (5) – CHEER (10) Not dead (5) – ALIVE (11) Hairdo (4) – PERM (12) Short-tempered person (7) – HOTHEAD (N.N. 2009: 9, 25, 69, 27)

Therefore, according to Greimas (1970: 287), crosswords work like a reverse dictionary, where only the definitions are given and the appropriate lemmata have to be provided by the solver. Yet to complicate matters, solving a non-cryptic clue may require considering polysemy, homonymy and proper names. In addition, the relationship between clues and answers may also be syntagmatic, being based on phraseological units such as idioms or collocations. The aforementioned rules apply less overtly to cryptic crosswords. The reason for this opacity is that cryptic clues have a binary structure. It is only the definition (underlined in the following examples of cryptic clues) that is syntactically and semantically equivalent to the answer. The subsidiary indication, however, encodes the same answer a second time semantically, phonologically

118 

 Teresa Pham

or orthographically. Thus, in example (13) the definition huge is a synonym of the answer, whereas the remaining subsidiary indication encodes the answer again, orthographically. (13) Huge mines exploded around me (7) – IMMENSE (Browne 2009: 28)

Only two clue types deviate from this basic structure: In so-called all-in-one or & lit clues (‘and literally true clues’; cf. Moorey 2008: 22), which are sometimes marked by exclamation marks, the definition and the subsidiary indication are merged (14). Cryptic definition clues (cf. Moorey 2008: 27), by contrast, consist of a misleading definition or paraphrase of the answer (15). They frequently rely on homonymy or a morphological reinterpretation of lexemes or idiomatic expressions and may be marked by question marks. Non-cryptic clues were banned when the rules for cryptic puzzles were reformulated by setters in the 1930s and 1940s (cf. Scott and O’Donnell 1998: 236). (14) Hood’s resort few disturbed (8,6) – SHERWOOD FOREST (15) One may move on to another American story (9) – ESCALATOR (Moorey 2008: 148, 106)

The different types of clue-answer relationship typical of non-cryptic and cryptic puzzles are illustrated schematically in Figure 1.

Figure 1: Clue-answer relationship in non-cryptic and cryptic crosswords (CWPs)

A cryptic clue thus offers two approaches to the answer and points to it unambiguously, if interpreted correctly. Some crossword initiates therefore insist that cryptic crosswords are easier to solve than non-cryptic ones (cf. Skinner 2008: 7; Schlepper 1981: 75). However, a solver may encounter several difficulties in inter-



The register of English crossword puzzles: Studies in intertextuality 

 119

preting cryptic clues. First, the definition and the subsidiary indication are integrated into a stretch of language which seemingly permits literal interpretation. Yet the sole purpose of the surface structure of the clue is to mislead the solver. Its meaning, however, is exhausted once the clue has been solved. Therefore, clues have to be regarded as a succession of fragments which correspond to neither morpho-syntactic nor orthographic units, since word boundaries may be shifted and punctuation marks overruled: “A cryptic clue is a sentence or phrase, appearing to make some kind of sense and putting ideas into the solver’s head. These often have little or nothing to do with the answer, which can be derived by interpreting all or part of the clue in ways which are less obvious” (Biddlecombe 2009). Second, the definition and the subsidiary indication are unmarked, may occur in variable order and may even overlap. There may also be words or phrases which are superfluous for solving the clue (cf. Schlepper 1981: 66), added solely for enhancing the coherence of the surface structure. Third, even when the definition has been identified, it may be a zero-derivation, polyseme or homonym and thus, due to the absence of any context, syntactically and/or semantically ambiguous. Fourth, the subsidiary indication may contain several operations of codification not necessarily indicated by signal words (for lists of such indicators cf. Stephenson 2007: 35–63; indicators will be underlined by a broken line in the following examples of cryptic clues). Cryptic clues whose subsidiary indication encodes the answer semantically, so-called double or multiple definition clues (for the names of clue types used here cf. Moorey 2008: 13–31; Biddlecombe 2009), contain a second definition. They are usually based on polysemy, homonymy, homography or the metaphorical or literal reinterpretation of one or several lexemes in the clue and/or the answer (16). (16) Poorly educated and characterless? (10) – UNLETTERED (Moorey 2008: 154)

By contrast, homophone clues encode the answer phonologically and are based on the phonological similarity (homeophony) or identity (homophony) of lexemes such as whale and wail in (17). (17) Marine beast’s audible cry (4) – WAIL (Stephenson 2007: 55)

Most frequently, however, a solver has to recompose the answer orthographically. The easiest case of an orthographic codification is a hidden clue, explicitly containing the graphemes of the answer. In the surface structure of the subsidiary indication, these graphemes are either dispersed or contained consecutively, often across word boundaries. Furthermore, it may be necessary to reverse the order of the graphemes contained in or encoded by the subsidiary indication

120 

 Teresa Pham

(anadrome or reversal clues) or to rearrange them (anagram clues). Thus, in (18) the graphemes of live, a synonym of quick, have to appear in inverted order to form a synonym of sin, while in (19) the answer is an anagram of remote: (18) Quick to return to sin (4) – EVIL (Stephenson 2007: 48) (19) Unusually remote shooting star (6) – METEOR (Skinner 2008: 18)

In addition, graphemes may also be substituted (substitution clues) or deleted (take away, apocopative or deletion clues). This is illustrated in (20), where the first letter of gown, a synonym of dress, must be deleted. (20) Possess a topless dress (3) – OWN (Moorey 2008: 20)

In crosswords of a certain complexity, however, answers may be cut into several chunks, which, theoretically, may consist of single letters. These orthographic chunks are then encoded separately, linearly in charade or additive clues and non-linearly in content or container clues. In (21), the graphemes have to be inserted into a synonym of cat, namely lion. Dec, the abbreviation for December, the last month of the year, is added by a charade operation. (21) Statement: Last month, a cat swallowed a rat (11) – DECLARATION (Biddlecombe 2009)

For these operations of codification, all kinds of abbreviations or acronyms may be used, such as of military ranks (e.g. Lt for lieutenant), chemical elements (e.g. Ag for silver) or terms from chess, music or cricket (e.g. W for wicket). Other letter sequences constitute foreign-language articles (e.g. le/la for the, un for one), pronouns (e.g. she for girl) or Roman numerals (e.g. I for one). Therefore, despite the fact that crosswords do not show grammatical cohesion, they may still contain lexemes which otherwise have a cohesive function. Finally, to further complicate the solving of clues, the aforementioned operations of codification can also be combined (complex clues). Thus, three operations are included in (22): heartless indicates the deletion of the central grapheme of the. By a charade operation (see the explanation of charade clues above), R (for Latin rex ‘king’) is added to . This letter sequence is then inserted into inn, a synonym of public house. (22) Confine the heartless king in a public house (6) – INTERN (Gilbert 2001: 64)



The register of English crossword puzzles: Studies in intertextuality 

 121

2.3 Functional analysis In view of the purposes of crosswords, their language is shaped by two diametric requirements: it must, on the one hand, encrypt the answers, yet, on the other hand, point to them unambiguously. In non-cryptic puzzles, in which the syntactic and semantic relationship between clues and answers is straightforward, a solver’s proficiency depends mainly on his or her factual declarative, encyclopaedic as well as metalinguistic knowledge. Only when clues can activate chunks of knowledge which are stored as cognitive representations in the solver’s memory or when appropriate cognitive representations can be constructed in the process of solving the puzzle (e.g. by consulting an encyclopaedia) can those clues be solved. The language of non-cryptic puzzles mirrors this. Most non-cryptic clues permit a literal, syntactically and semantically unambiguous interpretation of the surface structure. Furthermore, they are characterised by structural simplicity and shortness. What primarily accounts for the difficulty of primitive puzzles are, consequently, the currency of the lexemes functioning as answers among the target ‘solvership’ and the extent to which esoteric knowledge is targeted. In addition, non-cryptic clues may constitute semantically unspecific paraphrases, pointing to several answers such as in (23). Such ambiguity can only be resolved by intersecting answers and thus imposes a specific approach to solving the respective puzzle. (23) Atlantic county of Eire (5) – SLIGO (N.N. 2009: 11)

To procure even greater entertainment, cryptic puzzles, by contrast, take playing with words, testing mental flexibility and encoding answers to extremes. Their solution requires not only general knowledge but also expert knowledge, abilities or solution strategies. These may concern the specific conventions of codification, the frequency of certain letters, the completion of incomplete lexemes or the solution of anagrams. Cryptic crosswords thus often rely on the various syntagmatic and paradigmatic as well as coincidental formal relationships within the English language, which are largely irrelevant for everyday language use. Besides knowledge, they consequently depend on “fluid cognition” (Hambrick, Salthouse, and Meinz 1999: 131) or “lateral thinking” (Schlepper 1981: 79), i.e. creativity, mental flexibility and logical, abstract reasoning. This focus on a more complex codification of the answers and a more complex reasoning process in cryptic puzzles is mirrored in their language. The structurally more complex and longer surface structure of cryptic clues only seemingly permits literal interpretation but deliberately aims at misleading the solver. Since operations of codification are not necessarily indicated and since the definition, the subsidiary indica-

122 

 Teresa Pham

tion and possible indicators are not marked, the surface structure of cryptic clues permits multiple interpretations, semantically as well as morpho-syntactically. As with non-cryptic puzzles, the difficulty of cryptic puzzles increases when rare lexemes or specialised or esoteric knowledge are targeted. As against non-cryptic clues, however, once the structure underlying the clue has been recognised and the operations of codification have been identified, well-constructed cryptic clues can be answered unambiguously, even without resorting to crosslights in the grid. The previous analysis showed that crosswords are associated with a particular situation and particular purposes, which are reflected in pervasive formal as well as linguistic features. Consequently, crosswords must clearly be regarded as an independent register according to Biber and Conrad’s definition (2009: 31; see also Schubert’s introduction to this volume). Furthermore, the detailed semantic and morpho-syntactic analysis of crosswords revealed that non-cryptic and cryptic puzzles, despite their being based on the same linguistic building blocks, have developed different strategies for fulfilling their primary purpose as entertainment. They codify answers to a different extent and therefore require different skills on the part of the solver. Since, due to this, non-cryptic and cryptic puzzles differ linguistically, those two types of crosswords must be regarded as distinct sub-registers of the register of crosswords.

3 Intertextuality in crossword puzzles: A corpus study Intertextuality, the seventh standard of textuality according to de Beaugrande and Dressler (1981), implies that knowledge of one or several individual texts or groups of texts (pre-texts) may influence the production and/or reception of another text (the post-text). In registers like newspaper articles or advertisements, intertextuality most frequently takes the form of (unmodified or modified) quotations. Numerous studies have shown that these may have for example the re­presentational function of introducing additional components of meaning into a post-text, the expressive function of supporting the author’s argumentation and/or the conative function of guiding the reader’s reception (cf. Bühler [1934] 1982: 24–33). For an intertextual reference to fulfil (most of) its functions, (more or less extensive) knowledge of the pre-text is required (cf. Schulte-Middelich 1985; Stocker 1998: 73–92). However, since intertextual references are normally doubly referential, pointing to pre-texts as well as to the extra-linguistic world (cf. Pham 2014: 472), most post-texts equally permit a literal, non-intertextual interpreta-



The register of English crossword puzzles: Studies in intertextuality 

 123

tion. So far, however, it has never been studied how intertextuality contributes to the characteristics and purposes of crosswords and to what extent the analysis of intertextual references can contribute to establishing crosswords as a register or non-cryptic and cryptic puzzles as distinct sub-registers.

3.1 Working definitions The term intertextuality was coined in the late 1960s by the Bulgarian linguist and literary critic Julia Kristeva (1968). Yet, although intertextual references occur particularly frequently in texts from the 20th and 21st centuries, intertextuality is by no means an exclusively modern or postmodern phenomenon. On the contrary, references to previous texts or utterances may be regarded as an intrinsic property of human language. Consequently, the study of intertextual references, especially in the fields of rhetoric and literary theory, can be traced back to classical antiquity, albeit under different labels such as parody, quotation or imitation. Today, there are two principal tendencies in research on intertextuality. The theory of intertextuality is historically rooted in post-structuralist literary criticism, which deconstructs the traditional concept of text. Post-structuralists like Kristeva, Barthes and Derrida furthermore regard intertextuality as a characteristic of all texts and consequently contest the autonomy of any text. Thus, intertextuality does not refer back to individual, identifiable pre-texts, but to a “texte infini [infinite text]” (Barthes 1973: 59) or a “texte général [general text]” (Derrida 1972: 125), which is extended to comprise even the ‘social’, ‘cultural’ or ‘historical text’ (cf. Barthes [1968] 1977: 146). However, this ontological conception of intertextuality has never developed a feasible method for textual analysis. Consequently, for actual textual analysis as in the present paper, scholars revert to the second, narrower conception of intertextuality. It regards intertextual references as a gradable feature of some, yet not all texts, examines the forms and functions of such references and, being related to structuralism, approves of the traditional concept of text. For structuralists like Genette (1982) or Riffaterre (1981) intertextuality theoretically refers back to isolated, identifiable pretexts (or groups of pre-texts). It is this narrow conception of intertextuality that was adopted by linguistics in the 1980s. Linguists usually distinguish between typological intertextuality, i.e. the relationships between post-texts and groups of texts (registers, genres, styles or textual patterns), and referential intertextuality, i.e. the relationships between post-texts and individual, identifiable pre-texts. The previous section showed that crosswords should be regarded as an independent register comprising two sub-registers. Typical examples of crossword puzzles thus follow certain conventions and are necessarily characterised by

124 

 Teresa Pham

typological intertextuality. Consequently, for the present study, analyses were limited to referential intertextuality. The term intertextuality was thus understood to comprise only the relationships between a post-text and one or more individual and identifiable pre-texts. The intertextual subcategory of interfigurality (cf. Müller 1991) includes the mention or appearance of figures and authors of pre-texts in a post-text (“re-used figures [and] authors”, Helbig 1996: 115). Therefore, references to pre-textual figures and authors were equally considered in the present study. Moreover, a text was defined broadly as a formally delimited communicative act which usually exists in written or spoken form but may also consist of other visual or acoustic signs.

3.2 Methodology A corpus study on intertextuality in crosswords puzzles was conducted for this paper. Its primary aim was to investigate the particular forms and functions of intertextual references in this type of word game in order to evaluate their importance for crosswords as a register as well as for non-cryptic and cryptic puzzles as sub-registers. In the first half of the 20th century, so-called quotation clues were still used in cryptic puzzles. A citation listed in the Oxford Dictionary of Quotations (Partington 1992) was reproduced literally, explicitly marked by quotation marks, italics, the name of the pre-text and/or the name of the author. One part of the original wording was elided and had to be recovered by the solver as in (24), where the quotation is accompanied by a definition: (24) Consumed. “But answer came there none And this was scarcely odd because They’d ____ every one” (Carroll’s Through the Looking-Glass) (5) – EATEN (Gilbert 2001: 12)

Thus, for devising such clues, the setters relied on their knowledge of those pretexts. In order to identify the answers, solvers had to be able to access similar knowledge of the pre-texts by activating (or constructing) appropriate cognitive representations (cf. Geeraerts and Cuyckens 2007: 170–187). In 1995, however, quotation clues like (24) were forbidden because they were not strictly cryptic and because some puzzles had devoted too much attention to literary background knowledge (cf. Biddlecombe 2009). By contrast, quotation clues like (25) are still to be found in non-cryptic puzzles. (25) “A Nightmare on ____ Street” – ELM (Parker 15.04.2013)



The register of English crossword puzzles: Studies in intertextuality 

 125

This suggests that today references to works of literature or popular culture are considerably more frequent in non-cryptic than in cryptic puzzles and that less knowledge of existing texts is required to solve the latter. Hence, one further aim of the empirical study was to investigate this assumption comparatively by examining intertextual references in the two sub-registers of crosswords as to their frequency, pre-texts, forms and functions. For the corpus, two collections of crosswords were analysed, both published in 2009, i.e. well after the abolition of quotation clues in cryptic puzzles. In total, 80 non-cryptic puzzles (2080 clue-answer pairs) from The Sun (N.N. 2009) and 80 cryptic puzzles (2372 clue-answer pairs) from The Times (Browne 2009) were scrutinised for intertextual references according to the above definitions. When several references occurred in one clue-answer pair or when references pointed to several pre-texts, those were counted separately. This yielded a corpus of 270 intertextual clue-answer pairs (The Sun: 112; The Times: 158) and 295 intertextual references (The Sun: 112; The Times: 183; 38.0 % vs. 62.0 %), which were manually classified into five categories according to their respective pre-text(s). Category (1) comprises references to folkloristic and mythological texts, originally transmitted orally. Clue-answer pairs requiring knowledge of literary texts produced by individual authors according to aesthetic standards are summarised in category (2). References to the visual arts are subsumed under category (3) and subdivided into (a) painting/drawing/sculpture and (b) broadcasting/TV series/ film. For references to music, category (4) was created with the subcategories (a) classical music (both orchestral and vocal) and (b) popular music. Remaining references to religious, philosophical or other theoretical texts constitute category (5). In some cases, the distinction between these (sub-)categories is not clearcut. Thus, further criteria were introduced. For example, popular music, in contrast to classical music, was regarded as being typically commercially oriented, addressed to large audiences and distributed by the music industry. In addition, each group was analysed according to the provenances of the pre-texts or their authors. Thus, texts from Greek and Roman antiquity are classified as Classical, British is the label for pre-texts from the UK and the Republic of Ireland, American denotes pre-texts from the USA, etc. Provenances relevant for less than four intertextual references per category were subsumed under Other. Due to their importance for intertextuality, Shakespeare and the Bible are listed separately (cf. Table 2).

126 

 Teresa Pham

3.3 Quantitative analysis of the corpus The first conclusion we can draw from the quantitative analysis of the corpus is that, on the whole, and contrary to the previous assumption, intertextual references are relatively more frequent in the cryptic puzzles published in The Times than in the non-cryptic ones from The Sun. While crosswords in The Sun contain 1.4 intertextual references on average, puzzles in The Times contain 2.3 intertextual references. Even if references to different pre-texts occurring in the same clue-answer pair as in example (32) are not counted separately, this distributional difference remains obvious (1.4 vs. 2.0 intertextual clue-answer pairs/puzzle). A comparison with the frequency of intertextual references in non-cryptic puzzles from another quality paper, The Guardian (Rusbridger 11.–16.05.2013; 0.8 references or intertextual clue-answer pairs/puzzle), shows that this difference actually depends on the type of crossword and not on the journalistic standards or the addressed readership of the respective newspapers. Consequently, despite there being considerable variability in the frequency of intertextuality within the same sub-register, cryptic puzzles generally require more knowledge of other texts than non-cryptic puzzles. The qualitative analysis of the corpus will shed light on how intertextual references are incorporated into cryptic puzzles, despite quotation clues having been banned. Table 2: Composition of the corpus of intertextual clue-answer pairs Categories and provenances of pre-texts

THE SUN

THE TIMES

AVERAGE

(1) Folkloristic and mythological texts

14.3 (%)

9.3 (%)

11.2 (%)

Classical British Other

8.0 4.5 1.8

4.9 2.7 1.6

6.1 3.4 1.7

(2) Literature

28.6

51.9

43.0

Classical British (excluding Shakespeare) Shakespeare American French Other

1.8 16.1 2.7 3.6 2.7 1.8

2.7 31.1 8.2 4.9 2.7 2.2

2.4 25.4 6.1 4.4 2.7 2.0



The register of English crossword puzzles: Studies in intertextuality 

 127

Table 2(continued) Categories and provenances of pre-texts

THE SUN

(3) Visual arts:

19.6

6.6

11.5

(a) Painting/drawing/sculpture

3.6

3.8

3.7

Italian Other

1.8 1.8

1.1 2.7

1.4 2.4

16.1

2.7

7.8

British American

8.9 7.1

0.5 2.2

3.7 4.1

(4) Music:

22.3

13.1

16.6

(a) Classical music

8.0

10.9

9.8

British Italian Other

1.8 3.6 2.7

3.3 2.7 4.9

2.7 3.1 4.1

14.3

2.2

6.8

8.0 5.4 0.9

1.1 1.1 0.0

3.7 2.7 0.3

(5) Religious, philosophical and other ­theoretical texts

15.2

19.1

17.6

Classical British The Bible Other

0.0 0.9 14.3 0.0

3.3 5.5 7.1 3.3

2.0 3.7 9.8 2.0

(b) Video/broadcasting/TV series/films

(b) Popular music British American Other

THE TIMES

AVERAGE

Note: All values are percentages and are calculated based on the number of intertextual references in the crosswords from The Sun (112), The Times (183) or both newspapers (295; labelled Average). Differences for example between percentage sums (shaded cells) corresponding to (sub-)categories and respective individual percentage values (white cells) corresponding to provenances result from rounding to one decimal place.

Table 2 discloses the most popular pre-textual categories in crosswords in general. Works of literature are by far the most important ones (43.0 %), followed by religious, philosophical or other theoretical texts (17.6 %) and folkloristic and mythological texts (11.2 %). If provenance is considered as well, British literature (including Shakespeare; 31.5 %), the Bible (9.8 %) and myths of classical antiquity (6.1 %) are the most important pre-texts. In addition, Shakespeare is the individual author who is by far most often referred to (6.1 %). This result might be sur-

128 

 Teresa Pham

prising, since it has often been claimed that, at least since the mid-20th century, the traditional pre-texts of the Victorian Age have declined in importance in Anglo-American culture: “until recently Classical mythology, the works of Shakespeare and the Bible were regular sources for compilers” (Scott and O’Donnell 1998: 207; cf. also Hebel 1991: 149). Consequently, the predominance of these pretexts in crosswords may have been even clearer in the first half of the 20th century. This result supports Partridge’s assumption that typical solvers are thoroughly and “humanistically educated” (Partridge 1992: 504). Furthermore, it is equally revealing to compare the favourite pre-texts of the two sub-registers of crosswords. Thus, clues in non-cryptic crosswords from The Sun require knowledge of literary works in general (28.6 %), British literature (excluding Shakespeare; 16.1 %) and Shakespeare (2.7 %) less frequently than clues in cryptic crosswords from The Times (51.9 %, 31.1 % and 8.2 %). By contrast, puzzles from The Sun refer to the Bible (14.3 %) and to the oral tradition (14.3 %), especially to classical mythology (8.0 %), relatively more frequently than puzzles from The Times (7.1 %, 9.3 % and 4.9 %). The reason for these different preferences especially with regard to the traditional pre-texts of the Victorian Age might be that a British solver with an average education can be expected to possess more extensive general knowledge of the Bible and all texts of classical mythology than of the 38 plays and 154 sonnets commonly attributed to Shakespeare (cf. Greenblatt 1997: 65–66, 1923–1976). The most striking distributional differences between the two sub-registers can, however, be found in categories (3b) and (4b). Knowledge of (especially Anglo-American) video, broadcasting, TV series, films and popular music is necessary for the solution of nearly one third of all intertextual non-cryptic clues (30.4 %) but is hardly relevant for cryptic puzzles at all (4.9 %). Cryptic crosswords of the corpus thus primarily target traditional pre-texts like classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles focus on Shakespeare to a smaller, yet on classical mythology and the Bible to a greater extent and additionally require knowledge of texts of the popular, especially Anglo-American culture. However, only a corpus including non-cryptic and cryptic clue-answer pairs from further (popular and quality) newspapers could reveal whether these preferences for certain pre-textual categories are correlated to the respective sub-register of crosswords or to the expected knowledge of the target solvership (or to both).

3.4 Qualitative analysis of the corpus In both sub-registers of crosswords, most intertextual references (95.2 %) involve proper nouns (including titles). Due to their fixed extension but particularly



The register of English crossword puzzles: Studies in intertextuality 

 129

complex intension as well as their high selectivity and explicit markedness (cf. Pfister 1985: 28; Karrer 1985: 106–108), proper nouns contribute to the codification of answers as well as the unequivocal solution of clues. Hence, they are wellsuited for intertextual references in crosswords. In more than two thirds of all intertextual non-cryptic adjacency pairs of the corpus (67.9 %), proper nouns referring to the same pre-text occur in both the clue and the answer, usually in combination with common nouns providing further information on the referent (26). Thus, although these references are unmarked, proper nouns can usually activate the necessary cognitive representations un‑ equivocally even without the grid. (26) Writer of 1984 (6) – ORWELL (N.N. 2009: 77)

In about one third of the non-cryptic clue-answer pairs of the corpus, proper nouns occur either in the answer as in (27) (23.2 %) or, more rarely, in the clue as in (28) (6.3 %), whereas the other component of the pair gives a semantically equivalent common noun or noun phrase. Only one selective proper noun being involved, more pre-textual knowledge is required for correctly associating clue and answer. Furthermore, the solver may encounter a certain ambiguity, which is resolved only when the number of letters of the answer is considered or crosslights are already given in the grid: (27) Opera composer (7) – PUCCINI (28) Puccini work (5) – OPERA (N.N. 2009: 45, 53)

A comparison with non-cryptic puzzles from The Guardian (Rusbridger 11.– 16.05.2013) shows that these two types are particularly typical of this sub-register. In the corpus, only three non-cryptic clues (2.7 %) require exact knowledge of the wording of a pre-text and may thus, despite their featuring no explicit markers, be classified as quotation clues. Interestingly, all three refer to texts of popular culture: the catchphrase of a British comedian and the beginnings of two nursery rhymes. These pre-texts can be expected to be common knowledge among British solvers. (29) Tommy Cooper’s catchphrase (4,4,4) – JUST LIKE THAT (30) Ride a cock horse to here (7,5) – BANBURY CROSS (31) Silver-buckled sailor (5,7) – BOBBY SHAFTOE (N.N. 2009: 71, 147, 155)

In cryptic puzzles, by contrast, proper nouns are used with greater variation as intertextual references. One major difference between the two sub-registers in the corpus is that intertextual proper nouns may occur in the subsidiary indication

130 

 Teresa Pham

of cryptic clues, i.e. as an intermediate step in the solution of the clue (32.9 %). From a cognitive linguistic point of view, especially well-known proper nouns automatically activate easily accessible pre-textual frames. Whereas the frames activated by intertextual references in non-cryptic puzzles are directly relevant for the answers, this is not always the case in cryptic puzzles. Only lexemes in the definition need to be interpreted literally. Intertextual references in the subsidiary indication, however, usually require no pre-textual knowledge at all. They activate frames which mislead the solver and inhibit finding the answer, especially when knowledge of a completely different pre-text is required. Thus, in (32) no knowledge of Lewis or the Lake poets is necessary because the answer, the name of a different poet, is an anagram of the letters given in the subsidiary indication. (32) TV broadcast with C S Lewis and Lake poet (9-4) – SACKVILLE-WEST (Browne 2009: 52)

Cryptic clues whose definitions and answers contain intertextual proper nouns (usually referring to the same pre-text; 15.9 %) resemble the first type of non-­ cryptic clue discussed before: an intertextual name in the definition is often sufficient for an unequivocal solution and only basic pre-textual knowledge is required. Whereas the additional subsidiary indication first complicates the activation of the necessary cognitive representations, once identified, it indicates the correctness or falsehood of the supposed answer. In (33) the name of a Shakespearean spirit also results from the insertion of the Roman numeral for one into an anagram of Lear. Equally, the answer in (34) is not only indicated by the definition but is also confirmed by the subsidiary indication: for the mythological place name the graphemes of no and lava, paraphrased by sign of volcanic activity, are reversed. (33) Shakespearean spirit – one into Lear possibility (5) – ARIEL (34) No sign of volcanic activity about Arthur’s Seat (6) – AVALON (Browne 2009: 132, 56)

When intertextual proper nouns occur in the answer (41.1 %) or, more rarely, in the definition only (5.1 %) and the corresponding counterpart is constituted by a semantically equivalent common noun or noun phrase, as with the second type of non-cryptic clue discussed above, the answer can usually not be inferred unambiguously from the definition alone. However, in these cryptic clues, the subsidiary indication may resolve the ambiguity. Furthermore, such clues require more detailed knowledge of pre-texts than the previous categories. While the definition in (35) does not unambiguously identify the intertextual answer, the subsidiary indication requires the formation an anagram of relies on. By contrast, splitting



The register of English crossword puzzles: Studies in intertextuality 

 131

a couple, i.e. a lady and a man, by S (from succeeded) results in a synonym of the intertextual eponym Casanova in (36). (35) Relies on horribly haunted castle? (8) – ELSINORE (36) Casanova succeeded splitting couple? (5,3) – LADY’S MAN (Browne 2009: 40, 72)

Moreover, seven cryptic clues (4.4 %) require knowledge of the exact wording of pre-textual passages. Thus, although they do not follow the traditional pattern of quotation clues (featuring e.g. quotation marks and a gap which has to be recovered), they must be classified as quotation clues. Not only is their share larger than in non-cryptic puzzles, but they also refer to a different category of pre-texts. While only two clues, (37) and (38), refer to popular culture (an English nursery rhyme and a musical based on poems by Eliot), the others require knowledge of works of well-known British and international authors: Shakespeare (39), but only seemingly (40) and (41), Shelley (42), Carroll (43), Gray (41) and Plutarch (40). (37) When Grundy was christened, 48 hours before Chesterton’s man (7) – TUESDAY (38) Reason for Macavity’s lack of presence (5) – ALIBI (39) Underworld scam over shelter – it blighted Gloucester’s winter (10) – DISCONTENT (40) Composer includes girl in second act of Julius Caesar (7) – VIVALDI (41) Hamlet’s rude ancestor heard warning priest (10) – FOREFATHER (42) Lovely old piece describing Shelley’s traveller’s land (7) – ANTIQUE (43) Giving nasty looks? Alice never heard of such a thing! (12)  – UGLIFICATION (Browne 2009: 156, 104, 92, 58, 90, 44, 42)

Finally, three cryptic clues (1.9 %) are based on idioms derived from individual pre-texts. For these, the activation of pre-textual frames may be helpful, yet is by no means essential. The idiomatic collocation representing the answer in (44) is derived from Shakespeare’s Antony and Cleopatra (1.5.72). The subsidiary indication instructs the solver to insert sad (‘blue’) into lad (‘boy’) and to add ays (‘votes’). (44) Boy in blue votes for Green term (5,4) – SALAD DAYS (Browne 2009: 58)

The qualitative analysis of the corpus revealed that intertextual references in crosswords differ drastically from those in other registers, formally as well as functionally. Whereas intertextuality e.g. in newspapers or advertisements most frequently takes the form of quotations (cf. Pham 2014), interfigural relationships are the predominant formal category in the present corpus. Furthermore, intertextual references in other registers are usually doubly referential (cf. Pham 2014: 472), referring to both the extralinguistic world and the respective pre-texts.

132 

 Teresa Pham

Thus, an advertising slogan like “To smoke or not to smoke” for cigarettes (Mieder 1985: 126) can be interpreted as a statement about the world, expressing that the consumer has to decide between two alternative actions, or as an intertextual reference to Shakespeare’s Hamlet, additionally suggesting that the decision is essential to the consumer. By contrast, a literal, non-intertextual interpretation of references in non-cryptic clues as well as in the definition of cryptic puzzles does not lead to the answer, whereas intertextual references in the subsidiary indication of cryptic clues must be interpreted literally only. In both cases, the clues’ meaning is exhausted as soon as the answer has been identified. Intertextual references in puzzles can thus not be regarded as doubly referential. The analysis of the corpus and the comparison with non-cryptic inter­textual clues from a quality newspaper further identified various types of intertextual clue-answer pairs in non-cryptic and cryptic puzzles. These types typically establish intertextual relationships of different intensity and occur more frequently or even exclusively in one or the other sub-register of crosswords. Cryptic puzzles not only use intertextuality more often to encode the answer. Intertextual clue-answer pairs in cryptic puzzles also tend to require the activation of more comprehensive pre-textual knowledge than in non-cryptic puzzles. Furthermore, cryptic puzzles require knowledge of a greater variety of pre-texts and also of pretexts which cannot be regarded as part of popular culture. Finally, well-known pre-texts like Shakespeare’s Hamlet are referred to for misleading the solver by activating easily accessible frames of knowledge.

4 Conclusion While crosswords had never been studied in detail from a text linguistic perspective, the present paper established and analysed crossword puzzles as an independent register with non-cryptic and cryptic puzzles as distinct sub-registers. In addition, neither had referential intertextuality been investigated as a characteristic of crosswords, nor had it been considered as a linguistic feature relevant for register analysis. Thus, Biber and Conrad only mention references to previous scientific publications or postings in chatgroups (2009: 68, 289), but no other types of intertextuality. However, intertextual clue-answer pairs occurring on average more than once in every crossword in the present corpus (1.7 intertextual clue-answer pairs/puzzle), this paper proved intertextuality to be one important strategy of codification in this type of word game. Furthermore, intertextuality is used in a manner differing radically from other texts, formally as well as functionally. As a pervasive, frequent and distinctive linguistic feature of crosswords



The register of English crossword puzzles: Studies in intertextuality 

 133

which is related to the purposes and the communicative situation characteristic of this register, intertextuality must be included in a register analysis of this type of puzzle according to the framework by Biber and Conrad (2009). It might also turn out to be relevant for the analysis of other registers. Moreover, the present corpus study revealed considerable differences in the way non-cryptic and cryptic puzzles employ intertextual references. It thus confirmed the distinction between two sub-registers of crosswords. In non-cryptic clues, intertextuality typically supports the unambiguous solution of the clue and demands only superficial pre-textual knowledge. In cryptic crosswords, by contrast, intertextual references and even quotation clues are more frequent, despite the latter having been officially banned in 1995. 23 cryptic clue-answer pairs of the corpus such as (40) or (32) even contain references to two or three pre-texts. Thus, cryptic puzzles more frequently require the activation of pre-textual frames than non-cryptic puzzles and these frames need to be more detailed. Cryptic crosswords also feature references which are formally more variable and, at least initially, lead to ambiguities which account for part of the cryptic character of this sub-register. What is specific to cryptic puzzles is the reference to wellknown pre-texts in the subsidiary indication for misleading the solver. However, the present corpus permits no conclusion as to whether the pre-textual categories targeted by crosswords are dependent on the type of sub-register or the expected knowledge of the target readership of the newspapers in which these puzzles are published (or both). Thus, further corpus studies should be undertaken to specifically examine this correlation.

Bibliography Augarde, Tony. 2003. The Oxford guide to word games. Oxford: Oxford University Press. Barthes, Roland. [1968] 1977. The death of the author. In Roland Barthes, Image music text, 142–148. London: Fontana Press. Barthes, Roland. 1973. Le plaisir du texte. Paris: Editions du Seuil. Beaugrande, Robert-Alain de & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics. London & New York: Longman. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Biddlecombe, Peter. 2009. Yet another guide to cryptic crosswords. http://www.biddlecombe. demon.co.uk/yagcc/(accessed 27 January 2015). Browne, Richard. 2009. The Times crossword book 13. London: Times Books. Bühler, Karl. [1934] 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart & New York: Gustav Fischer Verlag. Coffey, Steve. 1998. Linguistic aspects of the cryptic crossword. English Today 14(1). 14–18.

134 

 Teresa Pham

Cornell, Alan & Marion Cornell. 1980. Fragen und Antworten im englischen Kreuzworträtsel. In Ernst Burgschmidt (ed.), Beiträge zu einer Linguistischen Landeskunde und Sprachpraxis, 44–63. Braunschweig: Verlag E. Burgschmidt. Derrida, Jacques. 1972. Positions: Entretiens avec Henri Ronse, Julia Kristeva, Jean-Louis Houdebine, Guy Scarpetta. Paris: Les Editions de Minuit. Dienhart, John M. 1998. A linguistic look at riddles. Journal of Pragmatics 31. 95–125. Fix, Ulla. 2011. Das Rätsel: Bestand und Wandel einer Textsorte. Oder: Warum sich die Textlinguistik als Querschnittsdisziplin verstehen kann. In Ulla Fix (ed.), Texte und Textsorten – sprachliche, kommunikative und kulturelle Phänomene, 185–214. 2nd edn. Berlin: Frank & Timme. Furthmann, Katja. 2006. Die Sterne lügen nicht: Eine linguistische Analyse der Textsorte Pressehoroskop. Göttingen: V&R unipress. Geeraerts, Dirk & Hubert Cuyckens (eds.). 2007. The Oxford handbook of cognitive linguistics. Oxford: Oxford University Press. Genette, Gérard. 1982. Palimpsestes: La littérature au second degré. Paris: Éditions du Seuil. Gilbert, Val. 2001. The Daily Telegraph: How to crack the cryptic crossword. London: Pan Books. Goldblum, Naomi & Ram Frost. 1987. The crossword puzzle paradigm: The effectiveness of different word fragments as cues for the retrieval of words. Haskins laboratories status report on speech research SR-89/90. 133–146. Greenblatt, Stephen (ed.). 1997. The Norton Shakespeare. Based on the Oxford Edition. London: W. W. Norton & Company. Greimas, Algirdas Julien. 1970. L’écriture cruciverbiste. In Algirdas Julien Greimas (ed.), Du sens: Essais sémiotiques, 285–307. Paris: Éditions du Seuil. Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman. Hambrick, David Z., Timothy A. Salthouse & Elizabeth J. Meinz. 1999. Predictors of crossword puzzle proficiency and moderators of age–cognition relations. Journal of Experimental Psychology: General 128(2). 131–164. Hebel, Udo J. 1991. Towards a descriptive poetics of allusion. In Heinrich F. Plett (ed.), Intertextuality, 135–164. Berlin & New York: Walter de Gruyter. Heinemann, Margot. 2000. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann & Sven F. Sager (eds.), Text- und Gesprächslinguistik. Ein internationales Handbuch zeitgenössischer Forschung, 604–614. Berlin & New York: Walter de Gruyter. Helbig, Jörg. 1996. Intertextualität und Markierung: Untersuchungen zur Systematik und Funktion der Signalisierung von Intertextualität. Heidelberg: Universitätsverlag C. Winter. Karrer, Wolfgang. 1985. Intertextualität als Elementen- und Struktur-Reproduktion. In Ulrich Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien, 98–116. Tübingen: Niemeyer. Kristeva, Julia. 1968. Le texte clos. Langages 12. 103–125. Mieder, Wolfgang. 1985. Sprichwort, Redensart, Zitat: Tradierte Formelsprache in der Moderne. Bern, Frankfurt am Main & New York: Peter Lang. Mok, Quirinus Ignatius Maria. 1987. Mots croisés et ambiguïté. In Brigitte Kampers-Manhe & Co Vet (eds.), Études de linguistique Française offertes à Robert de Dardel par ses amis et collègues, 97–108. Amsterdam: Éditions Rodopi B. V. Mollica, Anthony. 2007. Crossword puzzles and second-language teaching. Italica 84(1). 59–78. Moorey, Tim. 2008. How to master the Times crossword: The Times cryptic crossword demystified. London: Harper Collins Publishers.



The register of English crossword puzzles: Studies in intertextuality 

 135

Müller, Wolfgang G. 1991. Interfigurality: A study on the interdependence of literary figures. In Heinrich F. Plett (ed.), Intertextuality, 101–121. Berlin & New York: Walter de Gruyter. Nickerson, Raymond S. 2011. Five down, absquatulated: Crossword puzzle clues to how the mind works. Psychonomic Bulletin & Review 18. 217–241. N.N. 2009. The Sun two-speed crossword book 10. London: Harper Collins. Parker, Timothy. 15.04.2013. Universal crossword. New York Post. New York: News Corporation. Partington, Angela (ed.). 1992. The Oxford dictionary of quotations. 4th edn. Oxford & New York: Oxford University Press. Partridge, John G. 1992. Linguistic reflections on the English crossword puzzle. In Claudia Blank (ed.), Language and civilization. A concerted profusion of essays and studies in honour of Otto Hietsch, 495–504. Frankfurt am Main: Peter Lang. Pepicello, William J. 1980. Linguistic strategies in riddling. Western Folklore 39(1). 1–16. Pfister, Manfred. 1985. Konzepte der Intertextualität. In Ulrich Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien, 1–30. Tübingen: Niemeyer. Pham, Teresa. 2014. Intertextuelle Referenzen auf Shakespeare. Eine kognitiv-linguistische Untersuchung. Münster: LIT Verlag. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. Harlow: Longman. Riffaterre, Michael. 1981. Interpretation and undecidability. New Literary History 12(2). 227–242. Rolf, Eckard. 1993. Die Funktion der Gebrauchstextsorten. Berlin & New York: de Gruyter. Rusbridger, Alan (ed.). 11.–16.05.2013. Quick crossword No. 13,418–13,422. London: Guardian Media Group. Scheufele, Dietram A. & David Tewksbury. 2007. Framing, agenda setting, and priming: The evolution of three media effects models. Journal of Communication 57. 9–20. Schlepper, Wolfgang. 1981. Confusing poet makes fine stuff (5): The “wrestle with words and meanings” in the crossword puzzle. In Hans-Jürgen Diller, Stephan Kohl, Joachim Kornelius, Erwin Otto & Gerd Stratmann (eds.), anglistik & englischunterricht. Vol. 15, 61–80. Trier: WVT Wissenschaftlicher Verlag Trier. Schubert, Christoph. 2012. Englische Textlinguistik. Eine Einführung. 2nd edn. Berlin: Erich Schmidt. Schulte-Middelich, Bernd. 1985. Funktionen intertextueller Textkonstitution. In Ulrich Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien, 197–242. Tübingen: Niemeyer. Scott, W. T. & H. O’Donnell. 1998. Recovering meaning from chaos? Word play and the challenge of sense. In William Pencak & J. Ralph Lindgren (eds.), New approaches to semiotics and the human sciences: Essays in honor of Roberta Kevelson, 203–239. New York: Peter Lang Publishing. Simpson, John A. & Edmund S. C. Weiner (eds.). 2015. Oxford English dictionary online. Oxford: Oxford University Press. http://oed.com/(accessed 27 January 2015). Skinner, Kevin. 2008. How to solve cryptic crosswords. London: Right Way, Constable & Robinson. Stephenson, Hugh. 2007. Secrets of the setters. How to solve the Guardian crossword. London: Atlantic Books. Stocker, Peter. 1998. Theorie der intertextuellen Lektüre: Modelle und Fallstudien. Paderborn: Ferdinand Schöningh.

136 

 Teresa Pham

Stratmann, Gerd. 1995. Kreuzworträtsel. In Rüdiger Ahrens, Wolf-Dietrich Bald & Werner Hüllen (eds.), Handbuch Englisch als Fremdsprache (HEF), 192–195. Berlin: Erich Schmidt. Underwood, Geoffrey, Caroline Deihim & Viv Batt. 1994. Expert performance in solving word puzzles: From retrieval cues to crossword clues. Applied Cognitive Psychology 8. 531–548. Weisskirch, Robert S. 2006. An analysis of instructor-created crossword puzzles for student review. College Teaching 54(1). 198–201. Witte, Kenneth L. & Joel S. Freund. 1995. Anagram solution as related to adult age, anagram difficulty, and experience in solving crossword puzzles. Aging, Neuropsychology, and Cognition 2(2). 146–155.

Section II: Cross-register comparison While the studies in Section I concentrated on single registers, Section II provides cross-register comparisons, in which the distinctive features and markers of registers can be identified with great accuracy and perspicuity by means of juxtaposition. As the contributions will show, such comparisons are particularly revealing when the registers under discussion are from clearly divergent domains. The fact that each of the three papers in Section II includes academic writing demonstrates that this register is highly distinctive and therefore well-suited as a yardstick for text-linguistic collation. Christina Sanchez-Stockhammer’s study “Punctuation as an indication of register: Comics and academic texts” establishes a link to the papers by Rolf Kreyer and Teresa Pham in Section I, since it also analyses a register from popular culture, in this case the language of comics. At the same time, this contribution enters uncharted linguistic territory by focusing on punctuation as a register marker, which has been widely neglected so far despite its pervasiveness in written discourse. The study is based on two small-scale corpora, namely ­AcadText, a corpus of journal articles, and CoCo, a comic corpus, both of which were designed and compiled for register comparison by the author. It is shown that different punctuation marks have varying functions and deviant frequencies in relation to the written or spoken mode prominent in the registers. As a result, features of punctuation are suggested as a valid and necessary extension of Biber and Conrad’s (2009) model of register analysis. In her paper “Linking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry”, Martina Lampert chooses a specific linguistic feature as the standard of register comparison. By concentrating on the syntactic construction of parenthesis, she draws an analogy between a minimalist poem by E. E. Cummings and a scientific research paper within the framework of a microscopic qualitative analysis. She picks two registers which are located at the opposite ends of a continuum of written discourse and pays attention to punctuation marks as well, in this case to parenthetical round brackets (so-called lunulae). Since situational features of register description are closely linked to cognitive principles, a correspondence is established between Biber’s register analysis and Leonard Talmy’s cognitive semantic approach. Lampert concludes by arguing that parenthesis should be included in Biber and Conrad’s (2009) list of lexico-grammatical features relevant to register investigation.

138 

 Section II: Cross-register comparison

The study “Cohesive devices across registers and varieties: The role of medium in English” by Stella Neumann and Jennifer Fest combines the comparative analysis of academic writing, administrative writing, broadcast discussions, conversations and exams with regional variation. The term “regional” is used here and in the paper in the broader Hallidayan sense grouping variation by the speakers’ geographical background as opposed to functional variation varying by context of use, not by user. Based on data from the International Corpus of English, functional variation is investigated within the six L1 and L2 Englishes of Singapore, Hong Kong, India, Canada, Jamaica and New Zealand. An examination of the lexico-grammatical features of pronouns, conjunctions and lexical density sheds new light on the use of cohesive ties across both varieties and registers. In particular, quantitative surveys show that there are significant differences in the frequency of the cohesive items between spoken and written registers. Along these lines, it becomes obvious that an exhaustive discussion of any regional or national variety of English needs to take into account register variation as well, so that text linguistics is shown to be an indispensable complement to sociolinguistics. Moreover, this paper builds a bridge to Section III, in which the interrelation between regional and register variation is further elucidated.

Christina Sanchez-Stockhammer

Punctuation as an indication of register: Comics and academic texts Abstract: The currently most established definition of a register is the one developed by Douglas Biber in numerous publications (e.g. Biber 1988, 1995, 2006), namely “a variety associated with a particular situation of use” (Biber and Conrad 2009: 6). The delimitation of individual registers such as telephone conversations or newspaper editorials is based on their situational context, their lexical and grammatical characteristics and the functional relationship obtaining between context and language. While Biber’s multidimensional approach already considers a multitude of different lexico-grammatical features as potential indicators of register, this paper adds a new perspective by exploring a feature type which has not been taken into account so far in the different versions of his model, namely punctuation. After discussing the functions of various punctuation marks, the paper presents the corpus-based evidence of a small-scale study on two registers tending towards the extremes of the spoken – written dimension, namely academic texts and comics. To this end, the corpus AcadText was compiled for the present study by analogy to the comic component of the comic corpus CoCo (described in Sanchez-Stockhammer 2012), which comprises excerpts from Superman, Batman and Uncle Scrooge and considers the text occurring in text boxes with narration, inside speech bubbles, as onomatopoeia superimposed on the pictures etc. The results show that some punctuation marks (such as exclamation marks and round brackets) correlate strongly with spoken and written style respectively and barely occur in the contrasting register. Furthermore, even in those cases where the results are quantitatively similar, differences in usage become obvious upon closer consideration  – e.g. the dominant use of commas after introductory interjections or proper nouns with vocative function in comics as compared to more varied uses of that punctuation mark in academic texts. These results suggest that punctuation is indicative of register indeed, and that it makes sense to introduce punctuation as an additional category in Biber’s register model.

140 

 Christina Sanchez-Stockhammer

1 Introduction: Punctuation and register Many features of language occur in both speech and writing, but some are specific to one of these two modalities: thus phonetic assimilation phenomena and intonation are by necessity restricted to the spoken modality, since they concern auditory phenomena, whereas punctuation immediately comes to mind as a visual linguistic feature that only occurs in writing. While it is sometimes claimed that punctuation acts as a substitute for prosody and pauses in the written modality, Meyer (1987: 69) notes that “punctuation is at best a rather crude reflection of the complexities of prosody” and that the relation between the two is unsystematic. Thus commas are sometimes but not always used in contexts where one would expect a pause in speech – and sometimes they occur in contexts without a prosodic juncture: for example, the sentence (1) Those who are fond of sleeping late make unreliable workers.

is usually spoken with a pause after late, but it does not contain a comma if common spelling conventions are adhered to (Meyer 1987: 70). By contrast, the sentence (2) A couple of the males made good comedy, too.

is realised with a comma but arguably not produced with a pause in speech (Meyer 1987: 71). This raises the question whether the reverse relation between punctuation as the primary feature and prosody as its realisation in speech can also be postulated. One of the few exceptions where it is claimed that a feature of the written modality is rendered in oral communication are so-called air quotes, which are drawn into the air manually while speaking and which “intermodally” refer the listeners/receivers to the printed source of a spoken quote (cf. Lampert 2013). However, air quotes rely on visual gestures rather than prosody. By contrast, punctuation marks are produced orally on some occasions, such as when separating whole numbers from decimals, e.g. in (3) one point five.

However, in such cases it is actually the terms referring to the punctuation marks that are realised in the spoken modality rather than the corresponding function of the punctuation mark. A third conceivable option regarding the relation between punctuation and prosody is that there is none: for instance, Nunberg (1990: 7) argues that punctuation has no correspondence in speech and that it exploits “the particular expressive resources that graphical presentation makes availa-



Punctuation as an indication of register: Comics and academic texts  

 141

ble” in order to serve the requirements of written communication. Yet whatever the relation between speech and writing is recognised to be, what remains is that punctuation constitutes a characteristic feature of written language. This raises the question whether it is possible to recognise general principles of punctuation underlying all written language use, which are common to all written registers, or whether it is more appropriate to consider more specific tendencies in the use of punctuation marks in particular communicative situations. For instance, the combination seems acceptable in the sentence (4) She did what…???!!!

whereas one is extremely unlikely to encounter an example such as (5) The increasing evidence that language processing is sensitive to lexical and structural co-occurrences at different levels of granularity and abstraction has led to the hypothesis that lexical and structural processing may be unified…???!!!

in actual language usage  – at least not in the original context of use.1 Various explanations can be advanced for this: for instance, the first sentence is very short and therefore lends itself to the incredulous intonation associated with such a cluster of punctuation marks far better than the second sentence with its complex structure. Possibly even more importantly, the second sentence contains information situating it in the register of written academic language (it has been adapted from Snider’s 2009 article “Similarity and structural priming”), and it would seem that the punctuation above is unusual for an academic text to say the least. This discrepancy between the constructed example above and readers’ expectations suggests that language users tend to expect particular types of punctuation mark and their combination in some types of text rather than in others. If that is the case, then it should also be possible to use punctuation marks as an indication or even marker of individual registers – a hypothesis that will be explored in the remainder of this paper. Following Peters (2004: 447), the present contribution distinguishes between word punctuation (comprising e.g. hyphens and apostrophes occurring within

1 By contrast, it is conceivable to encounter the example in an online discussion forum or blog with reference to unclear academic writing. (I am grateful to my anonymous reviewer for pointing this out to me.) In that case, however, the sentence (which is a quotation) and the punctuation marks (which represent a comment) are situated on different linguistic levels. This is yet another example of the more general observation that texts with a metalinguistic function in the sense of Jakobson (1985) may depart from common usage. As a consequence, texts on linguistics should ideally be avoided in the compilation of general-language corpora.

142 

 Christina Sanchez-Stockhammer

unspaced sequences of letters) and sentence punctuation, and it concentrates on the latter. Sentence punctuation is usually characterised by the use of a space on that side of the punctuation mark which is not directly attached to a preceding or following sequence of letters and comprises full stop question mark exclamation mark comma semicolon colon dash slash suspension dots single quotation marks double quotation marks round brackets square brackets

. ? ! , ; : – / … ‘’ “” () [].

Register as the second concept which needs to be defined for the empirical study presented here is used with different meanings in the literature (cf. Schubert, this volume). The most commonly used definitions of register are based on the work of Douglas Biber. In numerous publications (e.g. Biber 1988, 1995, 2006), his use of the term has developed from what might be called a synonym of genre (Biber 1995: 910 about Biber 1988) to “a variety associated with a particular situation of use” (Biber and Conrad 2009: 6), i.e. a concept comprising all situation-­ dependent ­variation in language use, regardless of the level of specialisation (Biber and Conrad 2009: 32), but with specific sub-registers displaying less var­ iation than more general registers (Biber and Conrad 2009: 33). In Biber’s model (for a summary cf. Schubert, this volume), register features occur throughout texts from a particular register and are more frequent in the target register than in most other registers. Thus the passive voice is not restricted to academic writing and may occur in different types of text, but it is particularly frequent in that register. Register features can be structures on any linguistic level, from words to syntactic constructions. The occurrence of specific lexico-grammatical features in registers is attributed to their functionality (Biber 2006: 11): they are believed to be “particularly well suited to the purposes and situational context of the register” (Biber and Conrad 2009: 6). The co-occurrence of features is therefore interpreted as reflecting their shared functions (Biber 1995: 30). With regard to the features under consideration, Biber’s approach has evolved in the course of time:



Punctuation as an indication of register: Comics and academic texts  

 143

– Both Biber (1988: 73–75) and Biber (1995: 94–96) consider 16 major categories comprising 67 linguistic features: 1) Tense and aspect markers 2) Place and time adverbials 3) Pronouns and pro-verbs 4) Questions 5) Nominal forms 6) Passives 7) Stative forms 8) Subordination features 9) Prepositional phrases, adjectives and adverbs 10) Lexical specificity 11) Lexical classes 12) Modals 13) Specialised verb classes 14) Reduced forms and dispreferred structures 15) Co-ordination 16) Negation.

– These are reduced to seven major categories in Biber (2006: 241): 1.

vocabulary distributions (e.g., the number of different words in classroom teaching versus textbooks), including the distributional classifications of words from the four content word classes (e.g., common vs. rare nouns, common vs. rare verbs); 2. grammatical part-of-speech classes (e.g., nouns, verbs, first and second person pronouns, prepositions); 3. semantic categories for the major word classes (e.g., activity verbs, mental verbs, existence verbs); 4. grammatical characteristics (e.g., nominalizations, past tense verbs, passive voice verbs); 5. syntactic structures (e.g., that relative clauses, to complement clauses); 6. lexico-grammatical associations (e.g., that-complement clauses and to-complement clauses controlled by communication verbs vs. mental verbs); 7. lexical bundles – i.e. recurrent sequences of words.

– Biber and Conrad (2009: 78–82), by contrast, classify their 75 subcategories (some of which can be split up further) into 15 major categories: 1) 2) 3) 4) 5) 6)

Vocabulary features Content word classes Function word classes Derived words Verb features Pronoun features

144 

 Christina Sanchez-Stockhammer

7) Reduced forms and dispreferred structures 8) Prepositional phrases 9) Coordination 10) Main clause type 11) Noun phrases 12) Adverbials 13) Complement clauses 14) Word order choices 15) Special features of conversation.

Without going into detail what these various categories represent precisely, it becomes immediately obvious that punctuation or other orthographic characteristics (such as capitalisation) do not figure among the distinctive features treated in any of Biber’s approaches, in spite of the fact that Biber (1995: 29) maintains that “[a]ny linguistic feature having a functional or conventional association can be distributed in a way that distinguishes among registers”. This raises the question whether there are any arguments supporting the deliberate exclusion of punctuation as a distinctive feature. Based on Biber’s definition above, one might consider arguing that punctuation does not constitute a linguistic feature – but this is hard to maintain: while punctuation is restricted to the written modality, it is used nonetheless to represent linguistic meaning (cf. below). Punctuation marks may even reverse the meaning of a sentence completely; compare (6) The Democrats say the Republicans are sure to win the next election.

in which the Republicans are the assumed victors, as against (7) The Democrats, say the Republicans, are sure to win the next election.

In the second example, the Democrats are expected to be victorious (cf. Runkel and Runkel 1984: 34). In view of its meaning-distinguishing function, punctuation should consequently be considered a linguistic feature. If punctuation had no conventional or functional association, as required by the definition of linguistic features above, it should be possible to use all punctuation marks interchangeably. This is, however, not the case (cf. the next section). Since punctuation is restricted to writing, using it as a feature would seem to have the disadvantage of disregarding all registers belonging to spoken language. This is, however, only true to a certain extent, since spoken texts may be transcribed (e.g. in interviews for magazines or in corpora), and punctuation is conventionally inserted for the convenience of the reader in such cases. The relation between the two dimensions is clarified by Söll and Hausmann (1985: 17),



Punctuation as an indication of register: Comics and academic texts  

 145

who distinguish between the medium of realisation (auditory vs. visual code) as opposed to the characteristics of conception (spoken vs. written style). Punctuation is thus only present in the visual code but may be used in texts belonging both to the spoken or written style. Söll and Hausmann’s distinction is thus useful e.g. in view of the possibilities offered by computer-mediated communication, which may use the visual code but some kind of spoken style. Note also that Biber and Conrad’s (2009: 78–82) long list of linguistic features includes a subcategory “Special features of conversation”, which is restricted to a subgroup of registers with a tendency towards oral realisation and includes e.g. pauses, fillers and backchannels. As a consequence, the addition of a subcategory “Punctuation”, which applies to registers in the visual code only, would appear to be legitimate. Furthermore, one should not overlook the fact that Biber and Conrad (2009: 63) speak of a “list of features that you might consider” in register analysis, which means that they do not claim completeness. They also state that “[C]onsulting a corpus-based reference grammar is useful for deciding which features to study”. Since punctuation is only marginally treated in such grammars, possibly in view of written language’s widely assumed status as a secondary system (cf. e.g. Bloomfield 1933: 21), this may have led to its omission from the most influential model of register so far. To conclude, there are no convincing reasons for excluding punctuation as a possible register feature. Instead, it is argued in the following that there are several good reasons for considering it.

2 Functions of the punctuation marks Biber’s approach is based on the premise that “linguistic features co-occur in texts because they reflect shared functions” (Biber 1995: 30). This means that it should be possible to establish a link between the punctuation marks occurring in texts (and their functions) and the various lexico-grammatical register features discussed in the previous literature (with their corresponding functions linking them to the situational context and the communicative purpose of the respective register). If that were indeed the case, it should be possible to make an informed guess about (or even recognise) the register of a text based solely on the punctuation marks occurring in that text. The following illustrative passages are extracts from example texts used in Biber and Conrad (2009). Since these “illustrate the linguistic patterns found in previous large-scale analyses of these registers” (Biber and Conrad 2009: 64), they can be considered prototypical representatives

146 

 Christina Sanchez-Stockhammer

of the corresponding registers and should also fulfil that role with regard to punctuation. :

.

.

.

:

? :

.

:

.

:

? [

]

:

? :

.

Figure 1: Punctuation from text A

Figure 1 constitutes a sequence of punctuation marks which were extracted from a short text (cf. below) by deleting everything except the punctuation marks. Spaces were then added to make the punctuation marks more clearly discernible. Even in this reduced format, which is void of any lexical or syntactic content, it is possible to form some idea about the communicative situation of the text. The task is made easier if paragraph breaks are conserved as well: : : : : : : :

. ? . . ? ? .

.

.

[

]

Figure 1a: Punctuation from text A with paragraph breaks

The most striking feature is presumably the occurrence of a colon at the beginning of every line, which is followed by either full stops or a question mark, thereby suggesting an interactive communicative situation. Indeed, the text is part of a conversation between a group of friends walking to a restaurant, which is included in the Longman Spoken and Written English Corpus: Judith: Eric: Judith: Eric: Elias: Judith: Elias:

Yeah I just found out that Rebekah is going to the University of Chicago to get her PhD. I really want to go visit her. Maybe I’ll come out and see her. Oh is she? Yeah. Oh good. Here, do you want one? [offering a candy] What kind is it? Cinnamon.

Text A: Text sample 1.1 from the LSWE Corpus (Biber and Conrad 2009: 7–8)

The colons in the full text are actually not line-initial but follow the names of the speakers, just as they would in the scripted version of a play. Following the same type of convention, the information referring to the extra-linguistic context has



Punctuation as an indication of register: Comics and academic texts  

 147

been added in square brackets at the end of one line. The punctuation marks are thus strongly indicative of conversation. :

!

.

:

? :

!

:

!

:

< .

> .

!

:

< .

> ? :

!

Figure 2: Punctuation from text B

The same is true of Figure 2. The large amount of exclamation marks, colons, question marks and (this time angled) brackets in Text B makes it highly unlikely that the text should be a tax declaration document or newspaper article. While the fact that it is an excerpt from a drama – i.e. scripted speech – and no transcript of a conversation cannot be deduced from punctuation alone, the oral dimension of the text emerges by analogy to Text A. RUTH: BEATRICE: RUTH: BEATRICE: RUTH: BEATRICE: TILLIE:

I want to go! I promised Chris Burns I’d meet him. Can’t you understand English? I’ve got to go! Shut up! I don’t care. I’M GOING ANYWAY! WHAT DID YOU SAY? Mother!

Text B: Text sample 1.7 from Biber and Conrad (2009: 20): Paul Zindel’s 1970 drama The Effect of Gamma Rays on Man in the Moon Marigolds

This raises the question what typical register features are linked to punctuation. For example, the large number of first and second person pronouns typical of spoken conversation (Biber and Conrad 2009: 7–8) – which is supported by the prototypical extracts above – cannot be derived from punctuation. By contrast, another characteristic linguistic feature can: the pervasiveness of questions, which are usually marked by sentence-final question marks in many (but not all) transcripts of spoken language, e.g. in Text A, and in texts that are written to be spoken (e.g. Text B). The presence of question marks can thus be linked to the presence of questions: both are indicative of interaction (cf. Biber and Conrad 2009: 7–8). Since questions favour the production of answers as the privileged second pair part (Levinson 1983: 307), full stops following question marks are likely to represent not only statements but answers. This assumption is supported by Texts A and B above. According to Biber (1988: 227), questions “indicate a concern with interpersonal functions and involvement with the addressee”. It follows from this that they should be more frequent in registers involving that

148 

 Christina Sanchez-Stockhammer

function, occurring e.g. more frequently in riddles2 than in front-page newspaper articles (cf. Biber and Conrad 2009: 7–8) and also in scripted or transcribed conversation. The analysis of the four main types of pragmatic discourse function and the syntactic sentence types in Quirk et al. (1985: 803–804) with the punctuation marks used in the examples of that grammar reveals that this correlation is no coincidence: there is a strong link between – statements (which mainly convey information), declaratives (in which the subject usually precedes the verb) and full stops, e.g. The Prime Minister resigned. – questions (which usually seek information), interrogatives (characterised by inversion, e.g. of subject and operator, or sentence-initial wh-question words) and question marks, e.g. Did the Prime Minister resign? or What did the Prime Minister do? – directives (which are mainly used to instruct someone to do something), imperatives (which have no subject and whose verb is in the base form) and exclamation marks, e.g. Leave me alone! – exclamations (in which speakers express the extent to which they are impressed), exclamatives (which begin with what/how and usually have no subject-verb inversion) and exclamation marks, e.g. What a funny hat! It therefore seems safe to claim that the punctuation marks closing sentences follow a prototype-based distribution (cf. e.g. Rosch 1973, 1975) with an ideal exemplar in the centre of the category and fuzzy boundaries in its periphery. The latter would include less typical uses, such as (8) I’d love a cup of tea.

which is a declarative from the perspective of syntax but pragmatically a directive, inciting the hearer to serve a hot drink (Quirk et al. 1985: 804). The punctu-

2 Note, however, that puzzles need not necessarily be phrased as questions, e.g. in the case of crosswords (cf. Pham, this volume), which tend not to use question marks.



Punctuation as an indication of register: Comics and academic texts  

 149

ation with a full stop in Quirk et al. (1985) for this particular example seems to suggest that in doubtful cases, punctuation follows the syntactic rather than the pragmatic perspective. While the use of an exclamation mark does not seem to be entirely excluded in this particular example (even if an informal internet search confirms the full stop as the norm), other indirect speech acts such as Searle’s (1975: 73) famous (9) Can you pass the salt?

which is syntactically a question but actually a directive, definitely require the syntactically-based question mark. By contrast, the use of an exclamation mark making (10) Can you pass the salt!

slightly more explicitly directive would seem quite unusual. As a consequence, we may conclude that there is a strong correlation between punctuation marks and particular grammatical structures  – even more than with discourse functions, but often (in direct speech acts), both aspects will coincide. The communicative purposes of a register determine its discourse functions and the syntactic structures associated with these – which are in turn linked to particular prototypical punctuation marks. However, some registers may simply not require particular types of expression: for instance, instruction manuals do not usually engage in mutual interaction with their readers. As a consequence, one would not expect them to contain any questions and consequently no question marks (except, possibly, the occasional rhetorical question to guide their readers more vividly). Note, however, that the conventions of particular registers may require the use of particular punctuation marks in spite of communicative purposes or favoured syntactic sentence types which would prototypically result in the use of a different punctuation mark: thus recipes are directive and use a considerable amount of verbs in the imperative (cf. Arendholz et al. 2013), but they rarely contain any exclamation marks. This would seem to imply that the conventions associated with particular registers can override more general punctuation tendencies. The next extract of punctuation also belongs into a highly conventionalised register. ( ). ( ) , ( . , . , . ). , ( ; . ). , ( , ; . ; . ; . ; ). ( ) , ( . ). . . ( ) . Figure 3: Punctuation from text C

150 

 Christina Sanchez-Stockhammer

This sample is not only characterised by its complete lack of question marks and exclamation marks but also by a large proportion of full stops and brackets, many commas and even some semicolons. It comes from the introduction to a scientific research article and is thus situated clearly towards the extreme of the written dimension of language conception. Hybridization between species can severely affect a species status and recovery (Rhymer & Simberloff 1996). Threatened species (and others) may be directly affected by hybridization and gene flow from invasive species, which can result in reduced fitness or lowered genetic variability (Gilbert et al. 1993, Gottelli et al. 1994, Wolf et al. 2001). In other cases, hybridization may provide increased polymorphisms that allow for rapid evolution to occur (Grant & Grant 1992; Rhymer et al. 1994). Species can also be influenced indirectly, because hybridization may affect the conservation status of threatened species and their legal protection (O’Brien & Mayr 1991a, 1991b; Jones et al. 1995; Allendorf et al. 2001; Schwartz et al. 2004; Haig & Allendorf 2005). The Northern Spotted Owl (Strix occidentalis caurina) is a threatened subspecies associated with rapidly declining, late-successional forests in western North America (Gutierrez et al. 1995). Listing of this subspecies under the U.S. Endangered Species Act (ESA) attracted considerable controversy because of concern that listing would lead to restrictions on timber harvest. Text C: Text sample 6.13 from Biber and Conrad (2009: 163): Scientific research article (Genetic identification of Spotted Owls … , Conservation Biology, 2004).

While scientific research attempts to answer research questions, these are usually formulated indirectly, with the consequence that the number of direct questions and the ensuing question marks is relatively low (although not necessarily zero). Exclamation marks, by contrast, seem to be practically excluded in this register. This is presumably because the discourse functions usually associated with that punctuation mark (cf. Quirk et al. 1985: 803–804 above) contradict the general principles of academic research: it is neither directive (at least not overtly) nor concerned with the expression of emotions such as being impressed. These conventions are communicated between researchers, e.g. by supervisors marking their students’ papers or by means of style guides.3 The occurrence of large numbers of full stops is not only due to the focus of research papers on transmitting information but also to the frequent occurrence of the abbreviation et al., which is rarely found outside academia, in this particular passage. The use of brackets is also highly conventionalised: with few excep-

3 Note, however, that very popular style guides giving advice on academic research, such as Booth et al. (2008), do not mention punctuation (merely style), and others, such as Swales and Feak (2010: 27), limit themselves to the discussion of semicolons, colons, dashes and commas.



Punctuation as an indication of register: Comics and academic texts  

 151

tions containing additional explanations, most brackets contain references to other texts. This supports the view that particular punctuation marks tend to correlate with particular registers, and that some punctuation marks are employed following register-specific conventions which are particularly adequate for the communicative needs of the register in question. In academic research, this includes the need to refer to previous research in a clear and unobtrusive way. If we take all of the above into account, a question that emerges is whether there are any general functions of punctuation marks which may be put to specific ends in individual registers. According to Huddleston and Pullum (2002: 1729–1730), punctuation can be ascribed four main functions from a general perspective: – indicating boundaries (e.g. full stops mark the end of sentences) – indicating status (e.g. question marks indicate that a sentence is a question) – indicating omission (e.g. …) – indicating linkage (e.g. commas mark that units belong together).4 A more specific but nonetheless brief overview of the functions of individual punctuation marks is provided by Seely (2007: 16–124): the ● full stop ○ marks the ends of sentences ○ marks complete groups of words ○ ends abbreviations ○ acts as a separator in e-mail and website addresses ● question mark ○ marks the end of a question ○ marks statements as doubtful or questionable, e.g. in brackets ● exclamation mark ○ ends exclamations ○ ends loud or shouted direct speech ○ ends sentences expressing amusement ○ is used in brackets to express amusement or irony ○ separates items in lists ● comma ○ encloses sentence parts parenthetically ○ marks the divisions between the clauses in complex sentences ○ separates sections of sentences or numbers consisting of more than four digits to make them easier to read ○ introduces or ends direct speech

4 For a more detailed theoretical account of the guide functions of punctuation cf. Patt (2013).

152 

 Christina Sanchez-Stockhammer

● semicolon

● colon

● dash

● slash

● suspension dots ● quotation marks ● brackets

○ lists items which are very long ○ marks a break between two parts of a sentence, which are usually finite clauses that could stand on their own, in order to show the close link between them ○ introduces lists ○ introduces direct speech or quotations ○ separates two parts of a sentence of which the first leads on to the second ○ encloses sentence parts parenthetically ○ introduces something which further develops or exemplifies what has been written before ○ introduces asides by the writer ○ shows interruptions or break-offs in mid-sentence (in direct speech) ○ indicates alternatives ○ shows a range ○ is used in some abbreviations (e.g. c/o) ○ reduce the length of quotations ○ show incompleteness in direct speech ○ separate direct speech, titles or quotations or ideas marked as not being the author’s ○ indicate that the words enclosed within are not essential to the meaning of the sentence but provide supplementary information.

Even if this account necessarily simplifies a more complex situation, it provides a good point of departure for the consideration of more specific uses of the punctuation marks. Since full stops are used at the end of statements, they seem to represent a relatively unmarked punctuation mark. They do, however, change their function and become more marked as soon as they are combined into suspension dots, which signal omission. Question marks are apparently only placed at the end of direct questions, and direct questions always end with a question mark. Even the seemingly exceptional sceptical use listed above can be interpreted as shorthand for a question such as “Is that true?”, e.g. in (11) There is no such thing as a free lunch. (?)



Punctuation as an indication of register: Comics and academic texts  

 153

In most other cases, however, the relation is not as unequivocal, because the punctuation marks have several functions (some of which may overlap with the functions of other punctuation marks): as we have seen, colons can be used to set off the name of characters in a play from their text, but very frequently, they are followed by explanations or specifications and they can therefore commonly be found in registers with an argumentative function, such as academic papers. Alternatively, additional information may be included in brackets or following a dash,5 but different degrees of formality are associated with the various punctuation marks. According to Seely (2007: 84), brackets are “the most formal (and most obvious) way of showing parenthesis”, commas are “less forceful” and dashes “the least formal”. This seems to imply that a superficial analysis of punctuation marks does not suffice: it is not enough to simply count the number of commas, question marks etc. (not even if the number of words in the texts is taken into consideration), but it is also necessary to consider their individual functions and possibly even their stylistic value. This is the only means of identifying highly conventionalised register-specific uses, such as initial exclamation marks expressing negation (e.g. !interesting = not interesting) in “hacker-influenced interactions” (Crystal 2001: 90) or the specialised use of double quotation marks in comics (cf. below).

3 Punctuation in comics vs. academic texts In order to confirm or reject the hypothesis that punctuation can serve as an indication of register and to identify register-specific usage of punctuation, a smallscale empirical study was conducted. Since register characteristics become most obvious if very different registers are analysed contrastively (Biber and Conrad 2009: 8), a register with a relatively strong tendency towards spoken conceptualisation (namely comics) was contrasted with a register tending towards the written extreme (namely academic texts). For the first of these, the comic component of CoCo, the Comic Corpus described in Sanchez-Stockhammer (2012), was used.

5 Cf. Lampert (this volume) for a detailed treatment of parenthesis.

154 

 Christina Sanchez-Stockhammer

Table 1: The Comic Corpus (CoCo) texts (cf. Sanchez-Stockhammer 2012: 68) Text

Words

Sentences

Words per sentence

Batman Superman Uncle Scrooge

868 744 774

153 101 140

5.67 7.37 5.53

The language in comics considered in the compilation of CoCo occurs in headings, text boxes with narration, speech bubbles, thought bubbles and subtitles (common particularly in cartoons – which are part of CoCo but were not considered in the present study), as onomatopoeia superimposed on the pictures and as written language within the picture (e.g. inscriptions on signs; cf. Sanchez-Stockhammer 2012: 58–59). Combinations of punctuation marks were also encoded – notably suspension dots, which can also be considered a complex punctuation mark. Neither emoticons (e.g. < :-) >) nor obscenicons (e.g. ) as emotionally loaded combinations of punctuation marks occurred in the dataset.6 Non-­ linguistic semiotic means (such as the shapes of bubbles used to indicate that their content is spoken, thought, shouted etc.) were not taken into consideration, either. The corpus of academic texts AcadText was compiled specifically for the present study. It contains three research articles from high-quality journals: one theoretical text (Schneider 2003), one empirical study (Juhasz et al. 2003) and one text by Biber and two co-authors, namely Susan Conrad and Randi Reppen (Biber et al. 1994).7 Following the same approach as in the compilation of the comic corpus wherever possible, all full sentences (including footnotes) and tables were taken from

6 While the absence of emoticons can be explained by the fact that the multimodality of comics permits the representation of facial expression in a more detailed manner by the drawn faces of the interlocutors, the absence of obscenicons from the corpus is presumably due to chance. However, since the expression of anger in comic strips seems to use mainly question marks and exclamation marks from the set of the punctuation marks, while frequently using symbols (e.g. , , , , and ) and also drawings of spirals etc. (cf. Law 2010), the treatment of obscenicons belongs into the periphery of the use of punctuation marks anyway. 7 Since academic English is a register with a particularly strong lingua franca element and since all articles in AcadText come from high-quality journals and have consequently undergone intense editing, the native language of the authors was expected to play only a marginal role. While the individual author Schneider has a German-language background, either all or the majority of the authors of the jointly written articles were working at universities in English-speaking countries at the time of publishing.



Punctuation as an indication of register: Comics and academic texts  

 155

the first two pages with numbers ending in zero from each article. End-of-line hyphens were deleted and m-dashes flanked by spaces. Word-internal bracketing, e.g. in (semi-)automatic, was deleted so as not to skew the automated counts. While full stops, question marks and quotation marks counted as sentence endings, colons and semicolons were considered sentence-internal. Headings and rows in tables counted as one sentence each. It becomes immediately obvious that the number of words per sentence is considerably larger in the academic texts than in the comics. Table 2: The Corpus of Academic Texts (AcadText) Text

Words

Sentences

Words per sentence

Biber et al. (1994) Juhasz et al. (2003) Schneider (2003)

892 1,037 1,103

35 40 25

25.49 25.93 44.12

Since language in comics is heavily constrained by spatial restrictions and mainly contains the written representation of spoken-style language from conversations between speakers, comics as a register should contrast with what is already known from previous research about more prototypically written registers – such as academic texts. From a statistical perspective, the hypothesis H1 is therefore that comics and academic texts should differ in their use of the punctuation marks. H0 is consequently that comics and academic texts do not differ in this respect. In view of the assumed register-specifics, we can formulate the following more specific expectations regarding punctuation in comics: one may expect 1. a relatively large proportion of question marks and exclamation marks (due to the spoken character of this register) 2. no quotation marks (because direct speech is already marked as such by its inclusion in speech bubbles) 3. few commas (because the sentences in comics are presumably relatively short due to spatial restrictions) 4. few semicolons (for the same reason as for the commas) 5. few colons

156 

 Christina Sanchez-Stockhammer

(due to spatial restrictions and the fact that the speakers in a conversation are indicated by the pointed side of speech bubbles in contrast to usual scripted conversation) 6. fewer brackets than dashes (because these represent the most and least formal punctuation marks indicating parenthesis according to Seely 2007: 84) 7. a certain number of suspension dots (in order to permit longer sentences to continue in the following speech bubble). By contrast, academic texts as a written register are expected to contain 1. a very small proportion of question marks and exclamation marks (due to the written character of this register) 2. a certain proportion of quotation marks (in order to mark passages that were taken over verbatim from another author) 3. many commas (because the sentences in academic texts are presumably relatively long due to the complexity of the subjects treated) 4. many semicolons (for the same reason as for the commas) 5. many colons (because these provide links between sentences and are also used to refer to precise pages in references) 6. more brackets than dashes (because these represent the most and least formal punctuation marks indicating parenthesis according to Seely 2007: 84) 7. a few suspension dots (signalling omission in quotations). For the quantitative analysis of the punctuation marks, all letters and numbers in the original corpus texts were deleted, and the punctuation marks were counted semi-automatically by using the “replace” function in Microsoft Word. The results in Table 3 were normalised by dividing the absolute results by the number of words in the respective texts, then multiplying them by a thousand (in order to increase readability) and finally rounding them up or down to yield full numbers.



 157

Punctuation as an indication of register: Comics and academic texts  

Table 3: Normalised results (divided by the number of words per text, multiplied by 1,000 and rounded) Comics

Academic texts

Batman

Superman

Uncle Scrooge

Biber et al.

Juhasz et al.

Schneider

Full stops

78

50

4

53

69

24

Question marks

20

22

14

0

1

0

Exclamation marks

53

36

134

0

0

0

Commas

60

62

37

62

72

71

Semicolons

0

0

0

4

1

2

Colons

0

1

0

1

0

11

Dashes

16

3

0

1

0

1

Slashes

0

0

0

4

0

0

40

43

18

0

0

1

Single quotation marks (pairs)

0

0

0

3

0

9

Double quotation marks (pairs)

2

5

1

0

0

1

Round brackets (pairs)

0

0

0

18

41

12

Square brackets (pairs)

0

0

0

0

0

0

58

43

71

1

4

5

Suspension dots

Apostrophes

For each line (i.e. for each punctuation mark), shaded cells indicate intra-group similarity and inter-group dissimilarity between comics and academic texts. This is either based on a very obvious difference in the results (e.g. for the suspension dots) or, in some cases, on the presence of at least two values larger than zero in one type of register as against all-zero in the three texts from the other register (e.g. for the semicolons). Note that the number of quotation marks and brackets corresponds to the number of pairings of these punctuation marks. This is because it obligatorily takes two exemplars to set off parentheses – in contrast to dashes or commas, which may open a parenthesis closed by the final punctuation mark in a sentence, e.g. a full stop (cf. Lampert 2011: 91–92). While an alternative single-punctuation-mark use of brackets can be imagined, namely when a single closing bracket is employed to set off the introductory ordering letters in lists, such as

158 

 Christina Sanchez-Stockhammer

a) xx b) yy c) zz, the fact that this type of usage did not occur in the corpus made it unnecessary to establish a more detailed distinction here. If the results from Table 3 are analysed in relation to the hypotheses formulated above, the following findings emerge: (i) As expected, there is a marked difference in the use of question marks and exclamation marks in comics and academic texts: only one academic text contains a single question mark at the end of the sentence (12) What function do beginning and ending lexemes assume in compound recognition?

and no text from this register uses any exclamation marks. This is in line with the usual correlation of these two punctuation marks with conceptually spoken language: all the comic texts contain both question and exclamation marks, although the proportion varies considerably, with results ranging from 14 to 134 instances. (ii) The discussion of quotation marks requires a distinction between single and double quotation marks. As for the distribution of the single quotation marks, their analysis made it necessary to distinguish manually between single quotation marks and the formally identical apostrophes. Since apostrophes are word-internal punctuation marks, they were only included in the analyses because of this necessary distinction, but they actually yielded interesting results: while both academic texts and comic texts contain a small number of stylistically neutral genitives (4 in Superman, 3 in Batman, 2 in Uncle Scrooge), the majority of the large amount of apostrophes in the comic texts either marks informal contractions (e.g. won’t) or omissions or shortenings characteristic of informal language usage, e.g. (13) With a swoop to his left an’ a peck to th’ right, he catches rat finks way out west!

However, it seems that there is currently a tendency for an increasing number of academic texts to use contractions, too, e.g. Moore and Notz (2006: 236, Let’s) or Mithun (2012: 53, I’m). No pairs of single quotation marks were used in the comic texts, as expected, but they occasionally occur in the academic writing (12 pairs in two texts). This result may also be variety-dependent to a certain extent: according to Seely (2007: 60–62), there is a tendency for British English usage to prefer single quotation marks over double quotation marks, whereas American English has the opposite tendency – codified e.g. in The MLA Style Manual (Achtert and Gibaldi 1985: 80).



Punctuation as an indication of register: Comics and academic texts  

 159

Note, however, that the article by Schneider, which uses single quotation marks, appeared in Language, which is an American journal. The analysis of the article by Biber et al. beyond the passage included in the corpus shows that a considerable proportion of single quotation marks enclose no quotations but paraphrases of meaning, e.g. in (14) an analysis of adjectives marking ‘certainty’

or words which are used metalinguistically, e.g. (15) any global characterizations of ‘General English’ should be regarded with caution

Contrary to expectations, double quotation marks are almost nonexistent in the AcadText corpus, with only one pair in one text: (16) we need to remember that ‘nations are mental constructs, “imagined communities” ’ which are constructed discursively […] (Wodak et al. 1999:4).

and it becomes clear that these are merely used to mark quotation marks within a quotation whose reference is given later in the text; the convention being that single quotation marks are doubled in this case and vice versa (cf. Achtert and Gibaldi 1985: 80; Sanchez-Stockhammer, forthcoming). While this quasi-absence of double quotation marks from AcadText may be attributed to the small size of the random sample or the conventions of individual publishers, chance cannot explain the other unexpected finding, namely the relative frequency of double quotation marks in the comic corpus (8 pairs; at least one per text). Since direct speech is already marked as such by its inclusion in speech bubbles, the double quotation marks must have a different function here: indeed, the quotation marks in the comics are used in their general (academic) function and serve to quote the speech of others. Thus the utterance (17) Maybe next time, master Bruce.

is countered by (18) Not “maybe”, Alfred.

Double quotation marks are also employed in the comics to refer to the metalinguistic use of words, e.g. (19) Funny, I didn’t think you even knew the word “honest,” Penguin.

160 

 Christina Sanchez-Stockhammer

In Superman, double quotation marks are additionally used on some occasions in narrative boxes to indicate the direct speech or thought of a character not shown in the current panel itself, but whose identity can be deduced from context or from the fact that suspension dots are linking the end of an utterance marked with quotation marks to its beginning in a panel on the previous page (cf. below). (iii) Contrary to expectations, no marked difference was observable in the use of commas: while the figures are lower for comics overall, they are still surprisingly close to the results obtained for the academic texts. However, a more detailed text-based analysis reveals that commas are mainly used with very specific functions in comics: a very large proportion separate off proper nouns with vocative function from the remainder of the sentence, e.g. in (20) Toyman, you maniac!

This use is completely missing in the academic texts. Alternatively, commas occur after introductory interjections in the comics, e.g. in (21) Man, would you look at THAT!

in another use that was not found in the academic writing. These register-specific uses explain why commas occur relatively frequently in the comic texts. The most frequent use of commas in comics which is also to be expected in academic texts (but is not too frequent in the sample) is the delimitation of sentence-initial adverbials, e.g. in (22) According to the contract, they are RABBIT eggs for your children, King!

(iv) Semicolons, by contrast, only occur in the academic writing, e.g. in Schneider (2003): (23) traces of the previous stage will still be found; that is, some insecurity remains

Since they are absent from the sample of comic texts – presumably due to the fact that most of their uses require relatively long sentences – they can generally be used as an indication of register with regard to the spoken/written dimension. (v) Surprisingly, it was observed that the amount of colons does not vary extremely between the comics and the academic texts considered. Merely ­Schneider (2003) stands out, since it is the only one among the three academic texts to indicate the precise pages in text-internal references that do not affect quotations. (vi) Neither sample contained any square brackets. As expected, not a single pair of round brackets was used in the comic corpus – in contrast to the academic



Punctuation as an indication of register: Comics and academic texts  

 161

texts, where brackets are commonly used to indicate references. The extremely large proportion in Juhasz (2003) with 41 pairs of round brackets is due to the fact that a large part of the passage randomly included in the AcadText corpus is constituted by the results section, in which relevant figures and examples are added in brackets, e.g. in (24) high-frequency beginning lexemes were responded to quicker than low-frequency beginning lexemes, t1(27) = ± 3.78, p < .01, t2(18) = ± 2.02, p = .059 .

While the quasi-absence of dashes from the academic papers in contrast to a larger proportion in CoCo seems to support the view that there is a difference in formality between these two punctuation marks, the quantitative difference is not as marked as one might have expected. Furthermore, the analysis of the texts reveals that dashes are frequently used in consecutive pairs in the Batman comics and also in Superman, which raises the number of dashes. In many cases, the combination < -- > seems to indicate a longer pause, e.g. in (25) But her insides are all right -- no bleeding there.

This use of dashes represents a function which is not usually required in academic texts. (vii) The difference in frequency between the use of suspension dots in comics and academic texts is far more pronounced than expected: the only academic text using them is Schneider (2003) in one instance where omission in a quoted passage is indicated: (26) ‘the discursive constructs of nations and national identities … primarily emphasize national uniqueness and intra-national uniformity but largely ignore intra-national differences’ (Wodak et al. 1999:4).

This is a use which is highly unlikely to occur in comics. However, the low frequency of suspension dots in the sample of academic texts seems to suggest that quotations are usually extracted in shorter portions and that omissions are avoided. This is supported by the quotations in AcadText, all of which represent extracts from individual sentences only, e.g. the following series of quotations from Schneider (2003): (27) a case of ‘identity revision’ triggered by the insight that one’s traditional identity turns out to be ‘manifestly untrue’ or at least ‘consistently unrewarding’ (Jenkins 1996:95)

162 

 Christina Sanchez-Stockhammer

Comics, by contrast, use suspension dots very frequently (all texts employ them between 18 and 43 times) and often in order to create cohesion by their occurrence not only at the end of an utterance which is interrupted in one panel, e.g. in (28) You might be stronger and faster than I am right now, Parasite…

but also at the beginning of the continued speech or thought in the next panel: (29) …but you’ve barely had forty-eight hours to practice using my powers.

Such interruptions are not merely attributable to spatial restrictions, it seems, but also to the fact that the picture in the new panel corresponds more closely to the action indicated in the second part, such as a punch with a fist in the Superman example above. The differences between the use of punctuation marks in the texts from the comic corpus and the academic texts are even more striking if considered graphically. Figure 4 summarises the features which are characteristic of comics (question marks, exclamation marks, suspension dots and apostrophes); Figure 5 those which are more typical of academic writing (semicolons, single quotation marks and round brackets).

Figure 4: Punctuation marks occurring more frequently in comics than in academic texts

It may therefore come as a surprise that this striking difference between the two registers cannot be backed statistically: non-parametric statistical tests for independent samples were carried out in SPSS in order to compare the medians between groups (i.e. comics vs. academic texts), but even the Mann–Whitney U



Punctuation as an indication of register: Comics and academic texts  

 163

test yielded no significant results for any of the variables (e.g. question marks) due to the small number of texts considered. Nonetheless, the graphically immediately obvious difference between comics and academic texts in Figures 4 and 5 permits the tentative conclusion that the use of punctuation in different registers can be employed as a register feature. At the same time, these results call for further empirical research, which is extremely likely to provide statistical backing for the more than obvious tendencies observed in this explorative study.

Figure 5: Punctuation marks occurring more frequently in academic texts than in comics

4 Conclusion Punctuation is a completely underresearched feature in register studies at the time of writing: thus Barbieri’s extensive annotation of major register and genre studies in Biber and Conrad’s Appendix A (2009: 271–295) does not mention punctuation a single time in the column “features under investigation”. It is only in Barbieri’s summary of Crystal’s (2001) major findings that there is a minor reference to it, when “minimal punctuation” is found to be one of the “common characteristics of internet registers” (Biber and Conrad 2009: 289). However, the empirical analysis of two register-specific corpora in the present study – one of comics and one of academic texts – suggests that certain types of punctuation tend to occur more frequently in certain types of register and that punctuation can therefore be employed as an indication of register. For instance, some punctuation marks correlate strongly with spoken and written style respectively and barely occur in the contrasting register. While question marks, exclamation marks, suspension dots and apostrophes are far more frequent in comics

164 

 Christina Sanchez-Stockhammer

than in academic texts, the latter use a larger proportion of semicolons, single quotation marks and round brackets. Furthermore, even in those cases where the results are similar from a quantitative perspective, differences in usage emerge upon closer consideration: for instance, comics tend to use commas after introductory interjections or proper nouns with vocative function, whereas academic texts make more varied use of that punctuation mark. Further research into this topic is required to establish the register-distinctive functions of the punctuation marks in more detail and for a larger number of registers. Biber’s distinction between different registers is “based on the premise that most formal differences reflect functional differences” (Biber 1995: 136). Nonetheless, he claims that his multidimensional approach differs from the studies of his predecessors in that he does not conduct a functional analysis in the first place so as to identify characteristic linguistic features. Instead, he states that he “first identifies groups of co-occurring features and subsequently interprets them in functional terms” (Biber 1988: 24). While this seems to contradict an approach such as the one used in the present study at first sight, one should not forget that Biber’s analyses presuppose a list of linguistic features which were then subjected to statistical analyses. Taking into account that he reviewed “previous research to identify potentially important linguistic features” in his preliminary analysis (Biber 1988: 64) and that these are understood as features “that have been associated with particular communicative functions and therefore might be used to differing extents in different types of text” (Biber 1988: 71–72), it becomes clear that he is not correlating random phenomena but only the results of previous functional analyses – even if these were carried out by other researchers. In this sense, the present study can be regarded as a legitimate suggestion for the extension of the original model. Within such a framework, punctuation is on a level with the 15 other major categories such as “Special features of conversation” (Biber and Conrad 2009: 82). “Punctuation” is thus tentatively suggested as category 16 with the following subordinate features (some of which did not prove distinctive for comics vs. academic texts but may play a more important role with regard to the differentiation between other registers): 1. full stop 2. question mark 3. exclamation mark 4. comma 5. semicolon 6. colon 7. dash 8. slash



9. 10. 11. 12.

Punctuation as an indication of register: Comics and academic texts  

 165

quotation marks (single, double) brackets (e.g. round, square, angled) word-internal punctuation (apostrophes, hyphens) combinations of punctuation marks (e.g. suspension dots, emoticons).

In a very wide reading, the division of a text into paragraphs could also be considered as punctuation (cf. Huddleston and Pullum 2002: 1725). According to Nunberg (1990: 17), “punctuation must be considered together with a variety of other graphical features of the text, including font- and face-alternations, capitalization, indentation and spacing”, all of which are said to fulfil a similar function. To this can be added the use of italics and bold print. At first sight, these features seem to go beyond the purely linguistic means and to unduly emphasise the visual and multimodal aspect of written language – but they sometimes find a correspondence in spoken language in pauses, stress, intonation etc., even if it is not completely systematic (cf. above). What makes the proposed category 16 special is the fact that the register features listed therein are not lexico-grammatical, like the other features included in Biber’s models up to the time of writing. Some of the punctuation features correlate with lexico-grammatical features (e.g. question marks with syntactic questions), which are in turn typical of specific registers (e.g. conversations). However, this does not mean that punctuation is a secondary register feature. Many other punctuation marks correlate with more abstract categories; e.g. quotation marks with quotations, which may take practically any lexical or syntactic form. Furthermore, it is normal that “linguistic features co-occur in texts because they reflect shared functions” (Biber 1995: 30). This does not necessarily imply that one should receive more weight than the other. As a consequence, punctuation is considered a register feature in its own right. In 1988, Biber (71–72) states for register analysis that “the goal is to include the widest possible range of potentially important linguistic features”. The empirical analysis presented here clearly suggests punctuation as such a feature. However, the proposed addition of punctuation to the set of categories is not to be regarded as any form of criticism of the original model, but merely as the suggestion of a valuable category to add to the long list of previously used features.

5 References Achtert, Walter S. & Joseph Gibaldi. 1985. The MLA style manual. New York: The Modern Language Association of America.

166 

 Christina Sanchez-Stockhammer

Arendholz, Jenny, Wolfram Bublitz, Monika Kirner & Iris Zimmermann 2013. Food for thought – or, what’s (in) a recipe? A diachronic analysis of cooking instructions. In Cornelia Gerhardt, Maximiliane Frobenius & Susanne Ley (eds.), Culinary linguistics: The chef’s special, 119–137. Amsterdam: Benjamins. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamins. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Bloomfield, Leonard. 1933. Language. New York: Holt. Booth, Wayne C., Gregory G. Colomb & Joseph M. Williams. 2008. The craft of research. 3rd edn. Chicago: University of Chicago Press. Crystal, David. 2001. Language and the internet. Cambridge: Cambridge University Press. Halliday, Michael A.K. 1978. Language as social semiotic: The social interpretation of language and meaning. London: Arnold. Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press. Jakobson, Roman. 1985. Closing statement: Linguistics and poetics. In Robert E. Innis (ed.), Semiotics: An introductory anthology, 145–175. Bloomington: Indiana University Press. Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts on a cognitive-semantic analysis of written language. International Journal of Cognitive Linguistics 2(1). 81–106. Lampert, Martina. 2013. Say, be like, quote (unquote), and the air-quotes: Interactive quotatives and their multimodal implications. English Today 29(4). 45–56. Law, Gwillim. 2010. Grawlixes past and present. http://www.statoids.com/comicana/grawlist. html (accessed 15 July, 2014). Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press. Meyer, Charles F. 1987. A linguistic study of American punctuation. Frankfurt am Main: Peter Lang. Mithun, Marianne. 2012. The deeper regularities behind irregularities. In Thomas Stolz et al. (eds.), Irregularity in morphology (and beyond), 39–59. Berlin: Akademie. Moore, David S. & William I. Notz. 2006. Statistics: Concepts and controversies. New York: W.H. Freeman. Nunberg, Geoffrey. 1990. The linguistics of punctuation. Menlo Park, CA: CSLI. Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in English: Exploring the guide functions of punctuation. Tübingen: Narr. Peters, Pam. 2004. The Cambridge guide to English usage. Cambridge: Cambridge University Press. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman. Rosch, Eleanor. 1973. On the internal structure of perceptual and semantic categories. In Timothy E. Moore (ed.), Cognitive development and the acquisition of language, 111–144. New York: Academic Press.



Punctuation as an indication of register: Comics and academic texts  

 167

Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental Psychology, General 104(3). 192–233. Runkel, Philip Julian & Margaret Runkel. 1984. A guide to usage for writers and students in the social sciences. Towota, New Jersey: Rowman & Allanheld. Sanchez-Stockhammer, Christina. 2012. Comicsprache – leichte Sprache? In Daniela Pietrini (ed.), Die Sprache(n) der Comics, 55–74. Munich: Meidenbauer. Sanchez-Stockhammer, Christina. Forthcoming. The transformative power of copying in language. In Corinna Forberg & Philipp W. Stockhammer (eds.), The transformative power of the copy: A transcultural and interdisciplinary approach. Heidelberg: Heidelberg Publishing. Searle, John. 1975. Indirect speech acts. In Peter Cole & Jerry L. Morgan (eds.), Syntax and semantics. Vol. 3: Speech act, 59–82. New York: Academic Press. Seely, John. 2007. Oxford A–Z of grammar and punctuation. Oxford: Oxford University Press. Snider, Neal. 2009. Similarity and structural priming. In Niels Taatgen & Hedderik van Rijn (eds.), Proceedings of the 31st annual conference of the Cognitive Science Society, 815–820. Austin, TX: Cognitive Science Society. Söll, Ludwig & Franz Josef Hausmann. 1985. Gesprochenes und geschriebenes Französisch. 3rd edn. Berlin: Erich Schmidt. Swales, John M. & Christine B. Feak. 2010. Academic writing for graduate students: Essential tasks and skills. 2nd edn. Ann Arbor: The University of Michigan Press. Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn. London: Penguin. Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.

Corpora: Comic Corpus (CoCo): Re-print: Englisch lernen mit Batman. Bad Guys Gallery. 2007. Munich: Berlitz. Re-print: Englisch lernen mit Superman. Up, up and away! 2007. Munich: Berlitz. Walt Disney’s Uncle $crooge. No. 376. April 2008. York (PA): Gemstone.

Corpus of Academic Texts (AcadText): Biber, Douglas, Susan Conrad & Randi Reppen 1994. Corpus-based approaches to issues in applied linguistics. Applied Linguistics 15. 169–189. Juhasz, Barbara, Matthew S. Starr, Albrecht W. Inhoff & Lars Placke 2003. The effects of morphology on the processing of compound words: Evidence from naming, lexical decisions and eye fixations. British Journal of Psychology 94. 223–244. Schneider, Edgar. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79. 233–281.

Martina Lampert

Linking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry Abstract: This paper will explore the possibility of linking up Biber’s register analysis and Talmy’s cognitive semantics, based on the assumption that some fundamental cognitive principles inform situational features and hence would, in part, determine linguistic characteristics. As one case in point, two samples of parenthetical constructions from opposite written registers, academic science writing and minimalist poetry, are scrutinised in an initial qualitative analysis. The study identifies both a general structural and functional similarity in the examples selected for illustration, suggesting that no significant register distinction will ensue, while the parenthetical pattern is likely to exhibit a substantial cross-medial difference between speech and writing. These preliminary findings invoke properties of the human cognitive architecture as well as evolutionary specifics of the language modalities as critical parameters of influence and would speak for their recognition as potential determinants of register and, in turn, for a principled compatibility of the two linguistic approaches.

1 Introduction In this paper, I will present some arguments for linking up Douglas Biber’s register analysis with a recent (re)conceptualizion of register as a cognitive construct framed in Leonard Talmy’s cognitive semantics, suggesting that the traceable principled compatibility of these two major approaches to linguistic analysis might open up some promising insights. In his forthcoming The Attention System of Language1, Talmy advances the view that register, generally couched in terms of “types of speech situations”,

1 As always, I am grateful to Len Talmy for the privilege of granting me access to a very substantial current draft version of this forthcoming book; unless otherwise indicated, all quotes are from this work, and the references to this unformatted draft lack page numbers. Martina Lampert, Johannes Gutenberg University Mainz

170 

 Martina Lampert

may allow for a consistent re-analysis as speaker attitude, for instance, “toward [a lexical item’s] core meaning itself; toward the speech participants (the speaker himself, the addressee, or the relation between the two); or toward the current circumstance”. That is, in a cognitive semantics perspective, register distinctions would become conceivable as backgrounded speaker role, or attitude, for that matter, which are introjected into the minds of participants, thus inevitably involving attention and memory as relevant cognitive categories. To illustrate: what might best be treated at root as a speaker’s attitude of respect toward the addressee – or a speaker’s attitude of solemnity about the circumstance – could also be interpreted as the presence of a formal situation that triggers the use of a formal register.

The fundamental significance of register for any appropriate analysis of any linguistic item that surfaces in Talmy’s explication ties in with Biber’s belief that “all linguistic descriptions”, such as, for instance, “collocational studies of particular words […] must include consideration of register differences as a central organizing parameter, if they hope to achieve an accurate account of the patterns of use” (Gray 2013: 361). Accordingly, “register differences should be an essential component of any investigation of language use” (Gray 2013: 369). These two statements, then, concur on the view that, in general, any linguistic construction inheres a register ‘signature’. Moreover, Biber’s and Talmy’s approaches might in fact be read as suggestive of such link-up, precisely as they are seen to converge in acknowledging the major role of both medial and cognitive determinants of linguistic patterns: introjected in participants’ minds, cognitive parameters appear to effectively constrain pertinent situational characteristics, as, e.g., Biber’s (1988: 160) remark tracing medial-distinctive effects back to “different cognitive constraints on the speakers and writers” unambiguously demonstrates – apart from and additional to the hardwired effectors of the medium and the tangible properties of the setting in their specific interdependence. Capitalizing on their essentially evolutionary ‘design’, Talmy (2007b) furthermore recognises the prime significance of the options and constraints of both the production and reception circumstances, while attention proves the single most decisive determinant among the situational specifics in communicative interactions to shape a linguistic item’s representational format and its functional potential. As a case in point, I will focus on a much neglected though highly pervasive phenomenon in language – what I have suggested to call parenthetical constructions (cf. Lampert 1992: 16 and chapter 2 below). To give a cursory impression of the pattern’s range in structural variability, the following examples, exclusively from academic writing, are in order. It should be noted that they are all in line



Linking up register and cognitive perspectives 

 171

with the formal prototype, as demarcated by parentheses in the schematic illustration (7) below. Examples (1) and (2) are taken from Nunberg (1999) and demonstrate a typical sub-clausal as well as an alleged marginal sentential instance. The two sub-morphemic exemplars (3) and (4), found in titles of scholarly articles, are likewise deemed to be peripheral members of the category, while (5) and (6), retrieved from the academic sub-corpus of the COCA, testify to the principled unconstrainedness of the format even in the formal register. (1) Yet for all these changes, there is a continuity here, too, in the way that change is (sometimes heatedly) debated and (sometimes grudgingly) accommodated. (2) And there is a large number of common words for talking about the language itself, for example slang, usage, jargon, succinct, and literate. (It is striking how many of these words are particular to English. No other language has an exact synonym for slang, for example, or a single word that covers the territory that literate covers in English, from “able to read and write” to “knowledgeable or educated”.) (3) Robertson, John M., Chi-Wei Linn, Joyce Woodford, Kimberly, K. Danos, and Mark A. Hurst. 2001. The (Un)Emotional Male: Physiological, Verbal, and Written Correlates of Expressiveness. The Journal of Men’s Studies 9, 393–412. (4) Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm Kelsall & John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London: Routledge. (5) He took pianists, guitarists and harpists in stride, but expressed shock at “13 young lady violinists (!), 1 young lady violist (!!), 4 violoncellists (!!!) and 1 young lady contrabassist (!!!!). (6) While ego orientation did not emerge as a significant predictor of likelihood to aggress in any of the three groups, significant correlations were found between ego orientation and likelihood to aggress for boys, r (????) =.20, p (7) Academic Social Science: (a) “A Strategy for Hong Kong Industries, Inc.” (b) “The prospect of mediation in resolving construction disputes” (c) “The Rehabilitation Development Coordinating Committee and the Future of Services Concerning People with Disabilities in Hong Kong”

Moreover, IDEAS metaphors, based on this expected topical diversity8, occur with varying elaborations from discipline to discipline and also functionally contribute to academic writing in different ways, as will be demonstrated below. From the cross-varietal perspective, it is difficult to establish what variety presents itself as most metaphorical on the basis of data concerning one conceptual domain. Additionally, although Hong Kong is clearly characterised by the highest frequency of IDEAS metaphors, these metaphors still show up in comparable numbers in Indian and Singaporean academic writing. Therefore, what is of greater interest here is the consideration of metaphorical variation beyond frequency. Kövecses (2010: 216) states that “two languages may share the same conceptual metaphor, but the metaphor is elaborated differently in the two languages”. For instance, the conceptual metaphors THE BODY IS A CONTAINER FOR THE EMOTIONS and ANGER IS FIRE have an attested existence in both Hungarian and English; in Hungarian the body with fire inside is often elaborated as a pipe  – an elaboration that does not appear to be at work in conventional English metaphors of this kind (Kövecses 2010: 216). By extending this notion to the study of varieties, it is possible to establish variation along the lines of this kind of elaboration. For instance, IDEAS were conceptualised in Hong Kong aca-

8 In the study of metaphor, we should not underestimate the problematic aspect of topical diversity, which is related to the design of the ICE corpora. As far as the author of the present paper is aware, the ICE texts, despite being carefully selected as representative examples of the text types comprising the general design of the ICE project, were not selected on the basis of topic similarity. Thus, ICE-based research into metaphor may run into the problem of absence of a domain, not because a variety does not make use of this domain, but because it just so happens that the topics of the text selected does not make use of it. This factor, along with the smaller nature of the ICE components, does in the long run present difficulties for more extensive research into metaphor variation, for which more frequencies for a particular domain may be required. However, in terms of register research, ICE’s design is still the best option for comparative study of varieties and thus has been used in the present study.



Metaphors in New English academic writing 

 239

demic Humanities as MORAL GUIDES, illustrated by (8) to (11) in the following section, which was not found to be part of the mappings for the other varieties. Whether or not these differences have an overall characterising role for the study of varieties remains to be seen and would require more extensive research. However, this does indicate a starting point from which to consider overall metaphorical variation along the variety divide. Specifically, it helps to create a basis for separating those metaphors and metaphorical expressions that are ubiquitous to all varieties from potentially variety-specific conceptualisations or at the very least variety-specific domain preferences. This will be briefly considered in the next section, which is followed by a closer look at metaphor variation and function from the sub-register perspective on New English academic writing.

6 Discussion 6.1 Metaphor across New English varieties In distributional terms, it is clear that the IDEAS domain is conceptualised by all categories in all three varieties, with the exception of the IMAGES category (no instances in Singapore academic writing), which did not contribute many metaphors in general (four for Hong Kong and three for India). Due to the fact that all varieties make use of nearly all source domain categories to conceptualise IDEAS, I conclude that there is no great difference between the varieties, especially in regard to the non-presence of a certain domain. Nevertheless, the similarities in domain exploitation for IDEAS metaphors do not necessarily exclude potentially variety-specific conceptualisations. If we consider differences in terms of the various entailments or elaborations apparent in shared domain mappings, it becomes clear that varieties, in fact, display a certain degree of variation. Consider (8) to (11) below from Hong Kong academic Humanities: (8) identity as a woman depends on the specific social regulatory ideals by which female bodies are trained and formed (9) it is widely accepted that general principles serve to guide moral conduct and decisions

(10) Ethical behaviour is guided by the ethical ideal of caring and not by principles or rules.

(11) we are under the guidance of the ethical ideal, that vision of the best self.

240 

 Barbara Güldenring

These extracts illustrate an elaboration that could be represented by the conceptual metaphor IDEAS ARE MORAL GUIDES, which occurs a total of 11 times in the Hong Kong corpus. While (8) to (11) display a personification of IDEAS (represented by principles, ideal and vision) in that they are pursuing a uniquely human activity, that is, serving as a good example of moral behaviour or actively guiding and training, this elaboration is not present in Indian and Singaporean academic writing and, thus, has the potential to be variety-specific. IDEAS ARE MORAL GUIDES belongs to a more general metaphor, IDEAS ARE DOMINANT PEOPLE that, by contrast, has been attested for all varieties and shows no major tendency towards a certain elaboration: (12) general principles do not always determine what is appropriate (13) writers […] are very much influenced by the theories of black Aesthetics (14) Whereas once EAP was dominated by the concept of registers

Another example for a variety-specific elaboration comes from Singapore, which is the only variety to conceptualise IDEAS as a TEACHER, illustrated by (15): (15) a controversial issue could be either a good or bad teacher by affecting learning through its contents or through its dynamics.

Although this TEACHER conceptualisation is unique to the Singaporean corpus, the notion that IDEAS can impact an individual or a society in a positive or negative manner, as illustrated in (15), is still part of metaphors found in all varieties, for instance: (16) IDEAS ARE PEOPLE WHO HELP (a) translation as the ideological handmaid of imperialism (b) they [principles] may all work together to facilitate the use of language (c) What would soothe her is […] the thought that his action in comforting her is a response to her need (17) IDEAS ARE PEOPLE WHO HARM (a) many forms of oppressive ideologies (b) Ayer’s notion of philosophy deprives philosophy of its empirical content (c) understanding of human phenomena are sometimes distorted by […] political beliefs, ideology and sheer ethnocentrism.



Metaphors in New English academic writing 

 241

All in all, these metaphors show that, despite the potential for individual preference for certain elaborations, such as IDEAS AS MORAL GUIDES or TEACHERS, New English varieties, specifically in academic writing, tend to draw from the same conceptual pool, that is, their metaphors display more conceptual similarities than differences. This is perhaps not so different from varieties traditionally conceived of as more “standard”, such as British or American English, which would also speak to the strong conventional nature of the academic register, to which I turn in the following.

6.2 Discussion: Metaphor across academic sub-registers Distributional differences in IDEAS metaphors can also be identified from the sub-register perspective. One obvious observation relates to the distribution of PEOPLE metaphors. Humanities emerges as the most clearly metaphorical sub-register for the conceptualising of this domain, followed by Social Science and Natural Science. Incidentally, each variety individually displays the same tendency, as portrayed in Table 4. Table 4: Distribution of IDEAS ARE PEOPLE metaphors Academic Humanities Hong Kong India Singapore Total:

Academic Natural Science

Academic Social Science

59 24 30

4 3 2

7 11 12

113

9

30

All varieties consistently place Humanities on the more metaphorical side and Natural Science on the less metaphorical side of the continuum, with Social Science somewhere in between. This is a general trend for most other categories9, illustrated for the second most prominent ontological metaphor, IDEAS ARE OBJECTS, by Table 5.

9 The exceptions are 1) India academic Natural Science and Social Science, which both contain three IDEAS ARE ARTEFACTS metaphors; 2) Hong Kong and India academic Natural Science has more IDEAS ARE ARTEFACTS metaphors than SOCIAL SCIENCE; and 3) India academic Natural Science has one IDEAS ARE IMAGES, whereas India Social Science has none. However, the frequencies involved here are very small and do not necessarily detract from the general trend.

242 

 Barbara Güldenring

Table 5: Distribution of IDEAS ARE OBJECTS metaphors Academic Humanities

Academic Natural Science

Academic Social Science

Hong Kong India Singapore

47 27 21

2 4 1

14 5 10

Total:

95

7

29

Despite these frequency differences, the New English academic register as a whole displays a common characteristic, in that it makes use of well-established conventional metaphors, of the kind we would expect to find in other academic English varieties. Examples (19) to (20) represent conventional metaphors from the OBJECTS category of that feature in all academic sub-registers: (18) IDEAS ARE CONTAINERS (a) The interesting aspect […] lies in its intra-Asian comparative literature perspective (Humanities) (b) one of the major problem in the lanthanide f-f intensity theory (Natural Science) (c) Such a view hides in it subtle dangers. (Social Science) (19) IDEAS ARE OBJECTS IN CONTAINERS (a) the eternal world become latent dream-thoughts stored in the unconscious psyche (Humanities) (b) Medical personnel […] should keep this concept in mind (Natural Science) (c) the word “wealth” […] now occupies the vacated slot in the dirt dictionary as an unworthy concept (Social Science) (20) IDEAS ARE POSSESSIONS (a) Full cognizance should be given to the influences on the curriculum planner (Humanities) (b) the panoramic view gives the idea that it is more slopy and undulating (Natural Science) (c) The argument […] was given another perspective (Social Science)

However, in view of the topical diversity sketched out above, the mere fact that these metaphors can occur in all academic sub-registers (albeit for some in small numbers) is not necessarily an indication of similarity in the way these metaphors are elaborated. Just as we entertained the notion of variety-specific conceptualisations, we can consider discipline-specific conceptualisations, or at least



Metaphors in New English academic writing 

 243

preferences, by examining those metaphors from the OBJECTS category that are not found in all academic sub-registers. When IDEAS are conceptualised as OBJECTS in general, there is more of a tendency in the Humanities, first and foremost, and in Social Science, secondly, to highlight certain qualities, whereas in Natural Science no specific qualities are attributed to IDEAS as OBJECTS. To exemplify this, consider the following qualities, which can be formulated as individual mappings: (21) IDEAS ARE MOVEABLE OBJECTS (a) Wittgenstein has replaced Kant’s concept of mind by language (Humanities) (b) One view, advanced in the 1920s and 1930s (Social Science) (22) IDEAS ARE VISIBLE OBJECTS (a) Although the concept was never defined formally, it is clear on the basis of these answers (Humanities) (b) financial statements that show a “true and fair view” (Social Science)

Examples (21) to (22) may not illustrate metaphors in the strictest “discipline-specific” sense due to their occurrence in two separate sub-registers, or they may be an indication of Social Science containing texts of a more “Humanities” nature than a “Natural Science” one. Nevertheless, when considering the frequency of these metaphors, it becomes apparent that Humanities shows a slight preference for them over Social Science, since IDEAS ARE MOVEABLE OBJECTS occurs 8 times in the Humanities and 6 times in Social Science, while IDEAS ARE VISIBLE OBJECTS occurs 18 times in the Humanities and only 4 times in Social Science. In fact, by taking a closer glance at the latter category, we can see a perhaps more suitable candidate for a discipline-specific elaboration, because IDEAS are not only VISIBLE OBJECTS in the Humanities, but also represented as VISIBLE OBJECTS that were previously hidden from view and, by their revealing, have attained the VISIBLE quality: (23) IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY HIDDEN FROM VIEW) (a) the article is a legitimate attempt at establishing rapports de fait […] shedding light on certain issues (b) Subsequently they have exposed this notion as a historical and ideological construct (c) This paper aims to put a step towards that by highlighting certain pragmetic [sic] principles, some of which may go otherwise unnoticed

244 

 Barbara Güldenring

This particular elaboration makes up 72.2 % of IDEAS ARE VISIBLE OBJECTS (13 out of 18) in the Humanities texts and perhaps points to a functional role for this metaphor in this academic sub-register. Humanities texts, often in introductory sections, typically inform the reader about the history of ideas involved in the discussion of the topic at hand. For instance, if we consider the greater context of (23c), a linguistic paper from the India corpus entitled “Pragmatic Principles and Language”, it becomes clear that IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY HIDDEN FROM VIEW) functions to locate the paper within these previous ideas and accentuate its contribution to these ideas: (24) Philosophers have found pragmatics to be quite close to what they have called “ordinary language analysis”. They have often used isolated insights about the working of language in solving philosophical riddles without paying much attention to many of the underlying pragmatic principles of the language that they are using. As they have primarily concerned themselves with the theories of meaning, rules, and other related issues, they were forced to study pragmatics of language incidentally without which they would not have found it possible to explain, for example, what is “meaning”. A fuller understanding of pragmatic aspects of the working of language is yet to be achieved despite numerous attempts by philosophers and linguists. This paper aims to put a step towards that by highlighting certain pragmetic [sic] principles, some of which may go otherwise unnoticed.

By conceptualising IDEAS (principles) as becoming VISIBLE OBJECTS in need of revelation (highlighting, go otherwise unnoticed), it becomes obvious to the reader that the present article’s aim is to fill those knowledge gaps left by previous “philosophers and linguists”. Incidentally, the other 12 metaphors of this kind found in the Humanities texts function in exactly the same way. This is not necessarily evidence for Natural Science or Social Science texts being completely void of this metaphorical function, despite the data indicating a clear preference for it in the Humanities, which is most likely due to the nature of the topics that these texts comprise. At this point, we could entertain the possibility that metaphors of this kind act systematically as metaphorical “register features” (cf. Biber and Conrad 2009; Schubert, this volume; Sanchez-Stockhammer, this volume) due to a register’s or, in this case, sub-register’s preference for this particular mapping and function. Furthermore, the more extensively we investigate the relationship between metaphor and register within the study of varieties of English, we could conceive of the existence of metaphorical “register markers” (cf. Biber and Conrad 2009; Schubert, this volume; Sanchez-Stockhammer, this volume), whose uniqueness



Metaphors in New English academic writing 

 245

is not only determined by the register in which they prominently feature, but also perhaps by the extent to which a variety is nativised.10 Nevertheless, the present data provides insight into another preference and, thus, another potential metaphorical register feature that can be seen in IDEAS ARE PEOPLE, particularly those that stretch beyond the sentence boundary over a larger portion of the text. Consider (25) below, which serves as an example of how metaphors can influence textual structuring, that is, how they contribute to the cohesion as well as coherence of a text more significantly in Humanities than in Natural Science and Social Science: (25) It is time for courses to introduce controversial issues in management studies. A controversial issue covers new grounds. It enhances the learning process. It could facilitate further the practice of examining, analyzing and deciding skills. However, if not carefully introduced, controversial issues could generate a disproportionate degree of confusion, and result in demotivating the students. As such, the introduction of a controversial issue in the curriculum would have to be properly managed because a controversial issue could be either a good or bad teacher by affecting learning through its contents or through its dynamics.

We have encountered this metaphor before as IDEAS ARE TEACHERS (15) and determined that it is a metaphor specific to the Singapore corpus. However, in (25) we see that it functions to promote the coherence of the text, because an IDEA (issue) is portrayed as having all those teacher-like qualities one could expect when encountering a real teacher: A good teacher covers new grounds (topic-wise), enhances the learning process, facilitates the practice of skills, while a bad teacher can generate confusion and demotivate students. These qualities are attributed to IDEAS via the repeated presence of the metaphor IDEAS ARE TEACHERS, which is then directly stated at the end of the passage, acting as a summary of sorts. Here, it is also conceivable to consider this metaphor’s function in creating cohesion due to the fact that almost each instantiation of IDEAS (issue(s), it) is embedded in the same metaphor throughout the passage and all are linked by language pertaining to both helpful attributes of a teacher (e.g. enhancing learning and facilitating practice of skills) as well as negative attributes (e.g. generat-

10 For extensive discussion about nativisation and the extent to which a variety, as it is developing, orientates itself towards the English input variety, cf. Schneider’s “Dynamic Model” (Schneider 2007, 2003). Furthermore, research is currently being completed by the author of the present paper exploring the relationship between metaphor and nativisation and, thus, considering to what extent a variety, e.g. Indian English, behaves metaphorically different from its traditional input variety, British English, for certain target domains, e.g. EMOTIONS.

246 

 Barbara Güldenring

ing confusion and demotivating). This is different for Natural Science and Social Science texts, which do not give such prominence to IDEAS metaphors, and, in doing so, leave little room for them to structure their respective texts in this manner. Again, from this perspective, it seems to make more sense to talk about potential metaphorical “register features” over “register markers” (cf. Schubert, this volume).

7 Conclusions The assumption behind the present study is that metaphor is a characteristic and functional feature of the academic register. Although this study focuses on metaphors conceptualising a single domain, it shows that, despite traditional notions of the metaphorical poverty of this register, academic writing is by no means void of metaphorical language, which, in turn, indicates the presence of conceptual metaphors. In particular, New English academic writing, as represented by the ICE components under investigation, makes use of conventional metaphors that can be encountered in academic writing associated with more traditional varieties of English. This is perhaps the result of the highly revised and edited production circumstances and international reach of this register, which, taken together, may discourage more variety-specific conceptualisations in favour of conventional metaphors intelligible to speakers of all varieties and non-native speakers alike. Despite this conventionality, it is nevertheless possible to point out potentially variety-specific conceptualisations by taking a finer-grained look at how a variety elaborates on a more general metaphor. In fact, it is perhaps on this level of analysis that metaphorical variation across varieties can be encountered in general. In order to provide more evidence for this, research on other domains and with other varieties is required. From the sub-register perspective, it is possible to pinpoint the most metaphorical discipline for a specific domain, e.g. Humanities as most metaphorical for the IDEAS domain. Nevertheless, if other domains were examined, it could very well be the case that a completely different academic sub-register emerges as the most metaphorical. Furthermore, for metaphorical variation across the disciplines in this study of New English academic writing, at this stage it is possible to identify potential candidates for metaphorical “register features” rather than metaphorical “register markers” due to the fact that none in the data were exclusive to one specific academic sub-register, although a preference for certain metaphors can be determined. This also requires more research, which would most certainly benefit from the inclusion of other sub-registers or comparison



Metaphors in New English academic writing 

 247

with metaphorical data from popular texts pertaining to the Humanities, Natural Sciences and Social Sciences, which the ICE corpora also provide. In terms of their functional properties, a metaphor conceptualising a certain domain may exhibit functional features that can only be demonstrated for a particular sub-register, like signalling a paper’s contribution to a body of research in the Humanities. However, here again, further research can improve on the study of metaphorical function by adhering more strictly to a “census” technique, such as MIPVU, as well as relying on texts that do not display such a topical diversity, as the ICE components do. Additionally, recent work in metaphorical variation and the varieties11 exploit the advantages of using a significantly larger corpus, like Davies’ (2013) Corpus of Global Web-Based English (GloWbE), in order to make more extensive frequency-based claims about variety-specific domain preferences as well as to contribute to research into web registers (cf. Biber and Egbert, this volume) from the cross-variety perspective12. All things considered, employing metaphor as a feature to investigate both variety-based and register variation has the potential to provide many more insights into the nature of these highly relevant fields of study.

References Anthony, Laurence. 2012. AntConc (Version 3.3.5) [Computer Software]. Tokyo, Japan: Waseda University. http://www.antlab.sci.waseda.ac.jp/ Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS Category System. http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf (accessed 5 May 2011). Berber Sardinha, Tony. 2012. An assessment of metaphor retrieval methods. In Fiona MacArthur, José Luis Oncins-Martínez, Manuel Sánchez-García & Ana María Piquer-Píriz (eds.), Metaphor in use: Context, culture, and communication, 21–50. Amsterdam & Philadelphia: John Benjamins. Berber Sardinha, Tony. 2007. Metaphor in corpora: A corpus-driven analysis of Applied Linguistics dissertations. Rev. Brasileria de Lingüística Aplicada 7(1). 11–35. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP. Black, Max. 1954. Metaphor. Proceedings of the Aristotelian Society 55. 273–294. Cameron, Lynne. 2003. Metaphor in educational discourse. London: Continuum. Davies, Mark. 2013. Corpus of global web-based English. http://corpus.byu.edu/glowbe/.

11 Cf. Díaz-Vera’s (2015) study on various conceptualisations of LOVE in India, Pakistan and Nigeria. 12 GloWbE provides an opportunity to efficiently compare 20 distinct varieties of English worldwide, of which the bulk could be categorised as belonging to the “New Englishes”.

248 

 Barbara Güldenring

Deignan, Alice. 2005. Metaphor and corpus linguistics. Amsterdam & Philadelphia: John Benjamins. Díaz-Vera, Javier E. 2015. Love in the time of corpora. Preferential conceptualizations of love in world Englishes. In Vito Pirrelli, Claudia Marzi & Marcello Ferro (eds.), Word structure and word usage. Proceedings of the NetWordS final conference, 161–165. http://ceur-ws.org/ Vol-1347/paper37.pdf (accessed 13 May 2015). Drewer, Petra. 2003. Die kognitive Metapher als Werkzeug des Denkens. Zur Rolle der Analogie bei der Gewinnung und Vermittlung wissenschaftlicher Erkenntnisse. Tübingen: Narr. Goatly, Andrew. 1997. The Language of metaphors. London & New York: Routledge. Hardie, Andrew, Veronika Koller, Paul Rayson & Elena Semino. 2007. Exploring a semantic annotation tool for metaphor analysis. In Matthew Davies, Paul Rayson, Susan Hunston & Pernilla Danielsson (eds.), Proceedings of the Corpus Linguistics 2007 Conference, 1–12. http://corpus.bham.ac.uk/corplingproceedings07/paper/49_Paper.pdf (accessed on 19 August, 2011). Jäkel, Olaf. 1997. Metaphern in abstrakten Diskurs-Domänen. Eine kognitiv-linguistische Untersuchung anhand der Bereiche Geistestätigkeit, Wirtschaft und Wissenschaft. Frankfurt am Main: Peter Lang. Kövecses, Zoltán. 2010. Metaphor: A practical introduction, 2nd edn. Oxford: OUP. Krennmayr, Tina. 2011. Metaphor in newspapers. Utrecht: LOT. Lakoff, George & Mark Johnson. 2003 [1980]. Metaphors we live by, 2nd edn. Chicago & London: Chicago UP. Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), Comparing English worldwide: The International Corpus of English, 27–35. Oxford: Clarendon. Partington, Alan. 1998. Patterns and meanings: Using corpora for English language research and teaching. Amsterdam & Philadelphia: John Benjamins. Platt, John, Heidi Weber & Ho Mian Lian. 1984. The New Englishes. London: Routledge. Pragglejaz Group. 2007. A practical and flexible method for identifying metaphorically-used words in discourse. Metaphor and Symbol 22(1). 1–39. Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment, Computing Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/ Römer, Christine. 2000. Metaphern in der Wissenschaftssprache: Bildfelder der sprachwissenschaftlichen Fachkommunikation. In Josef Bayer & Christine Römer (eds.), Von der Philologie zur Grammatiktheorie, 353–365. Tübingen: Max Niemeyer. Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: CUP. Schneider, Edgar W. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79(2). 233–281. Semino, Elena. 2008. Metaphor in discourse. Cambridge: CUP. Semino, Elena, Alice Deignan & Jeannette Littlemore. 2013. Metaphor, genre, and recontextualization. Metaphor and Symbol 28(1). 41–59. Skorczynska, Hanna & Alice Deignan. 2006. Readership and purpose in the choice of economics metaphors. Metaphor and Symbol 21(2). 87–104. Steen, Gerard J., Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr & Trijntje Pasma. 2010. A method for linguistic metaphor identification. From MIP to MIPVU. Amsterdam & Philadelphia: John Benjamins. Stefanowitsch, Anatol. 2006. Words and their metaphors: A corpus-based approach. In Anatol Stefanowitsch & Stefan Th. Gries (eds.), Corpus-based approaches to metaphor and metonymy, 63–106. Berlin & New York: Mouton de Gruyter.



Metaphors in New English academic writing 

 249

Wolf, Hans-Georg & Frank Polzenhagen. 2009. World Englishes: A cognitive sociolinguistic approach. Berlin & New York: Mouton de Gruyter. Zichler, Csilla. 2010. Metaphern in der Wissenschaftssprache. Sprachtheorie und germanistische Linguistik 20(1). 95–112.

Steffen Schaub

The influence of register on noun phrase complexity in varieties of English Abstract: This study explores noun phrase (NP) complexity variation in registers of regional varieties of English. The focus is on the description of NP complexity in four registers (academic writing, conversation, unscripted speeches and social letters) across five regional varieties of English (Canada, Hong Kong, India, Jamaica, Singapore). For that, noun phrases are extracted from a register-stratified subsample of the International Corpus of English and annotated for NP complexity based on a four-way categorisation system: i) unmodified, ii) premodified only, iii) postmodified only, iv) pre- and postmodified. The results corroborate the strong influence of register on NP complexity, depending on two situational characteristics: communicative purpose (informational vs. interactional) and mode (written vs. spoken). Finally, it is assessed whether NP complexity is a viable marker of regional variation in comparative varieties research.

1 Introduction This study explores noun phrase (NP) complexity variation in registers of regional varieties of English. There are three motivations for pursuing this particular research topic: the lack of descriptive work on the noun phrase in varieties of English, a growing interest in register variation in English varieties research and awareness of the strong influence of register on NP structure. These motivations are discussed in more detail in the following. Descriptive work on the regional varieties of English has developed a focus on comparison. With the emergence of comparable linguistic corpora, such as the International Corpus of English (ICE), linguists have compared individual varieties against a normative ‘yardstick’ (usually British English) or against each other. Most of the attention has been devoted to phonology, lexis and morphosyntax. Interest in the latter was mainly guided by investigations of ‘non-standard’ features, i.e. features reported to occur in Englishes around the world that do not occur in the norm-providing standard varieties. The task is to re-evaluate early feature reports based on anecdotal observation (e.g. Platt, Weber and Ho 1984) Steffen Schaub, University of Marburg

252 

 Steffen Schaub

and to confirm their validity using empirical means. With regard to the noun phrase across regional varieties of English, three ‘non-standard’ features are frequently mentioned in surveys and grammatical descriptions: noun pluralisation (Platt, Weber and Ho 1984; Ahulu 1998; Hall, Schmidtke and Vickers 2013), use of the article system (Sand 2004; Lamidi 2007; Wahid 2013; Sand forthc.), and subject-verb concord (Asante 1995; Ahulu 1998; Blair and Collins 2001; Sand forthc.). Other, less frequently reported phenomena include variation in the pronoun system (Lamidi 2007; Kortmann and Lunkenheimer 2013), the expression of possession (Kortmann and Lunkenheimer 2013) and adjective comparison (Kortmann and Lunkenheimer 2013). More recently, interest in the noun phrase across varieties of English has moved beyond the investigation of isolated morphosyntactic features. Brunner (2014) introduces NP modification patterns as a marker of regional variation across varieties of English. He compares NP structures in British, Kenyan and Singapore English and finds that “[i]n Singapore English, premodified NPs are significantly overrepresented [while] in Kenyan English, postmodifiers are more frequent than premodifiers” (Brunner 2014: 44). He attributes these preferences to contact influence from the indigenous languages of the respective areas, based on their typological profiles (head-final vs. head-initial word order). These findings are drawn from the register of spontaneous spoken conversation, which is “arguably the least stylized and can therefore be expected to be susceptible to contact-induced language change” (Brunner 2014: 30). In order to substantiate the claim that preferences in NP modification are the result of language contact, it is necessary to study more registers to see if these tendencies can be confirmed. The notion of ‘register’ is a relatively recent addition to research into varieties of English. Register is defined here, in accordance with Biber and Conrad (2009: 6), as “a variety associated with a particular situation of use (including particular communicative purposes)”. So far, English varieties have mainly been handled as homogeneous entities conveniently defined by the borders of political nation-states rather than linguistic criteria, but this is not due to a lack of awareness. Already in early reports we find observations that take register variation into consideration. Platt, Weber and Ho (1984: 49), for instance, frequently differentiate between written and spoken as well as formal and colloquial language when discussing individual features, e.g.: “It is common in some New Englishes to mark the plural of the noun more often in writing and in more formal speech. There would be less marking in colloquial speech”. Nevertheless, for much of World Englishes research, the nation-state variety remained the preferred level of comparison. Macro-scale projects such as the Electronic World Atlas of Varieties



The influence of register on noun phrase complexity in varieties of English 

 253

of English (Kortmann and Lunkenheimer 2013) show that demarcating varieties even at this general level produces a large number of distinct entities.1 An essential component of register research is the analysis of grammatical features and their function in particular registers (see Schubert, this volume). With the emergence of comparable, computer-readable corpora in the 1990s, which are also subdivided into genres, it is possible to move beyond anecdotal observation and to verify hypotheses about register variation systematically. For instance, Sand (2004) compares article use across varieties and concludes that “differences across text types are observable and genre differences within one variety are practically always more pronounced than overall variation across varieties” (Sand 2004: 294–295). A growing number of studies extend Biber’s (1988) multidimensional approach to register variation to the study of regional varieties of English (see, for instance, Balasubramanian 2009; Xiao 2009; Neumann 2012; Neumann and Fest, this volume). Balasubramanian (2009: 4) specifically addresses register variation in Indian English, arguing that just as traditionally recognized ‘native’ varieties of English are recognized for the variation within them, so too, should the emerging new varieties. The ‘native’ varieties of English are recognized for the differences within them stemming from region, social status, and reason for use or register […] to name just a few variables. […] Any study of a new variety of English, then, should focus on identifying the variation within it, (and not just on describing a set of features that characterize the national variety), and provide detailed descriptions of the national variety […].

Xiao (2009) explores variation across twelve registers and five varieties using the multidimensional analysis (MDA) approach developed by Biber (1988). The study encompasses 141 grammatical and semantic features. Xiao concludes that “variations in language use involve regional varieties as well as variants in different registers and along different dimensions” (Xiao 2009: 447). In sum, register differences are increasingly addressed in English varieties research, and it becomes clear that the influence of register on the overall structural variation of regional English varieties must be taken into account. The connection between register and NP complexity has been demonstrated repeatedly. Aarts (1971) analyses NP complexity across four different text types and concludes that NP complexity correlates with syntactic function: While the subject slot prefers ‘light’ noun phrases, the object slot prefers ‘heavy’ ones. In

1 The eWAVE database covers 76 mostly national varieties of English, including, however, a number of localised dialectal varieties, for instance East Anglian English or Appalachian English (Kortmann and Lunkenheimer 2013).

254 

 Steffen Schaub

addition, Aarts found a tendency for heavy noun phrases to be much less frequent in spoken than in written texts. The latter point is taken up by de Haan (1993), who confirms Aarts’ (1971) hunch about the relation between NP complexity and text type. De Haan (1993) further investigates the combined influence of text type and syntactic function on NP complexity, and finds that, in some cases, the two reinforce each other, while in other cases they cancel each other out. Halliday (1989) argues that spoken language is no less complex than written language, but that the complexity is located differently. While spoken language has a more elaborate clausal structure, in written language, the complexity lies in the constituents below the clausal level, foremost in what he calls the nominal group. Nominals, in writing, carry “the meat of the message” (Halliday 1989: 72). Schäpers (2009), using a corpus of spoken and written British English, confirms that “[n]oun phrases are more complex in written language with regard to premodification, postmodification, and both pre- and postmodification” (2009: 153). On the level of registers, Biber et al. (1999) find that almost 60 % of noun phrases in academic prose have a modifier, while only 15 % of noun phrases in conversation are modified (Biber et al. 1999: 578). In general, academic prose is characterised by a more frequent use of nouns than conversation (Biber and Conrad 2009: 116–117). The linguistic differences between these two registers, Biber and Conrad argue, can be explained on the basis of their different situational characteristics: while the purpose of conversation is to develop personal relationships, academic prose focuses on communicating information (Biber and Conrad 2009: 109). To sum up, the strong connection between NP complexity and register has been confirmed in various studies of British and American English. The present study combines the three interconnected research interests outlined above. NP complexity is systematically compared across five varieties of English (Canadian English, Indian English, Jamaican English, Hong Kong English and Singapore English) and four registers (academic writing, conversation, unscripted speeches and social letters). The regional varieties reflect diverse socio­cultural and linguistic backgrounds. The registers were selected as counterparts based on two situational characteristics, namely mode (spoken vs. written) and communicative purpose (information vs. interaction).2

2 Although the texts are meant to represent the extremes of these two situational characteristics, a strict line cannot be drawn. For example, social letters may also be used to inform, for instance in work-related exchange between colleagues. Likewise, unscripted speeches contain interactional elements, as will be evident from the discussion of personal pronouns below.



The influence of register on noun phrase complexity in varieties of English 

 255

Table 1: Situational characteristics of registers in sample Mode/Communicative Purpose

informational

interactional

written spoken

academic writing unscripted speeches

social letters conversation

Based on the discussion above, a number of tentative hypotheses can be formulated. First, it is expected that register exerts a strong influence on NP complexity. Matched with the two situational characteristics mode and communicative purpose, NP complexity is likely to increase a) from interactional to informational texts, and b) from spoken to written texts. For our four registers, this yields the following: – Academic writing is expected to show the highest frequency of complex noun phrases. This is mainly due to the informational character, the high level of formality and the careful planning and revision during the production process. – Conversation is a highly interactive face-to-face exchange between two or more parties. Due to these situational characteristics, a higher frequency of pronouns, particularly personal pronouns, is expected. Furthermore, conversation is expected to contain the lowest frequency of complex noun phrases of all four registers, both due to mode and communicative purpose. – Unscripted speeches are expected to show a higher degree of NP complexity than conversation. This is due to the formal and informational character of unscripted speeches. However, complexity is expected to be lower than in academic writing because of the spoken mode. – Social letters are expected to contain more complex noun phrases than conversation because they are written and are planned and possibly revised during production. The level of NP complexity, however, is expected to be lower than in academic writing, because the communicative purpose of social letters is to interact. A second motivation of the present study is to further explore the potential of NP complexity as a marker of regional variation, especially in the light of a register-sensitive comparison (see the discussion in Section 4). Due to the exploratory nature of the study, the results are not tested for statistical significance.

256 

 Steffen Schaub

2 Methodology The present section describes the data and the annotation process used in the following analysis. Section 2.1 discusses various categorisation systems used to mark NP complexity and introduces the system used in the analysis to follow. Section 2.2 describes the corpus data and the annotation process.

2.1 Categorising NP complexity There are a number of methods for categorising NP complexity. The simplest is a binary distinction into ‘simple’ and ‘complex’ noun phrases, although the line is drawn differently by different authors. The most common understanding of this two-way distinction distinguishes between the presence and the absence of modification; in other words, all pre- and/or postmodified noun phrases are ‘complex’,3 while the remaining are ‘simple’. Some authors (de Haan 1993; Biber et al. 1999: 573–655) distinguish four classes of complexity (unmodified, premodified, postmodified, pre- and postmodified), with determination being optional for all four types. A more elaborate system is used in Jucker (1992: 259–260), whose annotation scheme not only specifies the type of head noun and modification(s), but also records the structural depth of the noun phrase, i.e. the degree of embedding in the modification. The present study makes use of the categorization system developed in de Haan (1993), which is also used in Biber et al. (1999: 573–655). It distinguishes four classes of NP complexity: class 1 comprises all noun phrases that lack modification, including pronouns, proper nouns, as well as unmodified common nouns. In the analysis to follow, class 1 is further subclassified: personal pronouns have been identified as a word class that is highly sensitive to register, so that a finer distinction of class 1 into personal pronouns on the one hand and other types of NP heads on the other is desirable. Class 2 includes all noun phrases that are premodified only. Class 3 includes all noun phrases that are postmodified only. Finally, class 4 includes all noun phrases that are both pre- and postmodified. As a slight modification to de Haan (1993) and Biber et al. (1999), class 4 here also includes multi-head coordinated constructions, e.g. the men and women. All four classes optionally contain determination. Although four classes are distinguished, the discussions below make occasional reference to the binary simple–

3 In the present paper, determination is not treated as modification.



The influence of register on noun phrase complexity in varieties of English 

 257

complex distinction referred to above. The former is identical with class 1, while the latter comprises classes 2–4. The system is summarised in Table 2. Table 2: Categorisation system for NP complexity (based on de Haan 1993); the (+) symbol indicates possible multiple instances Simple NPs

Class 1 Class 2

(DET) (DET)

– PREM(+)

HEAD HEAD



Complex NPs

Class 3 Class 4

(DET) (DET)

– PREM(+)

HEAD HEAD(+)

POSTM(+) POSTM(+)

2.2 Corpus and annotation The analysis to follow in Section 3 is based on a sample of 8,000 noun phrases taken from five components of the International Corpus of English: Canadian English (CAN), Indian English (IND), Jamaican English (JA), Hong Kong English (HK) and Singapore English (SIN). The varieties were selected in order to represent both traditional and ‘new’ Englishes, while at the same time covering different regions of the world. For each variety, texts from four registers were included: academic writing (from the sub-register ‘humanities’), conversation, social letters and unscripted speeches. For each register, three text units comprising 2,000 words were selected at random. The resulting sub-corpus is a selection of 60 text units stratified across four registers and five varieties, totalling approximately 120,000 words. In the following, I will describe the annotation process in more detail. First, the noun phrases are marked in the raw data using a simple bracket-and-label system. Only top-level noun phrases are marked; in other words, noun phrases that are embedded in larger noun phrases are not marked separately. As an illustration of the marking system, consider the sample sentence in (1a) and its marked version in (1b). Note how the embedded NP the line in the larger NP the other end of the line is not marked individually. (1a) Was a pleasant surprise to hear your voice again from the other end of the line. (1b) Was [NP a pleasant surprise] to hear [NP your voice] again from [NP the other end of the line].

Randomisation was introduced at two steps in the annotation process. First, as stated above, for each register–variety combination, three textual units were picked at random. In these textual units, all noun phrases were marked. Second,

258 

 Steffen Schaub

a sample of 400 NPs for each register–variety combination was extracted randomly, adding up to a total of 8,000 NPs. In the second step, the extracted noun phrases were annotated in a spreadsheet: the annotation includes the variables complexity, based on the four-way categorisation system outlined in Section 2.1, as well as variety, register and length (in orthographic words).

3 Results Table 3 shows the frequencies of the four complexity classes across the four registers for all five varieties combined. In general, simple NPs without modification (class 1) are most frequent overall (5,084 tokens or 64 %). Complex NPs (classes 2 to 4) are considerably less frequent: NPs with premodification (13 %) and postmodification (14 %) are relatively equally frequent, while NPs with both pre- and postmodification are the least frequent class (9 %). Table 3: NP complexity across registers (class 1 = unmodified NPs incl. pronouns; class 2 = premodified NPs; class 3 = postmodified NPs; class 4 = pre- and postmodified NPs and coordinated multi-head NPs) Conversation Class 1 Class 2 Class 3 Class 4

1,559

Total

2,000

Unscripted speeches

Social letters

Academic writing

Total

(77.95 %) 1,291

(64.55 %) 1,402

(70.10 %)

832

(41.60 %) 5,084 (63.55 %)

209

(10.45 %)

249

(12.45 %)

249

(12.45 %)

334

(16.70 %) 1,041 (13.01 %)

159

(7.95 %)

300

(15.00 %)

209

(10.45 %)

466

(23.30 %) 1,134 (14.18 %)

73

(3.65 %)

160

(8.00 %)

140

(7.00 %)

368

(18.40 %)

(100 %) 2,000

(100 %) 2,000

(100 %) 2,000

741

(9.26 %)

(100 %) 8,000

(100 %)

The frequencies of the four classes vary with regard to register: simple NPs (class 1) are frequent in conversation (78 %), unscripted speeches (65 %) and social letters (70 %), but relatively infrequent in academic writing (42 %). Analogously, complex NPs (classes 2–4) are relatively infrequent in conversation (22 %) and highly frequent in academic writing (58 %). Taking into consideration the two situational characteristics of the registers as defined in the introduction (mode and communicative purpose), NP complexity increases from spoken to written



The influence of register on noun phrase complexity in varieties of English 

 259

mode: social letters have a higher mean NP complexity4 than conversation (1.54 compared to 1.37), and academic writing has a higher mean NP complexity than unscripted speeches (2.18 compared to 1.66). In addition, NP complexity increases from interactional to informational communicative purpose: unscripted speeches have a higher mean NP complexity than conversation (1.66 compared to 1.37), while academic writing has a higher mean NP complexity than social letters (2.18 compared to 1.54). Table 4: NP complexity across varieties CAN

HK

IND

JA

SIN

Total

Class 1 Class 2 Class 3 Class 4

1,034 193 210 163

989 251 219 141

1,001 216 227 156

997 173 267 163

1,063 208 211 118

5,084 1,041 1,134 741

Total

1,600

1,600

1,600

1,600

1,600

8,000

Table 4 shows the distribution of complexity classes across the varieties for all registers combined. Class 1 is the most frequent and class 4 is the least frequent in all varieties (with a relatively low value in Singapore English). Looking at classes 2 and 3, the frequencies are differently balanced across varieties: while most varieties have a higher frequency of class 3, Hong Kong English shows a tendency towards class 2. Furthermore, the frequencies of classes 2 and 3 are relatively balanced in some varieties (Indian English, Canadian English, Singapore English), while in others there is greater divergence (Jamaican English, Hong Kong English). Both Table 3 and Table 4 provide a general overview of NP complexity distributions across register and variety. They allow the formulation of first tentative conclusions, such as variety-specific tendencies towards particular classes (e.g. pre- or postmodified NPs). As a second step, it is necessary to look at the distribution of NP classes across both varieties and registers simultaneously.

4 Mean NP complexity is defined here as a numeric value ranging from 1.0 to 4.0. It is the sum of complexity values of n noun phrases divided by n. The higher the mean value, the more frequently we find ‘complex’ noun phrases, i.e. classes 2–4.

260 

 Steffen Schaub

Table 5: NP complexity in conversation across all varieties

Class 1 Class 2 Class 3 Class 4

CAN

HK

IND

JA

SIN

314 36 33 17

298 59 31 12

303 43 34 20

312 27 47 14

332 44 14 10

Table 6: NP complexity in unscripted speeches across all varieties

Class 1 Class 2 Class 3 Class 4

CAN

HK

IND

JA

SIN

286 33 56 25

234 78 51 37

263 41 67 29

233 45 82 40

275 52 44 29

Table 7: NP complexity in social letters across all varieties

Class 1 Class 2 Class 3 Class 4

CAN

HK

IND

JA

SIN

275 55 31 39

293 47 41 19

258 54 48 40

301 36 37 26

275 57 52 16

Table 8: NP complexity in academic writing across all varieties

Class 1 Class 2 Class 3 Class 4

CAN

HK

IND

JA

SIN

159 69 90 82

164 67 96 73

177 78 78 67

151 65 101 83

181 55 101 63

Tables 5 to 8 show the distribution of the complexity classes for each individual register across all varieties. In the following sections, the registers are discussed separately.



The influence of register on noun phrase complexity in varieties of English 

 261

3.1 Academic Writing Academic writing yields the highest frequency of complex NPs across all classes (2–4). This is expected, as academic writing is characterised by dense information packaging (due to its informational communicative purpose) and carefully planned and revised production, both of which facilitate the use of complex NPs. In academic writing, NPs contain elaborate pre- and postmodification, and they typically contain the majority of lexical content of a sentence. Examples (2) to (6) illustrate typical uses of noun phrases in academic writing (NPs are emphasised in bold). (2) The left side of Ayearst’s diptych reproduces in painstaking detail, and with close attention to seventeenth-century techniques of glazing, Rembrandt’s fragmentary Anatomy Lesson of Dr Joan Deijman of 1656, now in the Rijksmuseum, Amsterdam. (ICE-CAN:W2A-001#10:1) (3) The integration of these two perspectives can form a more comprehensive picture of the person of Jesus Christ. (ICE-HK:W2A-005#14:1) (4) The whole misunderstanding about Hume’s philosophical position is the outcome of his treatment of causation that is often misunderstood. (ICE-IND:W2A-001#58:1) (5) The casual centrality of the ‘supernatural’ in Brodber’s fiction is also an excellent example of the writer’s adaptation of marginalised thematic concepts from the oral tradition which she legitimises in the very process of ‘writing them up’. (ICEJA:W2A-005#X14:1) (6) Though Wittgenstein was mainly concerned with the problem of philosophical explanation, his writings on the relation between language and thought and language and meaning have tremendous implications for both the theory and practice of linguistic science. (ICE-SIN:W2A-005#48:1)

Analogously, academic writing has the lowest frequency of class-1 (or ‘simple’) NPs in our sample (832 tokens or 41.6 %). The relatively low frequency of unmodified noun phrases can likewise be accounted for by the informational character of the register: unmodified noun phrases carry less information than modified ones. Personal pronouns are particularly uncommon: only 225 tokens (11 % of all NPs in academic writing) are realised by personal pronouns, the most frequent being it (61 tokens) and I (33 tokens). 1st and 2nd person pronouns are rare, which can be attributed to the fact that interaction in academic texts is uncommon. The 2nd person pronoun you is particularly rare, since no specific addressee is involved. With regard to regional variation, I find that academic writing is largely homogeneous across varieties. Few differences appear to exist with regard to pronouns, although two exceptions are worth a brief discussion here. The first person singular pronoun I occurs more frequently in some varieties (Hong Kong

262 

 Steffen Schaub

English: 15; Jamaican English: 10) than in others (Canadian English: 2; Indian English: 4; Singapore English: 2). However, it would be premature to attribute a more personal writing style to the Hong Kong and Jamaican English varieties based on such low absolute frequencies. Secondly, looking at the frequencies of you, it is noteworthy that the sample contains six occurrences in Singapore English, while the remaining varieties have zero occurrences. A closer look at the data reveals that all occurrences of you in Singapore English originate from one text unit, which is not an academic text in the traditional sense, but instead could best be described as a guide to real estate investment in Singapore. This text unit is characterised by a much more interactive style of writing; it frequently addresses the reader directly and makes use of imperatives, e.g. Take advantage of this law (ICE-SIN:W2A-001#48:1), or Invest your CPF savings in property (ICESIN:W2A-001#49:1). Whether such a text constitutes an instance of academic writing, much less in the humanities, is debatable. Nevertheless, the text could be clearly distinguished from other texts of the same register on the basis of one grammatical feature. There are slight indications of regional variation in the distribution of the complex NP classes, for instance the relative overuse of class 2 and underuse of class 3 in Indian English. Overall, however, there appears to be little variation in academic writing across varieties. This can be interpreted in two ways: one, there is no discernible difference between regional varieties for this register. An argument in favour of this interpretation would be that the homogeneity of the register, and by extension its conformity on an international level, is guaranteed by the publication process. A second interpretation is that the level of abstraction in categorising NP complexity, as it is used in this analysis, is too superficial to bring to light any discernible differences; in other words, although there may be no differences across regional varieties on the superficial level of abstraction assumed here, significant distributional differences might be observed when, for instance, specifying the types of modification involved. At this point, however, we have to conclude that we cannot find regional variation with regard to NP complexity in academic writing.

3.2 Conversation Conversation has the highest frequency of simple noun phrases of all registers in the study (78 %). This is in line with Biber et al., who find that ca. 85 % of all NPs in their conversation data have no modifier (Biber et al. 1999: 578). Of the class-1 NPs in conversation, more than half are personal pronouns (857 tokens, or 55 %). This also confirms Biber et al.’s finding that “pronouns are slightly more common



The influence of register on noun phrase complexity in varieties of English 

 263

than nouns in conversation” (Biber et al. 1999: 235). The relatively frequent reliance on pronouns is due to the “shared situation and personal involvement of the participants” (Biber et al. 1999: 235). Class-2 NPs are the most common type of modified noun phrase in conversation. They account for 10 % of the NPs. With regard to premodification, Biber et al. find that the vast majority of premodification sequences in noun phrases does not exceed two words (Biber et al. 1999: 597). This is confirmed in the present analysis: the average length (in orthographic words) of class-2 NPs in conversation is 3.2 (including head and any determiners). This means that premodification amounts to 1–2 words on average. The most common type of premodification is by adjective or noun, optionally including a determiner, as the examples below illustrate. (7) Uhm because David does say that hiking boots make an enormous difference not slide on anything (ICE-CAN:S1A-001#3:1:A) (8) Sometimes uhm the people uh sorry people of India they are they belong to different communities and they have their separate cultures (ICE-IND:S1A-005#62:1:B)

Longer class-2 NPs (>3 words) are uncommon and usually the result of correction or coordination, as can be seen in examples (9) and (10) below. Proper cases of multiple premodification, as in examples (11) and (12), are rare. This is because the real-time analysis of longer premodification sequences places a heavy cognitive burden on the listener, rendering spoken communication ineffective.5 (9) I know because I I can’t talk to an answering machine telephone answering machine three-words (ICE-HK:S1A-009#4:1:D) (10) nine hundred but on average about four hundred five hundred dollars both lah the reception and the sanctuary (ICE-SIN:S1A-001#33:1:A) (11) A very bright cheerful smiling face (ICE-IND:S1A-001#108:1:A) (12) We are entirely functional loving human beings (ICE-CAN:S1A-009#54:1:B)

Postmodified NPs (class 3) are relatively uncommon in conversation (8 %). Postmodification tends to be slightly longer than premodification. The mean word length of the former is 7.1 (as compared to 3.2). This value is relativised to some extent when looking at the median, which is 5. Subtracting head and optional determiner, this means that the length of postmodification averages between 3–4

5 See Quirk et al. (1985: 1039): “Considerable left-branching is possible in the noun phrase, […] although comprehension becomes more difficult as the complexity of left-branching increases”.

264 

 Steffen Schaub

words. The slightly higher mean value (7.1) is caused by rare instances of complex postmodification, as in examples (13) and (14). (13) Uhh I remember my friend Mendela that beautiful millionaire meatpacker from Saskatoon who was so nice to me when I was a young man […] (ICE-CAN:S1A009#85:1:A) (14) Naturally if Mitterand President Mitterand [sic] can run his government for a period of ten years uh why India cannot have a government consisting of some party national party national party representing the national capital or some progressive elements in some some political parties like Congress-I Congress-S or even Janata Dal with some radical members belonging to communist party or socialist party (ICE-IND:S1A-005#19:1:A)

Finally, class-4 NPs are extremely rare in conversation, accounting for only 4 % of all noun phrases in the data. The most frequent type is a combination of a oneword (nominal or adjectival) premodification plus postmodification by a short prepositional phrase (usually with of), as the following examples illustrate: (15) But what is after the road No the other side of the road (ICE-SIN:S1A-001#88:1:B) (16) I said I behave as if this might be the last day of my life […] (ICE-CAN:S1A-009#88:1:A) (17) […] and you would have seen a different spin to the thing (ICE-JA:S1A-009#X67:1:A)

Orthographically longer class-4 NPs are often the result of multiple coordination or performance phenomena, including repetitions, repairs and hesitations. Example (18) is a coordinated list of postmodified NPs, which contains several repairs and repetitions as well as a hesitation marker (uh). (18) Political exchange tourist exchange tourist exchange or scholars exchange of scholars or exchange of technocrats (ICE-IND:S1A-005#37:1:A)6

Comparing the frequencies of the conversation data across varieties, we observe distributional differences, which are mainly the result of individual varieties overor underusing certain complexity classes. We can pinpoint a) a relative overuse of class-2 NPs in Hong Kong English, b) an underuse of class 2 in Jamaican English, c) an overuse of class 3 in Jamaican English, and d) an underuse of class 3 in Singapore English. Looking at the data, however, it is difficult to identify a pattern which explains the over- or underuse (see discussion in Section 4).

6 The example in (18) is assigned the complexity value 4, as it is a coordinated (multi-head) construction (see Section 2.1). A ‘cleaned-up’ version of the noun phrase could be political exchange, tourist exchange or exchange of scholars or exchange of technocrats.



The influence of register on noun phrase complexity in varieties of English 

 265

3.3 Unscripted speeches Unscripted speeches are characterised by their spoken mode, a spontaneous, conversation-like production situation and the informational and/or persuasive communicative purpose. With regard to NP complexity, unscripted speeches rank between conversation and academic writing. While NP complexity is expected to be high due to the register’s informational communicative purpose, it is expected to be low because it is unscripted and spoken. The result is an intermediate level of NP complexity with slightly higher frequencies in the three complex noun phrase types, as compared to conversation. Unscripted speeches have the third-highest frequency of class-1 NPs in the sample (1291 tokens or 65 %). Personal pronouns constitute about half of the class-1 NPs (683 tokens or 53 %). The most frequent personal pronouns are I (162), you (126) and it (109). The reliance on personal pronouns can be related to the setting, since speeches usually take place in public in front of an audience and speakers use personal pronouns to create an impression of interaction between themselves and the audience. Furthermore, speeches frequently have the purpose of persuading the audience, which is facilitated by direct references, such as I and you. Examples (19) and (20) illustrate the kind of direct addressing typically found in speeches. (19) Okay don’t think that they’re going to give you time okay after your job interview Don’t think they’re going to take care of you in a very big way okay (ICE-CAN:S2A-021#29– 30:1:A) (20) You have to vote more opposition strong opposition not only to establish opposition in parliament Make opposition part of our political culture not only that but also an effective an effective hammer over the head of PAP If you don’t do that what will happen You can bet your last dollar after this election prices will sure to go up (ICE-SIN:S2A021#34–37:1:A)

With regard to complex noun phrases, unscripted speeches have the second-highest overall frequency in the sample (35 %). This is due to the informational communicative purpose of speeches, which necessitates the use of modified noun phrases to convey information. The overall level of NP complexity is higher in speeches than in social letters, despite the latter being written. In direct comparison, unscripted speeches and social letters make equally frequent use of premodification, while in classes 3 and 4, unscripted speeches surpass social letters. Like in conversation, the tendency for a stronger reliance on postmodification instead of premodification in unscripted speeches can be explained on the basis of easier comprehensibility of right-branching (see Quirk et al. 1985: 1039).

266 

 Steffen Schaub

Comparing the results across varieties, the following observations are noteworthy: assuming an even distribution, the frequency of premodified NPs (class 2) is relatively low in Canadian English (33 tokens) and high in Hong Kong English (78 tokens). Furthermore, postmodified NPs (class 3) are relatively frequent in Jamaican English (82 tokens), but infrequent in Singapore English (44).

3.4 Social letters Class-1 NPs are by far the most frequent noun phrase class in social letters, constituting between 65 % and 75 % of all NPs in each 400-word variety sample. Personal pronouns form the majority of class-1 NPs (ranging from 52 % to 61 % across varieties). This can be attributed to the interactional character of social letters, which mainly rests on the frequent use of I and you. The frequencies of class-2 and class-3 NPs are relatively balanced, with a slight preference for class 2. Class 4 is the least frequent noun phrase type in this register across all varieties, with the exception of Canadian English. Constructions in this category show a range of variation. A typical kind of class-4 construction are multi-head NPs coordinated with and or or. Class-4 NPs which are not coordinated are often nouns premodified by one adjective or noun and postmodified by a prepositional phrase, as in the examples (21) to (23). Complex noun phrases in social letters are very similar to those found in conversation and form a contrast to the lexically heavy class-4 NPs found in academic writing. (21) I hope that I will be able to come to Kolhapur in the first week of Jan. (ICE-IND:W1B-002#47:1) (22) My point is that if one can love the other person without calculate what one can get back from the relationship, this will be the greatest love of all. (ICE-HK:W1B-001#144:5) (23) The team is still waiting for a final reply from the administration of this university but I’m not optimistic. (ICE-SIN:W1B-001#148:2)

More complex examples are rare in social letters. Long, heavily modified noun phrases clearly originate from letters with an academic background, as example (24) illustrates. (24) I would need a formal invitation from you for collaboration with specific reference to the project & [sic] that it would not involve financial liabilities for the University. (ICE-IND:W1B-005#7:1)

In general, the register category of ‘social letters’ in ICE contains heterogeneous content, with some letters discussing everyday activities (e.g. basketball



The influence of register on noun phrase complexity in varieties of English 

 267

practice, reports from an exchange year) and others clearly coming from an academic context (correspondence between students and professors). NP complexity is higher in the latter. It remains debatable whether one text category should include both subtypes. Comparing NP complexity across varieties, there is relative underuse of premodified NPs in Jamaican English, overuse of postmodified NPs in Singapore English, and overuse of pre- and postmodified NPs in Canadian English and Indian English.

4 NP complexity across varieties In this section, I review the potential of NP complexity as a marker of variation across regional varieties of English. As discussed in the introduction, the field is currently in the process of shifting from studies of regional nation-state varieties as holistic entities, and towards acknowledging register variation. Any comparative study of varieties of English, it is argued, must take register into account. The inclusion of register leads to a more discriminating picture of structural preferences in regional varieties of English. Such preferences may occur in a) one or more varieties and one specific register, b) one or more varieties and several registers (with shared situational characteristics), and c) one or more varieties as a whole (i.e. in all registers). The preceding sections already isolated the first, namely variety-plus-register-specific preferences, such as the relative overuse of premodified NPs in Hong Kong conversational data. Regarding the second ― registers that share one situational characteristic ― we can identify a number of variety-specific tendencies. Again assuming even distribution within NP classes across varieties, the following tendencies can be observed: – relative overuse of premodified (only) NPs in spoken Hong Kong English – relative overuse of postmodified (only) NPs in spoken Jamaican English – relative underuse of postmodified (only) NPs in spoken Singapore English – relative underuse of premodified (only) NPs in interactional Jamaican English. These preferences can be matched with descriptions of varieties of English: for instance, the relative overuse of premodification in spoken Hong Kong English is in line with the description in Setter, Wong and Chan (2010: 61). Although this

268 

 Steffen Schaub

approach enables us to isolate structural preferences of NP complexity for particular varieties and register situations, these tendencies do not take into account other factors influencing NP complexity, such as syntactic function, and have to be interpreted with caution. The explanation most commonly offered for the emergence of structural innovations in varieties of English (in particular, postcolonial or New Englishes) is language contact (also called ‘transfer’ or ‘cross-linguistic influence’). Gut (2011: 105) points out that “as yet there exists no reliable method of quantifying the relative contribution of cross-linguistic influence on any structure produced by language learners”. This is especially true for NP modification patterns, which are strongly influenced by other factors, such as register and syntactic function. In addition to that, NP modification patterns can only be identified in the form of (statistical) preferences and are thus not directly identifiable as the result of contact-induced change (unlike, for instance, loanwords). The approach in the present study is suitable for detecting candidates for such structural tendencies. However, more factors need to be included and weighed against each other in order to confirm these preferences (see Schilk and Schaub forthc.).

5 Conclusion and outlook This study systematically compared NP complexity in a selection of registers and across a range of regional varieties of English. The results, based on data from five varieties of English, corroborate the strong connection between NP complexity and register. Across all regional varieties, NP complexity correlates with two situational register characteristics: – communicative purpose: NP complexity increases from interactional to informational registers, and – mode: NP complexity increases from real-time and spoken to planned and written registers. Overall, NP complexity is largely homogeneous within registers across the regional varieties. Consistency is higher in registers which have stricter codification (e.g. academic writing). Nevertheless, assuming even distribution across all varieties, it is possible to isolate individual varieties which show relative over- or underuse of particular NP structures. Furthermore, it is possible to match such preferences for pairs of registers that share situational characteristics. NP complexity has already been established as a register marker, and, it is argued here, is a viable marker of regional variation on the register level.



The influence of register on noun phrase complexity in varieties of English 

 269

There are numerous ways in which subsequent research can improve on the study presented here. First, the database of noun phrases has to be extended to provide a more solid empirical foundation. Second, by adding further annotation to the data, such as syntactic function, type of head noun, and type of modification, more fine-grained statements about differences in NP complexity across varieties are possible. This study has also shown that random selection of text units from the International Corpus of English for the purposes of a register analysis is not desirable. The texts included in some of the register categories in ICE are too heterogeneous. Instead, text units have to be carefully selected in order to ensure compatibility across varieties. Finally, any variety-specific structural preferences have to be matched against the typological inventory found in the substrate languages. Only then is it possible to draw any connections to the possible origin of such preferences, and to substantiate claims about structural transfer.

6 References Aarts, Flor G. A. M. 1971. On the distribution of noun-phrase types in English clause-structure. Lingua 26. 281–293. Ahulu, Samuel. 1998. Grammatical variation in international English. English Today: The International Review of the English Language 14(4). 19–25. Asante, Mabel Yeboah. 1995. Ghanaian English: Motivation for divergence from the standard in certain grammatical categories. Tübingen: Eberhard Karls University Tübingen dissertation. Asante, Mabel Yeboah. 2012. Variation in subject-verb concord in Ghanaian English. World Englishes 31(2). 208–225. Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam: Benjamins. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP. Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. 9th impr. (2011). Harlow: Longman. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP. Blair, David & Peter Collins. 2001. English in Australia. Amsterdam: John Benjamins. Brunner, Thomas. 2014. Structural nativization, typology and complexity: Noun phrase structures in British, Kenyan and Singaporean English. English Language and Linguistics 18. 23–48. Fludernik, Monika & Bernd Kortmann (eds.). 2012. Proceedings: Anglistentag 2011 Freiburg. Trier: Wissenschaftlicher Verlag Trier. Gut, Ulrike. 2011. Studying structural innovations in new English varieties. In Joybrato Mukherjee & Marianne Hundt (eds.), Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44), 101–124. Amsterdam: John Benjamins.

270 

 Steffen Schaub

Haan, Pieter de. 1993. Noun phrase structure as an indication of text variety. In Andreas H. Jucker (ed.), The noun phrase in English: Its structure and variability, 85–106. Heidelberg: Winter. Hall, Christopher J., Daniel Schmidtke & Jamie Vickers. 2013. Countability in World Englishes. World Englishes 32(1). 1–22. Halliday, Michael A. K. 1989. Spoken and written language. 2nd edn. Oxford: OUP. Jucker, Andreas H. 1992. Social stylistics: Syntactic variation in British newspapers (Topics in English Linguistics 6). Berlin: Mouton de Gruyter. Jucker, Andreas H. (ed.). 1993. The noun phrase in English: Its structure and variability. Heidelberg: Winter. Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. The electronic world atlas of varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas. org (accessed 28 February 2015). Lamidi, Mufutau T. 2007. The noun phrase structure in Nigerian English. Studia Anglica Posnaniensia: An International Review of English Studies 43. 237–250. Mukherjee, Joybrato & Marianne Hundt (eds.). 2011. Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44). Amsterdam: John Benjamins. Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik & Bernd Kortmann (eds.), Proceedings: Anglistentag 2011 Freiburg, 75–94. Trier: Wissenschaftlicher Verlag Trier. Platt, John, Heidi Weber & Mian Lian Ho. 1984. The new Englishes. London: Routledge and Kegan Paul. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. 4th edn. London: Longman. Sand, Andrea. 2004. Shared morpho-syntactic features in contact varieties of English: Article use. World Englishes 23(2). 281–298. Sand, Andrea. forthc. Angloversals? Shared morpho-syntactic features in contact varieties of English. Amsterdam: Benjamins. Schäpers, Uta Katharina Elisabeth. 2009. Nominal versus clausal complexity in spoken and written English: Theory and description (English Corpus Linguistics 8). Frankfurt: Peter Lang. Schilk, Marco & Steffen Schaub. forthc. Noun phrase complexity across varieties of English: Focus on syntactic function and text type. English World-Wide 37(1). Setter, Jane, Cathy Wong & Brian Chan. 2010. Hong Kong English. Edinburgh: Edinburgh UP. Wahid, Ridwan. 2013. Definite article usage across varieties of English. World Englishes 32(1). 23–41. Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World Englishes 28(4). 421–450.

Valentin Werner

Real-time online text commentaries: A cross-cultural perspective Abstract: In the area of electronically-mediated communication, real-time online text commentaries (OTCs) as a new specialised register have become popular as an alternative to traditional broadcasting. OTCs have been recognised as “mediated quasi-interaction” (Chovanec 2010) and a hybrid genre showing characteristics of spoken discourse within a written mode (Jucker 2006), as well as a characteristic combination of simultaneous information and entertainment (“infotainment”), where familiarity or “pseudo-intimacy” (O’Keeffe 2006; cf. Chovanec 2008) between commentator and the audience is created. This contribution helps to situate this emerging register from a cross-cultural perspective. I use OTCs by English and German media outlets from the EURO 2012 football championship to tackle the following issues with the help of a corpus-linguistic approach: (i) What are register-specific structural features of OTCs? (ii) Are there any culture-specific aspects along language boundaries or the dimension “intended readership”? I also consider the interaction of layout and content, production circumstances, and the influence of recent developments (such as the incorporation of Twitter messages) on reporting styles.

0 Introduction Real-time online text commentaries (henceforth OTCs)1 have become more and more popular2 and represent an alternative to traditional live TV and radio broad-

1 Alternative labels are live text commentary (LTC), live blogging, live ticker, news ticker and the more sport-specific minute-by-minute report (MBM) or live match tracker. 2 According to a recent survey, OTCs have become “the default format for covering major breaking news stories, sports events, and scheduled entertainment news”, even surpassing online articles and picture galleries in popularity (Thurman and Walters 2013: 82; cf. also Wells 2011). The growing importance of the format is revealed both by the sheer number of OTCs (almost 150 per month for The Guardian) and also in terms of page view counts, which are at least twice as high for OTCs compared to articles and galleries. User reports seem to confirm that OTCs are the

Valentin Werner, University of Bamberg

272 

 Valentin Werner

casting, reporting and commenting on live events controlled for duration, location and topic (cf. Siever 2011: 171), in particular major sports events. As the name implies, they are usually categorised as a written form of web communication (cf. Biber and Egbert, this volume) and are similar to (we)blogs in that they consist of individual consecutive postings (cf. Grieve et al. 2010: 303). While previous research has recognised the narrative properties and analysed the vocabulary and morphosyntax of football reportage in general (Brandt and Quentin 1983; Ghadessy 1988; Hennig 2000; Krone 2005; Müller 2007; Levin 2008), others have noted that OTCs as “mediated quasi-interaction” (Fairclough 1995: 40) constitute a hybrid register: They show characteristics of spoken discourse within a written mode (Jucker 2006; cf. Lakeberg, this volume) and are an interesting combination of simultaneous information and entertainment (“infotainment”). Thus, familiarity or “pseudo-intimacy” (O’Keeffe 2006: 92; cf. Chovanec 2008, 2010; Jucker 2010) between the commentator and the audience is created. Two further issues are important for establishing OTCs as a register, defined (following Biber 1988) as language variety by situational (i.e. non-linguistic) characteristics (see also Schubert, this volume). First, “situational context tends to exert functional pressures on linguistic output” (Grieve et al. 2010: 315), which implies there should be common linguistic features traceable across different OTCs, particularly if they report on the same matches. Second, there is the contrastive view. It was hypothesised, albeit for other types of football reportage, that “[t]ypological differences between […] two languages are expected to be neutralised to a certain degree” (Krone 2005: 51) when texts from two languages fulfil the same function (in our case, football match reportage). Others (e.g. Müller 2007: 44), however, have emphasised that cultural differences may lead to noticeable stylistic differences. Starting from these observations, this paper will address the following aspects with the help of a corpus-linguistic approach: (i) What are register-specific structural features of OTCs on different levels of linguistic analysis?

online format par excellence to track stretches of live events, as more than 35 % of respondents follow OTCs continuously. Nearly two fifths of all OTCs are sports-related (Thurman and Walters 2013: 82–95). Data for Der Spiegel are in line with these general findings as OTC football reportage receives more than 1 million clicks per match (See , accessed 20 April 2013).



Real-time online text commentaries: A cross-cultural perspective 

 273

(ii) Are there any culture-specific aspects along language boundaries or the dimension “intended readership”; or do OTCs rather form a relatively uniform cross-linguistic/cross-cultural register? Further aspects addressed are the interaction of the layout of OTCs with their content as well as the influence of very recent developments (such as the incorporation of Twitter messages) on the style of reporting. After a few notes on data and methodology, the present study first sets out to locate OTCs as a register in general terms in Section 2. Section 3 provides an analysis of the language of OTCs, focusing on vocabulary and collocations and related semantic aspects, discourse features and potential implications of the interaction of format and textual commentary. A discussion of OTCs as a cross-cultural register follows in Section 4, while Section 5 sums up the results and presents some generalisations as well as avenues of further research.

1 Data and methodology While previous research on OTCs has been dedicated almost exclusively to football reportage from The Guardian (Chovanec 2008, 2009, 2010, 2011; Perez-Sabater et al. 2008; but cf. Jucker 2006, 2010), the present analysis is based on OTCs of two English (The Sun, henceforth SUN; The Guardian, henceforth GUAR) and two German (Bild, henceforth BILD; Der Spiegel (online), henceforth SPON) media outlets, all stemming from the coverage of the UEFA 2012 EURO Championship Finals. This facilitates the comparison of the reportage along language boundaries as well as along the dimension of intended readership. The print versions of both SUN and BILD can be categorised as tabloids, predominantly aimed at a working-class readership (see also Höke 2007). Another common feature is their circulation, with approximately 2.4 million (SUN) and 2.5 million (BILD) copies sold on a daily basis, making them the most popular papers in England and Germany respectively. In contrast, GUAR and SPON can be viewed as quality-press products, primarily catering to a middle-class readership. Their circulation, with 0.2 million daily (GUAR) and 1.0 million weekly (SPON), is less extensive. The online versions of all four sources are amongst the news sites visited most often nationwide (Press Gazette 2013; see also ). For the present analysis it is presumed that the intended readership of the online version roughly corresponds to the intended readership of the printed version (cf. Newsworks 2013a, 2013b).

274 

 Valentin Werner

To have a comparable dataset, the corpus includes a total of 36 match reports for the English and the German squads (amounting to a token count of 120,414 words; see the appendix for a detailed list). In the first instance, the main focus of the structural analysis lies on the English OTCs, which are organised in a linear way, usually in reverse order and time-stamped (see further Section 2). For the extraction of the data, the running text chunks with the corresponding time stamps were manually copied and saved as plain text files in order to exclude unwanted meta-data and to make them machine-readable. Subsequently, these text files were loaded into Wmatrix (Rayson 2008; see ). This online annotation tool provides automatic part-of-speech tagging with the CLAWS 7 tagset () as well as semantic annotation with USAS (). In addition, it offers various concordancing, wordlist and keyword functions. For the analysis of n-grams, for keyword analyses and for concordance searches, AntConc 3.3.5w () was used both for the English and the German data. In addition, the webpages containing the OTCs were saved in order to access paratextual features such as Twitter feeds, tables, graphs or integrated videos and to assess their potential influence on the main text. The rationale behind including this data is the growing trend in linguistics that “[e]ver more phenomena that would previously have been termed paralinguistic, in the sense of accompanying but only weakly influencing linguistic form and expression, are now being moved into the center of concern” (Bateman 2012: 3990). Therefore, the present corpus can be seen as multimodal.

2 OTCs as a register 2.1 Electronically-mediated communication and sports reportage Broadly speaking, in the scarce amount of work available to date, the style of football reportage has been described as resembling conversation (cf. Ferguson 1983: 156–157), but some have highlighted its monologic quality, emphasising its narrative properties where the commentator acts as mediator and filter (Brandt and Quentin 1983: 21; Hennig 2000: 44). A comparison of OTCs and traditional types of live reportage in terms of a summary overview of results from previous analyses (Perez-Sabater et al. 2008; Chovanec 2008, 2010; Jucker 2010; Thurman and Walters 2013) yields the picture displayed in Table 1.



 275

Real-time online text commentaries: A cross-cultural perspective 

Table 1: Comparison of traditional registers of sports/football live reportage with OTCs

STRUCTURAL FEATURES

LINGUISTIC FEATURES

Radio

TV

OTC

Event-related versus non-event-related sections







Unscripted







Channels (visual/aural/textual)

✗/✓/✗

✓/✓/(✓)

(✓)/✗/✓

Temporal limitation







Narrative style







Monologic structure (one-to-many)





(✓)

Orality/informality/casual tone





(✓)

Jargon/slang/idioms







Formulaic language







Ellipsis







Table 1 shows the shared characteristics of both OTCs and traditional live reportage as events in mass communication, while humour is another broad communication strategy characteristically used in all types. Owing to the features listed above, sports reportage generally has been described as some kind of “entertainment” genre, even though its primary function arguably is to report factual content (Brandt and Quentin 1983: 20; Chovanec 2011: 253–254). However, a number of differences on account of the channel of distribution (web page mainly with textual content + interactive elements) and to the particular properties of electronic communication (e.g. the staging of familiarity,3 see further Section 3.2 and Jucker 2010: 66) emerge. Above all, a point worth noting is the way in which the recipients consume media forms such as OTCs. They are produced fairly quickly and without many corrections as the commentator is under time pressure due to the co-extensive nature of the event described and its description (Jucker 2010: 64).4 Likewise, the consumption is quick and cursory, as

3 According to Dürscheid (1999: 23), the staging of familiarity (and the resulting “pseudo-intimacy” between participants; O’Keeffe 2006; see further Section 3.2) in written electronic communication is characterised by an apparent closeness of those involved in such a communicative situation. This is due to the immediacy of the exchanges via the electronic medium, which is supported by the use and acceptance of features typically occurring in the spoken mode. 4 Indeed, typos, interpretable as a typical feature of online production under time constraints, repeatedly occur in all of the OTCs analysed (see e.g. examples (41), (56) and (59) below).

276 

 Valentin Werner

is the case with many other electronic offerings (Dürscheid 1999: 21). These findings suggest that there are areas of both overlap and divergence between OTCs and traditional forms of sports reportage. In addition to the aspects mentioned in the foregoing, it will be shown in the following how OTCs can be further related to the domains of sports and news reportage, but why they should be categorised as a separate, fairly institutionalised, register serving a discourse community (O’Keeffe 2006: 19, 29).

2.2 Layout and production The fundamental difference between commentators in traditional and in electronic media (including OTCs) is the loss of their ‘gatekeeping’ function. With the advent of internet communication, reporters are supposed to transfer, modularise and visualise information without any prioritising (Jucker 2005: 17). OTCs seem to be a nearly perfect format to achieve this, while another of the defining properties is their immediacy and speed and a particular ‘live’ atmosphere, highly valued by the online audience (Simons 2011: 180; Thurman and Walters 2013: 95). That OTCs in practice actually represent a new form of journalism can also be deduced from the fact that the task of creating the input is more often than not assigned to a freelance journalist or intern rather than to a regular editorial staff member. Economic considerations also play a role here, of course. OTCs as a rule are composed in an editorial office in front of a TV screen and only rarely in the football stadium (Holger Müller, personal communication). In the majority of cases a single commentator is responsible for the coverage, who acts as the voice in the OTC. That means he introduces himself and refers to himself in the first person. At times, however, a person mirroring and choosing readers’ mailings for inclusion in the commentary may support the commentator. This person may also be responsible for taking care of any technical issues occurring during the reportage (Thurman and Walters 2013: 91–92; see below for other interactive elements).5

5 The corpus even contains a few meta-comments on technical issues during production, after the conventional layout and the technical platform apparently had been changed: Yes, yes this looks a bit different to our usual minute-by-minute reports, but rather than moan about change, why not embrace it? Or moan about it privately. I’m just a drone who’s following orders and doing what he’s told. And besides, I quite like it, because I can put in big red quotation marks… (ukr_ eng_1906_guar); I do love this new headline facility… (ukr_eng_1906_guar)



Real-time online text commentaries: A cross-cultural perspective 

 277

Figure 1: Commentary and overview section of the SUN OTC (from swe_eng_1506_sun; , accessed 12/07/2012, 10:21)

Jucker defines OTCs as a “complex combination of visual and textual features […] giv[ing] the recipient not only a narrative account of the events so far, but also an overview of the situation at present” (2010: 59). Typically, the textual information is shown in reverse chronological order, with the most recently added post

278 

 Valentin Werner

appearing at the top of the page (see Figure 1 for an example).6 This post-by-post (or minute-by-minute) reporting style is supposedly a fairly recent development illustrating the influence of structure on activity (O’Keeffe 2006: 31). This means that the special properties of OTCs as a form of electronically mediated communication have an impact on the style of reporting. In fact, OTCs surprisingly resemble a certain type of after-match report which appeared in printed publications as early as the 1950s (see Figure 2).

Figure 2: Excerpt from Kicker FUSSBALL-ILLUSTRIERTE (1954) adapted from Burkhardt (2010: 11)

What is new, however, are the opportunities offered by the technology to use a similar reporting style for live reportage, and the additional options the electronic

6 The content management system of a media outlet may allow reversing the anti-chronological order once the event has finished, so that the report appears as a kind of article readable from top to bottom. For instance, this is the case with GUAR (Thurman and Walters 2013: 92) but does not apply for the other OTCs explored in this study. Occasionally, earlier postings are corrected or altered in order to make them more readable after the description of the actual event (e.g. during half-time breaks or before the order is reversed (Simons 2011: 181). Thus, OTCs are a register that is both dynamic and static (Chovanec 2010: 239).



Real-time online text commentaries: A cross-cultural perspective 

 279

medium offers. Sometimes the readers have the choice to filter the textual data to quickly update on the most important events in the match (i.e. goals, fouls and substitutions). Other elements that could be added (usually outside the frame or area where the main commentary appears) are links and embedded audiovisual content (Thurman and Walters 2013: 83). In football reportage in particular, the majority of OTCs offers sections, tabs or links on the score (also of simultaneous matches) and scorers, current and starting team line-up and on general statistics (shots on goal, cards, ball possession, etc.). One of the most intricate OTCs is offered by SPON, where readers can also retrieve the real-time statistics for each individual player. This OTC further includes “heatmaps” (see Figure 3) showing the positions/operating range of the individual player or of the full team on the pitch.

Figure 3: Heatmap of the English team (left) and Italian central midfielder Andrea Pirlo (right) in SPON (from ita_eng_2806_spon; , accessed 02/07/2012, 10:28)

The presence of all of these elements appears to suggest a secondary importance of the textual data of the commentary (cf. Jucker 2005: 17). Actually, the paratextual elements are also mainly textual (that is, they encode information orthographically) and present factual information. This might determine the style and content of the commentary, as factual information is constrained to the paratextual elements (Perez-Sabater et al. 2008: 251; cf. Bateman 2012: 3985). Occasionally (this mainly applies to GUAR), these additional elements are used for mere entertainment purposes without any direct relation to the event described (Thurman and Walters 2013: 85). In any case, it is necessary to consider the combination and interplay between these two categories in a linguistic analysis (see Section 3.3 below). Generally speaking, we can describe OTCs as examples of mash-ups of different journalistic styles (reporting, commenting, glossing; cf. Simons 2011: 179). Turning to the common layout of the commentary, we can establish the following

280 

 Valentin Werner

simplified scheme (in chronological order), abstracted from the four OTC types investigated: Table 2: OTC phases and their typical content Phase

Typical content

“Appetiser” (published a few days or hours in advance)

Statements on the relevance of the match

Preamble/preview

Self-introduction of the commentator, welcoming the readers, match-related interview passages

Background information

Team line-ups, tactics, referees, results in previous ­ ncounters, description of atmosphere, jersey colours, e national anthems

Commentary

Play-by-play description and comment, half-time summary and preview (readers’ comments)

Summary and overall match comment

Consequences for teams, naming goal scorers and order of scoring

Outlook

Next fixture of the team(s)

Goodbye

This highly structured layout in large parts corresponds to the progression in traditional football reportage, but OTCs usually finish shortly after the actual match coverage and lack post-match comments and interviews commonly found in radio and especially on TV (cf. Ferguson 1983: 154). Note the differences between the individual OTCs: while the posts of some media (e.g. from SUN) are always organised in the same fixed way (preview – early team news – head to head – the ref – etc.) and are apparently prepared in advance (cf. Simons 2011: 180–181), the data from the other media outlets suggest that they take a more liberal approach and leave the exact arrangement of the posts (particularly in the phases before the actual commentary begins) up to the commentator.7 The length of the individual phases may vary. For example, the length of the pre-match coverage ranges between 176 (swe_eng_1506_spon) and 2,969 words (ita_eng_2406_guar), GUAR overall being most verbose in this respect (see Figure

7 Boundaries between the (idealised) phases are blurred at times, so that information typically found in one phase may also appear somewhere else. For example, information on jersey colours may appear within the first minutes of the actual match commentary, as illustrated by 1’ KICK OFF Germany, in their all-white kit, start the game kicking from right to left (ger_den_1706_sun).



Real-time online text commentaries: A cross-cultural perspective 

 281

word count

4) and particularly when matches of the English squad are reported (for further quantitative assessment of OTCs, see Section 3.3 below). 1800 1600 1400 1200 1000 800 600 400 200 0 AVG AVG ENG AVG GER

GUAR 1258.3 1708.5 898.2

SUN 696.9 771.25 637.4

BILD 606.4 534.25 664.2

SPON 558.9 298.5 767.2

Figure 4: Length (in words) of pre-match commentary (AVG = overall average; AVG ENG = average of England match reports; AVG GER = average of Germany match reports)

The phases before the match actually starts serve at least two important communicative functions. First, the ‘appetiser’ section is a device to incite interest in readers and to emphasise the relevance of the match. (1) and (2) can be seen as typical posts. (1) A titanic clash awaits. (ger_ita_2806_sun) (2) Deutschland gegen Niederlande, das ist der Klassiker, das Non-Plus-Ultra im Fußball, ach, was sag ich, der heilige Gral bei dieser EM. Ich begrüße Sie herzlich zu diesem Top-Event (ger_ned_1306_spon)8

A second function, also applicable to the background information phase, is to directly address and accommodate the readers into the spectacle and make them part of the match. In this regard OTCs are quite similar to traditional mass media, which aim at linking “the significant and the mundane” (Gerhardt 2006: 131), that is, the allegedly spectacular match and the allegedly ordinary everyday life of the readers. (3) and (4) nicely illustrate this point.

8 Translation: Germany versus the Netherlands, that’s the classic, the non-plus-ultra of football – what am I saying, the Holy Grail of this European Championship. A warm welcome to this top event.

282 

 Valentin Werner

(3) Good evening, everybody. Are ya nervous? Are ya? (ukr_eng_1906_guar) (4) Die Nationalhymnen. Gänsehaut für jeden Fußballfan. Was für eine Stimmung. (ukr_eng_1906_bild)9

The commentary can be viewed as the core part, with the main communicative function of conveying factual information, although further functions, such as entertainment (see below), should not be discounted. OTCs usually finish with a summary and overall match comment, potentially aimed at members of the audience who only look for a quick round-up of the match and who do not want to read the full coverage.

2.3 Audience participation Studies of internet communication have always recognised its multimedial nature in the sense that textual data rarely appears in isolation (Dürscheid 1999: 28–29), and the same naturally applies to OTCs. Another dimension of multimediality is the opportunity of interacting with commentators before and while the match coverage is in progress. The question is whether this has ramifications for the structure and content of OTCs. On the one hand, Chovanec (e.g. 2008) has convincingly shown that audience mail-ins constitute an essential element of OTC football reportage. In addition, he has found that readers’ comments and their citing by the commentator are rarely directly related to the gameplay and thus constitute a second layer of “gossip” with a social rather than an informative function. This considerably extends the scope of the OTC beyond the provision of factual information (as its primary purpose) and is testimony to the entertainment function OTCs can carry. As only a selection of readers’ mails are presented and addressed and, more often than not, reduced to clichés (Chovanec 2008: 260), he labels this type of discourse “quasi-conversational interactions” (Chovanec 2011: 252). Readers may participate in the creation of the content of the OTC, but only at the discretion of the commentators (or their aides; see above). Given that commentator and contributing readers usually do not know each other personally, casual conversation is only simulated to a certain extent. However, the general applicability of Chovanec’s findings is limited as his analyses are restricted to GUAR data only (see also Thurman and Walters 2013: 85).

9 Translation: The national anthems. Creeps for every football fan. What an atmosphere.



Real-time online text commentaries: A cross-cultural perspective 

 283

On the other hand, the advent and growing popularity of genuinely interactive internet applications (the so-called “web 2.0” technologies) could have led to a widespread integration of these into OTCs as another “webby” form of communication, creating dynamic content. The most popular application, potentially also most adapted to OTCs as another immediate form of journalism (cf. Chovanec 2010: 239), is the microblogging service Twitter (). Despite its presence on the market since 2006, only one of the OTCs considered in the present study, SPON, has reserved some space for Tweets (that is, Twitter posts). This area (called “Live-Fanblock”, ‘live fan section’) is placed prominently next to the main commentary box (see Figure 5).

Figure 5: Main commentary and Tweets in SPON (from ger_ita_2406_spon; , accessed 02/07/2012, 10:29)

Commentators actively encourage readers to participate, as in (5), but they do not cite readers’ Tweets in the main commentary. The one exception to this rule is presented below as (6). (5) Jetzt ist es amtlich  – Klose, Schürrle und Reus spielen von Beginn an. Twittert der DFB. Sollten Sie auch den Drang verspüren, ihren Kommentar via Twitter in den Live-Fanblock rechts nebenan zu Tickern, so benutzen Sie bitte den Hashtag #gergre (ger_gre_2206_spon)10

10 Translation: Now we know for sure – Klose, Schürrle and Reus are in the starting line-up. Twitters the DFB (= the German football association). Should you also feel the urge to post your comments to the Live-Fanblock to the right, please use the hashtag #gergre

284 

 Valentin Werner

(6) PS: Mein Tweet des Abends: Dehnen ist gut für die Bänder, Bender ist schlecht für die Dänen – @wintersjon! In diesem Sinne, gute Nacht! (ger_den_1706_spon)11

Therefore, rather than engaging in quasi-conversation in the sense defined above, Tweets in SPON should be viewed as truly parallel comment, where readers can express their (unfiltered) opinion and post links. Although OTCs in GUAR do not comprise a formalised way of incorporating Twitter comparable to the “Live-Fanblock” of SPON, commentators refer to Tweets in a similar fashion as they do with regard to mails (that is, with added comment), albeit rarely in the present data (see (7)). (7) Over on the Twitter @ianapplegate has this suggestion. Maybe they should at least give Esperanto a go? Can anyone even speak Esperanto? (ger_den_1706_guar)

It emerges from the analysis that, at present, no unequivocal answer can be given to the question as to whether interactive elements influence OTC commentary. However, it could be shown (i) that the extent of how much reader-generated content influences the style and content of OTCs varies considerably and (ii) that different OTCs have different approaches towards interactivity. While two (SUN, BILD) do not provide any opportunity for the readers to get involved, OTC reportage in GUAR provides extensive, though filtered, reader-generated content and related comments, and thus yields a quasi-conversational structure as defined above. The most direct approach arguably is taken by SPON, where Tweets are displayed unfiltered as a by-commentary right next to the commentator’s text. However, the latter does not usually refer to the former in any way, so audience participation could be viewed as constrained in another way.12

11 Translation: PS: My Tweet of the night: Stretching is good for the ligaments, Bender is bad for the Danes – @wintersjon! In this spirit, good night! Note: In the German version, the author of the Tweet exploits the homophony /bendɐ/ between Bänder (‘ligaments’) and Bender (player’s name) for a comic effect. 12 Even if the extent of filtering varies, both ways of incorporating interactive elements presumably take account of a point made in audience studies of other media types. To be precise, Gerhardt (2006: 129) maintains that the audience consists of “active social agents whose lives do not come to a halt when they are exposed to a mass medium”. Accordingly, it could be argued that OTCs with interactive elements take a socially more adequate approach towards their readers. This view is also supported by Simons (2011: 156), asserting that modern audiences have developed a feeling of being entitled to participation and interaction. Therefore, it is argued that state-of-the-art journalistic practice is liable to incorporate social media in order to render mass media production and use a shared experience. A related point of minor importance is that OTCs sometimes also serve as some kind of by-medium to TV broadcasts where a commentator adds



Real-time online text commentaries: A cross-cultural perspective 

 285

3 The language of OTCs 3.1 Vocabulary, collocations and semantics 3.1.1 General picture Like traditional types of sports reportage (cf. Ghadessy 1988: 19), OTCs can be expected to contain a substantial amount of technical vocabulary to describe the gameplay. An exploration of the most frequent content words reveals that items can be broadly categorised into what is shown in Table 3. Table 3: Categories of content words amongst the top 100 wordlist created with AntConc Examples for

GUAR + SUN

SPON + BILD

Names of teams (­geographical location)

England, Germany, Sweden, France, Portugal, Ukraine, Italy

England, Deutschland (‘Germany’), Italien (‘Italy’), Portugal, deutschen (‘German’)

Temporal location

min, time, (first/second) half, after

Minute (‘minute’), jetzt (‘now’), dann (‘then’), heute (‘today’), nach (‘after‘)

Sports-/game-related terms

ball, goal, shot, corner, side, kick, area, cross, chance, post, team, game

Ball (‘ball’), Tor (‘goal’), Ecke (‘corner’), (gelbe) Karte (‘(yellow) card’), Strafraum (‘penalty area’), Flanke (‘cross’), Spiel (‘game’), Wechsel (‘substitution’)

Names of players and coaches

Hart, Rooney

Hart, Gomez, Klose, Özil, Neuer, Löw

Overall, the comparison between the most frequent content words in English and German OTCs reveals some striking similarities (especially as regards the first three categories in Table 3), but with a slight change in national focus (as regards the players’ names). Note also that the expression of movement, location and direction figures prominently in terms of function words – mainly prepositions – amongst the highly frequent lexical items (e.g. right, up, left, down, back, over,

“colour commentary” to the “action” on the screen. This is especially salient in designated OTCs on particular shows, for instance such as the regular SPON OTC on “Tatort”, a popular German crime series.

286 

 Valentin Werner

against, to, in, for, from, on, at, by, into vs. in, auf, mit, von, zu, im, aus, an, bei, gegen, vor, nach, zum, ab, am, über, durch, zur, ins). These findings can be closely related to a semantic keyword analysis in Wmatrix, where the English OTC data are compared against the spoken and written BNC sampler. In this quantitative perspective, salient semantic areas emerge. These are ‘competition’, ‘numbers’ (usually related to spatial and temporal orientation), ‘warfare, defence and the army; weapons’, ‘violent/angry’, ‘chance, luck’, ‘long, tall and wide’, ‘success’, ‘failure’, ‘anatomy and physiology’, illustrated by examples (8) to (15) respectively.13 (8) As it stands, Portugal will go through with a better head-to-head record. (ger_ den_1706_sun) (9) And how England love that decision, because the second effort is sent right onto Lescott’s head, eight yards out, level with the left-hand post. (fra_eng_1106_guar) (10) That was Klose’s 64th goal for Germany four off Gerd Muller’s record and he almost made it 65 moments later, following up a loose ball and sweeping in a low shot that was kicked behind at the near post by the besieged Sifakis. (ger_gre_2206_guar) (11) Evra whips a cross into the England area from the left. (fra_eng_1106_guar) (12) It’s high-stakes major-championship Holland versus Germany. (ger_ned_1306_guar) (13) Germany also prevailed in the third-place play-off at World Cup 2006, winning 3-1 in Stuttgart. (ger_por_0906_sun) (14) Designated scapegoat for when it all goes wrong: Pedro Proenca (Portugal). (ita_ eng_2406_sun) (15) He curls a cross onto the head of Gomez, but the big striker’s header is weak and wafted miles to the left of the target. (ger_ita_2806_guar)

The analysis of highly frequent content items and the semantic keyword analysis suggest that OTCs do not fundamentally differ from other forms of football reportage, particular radio reportage, as “good playing, moments of risk, significant points of heightened competition” (Ferguson 1983: 156–157) receive most extensive coverage. This can be deduced for example from the high salience of ‘success’ and ‘failure’ semantic tags or the high frequencies of players’ names usually involved when chances in a game occur; that is, strikers/offensive players (Rooney, Özil, Klose, Gomez) and goalkeepers (Hart, Neuer). Levin (2008: 146) has pointed out that “traditions developed in sports commentary are often unintelligible to the uninitiated”, one reason being that commentators rely on formulaic language with specialised meanings. In order to test

13 Some of the findings of the corpus software may be due to the metaphorical processes involved (cf. also the usage of the terms shot, target and squad, captain, etc.). It is controversial whether “football is war” metaphors still apply or whether they have conventionalised (see also Section 3.1.2).



Real-time online text commentaries: A cross-cultural perspective 

 287

this claim, I compared the ten most frequent 4-grams in the material for both languages, as shown in Table 4. Table 4: The ten most frequent 4-grams extracted with AntConc GUAR+SUN

SPON+BILD

Rank

Freq.

4-gram

Freq.

4-gram

1

41

the edge of the

13

Meter vor dem Tor (‘meters before the goal’)

2

25

edge of the area

11

auf der anderen Seite (‘on the other side’)

3

25

on the edge of

11

aus der zweiten Reihe (‘from the second row’)

4

19

down the inside-right

10

Tooor für Deutschland, X:X (‘goal for Germany, X:X’)

5

16

the inside-right channel

8

auf dem rechten Flügel (‘on the right wing’)

6

14

down the right and

7

in der zweiten Hälfte (‘in the second half’)

7

14

from the edge of

6

da war mehr drin (‘there was more in it’)

8

13

down the inside-left

6

doch der Ball geht (‘but the ball goes’)

9

13

in the first half

6

im Strafraum an den (‘in the penalty area at the’)

10

12

for the first time

6

Meter vor dem Kasten (‘meters before the goal’)

According to the absolute usage frequencies, English OTCs apparently use formulaic expressions much more than the German ones. A particularly common collocation (see ranks 1, 2 and 3 in Table 4), better represented as a 6-gram, is on the edge of the X.14 Levin’s (2008) findings can be confirmed insofar that somebody reading OTC reportage has to have (i) knowledge about conventions and a mental image as regards the layout of a football pitch and (ii) about foot-

14 Realisations for X occurring in the data are D, six yard box, England box, Italy penalty area, Sweden penalty area, penalty area.

288 

 Valentin Werner

ball-related jargon. Fact (i) is especially illustrated by the English data, where the majority of the 4-grams describes movement and/or position and (ii) especially by the German data, where technical terms (partly also related to position) such as Strafraum (‘penalty area’), Flügel (lit. ‘wing’; ‘outer part of the pitch’) or aus der zweiten Reihe (lit. ‘from the second row’; ‘from far away’) appear. The present data therefore suggest that it is not merely “goal scoring and measuring time” (Levin 2008: 146) where formulaic language is employed, although some of the items included in Table 4 (e.g. in the first half; for the first time; Tooor für Deutschland, X:X; in der zweiten Hälfte) support Levin’s claim. A related aspect is the extended reliance on informal and slang items (Perez-Sabater et al. 2008: 242; cf. Ferguson 1983: 156–157), exemplified by Kasten (‘goal’, lit. ‘box, case’) in Table 4. A recent study on informality (Burkhardt 2010: 14–15) has identified a long-standing tradition of dialectal and informal influence as regards (German) football language, and a similar situation in English appears highly likely. Indeed, the OTC data from both languages confirm a general tendency towards informal usage, as examples (16) to (19) show (see also below): (16) Neat turn from Ozil who twists in the box before feeding Khedira for a low 20-yarder, which Sifakis parries. (ger_gre_2206_sun) (17) (…) on the sideline Joachim Low is waving his hands around in frustration like an eejit. (ger_gre_2206_guar) (18) Huiuiui, dieser Reus hat sich einiges vorgenommen. Diesmal rutscht ihm das Spielgerät über den Schlappen und fliegt zwei Meter am rechten Außenpfosten vorbei. (ger_gre_2206_bild)15 (19) Fortakis hält einfach mal drauf. Neuer hält einfach mal fest. (ger_gre_2206_spon)16

3.1.2 Intended readership Lexical differences along the dimension “intended readership” are harder to determine. First of all, a quantitative assessment of the lexical density of OTCs (see Table 5) shows only marginal differences between languages and individual OTCs (SD = 1.50) and standardised type/token ratio values approximating values normally found in written data (e.g. of the written components of the International Corpus of English).

15 Translation: Huiuiui, this Reus guy is up for something. This time, the playing device (infml.) slides over his worn-out shoe/slipper and misses the right outer post by two meters. 16 Translation: Fortakis just shoots. Neuer just saves.



Real-time online text commentaries: A cross-cultural perspective 

 289

Table 5: Standardised type/token ratios (TTR) calculated with frequencies from AntConc

std. TTR

GUAR

SUN

BILD

SPON

45.88

42.50

45.51

42.93

In fact, keyword analyses contrasting the vocabulary of the two OTCs respectively (GUAR vs. SUN and SPON vs. BILD) yield a very diverse picture. First, a look at the top 100 keyness words of GUAR vs. SUN (and vice versa) reveals some (groups of) characteristic items. Commentators for GUAR seem to have a preference for technical terms such as tiki-taka or its ad-hoc (mock) variant (das) bundestikiundtaka17 to describe the particular playing style the Spanish and German teams are known for. On a related note, the acronym TBOF (‘two banks of four’), referring to the traditional tactical formation of the England squad, reaches a high keyness rating. Another conspicuous item in the GUAR data is beard. Here, an idiosyncratic use of the GUAR commentator, again from the Germany vs. Greece match, is responsible for its salience. While at the beginning of the coverage the player Salpingidis is introduced with the metonymic nickname beard to be feared, as in example (20), at a later point in the match, we can witness a process of personification and the reference merely by a physiological feature is taken as established, as can also be seen from the capitalisation of the term in example (21).18 (20) Gekas will go up front, with the beard to be feared, Salpingidis moving to the right of midfield. (ger_gre_2206_guar) (21) The Beard To Be Feared slides a cool low penalty to the right as Neuer goes the other way. (ger_gre_2206_guar)

In contrast, we can generalise from the SUN vs. GUAR keyness list that SUN commentators more often than not refer to players by their first names (Mario, Bastian, Antonio, Manuel, Mesut, Cristiano, Miroslav, etc.) and employ more war-/aggres-

17 Burkhardt (2010: 14) presents an overview of the genesis of the term tikitaka. Consider also the word formations das bundestikiundtakafussball (ger_por_0906_guar); I fell asleep after 63 minutes and have only just woken up from a tiki-taka-induced snooze (ita_eng_2406_guar) or Because over-intellectualising Spain’s tiki-totalitarianism isn’t going to be enough when you try to big this up in ten years’ time, I can tell you that for nothing (ger_ita_2806_guar). 18 Cf. the following references to England striker Wayne Rooney: Dicke Chance für Mister Haupt­ haar! (‘Big opportunity for Mister scalp hair!’; ita_eng_2406_bild); Wieder kommt das lebende Haartransplantat Rooney angeflogen, doch sein Kopfball ist eher eine Rettungstat denn ein Torversuch. (‘Again the living hair transplant Rooney is approaching, but his header is more of a save than an attempt on target.’; ita_eng_2406_spon).

290 

 Valentin Werner

sion-related terminology (e.g. fires, impact, strike, shot, kill, onslaught) – although it might be argued that some of these items have become conventionalised metaphors. Puns on players’ names and ad-hoc formations are a common feature of all OTCs and illustrate creative language use in this type of sports commentary (see also Section 3.2 on discourse features below; cf. Golebiowski 2012: 58): (22) It’s Robben-esque at times from Ibrahimovic (…) (swe_eng_1506_guar) (23) “It’s Goetzille.” Who needs Xaviesta? (ger_gre_2206_guar) (24) Super Mario was brilliant at times for Manchester City this season (…) (ita_eng_2406_ sun) (25) Immer wieder Mad Mario. (ita_eng_2406_spon) (26) THE LAHM BELLS ARE RINGING (ger_ned_1306_sun) (27) LACKING in KLAAS (ger_ned_1306_sun) (28) Schewagol (ukr_eng_1906_bild) (29) Kjaer has the ball toe-poked pass him by Muller with the result that Muller is mullered to the ground by the Dane. (ger_den_1706_guar)

The keyword analysis of the German OTCs shows that BILD is much more prone to using dialectal and jargon words than SPON. Two illustrative instances are references to Ball (‘ball’) and Tor (‘goal’). While the standard variants (i.e. Ball and Tor) rank high in the keyness list of SPON, within the top 100 keyness items of BILD a variety of informal terms both for the former (e.g. Kugel ‘bowl’, Leder ‘leather’, Pille ‘pill’, Murmel ‘marble’)19 and the latter (e.g. Kasten ‘box’, Hütte ‘shed’) occur. On a related note, other salient items worth mentioning due to their high keyness in BILD are Schlappen (‘foot’; lit. ‘worn-out shoe/slipper’) or Dampf­ hammer (‘fast shot on goal’; lit. ‘steam hammer’). This does not mean, however, that SPON commentators do not use informal or jargon items, as the occurrence of some other words listed in Burkhardt (2010) shows (see examples (30) to (32)) – they are just used less frequently. (30) Also Balotelli sollte heute besser keinen Elfer mehr schießen (ita_eng_2406_spon)20 (31) De Rossi schießt, Hart lässt prallen, Balotelli feuert aus kurzer Distanz drauf, wieder Hart und dann muss Monotolivo das Ding im Nachschuss machen (ita_eng_2406_ spon)21

19 The Kicktionary (; Schmidt 2007), a multilingual dictionary of football terms, includes Kugel and Leder (in addition to Spielgerät (‘the thing to play with’)); cf. Neuer faustet das Spielgerät weg (‘Neuer punches the ball away’; ger_por_09_06_bild), but not Pille and Murmel. 20 Translation: Well, Balotelli rather shouldn’t shoot any more penalties (infml.) today. 21 Translation: De Rossi shoots, Hart rebounds the ball, Balotelli fires from a short distance, again Hart and then Montolivo must score [lit. make the thing] in the follow-up.



Real-time online text commentaries: A cross-cultural perspective 

 291

(32) Garmash wagt einen Distanzschuss und knallt aus 30 Metern vom linken Flügel aus auf das Tor. (ukr_eng_1906_spon)22

3.2 Discourse features Again relating to in-group knowledge (see also Gerhardt 2006: 140; O’Keeffe 2006: 155) required by the audience, an earlier analysis has identified “Britishness” (Chovanec 2008: 261) as common ground of the cross-references in GUAR OTCs. Some of these findings can be extended to OTCs from other media outlets. In-group knowledge is required by the reader whenever commentators refer or allude to particular players, coaches or commentators not part of the current game or action (and their alleged characteristics, statements or achievements). Examples (33) to (38) illustrate that this happens in OTCs of all kinds. (33) Call it the Crouch Effect, if you will. (swe_eng_1506_guar) (34) The full-back likes attacking more than defending, apparently, so appears to be the Portuguese equivalent of Glen Johnson. (ger_por_0906_sun) (35) Gomes slides in, Gascoigne at the Euro 96 semi style, but can’t get his boot to the ball. (ger_ned_1306_guar) (36) Aber Kroos mit einer Christian-Rahn-Gedächtnis-Ecke. (ger_ita_2806_spon)23 (37) Pirlo kommt trotzdem an den Ball, macht aber den Robben. (ger_ita_2806_spon)24 (38) Balotelli will den Ibrahimovic machen. (ita_eng_2406_bild)25

In the GUAR data, this is also often observable in the readers’ comments included in the actual OTC. A similar effect is created by numerous references to scenes from other games and to other teams, as shown in examples (39) to (43). (39) Mellberg produces a tackle not too dissimilar to Bobby Moore’s famous one on Jairzinho in the 1970 World Cup. (swe_eng_1506_guar) (40) He makes it to penalty area before old hand Mellberg stops him in his tracks with a challenge akin to Moore on Pele, 1970. (swe_eng_1506_sun) (41) I just had a horrible premonition of Balotelli making this match his Maradona ’86 moment and crushing us single-handledly [sic] because he feels like it (ita_eng_2406_ guar)

22 Translation: Garmash tries a distance shot and rifles the ball from 30 meters from the left wing towards the goal. 23 Translation: But Kroos with a Christian-Rahn-memorial corner. 24 Translation: Pirlo gets the ball anyway, but does the Robben. 25 Translation: Balotelli wants to do the Ibrahimovic.

292 

 Valentin Werner

(42) Doch im Gegensatz zum FC Bayern nimmt keiner Reißaus oder zeigt auf den Anderen. (ita_eng_2406_bild)26 (43) Schlecht war die deutsche Mannschaft gegen Portugal eigentlich nur im Jahr 2000. Damals setzte es ein 0:3. Aber die Abwehrspieler hießen auch Rehmer oder Nowotny. (ger_por_0906_spon)27

While these intertextual28 references as listed above are not restricted to OTCs from GUAR, these are the ones where they occur most frequently (see Table 6). Table 6. Average number of intertextual references per match report

cross references

GUAR

SUN

BILD

SPON

6.67

2.78

2.44

4.78

This is also due to another unique feature of GUAR OTCs, which is reference to popular culture (e.g. actors, movie titles etc.) by both commentators and audience comments, as exemplified in (44) or (45): (44) See you in 10 minutes for more of the same, or the most dramatic twist since The Crying Game/The Usual Suspects/Fight Club/Turner & Hooch. (ger_gre_2206_guar) (45) Now that Walcott has replaced Ron Perlman England might actually win. (ita_ eng_2406_guar)

All this nicely illustrates the extensive additional knowledge required to become an actual part of the game, or rather its mediated presentation (see also Gerhardt 2006: 140). In other types of media, commentators deliberately employ intertextual references as one way to create “pseudo-intimacy”, that is, “some sense of common identity and nationality or some other familiarity built up through frequent ‘contact’” (O’Keeffe 2006: 92)29 and this seems to be the case also in OTC reportage, most clearly in the GUAR data.

26 Translation: But in contrast to Bayern Munich nobody runs away or points to somebody else. 27 Translation: The only time the German team actually was bad against Portugal was in the year 2000. They got defeated 0:3. But the defenders were called Nowotny and Rehmer. 28 Intertexuality is conceived of in broad terms, including e.g. previous matches, scenes, other players etc. as (non-linguistic) pre-texts. In addition, this intertextuality may also comprise stereotyped (national) clichés requiring generalised cultural knowledge, such as “[…] but Andreas Brehme has to be the best Left Back,” says John Duffy. “He had a few problems in the hairstyle department, mind, but what German doesn’t?” (swe_eng_1506_guar). 29 Cf. also Ferguson’s term “dialog on stage” (1983: 156).



Real-time online text commentaries: A cross-cultural perspective 

 293

Another remarkable discourse feature already extensively covered by Jucker (2006: 128) is what he labels “parlando prosodics”: in the written medium the commentator imitates “spoken language through exclamations, capitalisation, graphical indication of vowel lengthening […] and hesitations”.30 For reasons of space, suffice it to say that also the current dataset yields a range of examples and that these realisations can be found in OTCs of any provenance (see examples (46) to (50)). (46) Gooooooooooooooooal! but in the other game. (ger_den_1706_guar) (47) They couldn’t, could they??? (ger_ita_2806_sun) (48) Peeeeeeep! Peeeeeeep! Peeeeeeeeeeeeeeep! Nothing more to report here folks. (ger_den_1706_guar) (49) Aber gut, es bedeutet immerhin: GLEICH GEHT ES LOS! (ger_ger_2206_spon)31 (50) Rooooooooney zahlt zurück. (ukr_eng_1906_bild)32

Therefore, Perez-Sabater et al.’s (2008: 255) finding that prosody is usually not typographically marked in OTCs from British newspapers has to be revised. In addition, commentators indicate spoken modes of discourse by other means such as (i) question tags, (ii) interjections and (iii) hesitation markers (or combinations of these), all typically found in speech (cf. Chovanec 2008). Examples (51) to (54) illustrate the first type and are commonly used as rhetorical questions or as a means to convey surprise. (51) You’d fancy that run continuing this year, no? (ger_por_0906_guar) (52) Motta reißt Kroos um, Italien bekommt Freistoß. Häh? (ger_ita_2806_spon)33 (53) Oh no they didn’t! Football eh? (ger_gre_2206_sun) (54) Wenn man sowas übersteht, kann doch nichts mehr schiefgehen, oder? (ger_ por_0906_spon)34

The wide range of interjections found in the data fulfils a similar function of simulating spoken discourse. Again, they occur across all OTCs, as examples (55) to (59) show.

30 Expressive punctuation, exemplified in (47), could also be added to the list of parlando prosodics and may thus be seen as a characteristic register feature (cf. Sanchez-Stockhammer, this volume). 31 Translation: But well, at least this means: IT’S ABOUT TO START! 32 Translation: Rooooooooney pays back. 33 Translation: Motta knocks Kroos down, Italy gets a free kick. Eh? 34 Translation: If you get over such a thing, nothing can go wrong, right?

294 

 Valentin Werner

(55) Blimey, Liberopoulos is a man on a mission. (ger_gre_2206_sun) (56) Oooooooooh. A ball as delicious as your mother’s Sunday roast is swung into the box from Ozil but it goes out for a corer [sic]. (ger_den_1706_guar) (57) Boah! Kann man das bitte nochmal in Zeitlupe sehen? (ger_ned_1306_bild)35 (58) Drei Minuten gibt es obendrauf! Puuh, das ist viel! (ger_por_0906_spon)36 (59) Oh Gott, was macht den [sic] Müller da? (ger_den_1706_spon)37

In the above instances, medium determines content, or at least its typographical representation and many of the discourse features listed contribute to the creation of “pseudo-intimacy”, also meaning that both commentator and audience “pretend the relationship is not mediated and is carried on as though it were faceto-face” (O’Keeffe 2006: 92).

3.3 Interaction of text and other elements Another aspect largely having escaped researchers’ attention is the interaction between formal layout/paralinguistic phenomena and textual/linguistic content. For the four OTCs under investigation, this indeed plays a role. It was already indicated above that some of the OTCs come with many additional features such as team statistics, heatmaps, etc. Thus, it could be hypothesised that the more paralinguistic material is present, the shorter the individual OTCs are.38 This potential interaction can be measured quantitatively by considering absolute token counts (average number of words per match reported) and relating these values to the presence of further structural elements. Table 7: Average token number per match report

Average token number

GUAR

SUN

BILD

SPON

4,646

3,105

2,580

3,047

35 Translation: Boah! Can we see this in slow motion again? 36 Translation: Three minutes of additional time! Phew, that’s a lot! 37 Translation: Oh my god, what’s Müller doing there? 38 The present analysis applies a “micro-level approach” (Santini et al. 2010: 11); that is, only elements reachable within one click and which are part of the actual OTC are included (excluding ads and general navigation tabs, etc.).



 295

Real-time online text commentaries: A cross-cultural perspective 

Table 7 shows the relevant frequencies, and a “wordiness hierarchy” along the lines GUAR > SUN > SPON > BILD emerges, which suggests that the German OTCs are shorter on average. Two aspects are worth considering here: in addition to the textual commentary, the different OTCs rely on various other forms of presentation of match-related information, all allocated to different areas on the page or reachable by clicking on a tab (see Section 2.2 above). Table 8 gives an overview of presence or absence of these features. Table 8: Comparative overview of presence/absence of paratextual features. GUAR

SUN

BILD

SPON

Textual commentary









Match score and goal scorers









Parallel matches and scores









Team line-ups









Live table









Tactical formations









“Event” filter or timeline (goals, cards, substitutions)









Team and player statistics









Player positions/“heatmaps”









Player ratings









Referee statistics









Area for Tweets









While Table 8 shows that there are some basic elements for all OTCs (match score and goal scorers, team line-ups, statistics), it also illustrates a fundamental structural split between GUAR and the remaining three OTCs. GUAR emerges as the one with least additional informational elements, necessitating, in turn, a more explicit, or “wordy” style of reportage. The other OTCs, in contrast, rely more on iconographic and tabular representations (see also Figures 6 and 7), which provides a first explanation for the lower number of tokens in these.

296 

 Valentin Werner

Figure 6: Team line-up, statistics and heatmap from SPON (fra_eng_1106_spon; , accessed 02/07/2012, 10:30)

A second decisive point is that GUAR focuses on the entertainment aspect (Chovanec 2010: 242), whereas the other three OTCs are more informational in the sense that they provide an extended range of factual information and statistics. This might also be the reason why the individual entries in the commentary are short, as noted by Jucker (2010: 58–60).39 GUAR, in contrast, not only has longer individual entries than the other OTCs, but relies extensively on readers’ comments and replies by the commentator, comprising up to one third of the textual material (in number of words). Another characteristic feature of GUAR is the incorporation of pictures, video clips and links only indirectly related to the actual match, which rather serve to support the entertainment function. The other OTCs do not incorporate audience participation at all (SUN, BILD) or do so in a more direct manner, via Twitter messages displayed next to the main commentary (SPON), thus creating another layer of commentary (see Section 2.3 above), which breaks the uni-directionality of the communication.

39 However, the span (in terms of word length) across the OTCs is considerable and can range from just a few words (e.g. Ecke Deutschland ‘corner Germany’; ger_por_0906_bild) to more than 125 tokens.



Real-time online text commentaries: A cross-cultural perspective 

Figure 7: Timeline, statistics and commentary from SUN (ita_eng_2406_sun; , accessed 02/07/2012, 10:20)

 297

298 

 Valentin Werner

4 Discussion: Cross-cultural aspects Having considered some linguistic and structural aspects of OTCs, this section addresses the question as to whether OTCs should be seen as a cross-cultural register or whether differences are salient along the dimensions of regional provenance or intended readership. Based on the findings from the previous sections, a diverse picture emerges. A first area with considerable overlap is the general structure of reportage. Many elements (e.g. an “appetiser” section; see Section 2.2) occur universally and also the other components of a textual match report are principally similar. This is determined to some extent by the fact that all OTCs report on the same event with a fixed duration and thematic focus (Siever 2011: 171) – a football match –, so that a certain congruence could be expected. However, with respect to content, GUAR is more extensive in its pre-match coverage of England matches, while the German OTCs use more words to describe Germany playing. Word counts in SUN, however, are relatively indifferent to the type of match reported (see Section 2.2). The picture changes slightly when we consider the average word counts for the full reports on matches by either England or Germany, as shown in Figure 8. 6000

word count

5000 4000 3000 2000 1000 0 AVG AVG ENG AVG GER

GUAR 4746.5 5646 3847

SUN 3147.2 3525.5 2768.8

BILD 3059.8 3132.8 2986.8

SPON 2528.0 2054.5 3001.4

Figure 8: Overall average word count and according to team playing (AVG = overall average; AVG ENG = average of England match reports; AVG GER = average of Germany match reports)

Both GUAR and SPON are more extensive in their coverage of the “home” team (these commentaries comprise approximately one third more words than commentaries of the respective other), while this tendency is less clear for SUN (approximately one quarter more words for England matches) and even slightly reverse for BILD. Thus, despite claims that audiences of new media are “poten-



Real-time online text commentaries: A cross-cultural perspective 

 299

tially global” (O’Keeffe 2006: 16), this finding indicates some kind of persisting “national allegiances”. Turning to the lexicon and collocations, the analysis above revealed that content and function vocabulary are broadly comparable across languages. Equally, OTCs of all types rely on formulaic language, which could be expected with relation to earlier research on football discourse. From a quantitative perspective, however, English OTCs tend to use these combinations more than German OTCs, in particular when referring to location of the action on the pitch. Other commonalties are, first, the usage of slang terms and informal items typical for football language in general. Second, a comparison of the type-token ratios did not yield any significant differences. Thus, one of the points mentioned above, namely the restricted lexical range of this particular register and that especially OTCs associated with yellow press papers (SUN, BILD) are “simple” as regards lexical content, has to be qualified to a certain extent. An area where the OTCs clearly diverged along the dimension “intended audience” emerged in the keyness analysis. Both the English and the German OTCs yielded some inner differentiation – the former as to a higher salience of war-related metaphors in SUN, the latter as to a higher salience of dialectal and jargon vocabulary in BILD. Given the quantitative evidence, it is highly unlikely that this is a chance finding. Rather, it may be interpreted as an adaptation of the SUN and BILD commentators to the alleged language use of their intended readership. Whether this adaptation is deliberate or intuitive remains a matter of speculation. Puns on players’ names and creative ad-hoc formations can be found across all OTCs, however. Discourse features represent a further area where differences and similarities could be observed. On the one hand, the salience of football- and culture-related intertextual references as identified by Chovanec (2008) for GUAR OTCs could also be traced in the other OTCs considered, thus representing another uniting feature. However, these references are most frequent in GUAR and SPON, suggesting that both the creation of an in-group atmosphere and the often-related entertainment aspect are more important in the quality-press related OTCs. On the other hand, the present study confirmed and extended earlier research positing the staging of orality as a trademark feature of OTCs, showcasing creative manipulation of restrictions of the written medium, while no cultural specificity of this phenomenon can be claimed on the basis of the present data (see Perez-Sabater et al. 2008: 256 for a comparison of English, Spanish and French). Finally, with regard to the interaction between the textual commentary and other elements of the OTCs, it was evident that all OTCs apart from GUAR rely on an extended range of supplementary features (mainly tabular and iconographic), while GUAR may compensate for this lack of factual information with a more

300 

 Valentin Werner

extensive description in the textual commentary. In addition, GUAR and, with qualifications, SPON can be viewed as more “entertaining” or “fan-like”, while SUN and BILD are more factual (although the latter pair uses more jargon). This division reproduces Jucker’s (2010: 69) categorisation of OTCs. By way of summary, we can posit that there are indeed many commonalities transcending borders (set by cultural specificity and intended readership), but there is also room for variability both within and across language boundaries. This highlights the flexibility of the register despite the formal constraints of the electronic medium.

5 Summary and conclusion Above all, OTCs emerged from the analysis as a “webby” genre that has gained prominence within the last decade as an immediate form of online journalism, particularly adequate for live coverage of sports events. Production circumstances were established to be markedly different from those of traditional sports reportage and it was shown that OTCs can be viewed as an amalgamation of different journalistic, or, speaking more broadly, discursive styles (narration, description, opinion, quasi-conversation, etc.; see further Biber and Egbert, this volume). Some OTCs relied on an extended number of paratextual elements and the data suggested a split picture as regards the potential influence of audience participation (both in terms of “web 2.0” applications and via other channels) on the reporting. While two (SUN, BILD) did not take account of readers’ contributions, SPON had a designated paratextual element (the “Live-Fanblock” containing Tweets), where the audience could express their views as some kind of parallel comment, and GUAR covered an intermediate position as comments (usually sent-in mails) were frequently quoted and referred to, albeit in a mediated and filtered form. An overall comparison of OTCs and traditional forms of sports reportage indicated that the former should be identified as a new and specific register. At the same time, this showcased the “interweaving of old and new formats” as posited by O’Keeffe (2006: 27) as one of the general properties of newly emerging registers. Turning to language-related aspects, the present study first showed by way of a lexical and semantic analysis that OTCs do not fundamentally differ from other types of football reportage in their use of technical vocabulary. Second, the exploration of n-grams revealed the importance of position-related collocations and furthermore of informal and slang vocabulary, while differences between the individual OTCs, especially along the dimension “intended readership” were clearly



Real-time online text commentaries: A cross-cultural perspective 

 301

evident. In contrast, the consideration of discourse features showed a remarkable overlap between the four OTCs, while intertextual references were found to be most salient in OTCs with “entertainment” as a communicative function (GUAR and SPON). However, there were some instances with limitations posed by the electronic (written) format, in particular as regards the staging of orality prominent in OTCs. While all OTCs shared a similar general structure, GUAR emerged as “the odd one out”. It was the one using most words but least paratextual elements, one potential explanation being that there the entertainment function is strongest, while the other OTCs provided more factual information, supported through tabular and iconographic elements. This highlighted the need to consider the interaction between format and content and the communicative aim of the individual OTCs as well as the tension between information and entertainment emblematic of modern media discourse (cf. Fairclough 1995: 10). No definitive answer could be given to the second guiding research question as to whether OTCs can be seen as a cross-cultural register. Rather, OTCs emerged as a highly diversified form of reportage. Formal constraints and the similar structure of the matches reported determined similarity to a certain extent. However, the present analysis revealed (mostly, quantitative) diversity and flexibility, both across (e.g. as regards length of the coverage of the “home” team) and within (e.g. as to reliance on informal and slang items) languages. I suggest this is again mainly due to the communicative aim of the individual OTCs and adaptation towards their intended audience. For a future exploration, it would be desirable to obtain a better insight into the receptive dimension,40 for instance in terms of eye-tracking experiments establishing how fast users read the OTC text and which elements (statistics, textual commentary, icons etc.) they focus on. From a linguistic point of view, further areas worth considering in more detail are creative language use (see example (60)) as well as metonymies (see examples (20) and (21) above) and metaphors (see example (61) for a musical metaphor; cf. also Burkhardt 2010; Küster 2010: 32; Lewandowski 2012). (60) The German fans are ole-ing. (ger_gre_2206_guar) (61) Martin Olsson setzt sich auf links mit einem tollen Solo gegen Walcott und Johnson durch […] (swe_eng_1506_bild)41

40 This could also include a case study focusing on the linguistic properties and functions of the “twitterese” mentioned above. 41 Translation: Martin Olsson prevails against Walcott and Johnson on the left with a great solo.

302 

 Valentin Werner

While the present study offered a select comparison of German and English OTCs, an analysis including even more OTCs from other languages and intended audiences may help to establish a more fine-grained typology of OTCs worldwide, potentially also considering diachronic developments. In this connection, it remains to be seen whether audience participation, found to be relatively restricted in the present study, will play a more important role in the future and whether further technological developments (e.g. in terms of an integration of TV and OTC reportage) will have an impact on the style of reporting.

References Bateman, John A. 2012. Multimodal corpus-based approaches. In Carol A. Chapelle (ed.), The encyclopedia of applied linguistics, 3983–3991. Oxford: Wiley-Blackwell. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP. Brandt, Wolfgang & Regina Quentin. 1983. Zeitstruktur und Tempusgebrauch in Fussballreportagen des Hörfunks [Temporal structure and tense use in radio football reportage]. Marburg: Elwert. Burkhardt, Armin. 2010. Abseits, Kipper, Tiqui-Taca: Zur Geschichte der Fußballsprache in Deutschland [Offside, keeper, tiki-taka: The history of football language in Germany]. Der Deutschunterricht 62(3). 2–16. Chovanec, Jan. 2008. Enacting an imaginary community: Infotainment in on-line minuteby-minute sports commentaries. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang Stadler (eds.), The linguistics of football, 255–268. Tübingen: Narr. Chovanec, Jan. 2009. ‘Call Doc Singh’: Textual structure and coherence in live text sports commentaries. In Olga Dontcheva-Navratilova & Renata Povolná (eds.), Coherence and cohesion in spoken and written discourse, 124–137. Newcastle: Cambridge Scholars. Chovanec, Jan. 2010. Online discussion and interaction: The case of live text commentary. In Leonard Shedletsky & Joan E. Aitken (eds.), Cases on online discussion and interaction: Experiences and outcomes, 234–251. Hershey: IGI Global. Chovanec, Jan. 2011. Humor in quasi-conversations: Constructing fun in online sports journalism. In Marta Dynel (ed.), The pragmatics of humour across discourse domains, 243–264. Amsterdam: Benjamins. Dürscheid, Christa. 1999. Zwischen Mündlichkeit und Schriftlichkeit: Die Kommunikation im Internet [Between speech and writing: Communication on the Internet]. Papiere zur Linguistik 60(1). 17–30. Fairclough, Norman. 1995. Media discourse. London: Arnold. Ferguson, Charles A. 1983. Sports announcer talk: Syntactic aspects of register variation. Language in Society 12(2). 153–172. Gerhardt, Cornelia. 2006. Moving closer to the audience: Watching football on television. Revista Alicantina de Estudios Ingleses 19. 125–148. Ghadessy, Mohsen. 1988. The language of written sports commentary: Soccer – a description. In Mohsen Ghadessy (ed.), Registers of written English: Situational factors and linguistic features, 17–51. London: Pinter.



Real-time online text commentaries: A cross-cultural perspective 

 303

Golebiowski, Adam. 2012. Wortverschmelzungen und Sportsprache: Zur Kreativität im Wortbildungsbereich [Blends and the language of sport: Creativity in word formation]. In Janusz Taborek, Artur Tworek & Lech Zielinski (eds.), Sprache und Fußball im Blickpunkt linguistischer Forschung [Language and football in the view of linguistic analysis], 51–61. Hamburg: Kovač. Grieve, Jack, Douglas Biber, Eric Friginal & Tatjana Nekrasova. 2010. Variation among blogs: A multi-dimensional analysis. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the web: Computational models and empirical studies, 303–322. Dordrecht: Springer. Hennig, Mathilde. 2000. Tempus und Temporalität in geschriebenen und gesprochenen Texten [Tense and temporality in written and spoken texts]. Tübingen: Niemeyer. Höke, Susanne. 2007. Sun vs. Bild: Boulevardpresse in Großbritannien und Deutschland [Sun vs. Bild: Yellow press in Great Britain and Germany]. Saarbrücken: VDM. Jucker, Andreas. 2005. News discourse: Mass media communication from the seventeenth to the twenty-first century. In Janne Skaffari, Matti Peikola, Ruth Carroll, Risto Hiltunen & Brita Warvik (eds.), Opening windows on texts and discourses of the past, 7–21. Amsterdam: Benjamins. Jucker, Andreas. 2006. Live text commentaries: Read about it while it happens. In Jannis K. Androutsopoulos, Jens Runkehl, Peter Schlobinski & Torsten Siever (eds.), Neuere Entwicklungen in der linguistischen Internetforschung [Recent developments in linguistic internet research], 113–131. Hildesheim: Olms. Jucker, Andreas. 2010. ‘Audacious, brilliant!! What a strike!’ Live text commentaries on the internet as real-time narratives. In Christian R. Hoffmann (ed.), Narrative revisited: Telling a story in the age of new media, 57–77. Amsterdam: Benjamins. Krone, Maike. 2005. The language of football: A contrastive study of syntactic and semantic specifics of verb usage in English and German match commentaries. Stuttgart: Ibidem. Küster, Rainer. 2010. ‘Im Tabellenkeller brennt noch Licht’: Metaphern in der Fußballsprache [At the bottom of the table there’s still some light: Metaphors in football language]. Der Deutschunterricht 62(3). 26–37. Levin, Magnus. 2008. ‘Hitting the back of the net just before the final whistle’: High-frequency phrases in football reporting. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang Stadler (eds.), The linguistics of football, 143–155. Tübingen: Narr. Lewandowski, Marcin. 2012. Football is not only war: Non-violence conceptual metaphors in English and Polish soccer language. In Janusz Taborek, Artur Tworek & Lech Zielinski (eds.), Sprache und Fußball im Blickpunkt linguistischer Forschung [Language and football in the view of linguistic analysis], 79–96. Hamburg: Kovač. Müller, Torsten. 2007. Football, language and linguistics: Time-critical utterances in unplanned spoken language, their structures and their relation to non-linguistic situations and events. Tübingen: Narr. Newsworks. 2013a. The Guardian. http://www.newsworks.org.uk/The-Guardian (accessed 20 April 2013). Newsworks. 2013b. The Sun. http://www.newsworks.org.uk/The-Sun (accessed 20 April 2013). O’Keeffe, Anne. 2006. Investigating media discourse. London: Routledge. Perez-Sabater, Carmen, Gemma Pena-Martinez, Ed Turney & Begona Montero-Fleta. 2008. A spoken genre gets written: Online football commentaries in English, French, and Spanish. Written Communication 25(2). 235–261.

304 

 Valentin Werner

Press Gazette. 2013. UK national newspaper sales: Relatively strong performances from Sun and Mirror. http://www.pressgazette.co.uk/uk-national-newspaper-sales-relativelystrong-performances-sun-and-mirror (accessed 21 May 2013). Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4). 519–549. Santini, Marina, Alexander Mehler & Serge Sharoff. 2010. Riding the rough waves of genre on the web: Concepts and research questions. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the web: Computational models and empirical studies, 3–30. Dordrecht: Springer. Schmidt, Thomas. 2007. The Kicktionary: A multilingual resource of the language of football. In Georg Rehm, Andreas Witt & Lothar Lemnitzer (eds.), Data structures for linguistic resources and applications, 189–196. Tübingen: Narr. Siever, Torsten. 2011. Texte i. d. Enge: Sprachökonomische Reduktion in stark raumbegrenzten Textsorten [Constricted texts: Language-economical reduction in heavily space-constrained text types]. Frankfurt am Main: Lang. Simons, Anton. 2011. Journalismus 2.0 [Journalism 2.0]. Konstanz: UVK. Thurman, Neil & Anna Walters. 2013. Live blogging: Digital journalism’s pivotal platform. Digital Journalism 1(1). 82–101. Wells, Matt. 2011. How live blogging has transformed journalism: The benefits and the drawbacks of the open-to-all digital format. http://www.guardian.co.uk/media/2011/ mar/28/live-blogging-transforms-journalism (accessed 13 April 2013).

Appendix Match

Match day

Commentators (if available)

Associated files (guar = The Guardian; sun = The Sun; bild = Bild; spon = Der Spiegel)

Germany – Portugal

09/06/2012

GUAR: N/A SUN: N/A BILD: N/A SPON: Christian Paul

ger_por_0906_xx

Germany – Netherlands 13/06/2012

GUAR: N/A SUN: N/A BILD: N/A SPON: Jan Reschke

ger_ned_1306_xx

Germany – Denmark

GUAR: Ian McCourt SUN: N/A BILD: N/A SPON: Mike Glindmeier

ger_den_1706_xx

17/06/2012



Real-time online text commentaries: A cross-cultural perspective 

 305

Match

Match day

Commentators (if available)

Associated files (guar = The Guardian; sun = The Sun; bild = Bild; spon = Der Spiegel)

Germany – Greece

22/06/2012

GUAR: Rob Smyth SUN: N/A BILD: N/A SPON: Lukas Rilke

ger_gre_2206_xx

Germany – Italy

28/06/2012

GUAR: N/A SUN: N/A BILD: N/A SPON: Mike Glindmeier

ger_ita_2806_xx

France – England

11/06/2012

GUAR: Scott Murray SUN: N/A BILD: N/A SPON: Christian Paul

fra_eng_1106_xx

Sweden – England

15/06/2012

GUAR: Jacob Steinberg SUN: N/A BILD: N/A SPON: N/A

swe_eng_1506_xx

Ukraine – England

19/06/2012

GUAR: Barry Glendenning SUN: N/A BILD:N/A SPON: N/A

ukr_eng_1906_xx

Italy – England

24/06/2012

GUAR: N/A SUN: N/A BILD: N/A SPON: Mike Glindmeier

ita_eng_2406_xx

Javier Pérez-Guerra

Word order is in order here: A diachronic register analysis of syntactic markedness in English Abstract: In line with multidimensional proposals under which registers can be stylistically and/or situationally defined by paying attention to the frequency of a selection of linguistic features, this study explores the connection between syntactic markedness at the level of the clause and stylistic characterisation in a number of registers in the history of English. In particular, this chapter investigates three syntactic constructions leading to syntactically marked clausal designs which do not conform to subject-verb-complement word order: left dislocation, topicalisation and subject-inversion/extraposition. The data, retrieved from multi-register parsed corpora, show that the distribution of these constructions correlates with the degree of stylistic specificity and conventionalisation of the registers. In particular, those registers in which these constructions are particularly frequent feature more specific situational or stylistic choices related to literacy or subject-/participant-involvement. As a matter of fact, out of the three constructions, topicalisation has proved to have less radical consequences for the syntax of the clause, and this correlates with its even distribution across registers.

1 Introduction1 The linguistic analysis of registers/genres/text types in a language has always been controversial, possibly because of the intangible status of such key concepts (see Schubert, this volume). As Swales (1990: 33) points out when he refers to specifically genres, “[t]he word [‘genre’] is highly attractive – even to the Parisian timbre of its normal pronunciation  – but extremely slippery”. A first termino-

1 I am grateful to the following institutions for generous financial support: the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (grant no. FFI2013-44065-P), and the Autonomous Government of Galicia (grant no. GPC2014/060).

Javier Pérez-Guerra, University of Vigo

308 

 Javier Pérez-Guerra

logical remark seems thus in order here as regards the definition of ‘register’, which constitues the research topic in this study. Following, for example, Taavitsainen (2001), who maintains that genres are based on “external evidence in the context of culture” (140; my italics), where “external evidence” refers to the conventions that have come institutionalised “so that they can function […] as ‘horizons of expectation’ for readers to know what to expect and models of writing for authors” (141), I will use ‘genre’ when I refer to exclusively the cultural and/ or social dimension of a given textual category. ‘Register’ will be used here with a focus on the way in which the internal linguistic features of texts are codified in a given text or category of texts, which matches Taavitsainen (2001: 141) term ‘text type’. Even though text types and genres commonly go hand in hand since the linguistic characterisation of a textual category prototypically leads to the latter’s conventionalisation and specialisation in fulfilling a certain discoursive, communicative or social function, Taavitsainen herself recalls Fairclough’s (1992: 126) claim that a “genre [on occasions] implies not only a particular text type, but also particular processes of producing, distributing and consuming texts”, which broadens the notion of genre and covers elements which lie beyond the scope of this chapter. Such lack of definition of concepts such as register, genre or text type has led to multi-faceted studies in this area, adopting a number of different theoretical frameworks. On some occasions, linguists have addressed the linguistic analysis of registers by focusing on the core or prototypical communicative purposes attributed to these in (quite often traditional) stylistics. For example, Swales (1990: 46) notes that “[t]he principal criterion that turns a collection of communicative events into a [register] is some shared set of communicative purposes”. In Halliday’s (1978: 122) Systemic Functional Grammar, registers (genres, in their terminology) are analysed in terms of three variables: their content (or ‘field’), the participants (‘tenor’) and the channel of communication (‘mode’), that is, three dimensions which focus on the communicative elements and purposes involved in a given register. On other occasions, in an approach that will be used in the present chapter, the study of registers has been addressed through focusing on empirically-observable stylometric features (e.g. type-token ratios, length of syllables, words, sentences, paragraphs) which are themselves said to reflect more greater-level concepts such as lexical or syntactic complexity, lexical richness and ornamentation, etc. In Biber and Conrad (2009) the two basic approaches just summarised, which I refer to, respectively, as the ‘communicative’ and the ‘language-based’ views, are embodied in a taxonomy which identifies three perspectives on text varieties (see, for a brief overview, their Table 1.1): (i) style, which analyses aesthetic and authorial preferences in a given text or group of texts; (ii) genre, which focuses on the conventional linguistic devices specific to



Diachronic register analysis of markedness 

 309

a text variety (e.g. ‘genre markers’ such as Dear Sir in a letter); and (iii) register, which, as already pointed out, deals with the linguistic characteristics common within a text variety ‒ and also with the situation of use of the variety as will be argued later. The taxonomy is described in more detail in Dorgeloh (this volume; Section 2 in particular) and Schubert (this volume). So far I have equated register with the language-based characterisation of a given textual category. In this scenario, a further dimension of register must be brought into play. In line with previous proposal couched in the mutidimensional tradition, Biber and Conrad (2009: 6) claim that the linguistic characteristics of the textual categories, materialised by means of pervasive and frequent linguistic features, are “well suited to the purposes and situational context of the register”. That said, this chapter adheres to such a two-fold view of text varieties, that is, both language-based and situational, and, within a register-centred approach (as suggested in, for example, Biber 1995a: 1), focuses on the study of a number of texts in an attempt to explore register variation over the course of the history of English. On the one hand, I will describe a number of textual categories by exploring their dependency on a list of structural features, thus adhering to what is commonly understood by ‘text type’, that is, “grouping of texts that are similar in their linguistic form” (Biber 1988: 170) or, in other words, codifications of linguistic features (Taavitsainen 2001: 141). On the other hand, I will connect the language-based characteristics of the texts with their siatuational interpretation, thus accepting, for example, Virtanen’s (2010: 57) claim that such linguistic features “clearly relate to the form that [discourse functions] will take through aggregates of linguistic exponents of the particular text strategies that are associated with them”. The situational interpretation (better said, the functional interpretation) of the linguistic characteristics of a given text type will lead to the latter’s status as a ‘register’, in Biber’s terminology. This approach departs from, for example, Dorgeloh and Wanner’s (2010: 10) terminological account, summarised in Figure 1, where ‘register’ is used as a cover term for text type, genre and style, and sticks to a twofold characterisation of register which comprises mainly Dorgeloh and Wanner’s both text type and genre.

310 

 Javier Pérez-Guerra

Figure 1: Register, text type, genre and style in Dorgeloh and Wanner (2010)

This chapter will focus on register variation and, more specifically, on the relevance of syntax for this issue. In this respect, Dorgeloh and Wanner (2010) observe that resgiter is “language variation beyond the limits of semantic equivalence, which is why syntax […] provides a promising area of study” (8) and that “[i]t is form, and here morphosyntactic form in particular, that constitutes ‘a prior condition for reasoning about [register]’” (9). In this scenario, under the philosophy of Biber’s (1988, 1995a) groundbreaking multifactorial multidimensional model, this study will combine the main approaches to the analysis of registers already mentioned, that is, communicative and more language-based (syntactic) standpoints, in that findings from the latter will be associated with a corresponding functional interpretation (or dimensional interpretation, as Biber puts it). In other words, by investigating the spread of a number of objectively identified linguistic constructions in a selection of registers, and by interpreting the statistical results of (co-)occurrence, this study will not only shed some light on the functional interpretation of registers but also detect diachronic variation across them. Furthermore, this chapter will suggest some kind of link between syntactic markedness and the degree of (functional) conventionalisation or specialisation of registers. This paper, then, focuses on the analysis of registers in English while also describing variation in the recent history of the language. It also aims to consider the application of some of the assumptions of Biber’s model to syntactic strategies at a supra-phrasal level. In Section 2, I will very briefly summarise the features of the multidimensional model which constitutes the inspiration for the study, this case study and its specific methodology. The results are discussed in Section 3. Section 4 offers a summary of the investigation plus some suggestions for further avenues for research.



Diachronic register analysis of markedness 

 311

2 The case study Biber’s model, which has inspired this study, is based on three theoretical assumptions, summarised in Schubert (this volume) and recapped here only for introductory purposes: (i) the distinctive characteristics of a register are derived from inherent tendencies affecting the statistical productivity of a number of linguistic features; (ii) the patterns of these (co-)occurring features portray underlying dimensions of variation on which texts differ significantly; and (iii) these dimensions can be interpreted in terms of the social, situational and text-functional roles that their constitutive features have been found to play in previous research. As summarised in Biber (1995b), the sixty-seven features used in the first applications of the model belonged to different fields of linguistic analysis: syntactic (causal subordination, coordination, deletion of complementiser that, wh subject relativisers, pied-piped prepositions, stranded prepositions, participial adverbial clauses), grammatical (morphosyntactic categories such as nouns, adjectives, prepositions, demonstratives) and lexical categories (hedges, amplifiers, emphatics), as well as other metrics such as word length and type/token ratio. As noted above, the factors or families of features lead to the dimensions which are interpreted situationally. Biber and Conrad (2009: 51) established the pillars of the methodology: the need for a comparative approach, for quantitative analysis and for a representative sample. First, as regards the comparative approach, this study investigates three syntactic constructions, described in Section 2.1, by assessing two variables which will allow for comparison and contrast: diachrony and register. Second, the need for quantitative analysis has been accomplished by the empirical methodology described in Section 2.2. Third, the whole survey is driven by data retrieved from multi-register balanced corpora, as a means of attaining empirical representativeness and significance.

2.1 The linguistic variables The study outlined in this chapter reports on a construction-driven analysis of historical registers in English by looking at supra-phrasal variables or features which have not thus far been explored in the literature. As pointed out in Section 1, early studies by Biber and his colleagues  – and practically all subsequent studies derived from these  – are based on counts of lexical features. In fact, even those syntactic features which operate at the clause- or sentence-level were singled out by computing the frequency of lexical items such as complementiser that, specific (causal) conjunctions, members of the closed set of English pre­

312 

 Javier Pérez-Guerra

positions, relativisers which, who, etc. In this chapter, and this makes this study particularly innovative, I concentrate on syntactic supra-phrasal variables, specifically word-order phenomena, which cannot be determined by focusing on the occurrence ratios of specific lexical elements. Following the multidimensional model, these will be given so-called social or functional interpretations which will pave the way for the detection of diachronic variation in English as far as sentence linearisation is concerned. As regards the variables to be analysed here, I have focused on syntactic markedness at the level of the clause. From (at least) a statistical standpoint, the default organisational schema of a declarative clause in English is subjectverb-(complement), this being the most versatile design of the clause from the point of view of information structure and processing. Deviation from such a schema implies some degree of markedness. In particular, in what follows I will focus on three syntactic strategies which, first, lead to marked designs as far as word order is concerned and, second, involve elements other than the subjects in sentence-initial position. Since this methodology aims to determine not strictly linguistic but also social or situational variation in the language, I will follow Virtanen (2004: 12) in her claim that “the sentence-initial slot itself constitutes a rich source of discourse meanings precisely because of its cognitive relevance for our processing capacities and memory constraints”. The three constructions are: (i) Topicalisation (TOP), in which a (marked) constituent is in sentence-initial position ‒ example (1) below illustrates the topicalisation of the that-clause object that I had received such from Edward, (ii) Left dislocation (LFD), with a (marked) non-argument constituent in sentence-initial position ‒ in (2), the constituent he that thynkethe it a harde thynge to agre to the conclusion is a left-dislocated noun phrase which corefers with the pronominal object hym in the ensuing main clause, (iii) What I call other ‘subject-last’ strategies (SUBJ-LAST), which contain (marked) non-subject constituents in sentence-initial pre-verbal position. The SUBJ-LAST strategy comprises basically those examples of subject-verb inversion and subject-extraposition ‒ example (3) below illustrates subject-verb inversion, with the subject complement very great in sentence-initial position and the subject following the verb; example (4), in which the that-clause that for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones functions as the (logical) subject of the sentence and occurs in sentence-final



Diachronic register analysis of markedness 

 313

position, involving the insertion of expletive it in sentence-initial (preverbal) position, exemplifies subject-extraposition.2 (1) [That I had received such from Edward]i also I need not mention ∆i (Austen-180X,187.621) [TOP] (2) […] but [he that thynkethe it a harde thynge to agre to the conclusion,]i it behoueth hymi to shew eyther that some false thynge hath gone before, (BOETHCO-E1-H,99.610) [LFD] (3) […] and very great was [my pleasure in going over the house and grounds]Subject. (Austen-180X,168.182) [SUBJ-LAST, subject inversion] (4) yt was enacted ordeigned and graunted by auctorite of the same p~liament, [that for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones]Subject, (Statutes(II):524) [SUBJ-LAST, subject extraposition]

As already pointed out, the strategies TOP, LFD and the so-called SUBJ-LAST constructions investigated here have been chosen because they are syntactically marked since they do not comply with the default subject-verb(-complement) design. In particular, their markedness is basically due to the location occupied by the subjects, which are not clause-initial when constituents are topicalised (TOP) or left-dislocated (LFD), when verbs and subjects swap positions (subject-verb inversion, a type of SUBJ-LAST) or when the subjects are placed in clause-final position (subject-extraposition, another instance of the SUBJ-LAST construction). Since subject placement is the trigger for these construction, in line with the above-mentioned consequences which the unmarked placement of the subject has for the processing and interpretation of clauses and sentences, in what follows I will provide a very brief overview of the informative and/or communicative properties of the strategies TOP, LFD and SUBJ-LAST. First, TOP merits attention in register analysis because this syntactic strategy involves a specific not only syntactic but also informative arrangement of the clause. Following Virtanen (2004: 80–82) [my italics], Starting points are assumed to be light, small in size, and consist of given information. The reader’s main inferencing effort is expected to take place later in the sentence […]. Secondly, elements placed at the outset of a sentence also help readers anticipate what is to come as they pinpoint what the sentence is about and how it relates to the discourse topic (…). Furthermore, it is occasionally profitable to start with what is regarded as ‘crucial information’

2 TOP, LFD and the constructions within the frame of the SUBJ-LAST strategy have been approached from different perspectives in, for example, Virtanen’s (2010) qualitative scrutiny of sentence openers in narrative texts and in Kreyer’s (2010) paper on sentence-initial locatives in inversion constructions, in which a qualitative perspective on the description of the so-called ‘immediate-observer effect’ function is adopted.

314 

 Javier Pérez-Guerra

[…] Sentence-initial adverbials […] tend to form chains of text-strategic markers which have two basic functions in the discourse. They help create coherence and at the same time they signal text segmentation.

Virtanen thus summarises the informative function of TOP, that is, introducing constituents which do not convey given information in a position which is reserved for given elements according to the given-new principle. This analysis of TOP is in keeping with Prince (1981: 128), who highlights the salient status of topicalised constituents. Prince claims that TOP implies “inference on the part of the hearer that the entity represented by the initial NP stands in a salient partially-ordered set relation to some entity or entities already evoked in the discourse-model”. Furthermore, she contends that “if the entity evoked by the leftmost NP represents an element of some salient set, make the set-membership explicit”. Second, the discourse functions which have been attributed to LFD in the literature can be reduced to two: (i) a ‘simplifying’ function, according to which a constituent conveying discourse-new information can be placed in sentence-initial position, and (ii) a ‘poset’ function. As for the simplifying function, Prince (1997: 138–139) contends that LFD can “simplify discourse processing by removing a Discourse-new entity from a position in the clause which favors Discourse-old entities, replacing it with a Discourse-old entity (i.e. a pronoun)”. In the same vein, Gundel (1985) and Geluykens (1992) claim that LFD introduces a new topic into discourse. On the other hand, Prince (1997: 138–139) maintains that sentences containing left-dislocated phrases “trigger an inference that the entity represented by the initial NP stands in a salient partially-ordered set relation to some entities already in the discourse-model”, and that this favours the so-called poset function. In other words, the left-dislocated constituent resumes a number of referents previously evoked in the sentence by introducing a new expression which activates previous earlier (thus, informatively given or old) referents. In short, like TOP, LFD implies the placement of a new constituent in sentence-initial position, the main difference between TOP and LFD being that the former selects an extralinguistic referent already evoked in discourse and marks it as informatively salient, whereas LFD constituents seldom refer to topics which have already been introduced in the discourse. Third, as already stated, the SUBJ-LAST constructions involve examples of subject-verb inversion and subject-extraposition, illustrated, respectively, in (3) and (4) above. As regards subject-verb inversion, it is commonly acknowledged in the literature (e.g. Green 1980: 583; Birner 1994: 241; Dorgeloh 1997: 46) that the informative principle given-new is not at work in subject-verb inversion, since the preverbal constituent conveys information which is salient in the discourse,



Diachronic register analysis of markedness 

 315

whereas the subject is informatively anti-prominent or, in other words, materialises referents which have already been evoked. In fact, Takahashi (1992: 138) contends that subject-verb inversion fulfils a “Subtopically-Presentational-­Focusemphasizing function”, that is, it accommodates (discourse-new) presentational constituents in sentence-initial position and relegates to sentence-final or postverbal position discourse-given grammatical subjects. Bolinger (1992: 294) emphasises the focusing or presentational effect of inversion when he says that it locates the informatively non-prominent subject almost physically ‘on-stage’. The second SUBJ-LAST construction considered in this chapter is subject-­ extraposition. Its function is claimed to be different from that of subject-verb inversion (see, for instance, McCawley 1988), since, as a newness device, subject-extraposition accommodates informatively new subjects in final position, thus keeping track of given-new. However, the empirical analysis of extraposed subjects from Late Middle to Present-Day English in Pérez-Guerra (2005: 349–350) shows that information structure is not a decisive factor in explaining subject-­ extraposition since 60 to 70 percent of the extraposed subjects in this study are informatively referring and the information conveyed by sentence-medial constituents (mostly subject complements) in the examples of subject-extraposition is less referring in nature than that carried by the extraposed subjects.3 In consequence, it can be concluded that both subject-verb inversion and subject-extraposition are mostly new-given constructions and can be subsumed under SUBJLAST in the present approach. This section has provided a basic characterisation of LFD, TOP and SUBJLAST in terms of information structure. The syntactic marked organisation of these constructions correlates with their deviation from entrenched informative rules such as given-new. In short, informatively new and/or salient constituents are placed sentence-initially in LFD, TOP and SUBJ-LAST structures, where one would expect elements conveying given information, and informatively given subjects are preferred in postverbal and/or final position in the SUBJ-LAST construction type.

3 The data in Pérez-Guerra (2005: 350) confirm that the determinant of subject-extraposition is not end-focus but end-weight. The strategy of extraposition is, then, redistributional in the sense that its main role is to place long clausal subjects in final position and thus preserve the unmarked subject-verb(-complement) pattern from having non-prototypical material in sentence-initial position.

316 

 Javier Pérez-Guerra

2.2 The data and the methodology The data for the present study were retrieved from the following corpora: – the Penn-Helsinki Parsed Corpus of Middle English, second edition (1150– 1500; henceforth PPCME2; Kroch and Taylor 2000), – the Penn-Helsinki Parsed Corpus of Early Modern English (1500–1710; PPCEME; Kroch et al. 2004) – the Penn Parsed Corpus of Modern British English (1700–1914; PPCMBE; Kroch et al. 2010). The periods to be investigated are Middle (ME), Early Modern (EModE) and Late Modern English (LModE), that is, the periods following the initiation of the process of word-order syntacticisation or fixation in English around the default pattern subject-verb(-complement) in declarative clauses. These corpora were selected because, first, they are multi-register and, as noted above, this accommodates the need for representativeness. Second, they are parsed corpora following (almost) identical parsing conventions. These make use of part-of-speech and syntactic tagsets based on what we might call a shallow version of Principlesand-­Parameters. To give an example from the corpora, (5’) plots the graphical adaptation of the parsed version of sentence in (5) from PPCMBE: (5) a serious cheerfulness; that is the right mood in this as in all cases. (CARLYLE1835,2,278.374) (5’) ( (1 IP-MAT (2 NP-LFD (3 D a) (5 ADJ serious) (7 N cheerfulness)) (9 , ;) (11 NP-SBJ-RSP (12 D that)) (14 BEP is) (16 NP-OB1 (17 D the) (19 ADJ right) (21 N mood)) (23 PP (24 P in) (26 NP (27 D this) (29 PP (30 P as) (32 PP (33 P in) (35 NP (36 Q all) (38 NS cases)))))) (40 . .))

(5’) includes part-of-speech tagging (e.g. lexical morphosyntactic categories such as D(eterminer), ADJ(ective), N(oun) or P(reposition)) and syntactic annotation (e.g. phrasal categories such as IP for Inf(lection) phrase ‒ basically corresponding in the Principles-and-Parameters model to the category clause ‒, NP for noun phrase and PP for prepositional phrase, as well as functional labels such as OB1



Diachronic register analysis of markedness 

 317

for object, LFD for left-dislocated constituent and RSP for resumptive, that is, the proform which corefers in the clause with the left-dislocated material). LFD is parsed as such in the corpora, which means that the data can be retrieved automatically by means of specific software. In this case, the raw empirical results of the search had to undergo extensive manual revision. Thus LFD was retrieved by means of the (CorpusSearch) query in (6), which identifies clauses (or IPs) dominating left-dislocated constituents. (6) node: IP* query: (IP* Doms *-LFD)

A very small number of examples of LFD in my database are not nominal,4 as is the case in (7) below, which contains a left-dislocated prepositional phrase and a resumptive pronoun governed by a preposition in the main clause: (7) But of the tree of the knowledge of good and euill, thou shalt not eate of it: (AUTHOLDE2-H,II,1G.155)

By contrast, many of the examples parsed as LFD in the corpora which contain non-(pro)nominal resumptives have not been considered in this study. Examples of such constructions are given in (8) to (10), in which the resumptives are, respectively, then, yet and so: (8) […] but if it worke vpon it selfe, as the Spider worketh his webbe, then it is endlesse, (BACON-E2-H,1,20R.49) (9) […] and though he suffer’d only the name of a slave, and had nothing of the toil and labour of one, yet that was sufficient to render him uneasy; (BEHN-E3-H,193.231) (10) And as these Languages ought to be well understood, so they shou’d be learn’d in as short a Time as may be. (ANON-1711,3.6)

As regards TOP, which was not specifically tagged in the corpora used here, the CorpusSearch queries in (11) and (12) were used to retrieve examples, respectively, of topicalised complements (more specifically, nominal objects, subject predica-

4 A few examples from the database contain TOP and LFD of that-clauses. As regards LFD, since such that-clauses are resumed by a (pro)nominal copy, they fit the concept of LFD as established in this study. An example of a left-dislocated that-clause is given in (i): (i) [That false Locks as they call them of some Hair, being by curling or otherwise brought to a certain degree of driness, or of stiffness, will be attracted by the flesh of some persons, or seem to apply themselves to it, as Hair is wont to do to Amber or Jet excited by rubbing.]i Of thisi I had a Proof in such Locks worn by two very Fair Ladies that you know. (BOYLE-E3H,27E.93)

318 

 Javier Pérez-Guerra

tives5 and prepositional/adverbial complements6 occurring before nominal subjects) and adjuncts (prepositional and adverb phrases preceding nominal subjects). As already pointed out, some of the examples retrieved by the queries had to be excluded manually, since they were not correct instantiations of TOP. (11) node: IP-MAT* query: (IP-MAT* iDoms NP-OB*|NP-SPR) AND (IP-MAT* iDoms NP-SBJ*) AND (NP-OB*|NP-SPR precedes NP-SBJ*) (12) node: IP-MAT* query: (IP-MAT* iDoms PP*|ADVP*) AND (IP-MAT* iDoms NP-SBJ*) AND (PP*|ADVP* precedes NP-SBJ*)

Finally, with respect to SUBJ-LAST, the CorpusSearch query in (13) retrieved matrix IPs or clauses containing at least the following two immediate consti­ tuents: sentence-final noun phrases functioning as subjects and pronominal (expletive) subjects. (13) node: IP-MAT* query: (IP-MAT* iDomsLast NP-SBJ) AND (NP-SBJ iDoms !PRO)

Table 1 provides the raw figures of the distribution of the three constructions under analysis (the TOP data in Table 1 only includes topicalised complements for reasons which will be explained below). Figure 2 sets out the frequencies for LModE normalised to 1,000 clauses (or IPs):

5 An (archaic) illustration of a clause introduced by a topicalised object predicative is Male and female created he them (ERV-OLD-1885,1,20G.66). 6 My database includes only a small number of examples of topicalised prepositional complements (in (i)) and adverbial complements (in (ii)): (i) To them may be applied what St. James says on a like occasion (BURTON-1762,2,5.116) (ii) In the inward Frame the various Passions, Appetites, Affections, stand in different Respects to each other. (BUTLER-1726,235.69)



Diachronic register analysis of markedness 

 319

Table 1: Totals of LFD, TOP and SUBJ-LAST constructions from ME to LModE LFD

TOP

SUBJ-LAST

Clauses

PPCME2 PPCEME PPCMBE

1,638 575 369

1,878 359 352

2,989 611 677

74,092 34,896 60,100

Total

2,582

2,589

4,277

169,088

Figure 2: Normalised frequencies of LFD, TOP and SUBJ-LAST constructions in LModE

Since, as Figure 2 shows, the frequencies of topicalised adjuncts (TOP_adj), as in (14) below, and of complements (TOP_compl), in (1) above, differ greatly, I have opted for focusing exclusively on topicalised complements, whose proportion is closer and thus comparable to that of the LFD and the SUBJ-LAST constructions. In this vein, since the criterion for the distinction between complement and adjunct is syntactic (and semantic) selection by the verb, in what follows I will consider only those examples of topicalised constituents which are subcategorised by the verb (e.g. objects, prepositional complements, adverbial complements, predicative complements). (14) [After that a childe is come to seuen yeres of age,]Adjunct I holde it expedient that he be taken from the company of women (ELYOT-E1-H,23.27)

320 

 Javier Pérez-Guerra

The proportions of LFD, TOP and SUBJ-LAST were analysed in all the registers in the corpora, namely Biography, Diary, Drama, Education, Fiction, Handbook, History, Law, Letters, Philosophy, Science, Sermon, Religious treatises, Travelogue, Trials and Romance. Due to their archaic style and clausal organisation, I did not include Bible texts. Also, given that comparison with other Fiction texts in the latter periods is impracticable, the Fiction material in ME was not analysed. Following Culpeper and Kytö’s (2010: 16–18) typology of registers, those listed above can be argued to provide an overall view of the English language in its recent history: (i) writing-related registers such as Science, Law, Education, Religious treatises, that is, registers which are primarily attested in the written form; (ii) speech-purposed registers, designed to be articulated orally (either read out or performed), like Drama and Sermons; (iii) speech-like texts in the Diaries, Letters and Biographies, which contain features of “communicative immediacy” (Culpeper and Kytö 2010: 17); and (iv) speech-based registers, based on actual real-life speech events, here illustrated by the Trials. The normalised frequencies of the three constructions in all the registers are plotted respectively in Tables 2, 3 and 4. Table 2: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in ME LFD

TOP

SUBJ-LAST

Biography Handbook History Law Philosophy Religious treat. Romance Sermons Travelogue

15.11 16.65 9.25 30.70 44.18 29.32 5.76 29.05 14.79

34.62 11.10 13.26 38.28 16.83 31.95 17.10 31.57 15.09

41.93 19.26 36.30 17.54 21.71 35.12 109.40 23.38 72.72

Mean

21.65

23.31

41.93



Diachronic register analysis of markedness 

 321

Table 3: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in EModE LFD

TOP

SUBJ-LAST

Biography Diary Drama Education Fiction Handbook History Law Letters Philosophy Science Sermon Travelogue Trials

30.33 4.07 4.18 24.29 8.77 33.44 19.55 91.65 14.65 26.20 41.50 32.95 6.47 6.28

13.31 5.63 8.12 10.05 10.96 7.17 17.46 8.15 8.99 16.16 9.94 6.71 5.55 4.63

19.22 25.42 19.77 9.12 59.90 5.30 21.03 3.64 4.50 3.50 16.89 18.17 29.29 12.42

Mean

24.59

9.49

17.73

Table 4: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in LModE

Biography Diary Drama Education Fiction Handbook History Law Letters Philosophy Science Sermon Travelogue Trials Mean

LFD

TOP

SUBJ-LAST

4.34 1.61 1.84 8.97 5.99 6.87 5.29 11.94 2.42 16.27 3.49 29.07 1.13 1.85

3.67 5.37 2.61 5.12 5.99 5.62 4.70 3.47 3.70 15.18 2.33 10.65 4.75 4.51

5.67 5.55 15.96 7.04 54.17 5.31 15.88 13.86 4.12 5.42 3.72 13.10 9.73 0.21

7.22

5.55

11.41

With a view to determining the statistical role of each construction in the periods under investigation, Figure 3 below displays the frequencies of the three constructions and reveals that, in line with the syntacticisation of subject-verb(-complement) word order in English, they all decrease considerably over time. More

322 

 Javier Pérez-Guerra

specifically, whereas LFD accounted for approximately 20 to 25 examples (per 1,000 IPs) in ME and EModE, its normalised frequency is 7 clauses in LModE. As regards TOP, around 23 clauses per 1,000 contained topicalised complements in ME, this normalised frequency being slightly higher than 10 in LModE. Finally, sentence-final subjects are also rare in LModE, when approximately 13 clauses (per 1,000 IPs) belong to the SUBJ-LAST construction type, and this was the preferred pattern at a normalised frequency of 42 in ME. These proportions evince the statistically marked condition of the three syntactic strategies and thus their potential status as markers of other functional or situational roles. I will return to the connection between markedness and situational delimitation in Section 3.

Figure 3: Frequencies of LFD, TOP and SUBJ-LAST in ME (PPCME2), EModE (PPCEME) and LModE (PPCMBE)

3 Analysis of the data In this section I employ what Biber (2013) would call both a ‘linguistic variationist’ approach, in which the register itself is taken as a variable, and a ‘text-linguistic’ perspective, according to which the registers or the texts are the research objects. In other words, the small-scale multifeature analysis which is developed in this chapter aims, first, at describing register variation across time and, second, at profiling the situational or functional roles played by three marked word-order designs in the various registers from which the data were extracted.



Diachronic register analysis of markedness 

 323

The section is organised as follows: 3.1 deals with the distribution of the LFD data. Section 3.2 focuses on the analysis of the TOP examples from the database. Finally, Section 3.3 considers the diachronic progression of the SUBJ-LAST constructions under investigation.

3.1 Left dislocation and register Figures 4, 5 and 6 contain the normalised frequencies (per 1,000 clauses) of LFD in, respectively, ME, EModE and ModE. Table 5 provides an overview of the frequency of LFD per register.7

Figure 4: LFD in ME (the dotted line plots the mean normalised frequency)

7 In the columns containing the registers with lower/higher proportions of LFD, TOP and SUBJLAST in, respectively, Tables 5, 6 and 7 I have included a selection of the registers occurring either before (lower proportions) or after (higher proportions) of the dotted line expressing the mean normalised frequency of the distribution in the figures preceding the tables. As the figures reval, the groups of registers resulting from the classification into those exhibiting more or fewer examples of the constructions under investigation is not neat and, in consequence, in order to determine the connection between register and syntactic markedness I have considered only those registers which are more representative for that purpose.

324 

 Javier Pérez-Guerra

Figure 5: LFD in EModE

Figure 6: LFD in LModE



Diachronic register analysis of markedness 

 325

Table 5: LFD and registers across time LOWER PROPORTIONS

HIGHER PROPORTIONS

ME

Romance

Religious treatises Law Philosophy

EModE

Diary Drama Trials Letters

Science Law

LModE

Travelogue Diary Trials Letters

Philosophy Law

In light of the proportions of sentences containing left-dislocated constituents in initial position in ME, the following conclusions can be reached: first, the registers which are stylistically less literate (Biography, Romance, Travelogue), that is, those which demand on the reader’s part fewer technical understanding skills and linguistic abilities, contain a lower number of examples of LFD and, second, the registers which are stylistically more literate (Law, Philosophy, Religious treatises) contain more examples of LFD.8 The fact that Sermons (and possibly this can also be applied to the type of texts contained in the Philosophy historical registers, with predominant speech-related/purposed status due to the inclusion of the dialogues in Boethius’ De Consolatione Philosophiae) are grouped with the more literate registers implies that the distribution of LFD is conditioned by register literacy (the more literate the register is, the greater the frequency of LFD) and not by the production circumstances associated with either the spoken or the written medium. As for EModE and LModE, the relative proportions of LFD per register are quite similar and reinforce the view that stylistic literacy also seems to be the significant factor in these periods. As shown in Table 5, this tendency is relatively stable across time.

8 The adscription of the historical registers under investigation to the more/less literate options is based on stylistic pervasiveness within the text types. Even though the degree of stylistic hybridity is noteworthy in some of the registers (see my comments in Section 4), in order to determine connections between register and productivity of LFD, I have adhered to the taxonomy ±literate by relying on the style which is dominant in the texts studied.

326 

 Javier Pérez-Guerra

From a theoretical perspective, LFD is a strategy which disrupts the unmarked organisation of the clause. First, as already pointed out, subjects are not sentence-initial in contexts of LFD. Second, the constituents in sentence-initial position in LFD contexts (that is, the constituents which are left-dislocated) do not fulfil a syntactic function within the clause or, in order words, cannot be syntactically integrated with the ensuing clause. In fact, LFD is possibly the only syntactic strategy in English which enables the allocation in a clause of a constituent which is semantically connected with the clause and yet syntactically untethered to it. Consequently, the syntax of LFD leads to the characterisation of this construction as a highly marked syntactic device in English. From this perspective, I will argue below, and at greater length in Section 4, that linguistic markedness can be claimed to be closely connected with functional specificity in register analysis, and that this paves the way for the consideration that LFD is a formal indicator of stylistic literacy, at least in the recent history of English. Couched in the terminology of multidimensional register analysis, LFD can be taken as a linguistic feature which positively contributes to the minus-plus dimension ‘less literate versus more literate’.

3.2 Topicalisation and register Following the outline in Section 3.1, Figures 7, 8 and 9 show the distribution of TOP in, respectively, ME, EModE and LModE in the database. Table 6 summarises the results by classifying the registers into those in which TOP is frequent and those with low levels of TOP.



Figure 7: TOP in ME

Figure 8: TOP in EModE

Diachronic register analysis of markedness 

 327

328 

 Javier Pérez-Guerra

Figure 9: TOP in LModE Table 6: TOP and registers across time LOWER PROPORTIONS

HIGHER PROPORTIONS

ME

Handbook History Travelogue

Religious treatises Biography Law

EModE

Trials Travelogue Diary Sermon

Biography Philosophy History Fiction

LModE

Science Drama Law Biography

Philosophy Sermon Diary Fiction

The distribution of TOP over different registers in ME is considerably more complicated than the partitioning of registers according to the frequencies of LFD, since the families of registers resulting from the grouping in Figure 7 do not lead to an easy explanation in terms of, for example, narrative versus expository status, written versus speech-based nature or dialogic versus monologic character. The binomial condition of less formal versus more formal/literate could possibly constitute the baseline for the assessment of the cline in Figure 7, with less formal registers (for example, Handbook, Travelogue and the speech-purposed



Diachronic register analysis of markedness 

 329

Philosophy texts) in the group of registers containing fewer examples of TOP, and more formal registers (Religious treatises and Law) with many more instances of TOP. Nonetheless, Figures 8 and 9, which provide the information corresponding to, respectively, EModE and LModE, and Table 6, with an overview of the prevailing trends over time, reveal that TOP is no longer a textual marker in Modern English, in that it is a frequent syntactic device found in registers like Law and History, commonly classified as formal registers, and in Fiction or Diary, which are indisputably less formal. The data thus make clear the textually unmarked status of TOP as a functional or situational marker. As mentioned in Section 3.1, in an attempt to give value to the connection between the distribution of formal linguistic features and the situational or functional status of registers, I would like to establish a link between the unmarked textual condition of TOP, resulting from the analysis of the data, and the linguistic characterisation of TOP as a syntactic device in English. Syntactically, TOP involves the promotion of a constituent (either complement or modifier/adjunct) to sentence-initial position, which does not imply the violation of the unmarked subject-verb design of the English declarative clause. In Section 4 I will hold the position that if a given linguistic feature (in this research, a construction type) does not trigger a significant level of linguistic (here, syntactic) markedness, then a blatant functional or situational interpretation derived from the occurrence of the feature will not necessarily be at work. What I will be hypothesising later, although I am aware that this demands further research, is that linguistic markedness runs parallel to consistent functional specificity. If this is indeed the case, it would further emphasise the empirical relevance of multidimensional approaches.

3.3 Subject-last constructions and register This section provides statistical information corresponding to the SUBJ-LAST constructions analysed in this chapter, namely subject-inversion and subject-extraposition. Figures 10, 11 and 12 display the distribution of SUBJ-LAST across time and Table 7 summarises the groups of registers depending upon the frequency of SUBJ-LAST.

330 

 Javier Pérez-Guerra

Figure 10: SUBJ-LAST in ME

Figure 11: SUBJ-LAST in EModE



Diachronic register analysis of markedness 

 331

Figure 12: SUBJ-LAST in LModE Table 7: SUBJ-LAST and registers across time LOWER PROPORTIONS

HIGHER PROPORTIONS

ME

Handbook Law Philosophy

Travelogue Romance

EModE

Philosophy Law

Fiction Diary Travelogue Drama

LModE

Science Trials Handbooks

Drama History Fiction

Both the previous figures and Table 7 show that the frequencies of the SUBJ-LAST constructions investigated in this study are somehow connected to the degree of subject-involvement, as evinced by the registers in the database. In registers such as Law and Science (and many Handbooks in the LModE corpus), which prototypically avoid speaker/writer- or hearer/reader-oriented linguistic features, one finds fewer examples of SUBJ-LAST constructions. By contrast, practically all the registers in the rightmost column in Table 7 (Travelogue, Romance, Fiction, Diary, Drama) would be described as subject-oriented registers in the traditional stylometric literature and do contain many examples classifiable as SUBJ-LAST in this

332 

 Javier Pérez-Guerra

study. Furthermore, such a functional characterisation of the registers which are more prominent as far as the frequency of the variable SUBJ-LAST is concerned is strikingly stable in the periods explored here. The finding reported in the previous paragraph reinforces the connection between, on the one hand, the highly marked syntax of a construction and, on the other, its substantive functional defining role. Two remarks seem in order here: first, SUBJ-LAST constructions by definition wreak havoc on the unmarked syntactic design of English clauses, since their syntactic subjects are placed in final postverbal position. Second, the data reflect that the frequency of SUBJ-LAST is a strong indicator of the degree of participant-involvement of a given register. Briefly, then, syntactic markedness and functional priming have been shown to go hand in hand in the recent history of English also as far as subject-inversion and subject-extraposition are concerned.

4 Summary and concluding remarks This study has drawn on the multidimensional assumption that registers are (basically) linguistic units which can be associated with specific functional, textual and stylistic interpretations, which is in line with Biber and Conrad’s (2009: 1) well-known ‘register perspective’, which “combines an analysis of linguistic characteristics that are common in a text variety with analysis of the situ­ ation of use of the variety”. In this study I have explored the premise that a set of linguistic constructions, in particular three syntactic strategies with marked word-order designs in English, can be taken as markers of the functional, textual and stylistic characterisation of registers. The three constructions investigated here are topicalisation, left dislocation and extraposition. This study has shown, first, that LFD is a linguistic strategy which has been associated with literate registers from ME to LModE. This is a weighty finding, since the connection between LFD and textual literacy is not in keeping with the conversational character which is attributed to LFD in Present-Day English in the literature. To give an example, Biber et al. (1999: 957–958) claim that “Prefaces [LFD] […] are almost exclusively conversational features […] Prefaces are a sign of the evolving nature of conversation”. Second, it was found that TOP can be described as a literacy strategy in ME which has become progressively more textually unmarked in Modern English. Finally, the so-called SUBJ-LAST constructions investigated in this chapter are claimed to feature subject-hearer involvement. I have also suggested that the data serve to illustrate the link between linguistic markedness and situational definition. It was proposed that those con-



Diachronic register analysis of markedness 

 333

structions which are syntactically most marked as far as word order is concerned constitute hallmarks of well-defined situational interpretations of the registers in which they occur at an appropriate frequency. In this respect, since TOP does not significantly alter the unmarked subject-verb(-complement) organisation of the English clause, it has thus been shown not to trigger a register-specific situational interpretation and, as already reported, has been defined as a textually unmarked linguistic device. By contrast, the occurrence of LFD and SUBJ-LAST in sentences which end up exhibiting syntactically marked word-order designs has been related to specific situational interpretations: LFD evinces register literacy and SUBJ-LAST is a marker of subject- or participant-involvement in a register. The study concludes that word-order strategies can be added to the list of linguistic features, units or variables on which register analysis can rely. This notwithstanding, a final remark is in order here to acknowledge the high level of heterogeneity in the registers which the statistical analysis of the texts has identified. First, hybridity in registers is sometimes a formal or a linguistic issue. In this respect, Biber and Finegan (1988: 3) recognise that for some registers “greater linguistic differences exist among texts within the categories than across them” – to give some examples, in this chapter I noted both the speech-related status of some Philosophy texts and the differences in subject-involvement among modern Handbooks. Second, as contended by writers such as Virtanen (2010: 58) when she says that “texts are seldom unitype; text types usually appear in embedded hybridized forms, resulting in multiple texts”, the multidimensional model must be able to encompass the existence of texts and even text types which are not prototypical indicators of a given situational or textual interpretation. Finally, as recognised in Biber and Conrad (2009: Chapter 7), hybridity also underlies the classification of texts into registers  – see also Biber & Egbert (this volume) for an experiment on the classification of (mostly) hybrid internet registers. Virtanen (2010: 76) also refers to this when she says that “[o]ne and the same text type can be put to use in very different genres [registers], and one and the same genre easily manifests texts that can be related to very different types”. The model would thus benefit from the statistical analysis of individual texts by means of factorial or logistic regression techniques. To conclude, two issues have been left for further research. On the one hand, the validity of the findings in this study should be tested by extending the time span of the investigation to include Present-Day English data. In this respect, parsed corpora of contemporary English would provide empirical evidence of the issues raised in this chapter. On the other hand, a key issue in historical register variation, one pointed out in Biber and Conrad (2009: 166), is the distinction between language change and register variation. As recognised in Lijffijt et al. (2012), the null assumption in diachronic textual studies has usually been that

334 

 Javier Pérez-Guerra

a single-register corpus provides homogeneous linguistic data over time with regard to unique functional or situational implications. Were this the case, variation in corpus studies would lead straightforwardly to the observation of general diachronic change in language. By contrast, if the defining linguistic and/or stylistic features of registers were claimed to be subject to change over time, then linguistic register variation would not necessarily imply diachronic change of the language’s grammar. This leads us to the conclusion that corpus-based register analysis will benefit from fine-grained analyses of the data in order to detect quali­ tative inconsistencies which are, on occasions, blurred by the statistical results.

References Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas. 1995a. Dimensions of register variation. A cross-linguistic comparison. Cambridge: Cambridge University Press. Biber, Douglas. 1995b. On the role of computational, statistical, and interpretive techniques in a multi-dimensional analysis of register variation. A reply to Watson. Text 15(3). 341–370. Biber, Douglas. 2013. Register as a predictor of linguistic variation. Paper presented at ‘Register revisited: New perspectives on functional text variety in English’ International Conference, University of Vechta, 27–29 June. Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press. Biber, Douglas & Jesse Egbert. This volume. Towards a user-based taxonomy of web registers. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. London: Longman. Birner, Betty J. 1994. Information status and word order: An analysis of English inversion. Language 70(2). 233–259. Bolinger, Dwight. 1992. The role of accent in extraposition and focus. Studies in Language 16(2). 265–324. Culpeper, Jonathan & Merja Kytö. 2010. Early Modern English dialogues: Spoken interaction as writing. Cambridge: Cambridge University Press. Dorgeloh, Heidrun. 1997. Inversion in modern English: Form and function. Amsterdam: John Benjamins. Dorgeloh, Heidrun. This volume. The interrelation of register and genre in the medical register. Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 1–26. Berlin: Mouton de Gruyter. Fairclough, Norman. 1992. Discourse and social change. Cambridge: Cambridge University Press. Geluykens, Ronald. 1992. From discourse process to grammatical construction: On left-dislocation in English. Amsterdam: John Benjamins. Green, Georgia M. 1980. Some wherefores of English inversion. Language 56. 582–601.



Diachronic register analysis of markedness 

 335

Gundel, Jeanette K. 1985. ‘Shared knowledge’ and topicality. Journal of Pragmatics 9(1). 83–107. Halliday, Michael A. K. 1978. Language as social semiotic. London: Edward Arnold. Kreyer, Rolf. 2010. Syntactic constructions as a means of spatial representation in fictional prose. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 277–303. Berlin: Mouton de Gruyter. Kroch, Anthony & Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition. Kroch, Anthony, Beatrice Santorini & Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English. Kroch, Anthony, Beatrice Santorini & Ariel Diertani. 2010. Penn-Helsinki Parsed Corpus of Modern British English. Lijffijt, Jefrey, Tanya Säily & Terttu Nevalainen. 2012. CEECing the baseline: Lexical stability and significant change in a historical corpus. In Jukka Tyrkkö, Matti Kilpiö, Terttu Nevalainen & Matti Rissanen (eds.), Studies in variation, contacts and change in English. Vol. 10: Outposts of historical corpus linguistics: From the Helsinki Corpus to a proliferation of resources. Helsinki: University of Helsinki (Research unit for Variation, Contacts and Change in English). http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_ nevalainen (accessed 9 February 2015). McCawley, James D. 1988. The syntactic phenomena of English. Vols. 1, 2. Chicago: The University of Chicago Press. Pérez-Guerra, Javier. 2005. Word order after the loss of the verb-second constraint or the importance of Early Modern English in the fixation of syntactic and informative (un-) markedness. English Studies 86(4). 342–369. Prince, Ellen F. 1981. Topicalization, focus-movement, and Yiddish-movement: a pragmatic differentiation. In Danny K. Alford Karen, Ann Hunold & Monica A. Macaulay (eds.), Proceedings of the Seventh Annual Meeting of the Berkeley Linguistics Society, 249–264. Berkeley: Berkeley Linguistics Society. Prince, Ellen F. 1997. On the functions of left-dislocation in English discourse. In Akio Kamio (ed.), Directions in functional linguistics, 117–144. Philadelphia: John Benjamins. Schubert, Christoph. This volume. Introduction: current trends in register research. Swales, John M. 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Taavitsainen, Irma. 2001. Changing conventions of writing: The dynamics of genre, text types, and text traditions. European Journal of English Studies 5(2). 139–150. Takahashi, Kunitoshi. 1992. Constructionally presentational sentences. Lingua 86. 119–148. Virtanen, Tuija. 2004. Point of departure: Cognitive aspects of sentence-initial adverbs. In Tuija Virtanen (ed.), Approaches to cognition through texts and discourse, 78–97. Berlin: Mouton de Gruyter. Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 53–84. Berlin: Mouton de Gruyter.

Index academic writing 4, 8, 10, 137–138, 139–165, 169–191, 195, 200–201, 206–211, 215–217, 221, 223–247, 251, 254–269 air traffic control communication 67–73, 75, 79–80, 82–83 air traffic management 69–70, 75, 79 attention system 169, 178–179 attenuation effect 172, 190 audience participation 282, 284, 296, 300, 302 automatic register/genre identification 20, 23, 39 Aviation English 10, 17, 67–83 brackets 137, 139, 142, 147, 150–153, 156–157, 160–162, 164–165, 173 cognitive linguistic(s) 111, 113, 130 cognitive representation 113, 121, 124, 129–130 cognitive semantics 169–170, 173, 176, 178, 191 cohesion/cohesive 3, 114–115, 120, 138, 196, 202–205, 209, 213–218, 245 comic 2, 10, 137, 139, 153–165 comma 139–140, 142, 150–157, 160, 164, 185 conceptual metaphor/conceptual metaphor theory (CMT) 88, 221, 224–226, 227, 230–234, 238, 240, 246 conjunction 138, 195, 203–205, 209–215, 217, 311 contrastive linguistics 7, 10 cross-cultural/cross-linguistic register 2, 7, 222, 268, 271, 273–274, 298, 301 description 9, 24, 26, 28–38 diachronic 1, 5–6, 10, 88, 221–222, 307–334 dialect 2, 4, 6–7, 82–83, 102, 196–197, 237, 288, 290, 299 discourse hybridity 17, 44, 49, 52, 55–56, 62 discourse type 44, 47, 62 discussion 24–25, 28–29, 31–32, 34, 37–38 dislocation 222, 307, 312, 323, 332, see also left dislocation

divided attention 189–191 double referentiality/doubly referential 122, 131–132 dual nature 181, 189 electronic communication 275 electronic media 276, see also medium, electronic medium electronically-mediated 271, 278, 300 exclamation mark 118, 139, 142, 147–153, 155–158, 162–164 extraposition 9, 222, 307, 312–315, 329, 332 face-to-face conversation 2, 72–73, 83, 255 football language 272, 274, 279–280, 282, 286, 288, 299–300 frame 130–133 genre 1, 2, 4–5, 8, 17, 20–21, 23, 43–62, 88, 95, 123, 142, 163, 189, 227, 253, 271, 275, 300, 307–309, 333 hip-hop 10, 17–18, 87–109 hybrid(ity) 33, 43–45, 49–53, 55–59, 271, 325, 333 ––hybrid register 19, 22–23, 27–28, 30–32, 36–40, 44, 62, 222, 272 ICE see International Corpus of English (ICE) illness blog 17, 43, 48–52, 58–61 infotainment 271–272 intercultural communication 2 International Corpus of English (ICE) 138, 195–196, 199–201, 203, 205, 218, 221, 223, 228–229, 231–234, 237–247, 251, 257, 261–269, 288 internet/web 9–10, 17, 19–40, 50–52, 89, 92, 113, 151, 163, 218, 222, 230, 247, 271–302, 333 intertextual/intertextuality 18, 111–133, 292, 299, 301 inversion 100, 148, 307, 312–315, 329, 332 left dislocation (LFD) 312–326, 328, 332–333

338 

 Index

lexical density 138, 195, 203, 205, 208–217, 288 lyrics 18, 25–26, 30–31, 87–109 marked(ness) 49, 57, 83, 118, 129, 152, 155, 158, 161, 200, 222, 307–334 MDA see multidimensional analysis (MDA) medical case report 17, 50, 52–53, 57, 59, 61 medical discourse 17, 43–62 medium 4, 6–7, 9, 57, 72, 91, 114, 138, 145, 170, 172–175, 180–181, 195–219, 222, 279, 294 ––electronic medium 275, 300 ––medium of print 176, 188–189 ––spoken medium 181, 215 ––written medium 174, 182, 196, 202, 210, 293, 299, 325 metaphor(ical) 2, 5, 10, 88, 119, 187, 189, 221, 223–247, 290, 299, 301, see also conceptual metaphor/conceptual metaphor theory (CMT) multidimensional 6–7, 139, 164, 253, 307, 309–312, 329, 332–333 multidimensional analysis (MDA) 4, 6, 9–10, 253, 326 narration 9, 31, 37–38, 44, 48, 52, 56–59, 139, 154, 300 narrative/narrativity 24–25, 28–40, 43–45, 47–62, 160, 177, 272, 274–275, 277, 328 New English(es) 7, 10, 221, 223, 252, 257, 268 newspaper writing 2, 139, 147–148, 222, 293 noun phrase (NP) 9–10, 103, 114, 129–130, 144, 221, 251–269, 312, 314, 316, 318 noun phrase complexity/NP complexity 221, 251–269 opinion 24–26, 29, 31–40, 177, 284, 300 OTCs see real-time online text commentaries (OTCs) paratext(ual) 274, 279, 295, 300 parenthetical construction 137, 151–152, 169, 170–175, 179–186, 189–191 persuasion 9, 26, 28, 30–33, 36–39, 265 plain Aviation English 10, 17, 67–83

popular music/pop songs 2, 18, 87–89, 91–92, 94–96, 98–99, 105, 125, 127–128 pronominal reference 56, 59–60, 202–203, 205, 216, 312 pronoun 9, 49, 56, 59, 93, 114, 120, 138, 143, 147, 195, 203–212, 214–216, 227, 252, 255–258, 314, 317 ––personal pronoun 60, 88, 93, 103–105, 114, 185, 203, 206–207, 255–256, 261–262, 265–266 punctuation 90, 119, 137, 139–165, 171, 173, 175, 183 quasi-conversation 282, 284, 300 question mark 118, 142, 146–153, 155–158, 162–165 raters 19, 23, 27, 30–39 real-time online text commentaries (OTCs) 2, 10, 222, 271–302 reference/referential 59, 122–124, 131–132, 180, 186, 203 regional variation 6–7, 10, 138, 196, 215, 219, 251–252, 255, 261–262, 268 SFL see Systemic Functional Linguistics (SFL) sociolect 3 sociolinguistic approach 3, 6–7, 112, 138 specialised registers 10, 17, 67–83 spoken mode 9, 24–25, 137, 172–173, 179, 181, 189, 255, 265, 275, 293 standard(s) of textuality 115, 122 standardised phraseology 17, 67, 70–83 style 1, 4–5, 123, 139, 145, 150, 155, 158, 163, 179, 183, 210–211, 262, 271, 273–279, 284, 295, 300–302, 308–310, 320 sub-register 4, 17–18, 19–40, 44, 73–74, 88, 105, 111, 122–133, 142, 176, 221, 223–247, 257 suspension dots 142, 152, 154, 156–157, 160–165 synchronic 221 Systemic Functional Linguistics (SFL) 3, 8, 196–198 teaching 8, 87 text 1–5, 8–9

Index 

 339

time adverbials 49, 56–59, 143 topic 3, 8, 33, 48–50, 58, 60–62, 74, 80, 82–83, 92–95, 114, 176–177, 237–238, 242, 272 topicalisation (TOP) 222, 307, 312–334 Twitter 271–274, 283–284, 296, 301

––regional variety 1, 195, 217–218, 221, 251–254, 262, 267–268 ––text(ual) variety 1, 7, 44, 50, 54, 56, 111, 227, 309, 332 variety-specific 198–199, 205, 231, 239–240, 242, 246–247, 259, 267, 269

unmarked 60, 119, 129, 152, 313, 315, 326, 329, 332–333

web see internet/web word order 9–10, 144, 222, 252, 307, 312, 316, 321–322, 332–333 World Englishes 6, 223–224, 252 written mode 24–25, 113, 172, 181, 190, 271

variational text linguistics 1, 221 variety 2–11, 23, 40, 43–62, 71–78, 82–83, 138–139, 142, 158, 195–219, 221, 223–247, 251–269, 272, 308–309, 332