169 46 17MB
English Pages 196 Year 2016
‘The Conditioned and the Unconditioned’ Astronomy ‘playne and simple’
Late Modern texts on1700 philosophy The writing of English science between and 1900
EditedbybyIsabel IsabelMoskowich, MoskowichGonzalo and Begoña Crespo Edited Camiña Rioboo, Inés Lareo and Begoña Crespo
cd-rom(CETA) Corpus of Englishincluding Texts on Astronomy A Corpus of English Compiled by Isabel Moskowich, Inés Lareo Martín, Philosophy Gonzalo Camiña Rioboo and BegoñaTexts Crespo (CEPhiT)
John Benjamins Publishing Company John Benjamins Publishing Company
‘The Conditioned and the Unconditioned’
‘The Conditioned and the Unconditioned’ Late Modern English texts on philosophy including a CD-Rom containing
A Corpus of English Philosophy Texts (CEPhiT) Edited & compiled by Isabel Moskowich Gonzalo Camiña Rioboo Inés Lareo Begoña Crespo University of A Coruña
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
As of late 2018, the Corpus of English Philosophical Texts (CEPhiT) is also accessible online at the Repositorio Universidade Coruña - http://hdl.handle.net/2183/21847
doi 10.1075/z.198 Cataloging-in-Publication Data available from Library of Congress: lccn 2015041505 isbn 978 90 272 1229 0 (Hb) isbn 978 90 272 6217 2 (e-book) Cover photograph: Rob Hurson (CC BY-SA) https://www.flickr.com/photos/robhurson/14153958438 © 2016 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · https://benjamins.com
Table of contents
Acknowledgements Introduction chapter 1 Philosophers and scientists from the Modern Age: Compiling the Corpus of English Philosophy Texts (CEPhiT) Isabel Moskowich chapter 2 Genre categorisation in CEPhiT Begoña Crespo chapter 3 Editorial policy in the Corpus of English Philosophy Texts: Criteria, conventions, encoding and other marks Gonzalo Camiña and Inés Lareo chapter 4 Infrastructure for analysis of the CEPhiT corpus: Implementation and applications of corpus annotation and indexing Andrew Hardie chapter 5 On the shoulders of giants: An overview on the discussion of science and philosophy in Late Modern times Marina Dossena chapter 6 Abstractness as diachronic variation in CEPhiT: Biber’s Dimension 5 applied Leida Maria Monaco
vii ix
1
25
45
61
77
99
vi
‘The Conditioned and the Unconditioned’. Late Modern English texts on philosophy
chapter 7 Authorial presence in late Modern English philosophical writing: Evidence from CEPhiT Elena Seoane chapter 8 The status of seem in the nineteenth-century Corpus of English Philosophy Texts (CEPhiT) Francisco Alonso-Almeida and Inés Lareo
123
145
chapter 9 Explaining the use of if… then… structures in CEPhiT Luis Puente Castelo
167
Index
181
The CD contains the following: The manual (a pdf file you should read first) Clientlauncher.jar (the executable programme) Cephit.cct (the corpus for Mac and Linnux users) Cephitwindows.cct (the corpus for Windows users) EULA license A DTD folder A LIB folder A TTF folder (free Dejavu fonts) Dejavu fonts license Requirements: At least 1GB of RAM, 1,5GHz processor, 500MB of free space in you hard disk. Java Runtime Environment 1.6.0 or a later version. DejaVu fonts (also provided in this CD) The CD-ROM, provided with the print edition of the book, contains: the manual; Clientlauncher.jar; Cephit.cct (the corpus for Mac and Linnux users); Cephitwindows.cct (the corpus for Windows users); EULA license; a DTD folder; a LIB folder; a TTF folder (free Dejavu fonts, and the font license). As of late 2018, the Corpus of English Philosophical Texts (CEPhiT) is also accessible online at the Repositorio Universidade Coruña http://hdl.handle.net/2183/21847
Acknowledgements
We hereby acknowledge the support from the University of A Coruña for granting us funding for books and for attending conferences during the period 2009–2012. It was at those conferences that we met both national and international colleagues interested in our work who have always supported and encouraged us. We want to express our gratitude to them for their comments and feedback. We are especially indebted to the Faculty of Philology of our University for providing us with all the necessary facilities and a wonderful library to carry out the research here presented. Last but not least, we want to thank the working team of MuStE (Research Group for Multidimensional Corpus-based Studies in English) for their enthusiastic collaboration in the compilation of the Corpus of English Philosophy Texts (CEPhiT): Iria Bello Viruega, María José Esteve Ramos, Paula Lojo Sandino, Leida Maria Monaco, Ana Montoya Reyes, Luis Miguel Puente Castelo, Leticia Regueiro Naya, Estefanía Sánchez Barreiro and Sofía Zea Alvarez.
The editors
Introduction
Following the Corpus of English Texts on Astronomy, CETA (published by John Benjamins in 2012), the Research Group for Multidimensional Studies on English (MuStE)1 has compiled a similar corpus containing samples of texts on philosophy written in English during the late Modern period (1700–1900). This new Corpus of English Philosophy Texts (CEPhiT) in its beta version was the basis for the pilot studies presented in this book. The idea that every scientific field is likely to have its own writing traditions and restrictions in terms of conventions pervades the volume and the subcorpus itself. That is why the Coruña Corpus of English Scientific Writing (CC) is formed by a collection of several subcorpora, each of them containing samples of texts published between 1700 and 1900 and each corresponding to a different scientific discipline. Philosophy is the second discipline selected for the compilation of scientific texts and, consequently, the second subcorpus in the Coruña Corpus. The reasons behind the compilation principles of the whole Coruña Corpus have been extensively dealt with elsewhere (Moskowich & Crespo, 2007; Crespo & Moskowich, 2010) and those specific for CEPhiT can be found in chapter one in this book (see also Moskowich, 2011). As for this volume, it has been divided into two clearly different parts: the first part is an account of the general situation and context of the texts as they were produced, the theoretical and practical decisions for compilation as well as the general characteristics that make of CEPhiT something different. In the second part, the second set of essays deal with different research questions mostly, although not only, pragmatically-oriented, that can be answered by using this subcorpus. Therefore, the first part contains Chapters 1 to 4. In Chapter 1, Isabel Moskowich offers, as already mentioned, a detailed account of the compilation principles applied to the Corpus of English PhilosophyTexts so that any researcher is aware of the reasons behind the development of what he or she is using and, therefore, can decide how to exploit it. This chapter describes each of the parameters the compilers wanted to enhance in some detail except, perhaps, that of
1. www.udc.es/grupos/muste
x
‘The Conditioned and the Unconditioned’. Late Modern English texts on philosophy
genre. The latter is the topic of Chapter 2 by Begoña Crespo. In it, she provides an account not only of the genres contained in CEPhiT and how texts can be grouped in sets according to this variable, but she also explores the history of such genres in English and how each of them has a particular function and use depending, more often than not, on external and social factors. Chapter 3, by Inés Lareo and Gonzalo Camiña, two of the compilers and members of the Muste group, presents some of issues we have had to deal with as editors of the texts. It was not only a question of which particular extract of a text to include but also a question of how to reproduce the peculiarities of late Modern English writing. This chapter offers besides a general overview of the search engine which runs with each subcorpus within the CC, the Coruña Corpus Tool. Its utilities are largely explored in this part of the volume. An excellent complement for Chapter 3 is the one written by Andrew Hardie, a researcher from the University of Lancaster. He offers the clues for the versatility of CEPhiT as it can be used with different information retrieval tools as is the case with CQPWeb. Chapter 5, by Marina Dossena, works as a bridge to the second part of the book where essays are of a clearer applied nature. As an expert in late Modern English, she offers an overview of the period and how new terms relating to science and philosophy are coined (and registered in the OED). She also compares CEPhiT with the Corpus of Modern Scottish Writing (CMSW) since both cover approximately the same time-span. The rest of the works in this volume, then, are pilot studies exploring different aspects of the English language as presented in philosophical texts. Such is the case of the paper analysing different linguistic features considered to provide texts with abstractness so characteristic of scientific writing. In this Chapter 6, Leida Maria Monaco explores the way in which the features contained in Biber’s dimension 5 are realised. Her analysis resorts to the different variables contained in the metadata files for each sample, namely, genre, sex of the author and time, focusing especially on the last one. With a research interest similar to Monaco’s, Elena Seoane looks into authorial presence in some of the samples contained in CEPhiT in Chapter 7. In her microanalysis the author studies the use of personal pronouns, the rhetorical functions of verbs used with these pronouns and other linguistic devices to ascertain authorial involvement in philosophy writing. Francisco Alonso and Inés Lareo concentrate on the use of the verb seem in the nineteenth-century texts of the corpus. This pragmatic account of the texts analysed is illustrated with numerous examples that constitute good evidence of how authors writing on philosophical topics made use of devices transmitting evidentiality and/or epistemic modality.
Introduction xi
The volume concludes with a chapter on a syntactic topic written by Luis Puente. The author focuses his attention on the conditional structure if … then, its functional evolution in eighteenth- and nineteenth-century philosophy texts, its conditions of use and the still possible scholastic influence as a reason to justify its predominance over if structures. This book is the result of the combined efforts of both compilers and individual authors who have kindly accepted to contribute a chapter. If the part concerned with the description of the corpus has not been convincing enough of the many possibilities CEPhiT offers its potentiality as material for linguistic research, the pilot studies here presented, and others by future researchers, could serve to ratify this is actually so. As Cheyne (1705: iii), one of the authors contained in CEPhiT, expressed in his preface to Philosophical principles of natural religion (…): “If my Performance excite others, of more Leisure and Capacity to do it the Justice it deserves, I have obtain’d the end of my Ambition”.
The editors
Chapter 1
Philosophers and scientists from the Modern Age Compiling the Corpus of English Philosophy Texts (CEPhiT) Isabel Moskowich
University of A Coruña
1. Introduction Writing conventions, along with communication in general, have undergone considerable changes over the last two decades. The irruption of the Internet in the academic world is perhaps comparable in terms of the strength and extent of its effects to the scientific revolution that took place at the beginning of the Modern Era. Driven increasingly by an ethos of “publish or perish”, science now operates more than ever in the written mode, so that any idea that remains unpublished might as well not exist at all. However, it is not only the medium used to transmit scientific knowledge that has changed (electronic journals replacing paper ones) but also the way in which words, structures and conventions in general are themselves used in each discipline of science. The changes which occur in writing practices are not necessarily random; such practices are historical artefacts which can be traced back to the 1600s (Hyland, 1998: 18) and which, over the course of time, have been subject to changing discursive rules as they adapted to meet the requirements of the times. From the seventeenth century onwards, and with the increase of literacy, different types of readership emerge in response to different discursive patterns. One of the main consequences of the shift to Modern Science during the seventeenth century (Valle, 1999; Hoskin, 1999; Beal, 2004) was a radical change in the way knowledge and technical advances were conveyed. The Coruña Corpus of English Scientific Writing (henceforth CC) tries to reflect this, and includes samples of printed texts belonging to different domains in which language and discourse are used by scientists as a way of negotiating knowledge.
doi 10.1075/z.198.01mos © 2016 John Benjamins Publishing Company
2
Isabel Moskowich
There seems to be a general human need for people to classify things around them, and the realm of knowledge itself is no exception here. For this reason taxonomies and terminology have been repeatedly reinvented and replaced over the course of time, as history and historical context provoked changes which led to the demand for new perspectives on knowledge and events. The very notion of philosophy, understood as the “advanced knowledge or learning, to which the study of the seven liberal arts was regarded as preliminary in medieval universities” (OED), would itself not survive unchanged as a science or field of knowledge into the Modern Age, but would be redefined and subdivided differently at different times. Many European universities, for example, adopted a threefold division into natural, moral and metaphysical philosophy, whereas in other cases and other institutions the term philosophy would be employed more broadly and generally to make reference to other subjects leading to the degree of Master of Arts. The use of the term relating to the seven liberal arts seems to decline throughout the eighteenth century (OED), entailing a re-classification and greater specialisation of disciplines, as illustrated by the fact that the term natural philosophy would come to be replaced by others such as biology, zoology, etc. in the following century (see Chapter 2). Examples in Ayenbite of Inwit (1340) confirm that at the time the English word philosophy already denoted the branch of knowledge dealing with the principles of human behaviour, the study of morality, and ethics. Later, under the influence of French philosophy, the term was narrowed; by the eighteenth century philosophy was no longer seen as the study of the whole Universe, but of rational thought, as an opposition to reveal knowledge and religion.1 The interest of the Corpus of English Philosophy Texts (CEPhiT) lies primarily in how ideas were transmitted, and although modern philosophy had by this time abandoned to a large extent the notion of knowledge as a divine gift, slowly turning to the rationalistic and empiricist tendencies coming from Europe and from England itself, the Modern English period was still one of prescriptive tendencies (Valle, 1999; Moessner, 2001). Part of this changing mode of thought can be seen in the samples compiled in CEPhiT, one of the sub-corpora of the CC.2 Its characteristics, as reflected in the specific writing conventions adopted by writers of the period, will be described in the pages that follow.
1. Burke (1770), one of the authors in CEPhiT, uses the term precisely in this sense. 2. The Coruña Corpus is a long-term project and is being completed in stages. The Corpus of English Philosophy Texts (CEPhiT) is the second part.
Chapter 1. Compiling CEPhiT
2. General outline The ways in which the CC as a whole is intended to complement other diachronic and domain-specific corpora in terms of time-span and register have been set out extensively elsewhere (Crespo & Moskowich, 2006; Moskowich, 2012a; Moskowich & Crespo, 2007). Our aim in compiling the specific corpus described here is to make it possible for scholars to explore the negotiation of knowledge between authors and readers as well as to study the changing conventions and linguistic strategies used to this end. As already mentioned, Hyland (1998: 18) affirms that the broad linguistic practices found in scientific texts can be dated from the 1600s. Authors at the time, most of them members of the Royal Society of London, saw the need to establish certain discursive rules to separate the exposition of hypotheses and of proven facts (Allen, Qin & Lancaster, 1994; Gotti, 1996). The application of these new rules entailed the birth of new formats and genres reflecting new trends of mind (see Chapter 2). The texts on philosophy gathered in CEPhiT reflect the importance of the observation of phenomena as well as the replacement of authoritative statements by the deductive method, although perhaps less clearly so than in other disciplines. The transition from the scholastic tradition, however, is gradual; indeed, our corpus includes texts on moral philosophy that are deeply and evidently indebted to Scholasticism. The influence of the Reformation can be seen in the movement away from science as being enshrined in the words of unquestionable authorities to the opening up of science to new approaches. This is reflected, for instance, in the work by Greene (1727), one of the authors in CEPhiT, influenced by the increasing importance of observation and experimentation to confirm facts. Contrasting with this, we have also compiled samples such as those by Mary Astell and Wollstonecraft, which show a completely different way of thinking and champion more radical ideas. The relationship between philosophy and society is also manifested in several works included in CEPhiT, and is perhaps seen more clearly here than in other fields. The work by George Cheyne (1705), Philosophical principles of natural religion: containing the elements of natural philosophy, and the proofs for natural religion, arising from them, for example, well illustrates this need to present evidence for claims made, rather than to rely on the supposedly veridical statements of revered, ancient scholars.
3
4
Isabel Moskowich
3. The rationale behind CEPhiT The classification of knowledge we are familiar with today is different from that of the late Modern period (1700–1900). This different taxonomy is one of the principal difficulties encountered in the selection of representative samples of scientific language, and for this reason the classification of disciplines published by UNESCO in 1988 was used as a starting point for all the sub-corpora in the CC. Table 1 sets out the disciplines chosen for compilation. Some of them have been re-allocated, since in these cases there is no exact correlation between the present-day conception of a scientific field and the one existing in the Modern period. The degree of branching and specialisation of present-day science cannot be reflected in the compilation of eighteenth- and nineteenth-century texts. Therefore, Table 1 illustrates the distribution of disciplines proposed for the CC and the different corpora being compiled: Table 1. Disciplines and sub-corpora in the Coruña Corpus Field
UNESCO disciplines
Coruña Corpus discipline
Subcorpus
Natural sciences
Astronomy
Astronomy
CETA
Biology Botanics Zoology
Life sciences
CELiST
Physics
Physics
CETePH
Biochemistry Chemistry
Chemistry
CECheT
Philosophy History of science and technology
Philosophy
CEPHiT
History Archaeology Numismatics Palaeography Genealogy
History
CHET
Modern languages
Linguistics
CETeL
Humanities
The CEPhiT is the second subcorpus within the CC, following the Corpus of English Texts on Astronomy (CETA) (Moskowich et al., 2012). Since each of the sub- corpora in the CC is devoted to a different scientific discipline or domain they can be considered specialised (Connor & Upton, 2004). In line with all the material in the CC, CEPhiT includes 10,000-word samples of prose texts together with one
Chapter 1. Compiling CEPhiT
metadata file per sample. All items that are non-analysable linguistically (figures, tables, formulae, etc.) have been excluded from the extracts. Metadata files, in turn, contain information on the life and socio-historical context of the author as well as a description of the main characteristics of the text compiled, including cross-references to other texts in the CC. Factors relating to extra-linguistic variables such as age, sex, place of education of authors and genre/ text-type of each of the compiled sample are also recorded in the metadata files (Crespo & Moskowich, 2010). Both text samples and metadata files have been encoded in XML format following TEI guidelines. The two basic ideas pervading the whole CC, as already detailed for CETA (Moskowich & Crespo, 2012a) are balance and representativeness (McEnery & Wilson, 1996; Biber et al., 1998: 251–253), and these, hence, are reflected in the compilation of CEPhiT. Since the compilation principles of the CC in general, as well as some of its sub-corpora, have been dealt with elsewhere (Moskowich 2009a, 2009b, 2010, 2011, 2012a; Crespo & Moskowich, 2008; Moskowich & Parapar, 2008; Moskowich & Crespo 2004, 2007, 2010, 2012a) these will not be discussed further here. Our previous experience with CETA and the different pilot studies we have published using it (Lareo, 2009; Bello, 2010; Crespo & Moskowich, 2010; Alonso, 2012; Banks, 2012; Cantos & Vázquez, 2012; Gray & Biber, 2012; Lareo, 2012; Moskowich, 2012c; Biber & Gray, 2013; Camiña, 2013; Crespo, 2013; Moskowich, 2013; Puente-Castelo & Monaco, 2013; Moskowich & Monaco, 2014) have confirmed that 1,000-word samples are not really enough for the study of variation within the scientific register (Biber, 1993), largely because many of the samples contained in our corpus are not technical or scientific in the same sense as those we might find in present day English, the scientific register not being as standardised then as it is nowadays. Since the beginning of the project in 2003, then, we have aimed at compiling two 10,000-word text files per decade, and thus each century in CEPhiT is represented by approximately 200,000 words. Of the two samples drawn from each decade, we have resorted to first editions when possible. When this was not possible, and assuming that language change can be observed within 30-year periods (Kytö, Rudanko & Smittenberg, 2000: 92), we have chosen samples that were published less than thirty years after the work’s initial date of publication. In this way, CEPhiT shares the structure and mark-up conventions used for the whole project, which have proved to be extremely useful for research purposes in that idiosyncrasies typical of particular authors and arising from the problems of translation have been avoided. As with CETA, we have tried to collect extracts from different parts of the works sampled, excluding prefaces and dedications, which we do not consider to be consistently scientific in their content. Introductions, central chapters and
5
6
Isabel Moskowich
conclusions are more or less equally represented. We have also tried to compile a similar number of words and samples for each century, arriving at a total of 200,022 words for the eighteenth century and 201,107 for the nineteenth. However, not all genres/text types or other variables, such as sex or place of education of the author, are equally represented, given that their presence in the corpus is itself a mirror of text production at the time and not a question of authorial choice. Table 2 below shows the overall distribution of words: Table 2. Words in CEPhiT Eighteenth century
200,022
Nineteenth century
201,107
It is worth noting that selection has often been determined by the availability of texts, although in the last few years an increasing number of copyright free images of texts has become available. All the information regarding the samples and their sources is provided in a Section called “about the text” in the metadata files. 4. Time-span represented CEPhiT covers the same time-span as the CC and its other sub-corpora. The time limits set for text selection are based on extra-linguistic considerations such as certain landmarks in scientific thought, rather than on landmarks in language change. Indeed, it is changes in scientific thought that inevitably lead to changes in the way in which knowledge is conveyed in the discourse of science (Moskowich & Parapar, 2008). CEPhiT has been compiled, then, by selecting samples of texts published between 1700 and 1900, that is, the late Modern English period (Rissanen, 2012: 147).3 The initial date here reflects the outburst of the scientific revolution, the foundation of the Royal Society of London, and with it, of course, the publication of basic guidelines on how to present scientific works to the members of the Royal Society based on the notions of clarity and simplicity. 3. Alternative dates, such as 1660, 1725, 1776 or even 1800 (Görlach, 1994: 22), have been posited as the point of transition between early and late Modern English. Other authors (van Gelderen, 2006: 11) have proposed 1750 as the starting point for Modern English, a period streching until today.It is true that from the eighteenth century English scholars tend to use prescribed forms regardless of their dialectal origin, and that regional and social dialects now come to be considered inferior (Freeborn, 1992: 180). Also, it is in the eighteenth century that we observe an explosion in the publication of all sorts of pamphlets, grammars and articles aimed at linguistic improvement.
Chapter 1. Compiling CEPhiT
As can be seen in Table 3 below, the earliest texts date back to 1700 (Mary Astell) and 1705 (George Cheyne), a moment at which the old epistemological patterns of Scholasticism are suffering a radical transformation (Taavitsainen & Pahta, 1997). This starting point in our time-span also coincides with the new inductive method of reasoning, which one of the authors included in CEPhiT, John Stuart Mill (1845), would go on to systematise. Empiricism also promoted the development of science outside the universities for the first time. These social and epistemological changes brought about the need for a new language in the transmission of science and scientific practice (Swales, 1990), and it is this emerging language which we have tried to capture in the compilation of CEPhiT. No text published later than 1900 has been included. Different events occurred around the turn of the century that proved crucial in the history of science. Among these we might list the discovery of the electron by Thompson in 1896, the crisis arising from the mechanical physics of Mach, Kirchhoff and Bolzmann in this same year, Planck’s announcement of quantum mechanics, and Einstein’s publication in 1905 of a paper proposing the Special Theory of Relativity (be it his own idea or Mileva Maric’s). All these developments brought with them the need to modify the discursive patterns of science, simplifying the prose and resorting to distinctive structures and vocabulary, just as had happened two centuries earlier. This change in discourse was formally announced by Thomas Huxley at the 1897 International Congress of Mathematics, and thus the turn of the century seems an appropriate end-point for our corpus. 5. Authors in CEPhiT Although the sampling method and overall principles of compilation governing the CC are applied in all cases and across all sub-corpora, the availability of texts and the very nature of the disciplines meant that the sub-corpora differ as regards variables such as sex, genre and geographical distribution. Table 3 below lists all the authors contained in CEPhiT together with the title of their work and the year of publication. All the authors whose works are sampled in CEPhiT would have been working and writing under the influence of specific extra-linguistic conditions, and these would have contributed to their way of writing. Such conditions, in that they are relevant to the study of writers’ discursive habits, have a decisive influence in the nature of the corpus, and will be discussed in the following paragraphs.
7
8
Isabel Moskowich
Table 3. Authors in CEPhiT Year
Author
Text sampled
1700 Astell, Mary
Some reflections upon marriage.
1705 Cheyne, George
Philosophical principles of natural religion: containing the elements of natural philosophy, and the proofs for natural religion, arising from them.
1710 Dunton, John
Athenianism: or, the new projects of Mr. John Dunton.
1717 Collins, Anthony
A Philosophical Inquiry Concerning Human Liberty.
1727 Greene, Robert
The principles of the philosophy of the expansive and contractive forces. Or an inquiry into the principles of the modern philosophy, that is, into the several chief rational sciences, which are extant.
1730 Kirkpatrick, Robert
The golden rule of divine philosophy: with the discovery of many mistakes in the religions extant.
1733 Balguy, John
The law of truth: or, the obligations of reason essential to all religion. To which are prefixed, some remarks supplemental to a late tract; entitled, Divine rectitude.
1736 Butler, Joseph
The analogy of religion, natural and revealed, to the constitution and course of nature. To which are added two brief dissertations: I. Of personal identity. II. Of the nature of virtue.
1740 Turnbull, George
The principles of moral philosophy. An enquiry into the wise and good government of the moral world: in which the continuance of good administration, and of due care about virtue, for ever, is inferred from present order in all things, in that part.
1748 Hume, David
Philosophical essays concerning human understanding. By the author of the essays moral and political.
1754 Bolingbroke, Henry
The Philosophical Works of the late Right Honorable Henry St. John, Lord Viscount Bolingbroke.
1755 Hutcheson, Francis
A system of moral philosophy, in three books.
1764 Reid, Thomas
An inquiry into the human mind, on the principles of common sense.
1769 Ferguson, Adam
Institutes of moral philosophy. For the use of students in the college of Edinburgh.
1770 Burke, Edmund
Thoughts on the cause of the present discontents. Dublin. [Dublin]: London: printed for J. Dodsley.
1776 Campbell, George
The philosophy of rhetoric.
Chapter 1. Compiling CEPhiT
Table 3. (continued) Year
Author
Text sampled
1783 Macaulay, Catharine
Treatise of the immutability of moral truth.
1790 Smellie, William
The philosophy of natural history.
1792 Wollstonecraft, Mary
Vindication of the Rights of Woman.
1793 Crombie, Alexander
An essay on philosophical necessity.
1801 Belsham, Thomas
Elements of the philosophy of the mind and of moral philosophy: to which is prefixed a compendium of logic.
1810 Stewart, Dugald
Philosophical Essays.
1811 Kirwan, Richard
Metaphysical Essays; containing the principles and fundamental objects of that science.
1820 Brown, Thomas
Lectures on the philosophy of the human mind.
1824 Phillips, Sir Richard
Two dialogues between an Oxford tutor and a disciple of the commonsense philosophy: relative to the proximate causes of material phenomena.
1830 Mackintosh, Sir James
Dissertation on the progress of ethical philosophy, chiefly during the seventeenth and eighteenth centuries.
1835 Hampden, A course of lectures introductory to the study of moral philosophy: Renn Dickson delivered in the University of Oxford, in Lent Term, 1835. 1838 Powell, Rev. Baden
The connexion of natural and divine truth: or, the study of the inductive philosophy, considered as subservient to theology. The Saturday Mazine.
1845 Mill, John Stuart
An examination of Sir William Hamilton’s philosophy and of the principal philosophical questions discussed in his writings.
1846 Combe, George
Moral philosophy, or the duties of man considered in his individual, domestic and social capacities.
1855 Lyall, William
Intellect, the Emotions, and the Moral Nature.
1860 Slack, Henry James
The philosophy of progress of human affairs.
1862 Simon, T. Collyns
On the Nature and Elements of the External World: Or, Universal Immaterialism, Fully Explained and Newly Demonstrated.
9
10
Isabel Moskowich
Table 3. (continued) Year
Author
Text sampled
1866 Mansel, Henry Longueville
The philosophy of the conditioned: comprising some remarks on Sir William Hamilton’s philosophy, and on Mr. J. S. Mill’s examination of that philosophy.
1874 Woodward, Thomas Best
A treatise on the nature of man, regarded as triune; with an outline of the philosophy of life.
1874 Balfour, A defence of philosophic doubt. Arthur James 1885 Seth Pringle- Scottish philosophy: a comparison of the Scottish and German answers to Hume. Pattison, Andrew 1890 Mackenzie, John Stuart
An Introduction to Social Philosophy.
1893 Bonar, James
Philosophy and political economy in some of their historical relations.
1898 Hodgson, Shadworth Hollway
The metaphysic of experience.
6. Genres represented in the corpus Previous corpus-based studies based on the CC, as well as based on other corpora, have shown that scientific writing is subject to variation depending, among other factors, on genre (as a way of socialising, hence with external functions) and on text-type (the internal characteristics of texts) (García-Izquierdo & Montalt, 2002). That said, we can assume that texts belonging to the same discipline or dealing with similar subject-matter may nevertheless exhibit differences according to text-type/genre (see Nwogu, 1990; Myers, 1990; Bhatia, 1993). The taxonomy we have applied to samples does not rely on linguistic features exclusively; on the contrary, we have also used epistemological and social features. Thus, the corpus contains texts broadly representing the three epistemological levels of writing identifiable today: highest (typical of research articles and abstracts), high (abstracts in abstracting journals and informative scientific articles); medium (specialised non-academic articles) (Fortanet et al., 1998). It has been argued (Moskowich, 2011) that the CC is concerned with para genres, that is, genres belonging to a single professional community (Monzó,
Chapter 1. Compiling CEPhiT
2002: 141) rather than to genres themselves. The labels applied for the classification of text-types in CETA, based on Görlach (2004), are also the ones used in CEPhiT.4 As shown in Table 4, the texts selected for CEPhiT group into only a small number of genres, fewer than in other disciplines, such as astronomy and life sciences. Table 4. Genres in philosophy texts Genres in CEPhiT
Samples
Treatise Esssay Textbook Lecture Dialogue Article
22 10 1 5 1 1
The adscription of a sample to one genre or another is always potentially debatable, since genres are “as family members” who “are related in various ways without necessarily having any single feature in common by all” (Fowler, 1982: 41). For this reason the functional text categories (genres) in other sub-corpora of the CC do not coincide with those found in CEPhiT. Our samples of philosophy texts are limited to six types, two less than in other disciplines (Moskowich, 2012). Writing in certain domains seems to rely on just a few types of texts, whereas for others a wider range is preferred. Subject-matter could here be claimed as the determinant element for such restrictions. The category “Treatise”, for example, seems to be the favoured one for late modern philosophy authors, as seen in Table 4; other genres, such as “Textbook”, while very popular among astronomers in CETA (Moskowich, 2011 & 2012), is found only once in CEPhiT. “Essay” is the next most favoured after “Treatise”, indicating a firm preference for more formal formats. Besides texts with an identifiable informative function (the most common), CEPhiT also contains texts of an instructive and even of an entertaining nature, as represented by “Lecture”, “Dialogue” and “Article”. The categories described by Görlach (2004: 88) were already in use at the time during which these texts were produced5 and it is for this reason, following the same scheme used in CETA, that they have been applied here.
4. For a discussion on the difficulties of such classification see Moskowich (2011). 5. We can attest the use of some of these formats as early as the end of the fourteenth century (see Chapter 2 and Moskowich, 2011).
11
12
Isabel Moskowich
Distribution of words per TT/Genre Treatise Esssay
Textbook Lecture Dialogue Article
Graph 1. Proportion of words per genre
In order to assure that CEPhiT contains samples of these six categories, all the samples were considered carefully, along with their prefaces and the texts from which they were extracted – thus avoiding what Rissanen (1989) terms “the philologist’s dilemma”.6 Having confirmed that the allocation of each sample to a specific category was appropriate, total word counts for each category were calculated, expressed in percentages in Graph 1. As Graph 1 shows, 54% of all CEPhiT samples correspond to the category “Treatises”. A more detailed look at the distribution of categories over time reveals that it is certainly different across the two centuries. Such differences, it can be argued, reflect the changes in text production as a response to the needs of social reality and trends in the negotiation of knowledge. This information is graphically displayed in the following tables and graphs. As shown by both Tables 5 and 6, as well as in their corresponding graphs (2 and 3), nineteenth-century authors resorted to a wider variety of genres than authors in the preceding century. This may be the result of the fact that philosophy in the nineteenth century had come to be considered as simply another field of knowledge, and thus deserved to be known and circulated at different educational and cultural strata in society, rather than restricted to a select few. The information from the metadata files in CEPhiT represented above offers a portrait of the nineteenth century as a period during which philosophy reached a larger readership by using a wider range of genres.
6. A more detailed account is provided in Chapter 2.
Chapter 1. Compiling CEPhiT
Table 5. Words per genre in eighteenth-century CEPhiT Genre
Number of words
Essay Treatise Textbook
60,213 129,745 10,064
Table 6. Words per genre in nineteenth-century CEPhiT Genre
Number of words
Essay Lecture Treatise Dialogue Article
40,251 50,307 90,393 10,084 10,072 Words per TT/Genre (18th c.)
esssay treatise textbook
Graph 2. Words per genre in eighteenth-century philosophy texts Words per TT/Genre (19th c.)
esssay lecture treatise dialogue
article
Graph 3. Proportion of words per genre in nineteenth-century philosophy texts
13
14
Isabel Moskowich
7. Sex of authors as a variable As might be expected, not many texts on philosophy were written by women in the late Modern English period. The whole sub-corpus contains only a total of three such examples, representing just 8% of the total word count (see Table 7); these women are Mary Astell (1700), Catharine Macaulay (1783) and Mary Wollstonecraft (1792). Table 7. Number of words per sex in CEPhiT Female Male
30,194 370,935
Women are seldom mentioned in books about the history of science. In certain spheres of life public female activity was not common, and this included publishing. Works on philosophy by women were particularly infrequent. However, other fields of science were regarded as even more masculine than philosophy, and women’s work was often not taken seriously (Herrero 2007: 75). Excluded from the official conduct of science, women who wanted to learn had to do so by reading, by listening to other women and, occasionally, by listening to men in places other than institutions of “official knowledge”, where women were not allowed. These social conditions constituted almost insurmountable boundaries for women authors. In certain scientific fields, such as astronomy, one response was for women not to sign their own work (Moskowich, 2012, 2012a). Although women participated intensively in science, they often did so as mere assistants. Some scientific institutions, in fact, did not admit women until the second half of the twentieth century. Such a reality, one in which women are not (at least officially) recognised, is reflected in CEPhiT. In fact, no women writing on philosophy in the nineteenth century have been included, and thus a total of only 30,194 words of female writing represent the period prior to the beginning of the suffrage movement. Words per sex in CEPHIT
Graph 4. Words/sex in CEPhiT
Female Male
Chapter 1. Compiling CEPhiT
Graph 4 presents the information in Table 7 in a clearer way, in which the absence of women from the world of philosophical knowledge is more visibly evident. 8. Geographical distribution of CEPhiT authors Variation depending on different sociolinguistic variables, such as geographical origin, can also be studied with the CEPhiT. For this reason we have tried to compile samples from authors whose linguistic habits regarding geography could be traced. To this end, we have selected English-speaking authors writing in English, avoiding any translations, even those made by the authors themselves. This in turn makes it necessary to gather information about authors and their lives, not least because sometimes in the history of science writers have been labelled as British when in fact their education in Britain started only very late, at a point when their linguistic strategies had already been set in other language. Thus, by geographical distribution of authors we refer primarily to the places where they were educated and where they acquired those linguistic habits which would subsequently be found in their writings as sampled in CEPhiT. The distribution of authors according to the geographical variable can be seen in Table 8. Although in other sub-corpora of CC samples from American authors abound, none have been included in this sub-corpus. At the time the great bulk of work on philosophy was being produced in Europe, while North America was recovering from the effects of a very convulsive eighteenth century. Similarly, during the nineteenth century Americans were more deeply concerned with the practical application of scientific advances than with theoretical disquisitions. The same information in Table 8 is set out in Graph 5 below. Extra-linguistic factors played an important part in philosophical movements, and this was more so in the case of Europe; American scientific writing, by contrast, was notable in other fields. Graphs 6 and 7 illustrate that, at the same time, a different geographical distribution can be observed in the two centuries covered by CEPhiT. It is clear that social and political changes have a deep impact on the development of science and the language of science. The way in which CEPhiT has been sampled allows for the representation of such social and political shifts. For example, aside from the impact of the American war, the fact that during the eighteenth century Ireland lived through the Protestant Ascendancy meant that the native Irish population was excluded from power and public life (Claydon & McBride, 1999); with England as the coloniser, it is little wonder that most scientific texts were published by English authors, with few Irish authors in evidence at the time.
15
16
Isabel Moskowich
Table 8. Geographical origin of authors in CEPhiT Year
Author
Place of education
1700 1705 1710 1717 1727 1730 1733 1736 1740 1748 1754 1755 1764 1769 1770 1776 1783 1790 1792 1793 1801 1810 1811 1820 1824 1830 1835 1838 1845 1846 1855 1860 1862 1866 1874 1874 1885 1890 1893 1898
Astell, Mary Cheyne, George Dunton, John Collins, Anthony Greene, Robert Kirkpatrick, Robert Balguy, John Butler, Joseph Turnbull, George Hume, David Bolingbroke, Henry Hutcheson, Francis Reid, Thomas Ferguson, Adam Burke, Edmund Campbell, George Macaulay, Catharine Smellie, William Wollstonecraft, Mary Crombie, Alexander Belsham, Thomas Stewart, Dugald Kirwan, Richard Brown, Thomas Phillips, Sir Richard Mackintosh, Sir James Hampden, Renn Dickson Powell, Rev. Baden Mill, John Stuart Combe, George Lyall, William Slack, Henry James Simon, T. Collyns Mansel, Henry Longueville Woodward, Thomas Best Balfour, Arthur James Seth Pringle-Pattison, Andrew Mackenzie, John Stuart Bonar, James Hodgson, Shadworth Hollway
England Scotland England England England Unk England England Scotland Scotland England Ireland/Scotland Scotland Scotland Ireland Scotland England Scotland England Scotland England Scotland Ireland England/Scotland England Scotland England England England/Scotland Scotland England England Ireland/England England Ireland Scotland/England Scotland Scotland Scotland England
Chapter 1. Compiling CEPhiT
Gepgraphical distribution per words in CEPHIT
England
Scotland Ireland Unknown
Graph 5. Provenance of authors in CEPhiT Geographical distribution per words in CEPhiT (18th c.) 10 %
5%
England 45 %
Scotland Ireland Unknown
40 %
Graph 6. Geographical distribution in the eighteenth century Geographical distribution per words in CEPhiT (19th c.)
England
Scotland Ireland
Graph 7. Geographical distribution in the nineteenth century
17
18
Isabel Moskowich
9. Validity of CEPhiT In addition to some general works which describe this subcorpus (Crespo & Moskowich, 2010, 2011a, 2011b; Moskowich, 2012a, 2012b), a number of pilot studies have shown that CEPhiT, as with its brother corpora within the CC, is a reliable source for the study of the evolution of scientific writing in the changing field of modern philosophy. Studies carried out can be grouped according to the various linguistic aspects they deal with. The corpus has only recently been finalised, and hence few studies using it have as yet been published. However, the following can be mentioned: the lexicon of science, the morphology of specialised terminology and other semantic implications have been explored from different perspectives in Camiña-Riobóo (2010a, 2010b, 2010c and Camiña & Lareo in this volume) and Puente (in this book); pragmatic approaches have been adopted in Monaco (2010, in this book), Alonso (2013), Alonso & Lareo (in this book) Seoane (in this book), and syntactic ones in Bello-Viruega (2010), Monaco & Puente-Castelo (2013), Puente-Castelo & Monaco (2013) and Alonso & Lareo (in this book). In addition, socio-linguistic variables included in CEPhiT have been used for a number of studies, including Crespo (2011, 2014), Moskowich (2013), and Camiña-Riobóo (2011). Other works, such as Moskowich (2009a), Monaco (2010), Lareo (2011), Crespo (2015), Crespo & Moskowich (2013, forthcoming), Moskowich & Monaco (2014), and Moskowich & Crespo (2012b, 2012c, 2013, 2014) focus on different aspects of discourse. Finally, several recent MA and doctoral dissertations have been written using CEPhiT, and others are currently being written. All such work attests to the value of the corpus for research.
References Allen, Bryce, Jian Qin & Frederick Wilfrid Lancaster (1994). Persuasive Communities: A Longitudinal Analysis of References in the Philosophical Transactions of the Royal Society, 1665–1990. Social Studies of Science, 24/2, 279–310. doi: 10.1177/030631279402400204 Alonso-Almeida, Francisco (2012). An analysis of hedging in eighteenth-century English astronomy texts. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’: The writing of science between 1700 and 1900 (199–220). Amsterdam: John benjamins. Alonso Almeida, Francisco (2013). An analysis of the grammatical and pragmatic values of seem in the Corpus of English Philosophy Texts. CILC 13. University of Alicante, 13–16 March 2013.
Chapter 1. Compiling CEPhiT
Banks, David (2012). Thematic structure in eighteenth century astronomical texts. A study of a small sample of articles from the Corpus of English Texts on Astronomy. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’: The writing of science between 1700 and 1900 (221–238). Amsterdam: John Benjamins. doi: 10.1075/z.173.11ban Beal, Joan (2004). English in Modern Times 1700–1945. London: Arnold. Bello Viruega, Iria (2010). A diachronic study of nominalizations in Astronomy and Philosophy texts. 6th International Contrastive Linguistics Conference. Berlin: Freie Universität Berlin. Bhatia, Vijay K. (1993). Analysing genre: Language use in professional settings. London and New York: Longman. Biber, Douglas (1993). Representativeness in Corpus Design. Literary and Linguistic Computing, 8/4, 243–257. doi: 10.1093/llc/8.4.243 Biber, Douglas, Susan Conrad, & Randi Reppen (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, Douglas & Gray, Bethany (2013). Being Specific about Historical Change: The Influence of Sub-Register. Journal of English Linguistics, 41, 104–134. doi: 10.1177/0075424212472509 Camiña-Riobóo, Gonzalo (2010a). New nouns for new ideas. In M. L. Gea-Valor, I. García & M. J. Esteve (Eds.). Linguistic and Translation Studies in scientific Comunication (157–176). Bern: Peter Lang. Camiña-Riobóo, Gonzalo (2010b). The language of women scientists in the 18th century: a morphological approach. 2nd International Corpus Linguistics Conference (CILC10). Universidade da Coruña. Camiña-Riobóo, Gonzalo (2010c). Rewriting philosophical theories in the eighteenth century: forming nnew nouns to express new ideas in the Corpus of English Philosophy Texts (CEPhiT). In M. L. Gea Valor, I. & M. J. Esteve (Eds.). Linguistic and Translation Studies in Scientific Communication. Bern: Peter Lang. Camiña-Riobóo, Gonzalo (2011). New nouns in the scientific register of late Modern English: a corpus-based approach. 3rd International Corpus Linguistics Conference (CILC11). Universidad Politécnica de Valencia. Camiña, Gonzalo (2013). Noun Formation in the Scientific Register of Late Modern English: A Corpus-Based Approach (PhD. Diss.). http://ruc.udc.es/handle/2183/11703. Cantos, Pascual & Nila Vazquez (2012). Subject Specific Vocabulary in Astronomy Texts: A Diachronic Survey of the Corpus of English Texts on Astronomy. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’. The writing of science between 1700 and 1900 (123–154). Amsterdam: John Benjamins. doi: 10.1075/z.173.07can Claydon, Tony & I. McBride (Eds.) (1999). Protestantism and National Identity: Britain and Ireland, c. 1650–c. 1850. Cambridge: Cambridge University Press. Crespo, Begoña (2011). Persuasion markers and ideology in eighteenth-century philosophy texts (CEPhiT). Lenguas para Fines específicos, 17, 199–228. Crespo, Begoña (2013). Locutionary Acts in Nineteenth-Century Astronomy Writing: Observ Explained through CETA. Nila Vázquez (Ed.). Creation and Use of Historical English Corpora in Spain (35–57). Newcastle: Cambridge Scholars Publishing. Crespo, Begoña (2014). Female Authorial Voice: Discursive Practices in Prefaces to Scientific Works. In M. Gotti & D. S. Giannoni (Eds.). Corpus-based Analysis and Teaching of Specialised Discourse. Bern: Peter Lang.
19
20 Isabel Moskowich
Crespo, Begoña (2015). Women writing science in the eighteenth-century: a preliminary approach to their language in use. Anglica. An International Journal of English Studies, 24/2, 103–127. Crespo, Begoña & Isabel Moskowich (2006). Latin Forms in Vernacular Scientific Writing: Code-Switching or Borrowing?. In R. MacConchie et al. (Eds.). Selected Proceedings of the 2005 Symposium on New Approaches in English Historical Lexis (Hel-Lex (51–59)). Somerville, MA:Cascadilla Press. Crespo, Begoña & Isabel Moskowich (2008). Advances in the Coruña Corpus. ICAME 29. University of Zurich. Crespo, Begoña & Isabel Moskowich (2010). CETA in the context of the Coruña Corpus. Literary and Linguistic Computing, 25/2, 153–164. doi: 10.1093/llc/fqp038 Crespo, Begoña & Isabel Moskowich. (2011a). Compiling English Philosophy Texts (CEPhiT). Helsinki Corpus Festival. University of Helsinki. Crespo, Begoña & Isabel Moskowich (2011b). CEPhiT: Texts ‘Concerning Human Understanding’. ICAME 32. University of Oslo. Crespo, Begoña & Isabel Moskowich (2013). The Corpus of English Philosophy Texts (CEPhiT) and a study on persuasion strategies. 36th International Conference of AEDEAN. Universidad de Málaga. Crespo, Begoña & Isabel Moskowich (forthcoming). Involved in writing science: ninetenth- century women in the Coruña Corpus. International Journal of Language and Linguistics. Crespo, Begoña et al. (Eds.) (2010). Mujer y Ciencia: Historia de una desigualdad. München: Lincom. Connor, Ulla & Thomas A. Upton (2004). Introduction. In U. Connor & T. A. Upton (Eds.). Discourse in the Professions. Perspectives from corpus linguistics. Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/scl.16 Fortanet, Inmaculada, Santiago Posteguillo, Juan Carlos Palmer & Juan Francisco Coll (1998). Disciplinary variations in the writing of research articles in English. In I. Fortanet, S. Posteguillo, J. C. Palmer & J. F. Coll (Eds.). Genre Studies in English for Academic Purposes (59–78). Castelló: Universitat Jaume I. Fortanet, Inmaculada, Santiago Posteguillo, Juan Carlos Palmer & Juan Francisco Coll (Eds.) (1998). Genre Studies in English for Academic Purposes. Castelló: Universitat Jaume I. Fowler, Alastair (1982). Kinds of Literature. An Introduction to the Theory of Genres and Modes. Oxford: Clarendon Press. Freeborn, Denis (1992). From Old English to Standard English. London: Macmillan. García-Izquierdo, Isabel & Vicent Montalt i Resurrecció (2002). Translating into textual genres. Retrieved 27 February 2013 from www.lans-tts.be/img/NS1/P135-143.PDF. Gea-Valor, María Luisa, Isabel García Izquierdo & María José Esteve (Eds.) (2010). Linguistic and Translation Studies in scientific Comunication. Bern: Peter Lang. Gelderen, Elly van (2006). A History of the English Language. (Rev. ed.). Amsterdam: John Benjamins. doi: 10.1075/z.135 Görlach, Manfred (1994). The Linguistic History of English. London: Macmillan. doi: 10.1007/978-1-349-25684-6 Görlach, Manfred (2004). Text Types and the history of English. Berlin/New York: Mouton de Gr uyter. doi: 10.1515/9783110197167 Gotti, Maurizio (1996). Robert Boyle and the language of science. Milan: Guerini Cientifica. Gotti, Maurizio & Davide S. Giannoni (Eds.) (2014). Corpus Analysis for Descriptive and Pedagogical Purposes. Bern: Peter Lang.
Chapter 1. Compiling CEPhiT
Gray, Bethany & Douglas Biber (2012). The emergence and evolution of the pattern N + PREP + V-ing in historical scientific texts. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’: The writing of science between 1700 and 1900 (181–198). Amsterdam: John Benjamins. doi: 10.1075/z.173.09gray Herrero López, Concepción (2007). Las mujeres en la investigación científica. Criterios, 8, 73–96. Hoskin, Michael (Ed.). (1999). The Cambridge concise history of astronomy. Cambridge: Cambridge University Press. Hyland, Ken (1998). Hedging in Scientific Research Articles. Amsterdam: John Benjamins. doi: 10.1075/pbns.54 Kytö, Merja, Juhani Rudanko & Erik Smitterberg (2000). Building a Bridge between the Present and the Past: A Corpus of 19th-century English. ICAME, 24, 85–97. Lareo, Inés (2009). Make-collocations in Nineteenth-Century Scientific English. Studia Neophilologica, 81/1, 1–16. doi: 10.1080/00393270802083067 Lareo, Inés (2011). Sociedad, educación y ciencia en los siglos 18 y 19: científicas británicas y americanas en el Coruña Corpus. In B. Crespo et al. (Eds.). Mujer y Ciencia: Historia de una desigualdad (42–68). München: Lincom. Lareo, Inés (2012). A corpus-driven approach to explore the use of complex predicates in 18th century English scientific writings. In I. Moskowich & B. Crespo (Eds.), Astronomy ‘playne and simple’: The writing of science between 1700 and 1900 (155–180). Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/z.173.08lar Lorenzo Modia, María Jesús (Ed.) (2008). Proceedings from the 31st AEDEAN Conference. A Coruña: Universidade da Coruña. MacConchie, Rod W. et al. (Eds.) (2006). Selected Proceedings of the 2005 Symposium on New Approaches in English Historical Lexis (Hel-Lex). Somerville, MA: Cascadilla Press. McEnery, Tony & Andrew Wilson (1996). Corpus Linguistics. Edinburgh: Edinburgh University Press. Moessner, Lilo (2001). Genre, text type, style, register: A terminological maze?. European Journal of English Studies, 5/2, 131–138. doi: 10.1076/ejes.5.2.131.7312 Monaco, Leida Maria (2010). Epistemic possibility and necessity in Modern scientific English: A first approach to modal auxiliaries in philosophy texts. 2nd International Corpus Linguistics Conference (CILC 10). Universidade da Coruña. Monaco, Leida Maria & Luis Puente Castelo (2013). Conditionals in 18th-century philosophy texts: A corpus-based study. Corpus Linguistics Conference 2013. University of Lancaster. Monzó Nebot, Esther (2002). La Profesió del Traductor Juridic i Jurat: Descripció Sociòlogica del Professional i Anàlisi Discursiva del Trasgenere. Unpublished PhD Dissertation. Castelló: Universitat Jaume I. Moskowich, Isabel (2007). Exploiting the Coruña Corpus. Encuentro de Investigadores: Lingüística Histórica Inglesa (Genres, Registers and Text-Types: a Historical Approach to English Scientific Writing. Universidade da Coruña. Moskowich, Isabel (2009a). Sobre la recopilación de corpus: la experiencia del Coruña Corpus. CILC 09. Universidad de Murcia. Moskowich, Isabel (2009b). Científicos de la Edad Moderna en el Coruña Corpus of Scientific Writing: vida y obra, obra y vida. Lecture at the Instituto de Historia de la Medicina y de la Ciencia López Piñero. Valencia. Moskowich, Isabel (2010). On the Coruña Corpus: Under construction. CILC 10. Universidade da Coruña.
21
22
Isabel Moskowich
Moskowich, Isabel (2011). “The Golden Rule of Divine Philosophy” exemplified in the Coruña Corpus of English Scientific Writing. Lenguas para fines específicos, 17, 167–198. Moskowich, Isabel (2012a). CETA as a tool for the Study of Modern Astronomy in English. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900 (35–56). Amsterdam: John Benamins. Moskowich, Isabel (2012b). “A smooth homogeneous globe” in CETA: Compiling Late Modern Astronomy Texts in English. In N. Vázquez (Ed.). Creation and Use of Historical English Corpora in Spain (21–37). Newcastle: Cambridge Scholars. Moskowich, Isabel (2012c). Patterns of English Scientific Writing: adjectives and other building- blocks. In I. Moskowich & B. Crespo (Eds.). Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900 (79–92). Amsterdam: John Benjamins. doi: 10.1075/z.173.05mos Moskowich, Isabel (2013). Eighteenth-century female authors: women and science in the Coruña Corpus of English Scientific Texts. Australian Journal of Linguistics, 33/4, 467–487. doi: 10.1080/07268602.2013.857570 Moskowich, Isabel & Leida Maria Monaco (2014). In M. Gotti & D. S. Giannoni (Eds.). Corpus Analysis for descriptive and Pedagogical Purposes (203–224). Bern: Peter Lang. Moskowich, Isabel & Begoña Crespo (2004). Presenting the Coruña Corpus: A Collection of Samples for the Historical study of English Scientific Writing. 2nd Late Modern English Conference (LMEC2). Universidad de Vigo. Moskowich, Isabel & Begoña Crespo (2007). Presenting the Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing. In J. Pérez Guerra et al. (Eds.). ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English (341–357). Bern: Peter Lang. Moskowich, Isabel & Begoña Crespo (Eds.) (2012a). ‘Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins. doi: 10.1075/z.173 Moskowich, Isabel & Begoña Crespo (2012b). Involved in writing science: nineteenth-century women in the Coruña Corpus. ICAME 33. University of Leuven. Moskowich, Isabel & Begoña Crespo (2012c). Reporting knowledge: information in nineteenth- century women works from the Coruña Corpus. ICAME 33. University of Leuven. Moskowich, Isabel & Begoña Crespo (2013). Apparently, stance can be found in scientific writing. A survey from the Coruña Corpus of English Scientific Writing. Late Modern English Conference. Università di Bergamo. Moskowich, Isabel & Begoña Crespo (2014). Stance is present in scientific writing, indeed. Evidence from the Coruña Corpus of English Scientific Writing. Token, A Journal of English Linguistics 3. Moskowich, Isabel & Leida Maria Monaco (2014). Abstraction as a Means of Expressing Reality: Women Writing Science in late Modern English. In M. Gotti & D. S. Giannoni (Eds.). Corpus Analysis for Descriptive and Pedagogical Purposes (203–224). Bern: Peter Lang. Moskowich, Isabel & Javier Parapar López (2008). Writing Science, Compiling Science. The Coruña Corpus of English Scientific Writing. In M. J. Lorenzo Modia (Ed.). Proceedings from the 31st AEDEAN Conference (531–544). A Coruña: Universidade da Coruña. Moskowich, Isabel et al. (comps.) (2012). Corpus of English Texts on Astronomy. Amsterdam: John Benjamins. Myers, Greg (1990). Writing Biology: Texts in the Social Construction of Scientific Knowledge. Madison: University of Wisconsin Press.
Chapter 1. Compiling CEPhiT
Nwogu, Kevin N. (1990). Discourse Variation in Medical Texts: Scheme, Theme and Cohesion in Professional and Journalistic Accounts. Nottingham: University of Nottingham. OED Online. Oxford: Oxford University Press. http://dictionary.oed.com (Accessed 28 April 2009). Pérez Guerra, Javier et al. (Eds.) (2007). ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English. Bern: Peter Lang. Puente Castelo, Luis & Leida Maria Monaco (2013). Conditionals and its functions in women’s scientific writing. Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers. http://ac.els-cdn.com/S1877042813041554/1-s2.0S1877042813041554-main.pdf?_tid=b43c9a54-b8eb-11e3-b7d6-00000aacb360&acdnat= 1396281003_cbc97a7f03dfcac1206853e0ea72b359. Rissanen, Matti (1989). Three problems connected with the use of diachronic corpora. ICAME Journal, 13, 16–19. Rissanen, Matti (2012). Grammaticalisation, contact and corpora. On the development of adverbial connectives in English. In I. Hegedus & A. Fodor (Eds.). English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23–27 August 2010 (131–151). Amsterdam: John Benjamins. DOI: 10.1075/cilt.325.06ris Swales, John (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Taavitsainen, Irma & Päivi Pahta (1997). The Corpus of Early English Medical Writing. ICAME Journal, 21, 71–78. UNESCO (1988). Proposed International Standard Nomenclature for Fields of Science and Technology. Paris: UNESCO/ROU257 rev. 1. Valle, Ellen (1999). A Collective Intelligence. The Life Sciences in the Royal Society as a Scientific Discourse Community, 1665–1965. Turku: Anglicana Turkuensia. Vázquez, Nila (Ed.) (2013). Creation and Use of Historical English Corpora in Spain. Newcastle: Cambridge Scholars Publishing.
23
Chapter 2
Genre categorisation in CEPhiT Begoña Crespo
University of A Coruña
…philosophy arises when the struggle for existence has given place to a life of leisure. (Erdmann, cited in Bonar 1893: 3)
The communicative function of language has long been of interest to scholars, and debate continues on concepts which form part of this function, among them, rhetorical patterns, language structures, target readership, author-reader relationship, context of situation and purpose (Duszak, 1997; Littlejohn & Fox, 2008). Within the realm of discourse studies and pragmatics, a range of approaches have been taken to clarify the concepts of genre and text type, as a means of producing a typological classification on which to base further analyses (Rissanen, 1996; Görlach, 2004). This includes the seemingly straightforward task of categorising texts for corpus compilation, which is what the members of the Research Group for Multi- Dimensional Corpus-based Studies in English (MuStE) have been doing. This chapter has been organised as follows: in Section 1 I will explain the principles that have guided the categorisation of the different samples in the present corpus. Section 2 will deal with the notions of genre and text type themselves. In the following section we will see how our understanding of genre as a socially- constrained format is manifested in the classification of text samples. The chapter concludes with some final remarks. 1. Categorising samples: The procedure As already noted by Moskowich in Chapter 1, one of the extralinguistic variables that can be found in the different sub-corpora of the Coruña Corpus of English Scientific Writing (CC henceforth) is that of genre or text type. Generally, the two terms have been used interchangeably, with no hint of a difference between them. Yet, some authors have attempted to establish such a difference (Biber, 1993; Trosborg, 1997; Lee, 2001; Taavitsainen, 2001, 2004; Moessner, 2001). At this point, looking specifically at the different modes of writing which authors resort doi 10.1075/z.198.02cre © 2016 John Benjamins Publishing Company
26 Begoña Crespo
to, it is perhaps worth setting out the view which has guided the classification of CC texts in this sense. The samples from philosophy texts compiled in CEPhiT (see Appendix I) have been classified according to four main criteria: 1. The author’s choice and an explicit determination of what he or she was writing, as stated in various possible kinds of prefatory material: prefaces, dedications, introductions, forewords. 2. The organization of the text to which the sample belonged, that is, its rhetorical construction, the purpose of writing the text, the target audience, the medium (written, oral texts which were written down), the level of technicality or complexity. 3. The author himself/herself, his/her academic background and position in society. 4. The addressee. Only 1 was taken as definitive. The remaining criteria were used as complementary except for where the author’s ascription of the text was not explicitly mentioned. In that situation, criteria 2, 3 and 4 were all taken into consideration. Examples (1), (2) and (3) below illustrate the authors’ indication of the genre they were using: (1) The following Treatise may in some Measure, claim the Honour of Your Lordship’s Patronage, as being undertaken in Obedience to your Commands, and… (Cheyne, 1705: Dedication) (2) The Reader is desired to excuse these Remarks thrown in his Way. As they are Supplemental to another Tract, I might have found another Place for them. But since they appear no improper Introduction to the following Essay; I (Balguy, 1733: iii) chose to prefix them here. (3) These Lectures, it will be perceived, have immediately in view the class of hearers to whom they were addressed; but it is hoped, at the same time, they may be generally useful to any who have not yet sufficiently thought on the nature of Moral Science, or of its real importance and interest. (Hampden, 1835: vi)
According to these criteria and making decisions based on previous literature, what the MuStE group has been doing does not seem to be a taxonomy of text types but of genres within a particular register, more precisely the scientific register, in which there is a great deal of variability. In the pages that follow I will attempt to justify the corpus compilers’ decision of classifying samples into genres.
Chapter 2. Genre categorisation in CEPhiT
2. Genre, not text type According to the above criteria, our understanding of genre coincides with Swales’ (1990: 58) definition: a class of communicative events, the members of which share some set of communicative purposes. These purposes are recognised by the expert members of the parent discourse community, and thereby constitute the rationale for the genre. This rationale shapes the schematic structure of the discourse and influences and constrains choice of content and style.
This definition is reinforced by Martin’s view of genre (2000: 13) as a social and cultural artefact within the framework of Systemic Functional Linguistics, a view which has sought to define genres as “staged, goal-oriented social processes through which social subjects in a given culture live their lives”. Given the social and communicative nature of any genre, the corpus compilers agree with Lee (2001: 38) when he argues that “[A]t present, all corpora use only external criteria to classify texts”. The reason for this is explained by Atkins, Clear and Ostler (1992: 5): “The initial selection of texts for inclusion in a corpus will inevitably be based on external evidence primarily … A corpus selected entirely on internal criteria would yield no information about the relation between language and its context of situation.” And this is precisely one of the aims in the compilation of the CC, which seeks to provide information about eighteenthand nineteenth-century scientific English and the external circumstances that favoured its development. All these views are reflected in Moskowich’s previous claims (2011: 182) that: Our classification of samples for the corpus is not based on linguistic features exclusively but on epistemological features and social factors too. As compilers, we have tried to include extracts from different epistemological levels, that is to say, from different levels of specificity (more or less informative), addressed to a more or less specialised target readership. This could be compared to Fortanet et al.’s (1998) classification of highest to medium epistemological level.
Görlach (2004: 88) has also claimed that proper definitions of each genre are necessary prior to text collection so as to ensure that corpora contain representative samples of the material under analysis. He mentions eight categories, proposing the definitions as shown in Table 1. This classification has been used as a starting point in our compilation of scientific texts, but as mentioned above, other parameters have been also taken into account.
27
28
Begoña Crespo
Table 1. Görlach’s classification of text types Article
Non-fictional composition or dissertation in a newspaper, journal or read at a conference Essay Short prose composition, first draft Lecture Formal discourse delivered to students. Piece of writing intended to be read aloud Treatise Discussion of a topic including some methodological issues Dialogue Literary work in conversational form Textbook Book used as a standard reference work Letter Written communication (not necessarily sent by post) Encyclopaedia Book containing information in all branches of knowledge, arranged alphabetically
Extra-linguistic considerations are intimately connected to the linguistic system itself and form part of the necessary criteria in the process of compiling a corpus. The relevance of considering external circumstances in a possible classification of genre had already been discussed by Biber (1988: 170), for whom genres are “categories determined on the basis of external criteria relating to the speaker’s purpose and topic; they are assigned on the basis of use rather than on the basis of form”. The key term here is ‘use’: it is the context of use that leads us to a particular classification, a tenet underlined by Bawarshi & Reiff (2010). Paraphrasing Martin (2000), they highlight the social nature of genre “by showing how social purposes/motives are linked to text structures, and how these are realized as situated social and linguistic actions within register” (Bawarshi & Reiff, 2010: 33). Genres can then be defined as socio-cognitive slots in the communicative process, which every author fills according to situational or contextual parameters. They can be adapted to the type of addressee and consequently to different levels of technicality (degree of specialisation), and can present a particular rhetorical organisation (format used to display the information). Essentially, then, everything relating to language choice, vocabulary, structures, and constructions are defining characteristics of text types, which we can understand as being different individual realizations of a particular genre. Biber & Finegan’s (1989: 6) contribution to the distinction between genre and text type disentangles the terms thus: “genre distinctions do not adequately represent the underlying text types of English …; linguistically distinct texts within a genre represent different text types; linguistically similar texts from different genres represent a single text type.” The emphasis here is on the existence of variation within genres and text types. Bearing this distinction in mind, it is important when compiling a corpus to concentrate first of all on the situational parameters of language variation,
Chapter 2. Genre categorisation in CEPhiT
because these can be established prior to text collection (Biber, 1993). As Biber (1993: 245) contends: there is no a priori way to identify linguistically defined types … [however,] the results of previous research studies, as well as on-going research during the construction of a corpus, can be used to assure that the selection of texts is linguistically as well as situationally representative.
The above claims are reflected in the methodology used by the CC compilers when classifying the collected samples. This means that prior to text selection we considered all the material at our disposal so as to proceed with a selection of samples as representative of the period as possible. These are general compilation principles, and can be applied to all sub-corpora. Some of the external considerations which demarcate the genres included in CEPhiT, that is to say, the indexical markers of genre, are time-span, discourse community, level of scientific literacy within the epistemic community, and an author’s statement of purpose. As for the first marker, Boris Tomashevsky argues that a logical classification of genres “is always historical, that is to say, it is correct only for a specific moment of history” (cited in Bordwell, 1989: 147). Since CEPhiT contains samples from works produced between 1700 and 1900, the ascription of samples to genre categories will have to consider the communicative formats of this period on the grounds of both primary sources (the authors themselves) and secondary sources (other scholars’ research on the history of science, historical discourse analysis, etc.). In the same vein, Hyland (1998: 18) states that writing practices are historical artefacts which meet the discursive requirements of a particular period. Such writing practices vary depending on different types of readership and the way in which meaning is negotiated between the concerned parties. Therefore, the first marker here is linked to the second: discourse community. In a discourse community1 a group of people share some communicative practices, both at the written and oral levels. Belonging to a discourse community normally entails common knowledge of a specific field or area. The members of this social grouping have the same communicative goals (Swales, 1990), and the 1. The concept of discourse community originally arose from the notions of speech community and interpretive community, in the literature dealing with communities of language use. Over the last decade, however, this concept has been overtaken by the sociolinguistic community of practice, notion first developed by Jean Lave and Etienne Wenger in Lave and Wenger (1991) and Wenger (2000) in which its members share interests, goals and are mutually engaged, guaranteeing effective communication within the community. As Penelope Eckert (2006: 1) has put it “A community of practice is a collection of people who engage on an ongoing basis in some common endeavor”.
29
30
Begoña Crespo
shared rules of the group serve as prescriptive sets of identifiers both for those inside and outside the circle. As members of the discourse community, author and addressee (Moessner, 2001) can develop a more or less interactive relationship manifested through the use of different genres. Both participants can be understood to be present during the writing of a text, depending on the function that the text is intended to fulfil. Gotti (2011: 124) observes that: Indeed, the propagation of discourse conveying new information about specialised facts or events to a social group sharing intellectual and professional interests implied the adoption of various textual forms, each with its own specific pragmatic aim so as to carry out different communicative functions and meet the expectations of a large number of non-homogenous addressees.
In this sense, an author’s socio-academic background, his or her specialist discipline and the statement of purpose when first designing a scientific work – namely, informative, developmental (for research purposes), didactic, compilatory (reporting function) or introductory – all characterise the generic format the work is to follow. In close connection, decisions as to the design of the work are also made on the basis of a consideration of the addressee: average literate reader, learner (taking into account different levels in the learning process), colleague, women. Graph 1 below represents these ideas:
Colleague
Source [Author]
Target [Readership/ Audience]
• Treatise, Letter, Article, Essay
Average reader
• Other
Learner, women
• Textbook, Lecture, Dialogue
Graph 1. Genres in late Modern Scientific English
In the case of CEPhiT we find several examples of authors leaving clues as to their target readership in the prefaces to their works. Belsham (1801), for example,
Chapter 2. Genre categorisation in CEPhiT
refers to his pupils, and Hampton (1835) presents his lectures by indicating who the hearers are. Seth (1885) also mentions his audience. Graph 1 also illustrates the role played by ideology and power (Bhatia, 1993) in the construction and use of genres. The ideology of British Empiricism is present in the way the information is transmitted, following Bacon and Boyle’s premises of plainness, conciseness, brevity and clarity of exposition, with references to authorities, an account of the experimental method based on a three-step procedure of action, observation and mathematical representation, as a means of creating confidence and reliability. As Belsham (1801: v) explains in the preface to his Elements of the Philosophy of Mind, and of Moral Philosophy: The formality of syllogistic reasoning is indeed justly laid aside in modern composition: but the ability to define correctly, to think justly, to analyse a complex process of argumentation, to detect plausible sophistry, and to arrange ideas and reasonings in a clear and luminous method, will always be of use.
Philips (1824: vi) includes “some fundamental doctrines in his Dialogues” since he “has attempted to lay a solid foundation of natural philosophy”. Ideological concerns are also conveyed through the forms and linguistic constructions used to manifest these tenets. Power is felt from the outset of the process: the author is in a higher position with regard to the addressee, and this power is a function of knowledge. In addition, Swales (1990) and Bhatia (1993) have claimed that considerations of ideology and power can be deduced from genres given the connection between language, culture and social goals associated with them. In any particular discipline, as with philosophy, participants in the discourse form part of an epistemic community. There are different levels of scientific literacy within this community: authors, learners and laypeople. On a hierarchical scale these levels would be represented in the following way: Author
other authors learners
beginner intermediate
laypeople
upper
Authors are placed at the same level in the exchange of knowledge, and occupy the higher ranks of the knowledge pyramid. This means that the language used and the concepts, ideas or phenomena explained in their writings for one another
31
32
Begoña Crespo
can be of maximum difficulty. Conceptual and language complexity, then, might well be seen in genres such as treatises, essays and articles, which are intended for other professionals or specialists. They are more expository, and allow for the presentation of authorial views, generating discussion. Moreover, the habit of including references to other scholars in these texts reveals a kind of interaction which strengthens the ties among the members of the scientific network (Gunnarsson, 2009: 66). This itself generates discipline-oriented specialisation within the general scientific milieu, thus contributing to the creation of a group identity. Learners of various levels of knowledge require the organisation of content in a didactic and illustrative way. Yet, it is true that in the eighteenth and nineteenth centuries, especially in the former, learners with access to school or university were in a select minority. Textbooks and dialogues served as a means of teaching philosophy to addressees here. Lectures taught at universities were also useful vehicles for the dissemination of philosophical knowledge. The different mode of transmission (written for textbooks; oral for dialogues and lectures) may have developed typical characteristics in each case (Carter-Thomas, 2004), the written form exhibiting a more overt authorial presence. Finally, the third level in the knowledge pyramid involves laypeople. Yet there are no specific genres for a wide transmission of philosophy, which was not the concern of illiterate or semi- literate people at a time when knowledge was not directly associated with “covering basic needs”; thinking, observing, reacting, claiming or stating were not key elements in the everyday struggle to survive (see initial quotation). Closely connected to the target audience/readership is what I have termed author’s statement of purpose. On occasions, what the authors intend with their writings is clearly stated in the preface, dedication or introduction to the work. The goal expected to be achieved is itself part of determining the genre selected for the transmission of philosophy issues. The following is Turnbull’s preface: AND accordingly what I now publish, is an attempt (in consequence of such observations as I have been able to make, or have been led to by others) to vindicate human nature, and the ways of GOD to man, by reducing the more remarkable appearances in the human system to excellent general laws: i.e. To powers and laws of powers, admirably adapted to produce a very noble species of being in the rising scale of life and perfection. (Turnbull, 1740: preface iv)
The social and contextual nature of the four parameters that intervene in the classification of scientific works, period, discourse community, scientific literacy and author’s goal, lead us to the claim that these texts belong to particular genres. In addition, since all these parameters are gradable, it is possible to identify a variety of genres within the realm of scientific discourse.
Chapter 2. Genre categorisation in CEPhiT
3. Genres in the Corpus of English Philosophy Texts (CEPhiT) As Trosborg (1997: 6) has stated: Texts used in a particular situation for a particular purpose may be classified using every-day labels such as a guidebook, a nursery rhyme, a poem, a business letter, a newspaper article, a radio play, an advertisement, etc. Such categories are referred to as genres.
The compilers have also identified a variety of communicative situations which were channelled through different genres when working with these philosophy samples. However, the study of the context and the participants involved in the production of philosophy texts preceded genre ascription. As mentioned in Chapter 1, the English term philosophy, “advanced knowledge or learning, to which the study of the seven liberal arts was regarded as preliminary in medieval universities” (OED), dates back to c. 1325. The subject matter of the discipline we nowadays call philosophy was originally divided into dialectic and physics, the former entailing the study of logic and the latter concerned with the study of mankind and the natural world from a general point of view. During the eighteenth century this second branch would be known as natural philosophy, and would base its arguments on quantitative reasoning. Studies in natural philosophy were paralleled by studies in natural history which, on the contrary, made use of qualitative and descriptive explanations. In the nineteenth century these terms were replaced by the term “science” which would itself be divided into physical (biology, chemistry, and physics) and social (psychology, anthropology, sociology) sciences. In fact, it was Whewell (1794–1866), a founding member of the British Association for the Advancement of Learning, who coined the term “scientist”, replacing expressions such as “natural philosopher” and “man of science”. The very term science gained its modern meaning when it came to be associated with the new scientific method of acquiring knowledge through experience. Most of the texts we have included in CEPhiT belong to what we today call philosophy (philosophy of language, philosophy of mind, metaphysics, ethics, morals, natural or divine theology), understood as rational thought as opposed to revealed knowledge or any kind of knowledge subject to religious beliefs. On some occasions, samples belonging to other fields (botany, mechanics or geology) closer to the eighteenth century perspective of Natural philosophy, were included. There were two main ideological trends in the field of natural philosophy during the Age of Reason: British Empiricism and Continental Rationalism. Although traces of the scholastic tradition can still be observed in some texts, the samples
33
34
Begoña Crespo
contained in CEPhiT try to represent the Empiricist vein of work, based on the observation of phenomena as well as on the use of deductive methods: It is confessed, that the utmost effort of human reason is to reduce the principles, productive of natural phenomena, to a greater simplicity, and to resolve the many particular effects into a few general causes, by means of reasonings from analogy, experience, and observation. (David Hume, An Enquiry Concerning Human Understanding (section IV, part I))
Philosophy, as understood in the eighteenth and nineteenth centuries, then, is the core of this corpus. The philosophical trends of the period themselves determine how authors understand nature and reality and, consequently, will set the landmarks for the use of a particular language when writing about anything ‘scientific’. The ‘Empiricist Philosophy’ of eighteenth-century England criticised any verbal, speculative or dialectic thoughts typical of previous periods. Conversely, it supported experimentation and experience as the fundamental, unique pillars of truth, which was to be transmitted through a specific kind of language based on univocal representations of nature and reality. It is precisely this ideological construct that generates a new and different scientific register from the time of Empiricism onwards. The advance of knowledge through science was one of the ideological pillars of the Enlightenment. All in all, the different subject-matter that could be gathered under the label “philosophy” might have had an influence on the writing styles of authors. As mentioned in Section 1 above, I contend that authors’ personal views of the subject-matter, their goals, ideological backgrounds and relations to the reader will determine the choice of a particular genre. In his preface to Elements of the Philosophy of Mind, and of Moral Philosophy, Belsham (1801) exemplifies these principles: – Empiricist methodology: The formality of syllogistic reasoning is indeed justly laid aside in modern composition: but the ability to define correctly, to think justly, to analyse a complex process of argumentation, to detect plausible sophistry, and to arrange ideas and reasonings in a clear and luminous method, will always be of use. (Belsham, 1801: v)
– Author’s goal: The author’s sole end was the investigation and diffusion of useful truth, and his desire was, not to influence his pupils to adopt his own opinions, but to excite in them a spirit of inquiry, and to assist and encourage them to think, and to judge for themselves. (Belsham, 1801: i)
Chapter 2. Genre categorisation in CEPhiT
– Addressee: The following sheets contain the substance of a course of lectures, which the author delivered to his pupils, upon some of the most interesting subjects which can occupy the attention of the human mind. (Belsham, 1801: i)
Following the criteria mentioned in the preceding sections, the compilers of CEPhiT have traced six different genres (see Chapter 1), the distribution of which is as follows: 22 Treatises, 10 Essays, 5 Lectures and 1 Textbook, 1 Dialogue and 1 Article. Treatise, Essay and Lecture, then, are the most common genres, whereas Textbook, Dialogue and Article are all limited to just one sample each. Since one of the aims of the CC is to represent as closely as possible the reality of text production, these figures might be explained in terms of the nature of the discipline here. Authors writing about philosophy during the modern period seem by large to prefer treatise, with essays coming next, which points to a very clear preference for more formal genres. The writing of treatises is attested as early as the end of the fourteenth century, and is first recorded with the meaning “a book or writing which treats of some particular subject” (OED). In modern times, however, the meaning also includes the idea of a book “containing a formal or methodological discussion or exposition of the principles of the subject”. Olmsted (1841) offers a definition of treatise as a piece of writing “in which the deepest research is united with that clearness of exposition, which constitutes the chief ornament of a work intended for elementary instruction” (emphasis added). Here he sets out clearly the object, process and purpose of the treatises, which will help us in defining the prevailing characteristics of the genre. Essay is defined by Balfour (1879: v) as “a piece of destructive criticism, formed by a series of arguments of a highly abstract character”. The purpose in this case is quite different (destructive criticism) and so too is the procedure (abstract argumentation). The term “essay” was used by Bacon (1605) in his work Of the proficience and aduancement of learning, diuine and humane as “[A] trial, testing, proof; experiment” but it was later used in the seventeenth-century meaning. A composition of moderate length on any particular subject, or branch of a subject; originally implying want of finish, ‘an irregular undigested piece’ (Johnson), but now said of a composition more or less elaborate in style, though limited in range. The use in this sense is apparently taken from Montaigne, whose Essais were first published in 1580. (OED)
In addition, Hickey (2010: 108) claims that “[I]n form, the eighteenth-century essay occupies a stylistic space between the letter and the dialogue” and that “[T]here
35
36
Begoña Crespo
is a set of practices associated with essay writing” (Hickey, 2010: 109) which makes essay writing different from other genres, as has already been described. Lectures are characterised by being presented through the oral medium. The general definition given by the OED explains that a lecture is “[A] discourse given before an audience upon a given subject, usually for the purpose of instruction”. From time to time these lectures were adapted to be published in written form, and five such examples are found in CEPhiT. In 1832, an Association was formed by the industrious classes of Edinburgh, for obtaining instruction in useful and entertaining knowledge, by means of lectures, to be delivered in the evenings after business-hours. These lectures were designed to be popular with regard to style and illustration, but systematic in arrangement and extent. …The audience amounted to between five and six hundred persons of both sexes. In twenty lectures, addressed to such an audience, only a small portion of a very extensive field of science could be touched upon. It was necessary also to avoid, as much as possible, abstract and speculative questions, and to dwell chiefly on topics simple, interesting, and practically useful… And also for the occasional omission of that rigid application of the principles on which the work is founded, to the case of every duty, which would have been necessary in a purely scientific treatise. As, however, my hearers were not, in general, regular students, but persons engaged in practical business, who could not be supposed to have always at command a distinct recollection of their previous knowledge, it became necessary for me to restate these principles at considerable length. (Combe, 1846: iii)
The genre Dialogue is defined by the OED as “[A] literary work in the form of a conversation between two or more persons” and as a “literary composition of this nature; the conversation written for and spoken by actors on the stage; hence, in recent use, style of dramatic conversation or writing”. Dialogues, then, are oral discussions on scientific topics, or may just be the form adopted by the author to present a simple and familiar version of a particular topic. In the seventeenth-century letters written along the lines of essay were quite common, used as a quick means of communicating interesting findings among scholars (Harmon & Gross, 2000). Letters with an essay format were the origin of the journal article, the publication of which included editor reviewing. Moreover, they were considered as a broadly informal form of communication. With the evolution of science, informal or personal letters became a more formal vehicle of communication: the journal article. Early articles typically described an individual observation of a natural phenomenon. They were normally short and were presented according to the Baconian framework of plainness and simplicity (Harmon & Gross, 2010). Banks (2010: 1) also notes that letters were a relatively quick and cheap means of communication:
Chapter 2. Genre categorisation in CEPhiT
Although these letters were sent from individual to individual, they were not really personal letters in the present day sense. It was understood that these letters could, indeed in many cases should, be copied, sent on, read at meetings of intellectual societies, and so on. Networks of correspondence were built up on this basis. The fact of something having been written in a letter could even be used in a priority dispute. It was in this context that the first two vernacular journals of an academic nature were created, both in 1665.
Similarly, Allen, Qin & Lancaster (1994: 293–294) have pointed out that “… books were the paramount medium of scholarly communication in the seventeenth and eighteenth centuries. In the nineteenth century, scientific journals achieved equal importance with books, then gradually assumed a dominant role.” 3.1
Genres per century
The picture offered by the range of genres used by philosophy authors included in CEPhiT varies considerably if we examine each century in detail as was pointed out in Moskowich’s chapter (this book). In the eighteenth century just three genres were employed by authors writing on philosophical concerns, and as we can see in Tables 2 and 3 below the genre treatise is double that of essays and far exceeds that of textbook: Table 2. Genres in the eighteenth-century per samples Genre
Number of texts
Essay Treatise Textbook
6 13 1
Table 3. Genres in the eighteenth-century per number of words Genre
Number of words
Essay Treatise Textbook
60,213 129,745 10,064
In the nineteenth century, there is a greater variety of formats in comparison with the previous century, as illustrated by the incorporation of lectures, dialogues and articles (Görlach, 2004). However, the predominant format continues to be treatise, followed by lecture and essay. Tables 4 and 5 below show the distribution of genres by samples and number of words in the nineteenth century:
37
38
Begoña Crespo
Table 4. Genres in the nineteenth-century per samples Genre
Number of texts
Treatise Essay Lecture Dialogue Article
9 4 5 1 1
Table 5. Genres in the nineteenth-century per number of words Genre
Number of words
Essay Lecture Treatise Dialogue Article
40,251 50,307 90,393 10,084 10,072
As the overall size of the reading public increased, so too did scientific literacy. It is not only the number but also the diversity of potential readers that explains the variety of formats in the nineteenth century. Increasing scientific literacy demanded an array of different communication channels which could be adapted to different contexts. The expansion of the discipline ran parallel to the expansion of the formats in which it was transmitted. Once again, then, we see the social nature of genres. This confirms the observations of Gross et al. (2002: 137–138), who note that nineteenth-century prose is addressed to amateurs as well as professionals. They also comment on the persistent difference between nineteenth-century writing, and “the highly compressed, neutral, monotonal prose” of late-twentieth-century science. However, the so-called impersonal tone of some texts shows a tendency towards a more “homogeneous communicative style”. This can be seen clearly in the use of “title and author credits, headings, equations segregated from text, visuals provided with legends, and citations standardised as to format and position,” as well as in standardised introductions and conclusions, that is, in the formal presentation of anything scientific.
Chapter 2. Genre categorisation in CEPhiT
4. Final remarks By way of conclusion I would like to restate the idea of genre as a socially- determined concept, which perhaps conflicts with the more linguistic notion of text type, but which is also adapted to the conventions of rhetoric. It is in this sense that the classification of samples in the CC and, therefore, in CEPhiT, has been based on extra-linguistic parameters such as the author’s own description of the nature of the text, the author and his context, the type of readership, and the discipline. This relates to the very essence of what a genre is, a socio-cognitive slot to be filled with the demands of the participants (writer-readership) through the communicative process. As society evolves and literacy increases, the formats to convey science also change so as to reflect the requirements of each particular moment in time.
References Allen, Bryce, Jian Qin & Frederick Wilfrid Lancaster (1994). Persuasive Communities: A Longitudinal Analysis of References in the Philosophical Transactions of the Royal Society, 1665–1990. Social Studies of Science, 24/2, 279–310. doi: 10.1177/030631279402400204 Atkins, Sue, Jeremy Clear & Nicholas Ostler (1992). Corpus Design Criteria. Literary and Linguist Computing, 7/1, 1–16. doi: 10.1093/llc/7.1.1 Banks, David (2010). The beginnings of vernacular scientific discourse: genres and linguistic features in some early issues of the Journal des Sçavans and the Philosophical Transactions. E-rea, Revue électronique d’études sur le monde Anglophone. Bawarshi Anis S. & Mary Jo Reiff (2010). Genre: An Introduction to History, Theory, Research, and Pedagogy. The WAC Clearing House, retrieved February 21, 2013, from http://wac. colostate.edu/books/bawarshi_reiff/ Bhatia, Vijay K. (1993). Analysing Genre: Language Use in Professional Settings. London: Longman. Biber, Douglas & Edward Finegan (1989). Drift and Evolution of English Style: A History of Three Genres. Language, 65, 487–517. doi: 10.2307/415220 Biber, Douglas (1988). Variation across Speech and Writing. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511621024 Biber, Douglas (1993). Representativeness in Corpus Design. Literary and Linguistic Computing, 8/4, 243–257. doi: 10.1093/llc/8.4.243 Bordwell, David (1989). Making Meaning: Inference and Rhetoric in the Interpretation of Cinema. Cambridge: Harvard University Press. Borg, Erik (2003). Key Concepts in ELT Discourse Community. ELT Journal, 57/4, 398–400. doi: 10.1093/elt/57.4.398
39
40 Begoña Crespo
Carter-Thomas, Shirley (2004). Specialised Syntax for Specialised Texts? A Comparison of the Preferred Syntactic Patterns in Proceedings Article and Conference Presentation Introductions. Barcelona: Actes du Colloque GLAT 2004, 11–21, retrieved February 12, 2013 from http://hal.archives-ouvertes.fr/docs/00/40/82/49/PDF/GLATart2004d.pdf. Duszak, Anna (1997). Cross-cultural academia communication: a discourse-community view. In Duszak, Anna (Ed.), Culture and Styles of Academic Discourse (11–39). Berlin/New York: Mouton de Gruyter. doi: 10.1515/9783110821048 Duszak, Anna (Ed.) (1997). Culture and Styles of Academic Discourse. Berlin/New York: Mouton de Gruyter. doi: 10.1515/9783110821048 Eckert, Penelope (2006). Communities of practice. In K. Brown (Ed.), Encyclopedia of Language and Linguistics (683–685). Oxford: Elsevier. doi: 10.1016/B0-08-044854-2/01276-1 Görlach, Manfred (2004). Text Types and the history of English. Berlin/ New York: Mouton de Gruyter. doi: 10.1515/9783110197167 Gotti, Maurizio (2011). The development of specialized discourse in the Philosophical Transactions. In I. Taavitsainen & P. Pahta (Eds.), Medical Writing in Early Modern English (204– 220). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511921193.012 Gross, Alan G., Joseph E. Harmon & Michael Reidy (2002). Communicating Science: The Scientific Article from the 17th Century to the Present. Oxford: Oxford University Press. Gunnarsson, Britt-Louise (2009). Professional discourse. London/New York: Continuum. Harmon, Joseph & Alan Gross (2000, August 5). The Scientific Article: From Galileo’s New Science to the Human Genome. Retrieved from http://fathom.lib.uchicago.edu/2/21701730/. Harmon, Joseph & Alan Gross (2010). The Craft of Scientific Communication. Chicago: The University of Chicago Press. doi: 10.7208/chicago/9780226316635.001.0001 Hickey, Raymond (2010). Eighteenth-Century English: Ideology and Change. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511781643 Hyland, Ken (1998). Hedging in Scientific Research Articles. Amsterdam: John Benjamins. doi: 10.1075/pbns.54 Lee, David (2001). Genres, rRegisters, Text Types, Domains, and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle. Language Learning & Technology, 5/3.3, 37–72. Littlejohn, Stephen W. & Karen A. Foss (2008). Theories of Human Communication. Belmont, CA: Thomson and Wadsworth. Martin, James R. (2000). Analysing genre: functional parameters. In F. Christie & J. R. Martin (Eds.), Genre and Institutions. Social Processes in the Workplace and School (3–39). London: Continuum. Moessner, Lilo (2001). Genre, text type, style, register: A terminological maze?. European Journal of English Studies, 5/2, 131–138. doi: 10.1076/ejes.5.2.131.7312 Moskowich, Isabel (2011). ”The Golden Rule of Divine Philosophy” exemplified in the Coruña Corpus of English Scientific Writing. Lenguas para fines específicos, 17, 167–198. Oxford English Dictionary online. 1989. Oxford: Oxford University Press. Third edition, retrieved December 15, 2012, from http://dictionary.oed.com/ Olmsted, Denison (1841). Letters on Astronomy, addressed to a lady in which the elements of the science are familiarly explained in connexion with its literary history. With numerous engravings. Boston: Marsh, Capen, Lyon and Webb.
Chapter 2. Genre categorisation in CEPhiT
Rissanen, Matti (1996). Genres, Texts and Corpora in the Study of Medieval English. In Klein Jürgen & Dirk Vanderbeke (Eds.), Anglistentag 1995 Greifswald. Proceedings (229–242). Tübingen: Max Niemeyer Verlag. Swales, John (1990). Genre analysis: English in Academic and Research Settings. Cambridge: Cambridge University Press. Taavitsainen, Irma (2001). Changing Conventions of Writing: The Dynamics of Genres, Text Types, and Text Traditions. European Journal of English Studies, 5/2, 139–150. doi: 10.1076/ejes.5.2.139.7309 Taavitsainen, Irma (2004). Genres of secular instruction: a linguistic history of useful entertainment. Miscelánea: A Journal of English and American Studies, 29, 76–94. Trosborg, Anna (1997). Register, Genre and text type. In A. Trosborg (Ed.), Text Typology and Translation (3–23). Amsterdam: John Benjamins. doi: 10.1075/btl.26.03tro Trosborg, Anna (Ed.) (1997). Text Typology and Translation. Amsterdam: John Benjamins. doi: 10.1075/btl.26
Appendix Table 1. Eighteenth-century authors and texts Year
Author
Title of work sampled
Genre
1700
Astell, Mary
Some reflections upon marriage.
Essay
1705
Cheyne, George
Philosophical principles of natural religion: containing the ele- Treatise ments of natural philosophy, and the proofs for natural religion, arising from them.
1710
Dunton, John
Athenianism: or, the new projects of Mr. John Dunton.
Treatise
1717
Collins, Anthony
A Philosophical Inquiry Concerning Human Liberty.
Treatise
1727
Greene, Robert
The principles of the philosophy of the expansive and contractive Treatise forces. Or an inquiry into the principles of the modern philosophy, that is, into the several chief rational sciences, which are extant. In seven books. By Robert Greene.
1730
Kirkpatrick, The golden rule of divine philosophy: with the discovery of many Treatise Robert mistakes in the religions extant.
1733
Balguy, John
The law of truth: or, the obligations of reason essential to all Essay religion. To which are prefixed, some remarks supplemental to a late tract; entitled, Divine rectitude.
1736
Butler, Joseph
The analogy of religion, natural and revealed, to the constitution Treatise and course of nature. To which are added two brief dissertations: I. Of personal identity. II. Of the nature of virtue.
41
42
Begoña Crespo
Year
Author
Title of work sampled
1740
Turnbull, George
The principles of moral philosophy. An enquiry into the wise and Treatise good government of the moral world: in which the continuance of good administration, and of due care about virtue, for ever, is inferred from present order in all things, in that part…
1748
Hume, David
Philosophical essays concerning human understanding. By the author of the essays moral and political.
Essay
1754
Bolingbroke, Henry
The Philosophical Works of the late Right Honorable Henry St. John, Lord Viscount Bolingbroke. Published by David Mallet, Esq; Volume I.
Essay
1755
Hutcheson, A system of moral philosophy, in three books. Francis
1764
Reid, Thomas
An inquiry into the human mind, on the principles of common sense.
Treatise
1769
Ferguson, Adam
Institutes of moral philosophy. For the use of students in the college of Edinburgh. By Adam Ferguson, LL.D.
Textbook
1770
Burke, Edmund
Thoughts on the cause of the present discontents. Dublin. [Dublin]
Treatise
1776
Campbell, George
The philosophy of rhetoric.
Essay
1783
Macaulay, Catharine
Treatise of the immutability of moral truth.
Treatise
1790
Smellie, William
The philosophy of natural history.
Treatise
1792
Wollstone- Vindication of the Rights of Woman. craft, Mary
1793
Crombie, Alexander
An essay on philosophical necessity.
Genre
Treatise
Treatise Essay
Chapter 2. Genre categorisation in CEPhiT
Table 2. Nineteenth-century authors and texts Year
Author
Title of work sampled
Genre
1801
Belsham, Thomas
Elements of the philosophy of the mind and of moral philosophy: to which is prefixed a compendium of logic.
Lecture
1810
Stewart, Dugald
Philosophical Essays.
Essay
1811
Kirwan, Richard
Metaphysical Essays; containing the principles and fundamen- Essay tal objects of that science.
1820
Brown, Thomas
Lectures on the philosophy of the human mind.
Lecture
1824
Phillips, Sir Richard
Two dialogues between an Oxford tutor and a disciple of the common-sense philosophy: relative to the proximate causes of material phenomena.
Dialogue
1830
Mackintosh, Dissertation on the progress of ethical philosophy, chiefly durSir James ing the seventeenth and eighteenth centuries.
1835
Hampden, Renn Dickson
1838
Powell, Rev. The connexion of natural and divine truth: or, the study of the Treatise Baden inductive philosophy, considered as subservient to theology. The Saturday Magazine.
1845
Mill, John Stuart
An examination of Sir William Hamilton’s philosophy and of Treatise the principal philosophical questions discussed in his writings.
1846
Combe, George
Moral philosophy, or the duties of man considered in his individual, domestic and social capacities.
Lecture
1855
Lyall, William
Intellect, the Emotions, and the Moral Nature.
Treatise
1860
Slack, The philosophy of progress of human affairs. Henry James
1862
Simon, T. Collyns
1866
Mansel, The philosophy of the conditioned: comprising some remarks Henry on Sir William Hamilton’s philosophy, and on Mr. J. S. Mill’s Longueville examination of that philosophy.
Treatise
A course of lectures introductory to the study of moral philoso- Lecture phy: delivered in the University of Oxford, in Lent Term, 1835.
Treatise
On the Nature and Elements of the External World: Or, Treatise Universal Immaterialism, Fully Explained and Newly Demonstrated. Article
43
44 Begoña Crespo
Year
Author
Title of work sampled
Genre
1874
Woodward, A treatise on the nature of man, regarded as triune; with an Thomas Best outline of the philosophy of life.
Treatise
1874
Balfour, A defence of philosophic doubt. Arthur James
Essay
1885
Seth Pringle- Scottish philosophy: a comparison of the Scottish and German Lecture answers to Hume. Pattison, Andrew
1890
Mackenzie, John Stuart
An Introduction to Social Philosophy.
Essay
1893
Bonar, James
Philosophy and political economy in some of their historical relations.
Treatise
1898
Hodgson, Shadworth Hollway
The metaphysic of experience.
Treatise
Chapter 3
Editorial policy in the Corpus of English Philosophy Texts Criteria, conventions, encoding and other marks Gonzalo Camiña and Inés Lareo
University College Cork / Universty of Vigo
1. General remarks One of the difficulties faced in compiling and transcribing the samples in the Corpus of English Philosophy Texts has been differences in spellings. With the most recent samples it was possible to scan the original source texts and to use an Optical Character Recognition process (OCR) to obtain digital text; with the older sources, indeed with all those texts from the eighteenth century, manual retyping of the original work was required, due to the poor quality of the source material, including poorly defined characters, ink blots, worn-out paper, and the use of non-standard characters such as (long ), (italicised long ) and the ligatured digraph , rendering OCR unfeasible, as can be seen in Figure 1. The corpus, then, has been built using both typing and OCR, in both cases followed by revision and encoding. Final versions of texts have in fact been revised three times, although it is possible even now that some small inconsistencies might remain. The mark-up language employed to tag the texts is the eXtended Mark-up Language (XML), due of its flexibility and its potential to be further developed, modified and exported, as explained in Section 2 below. In the corpus we have tried to keep the computerised output text as faithful to the original as possible, notwithstanding the difficulties here. The final output of every text constitutes an electronic third layer or step, following the first step of the author’s original, and the second step of the printer. Our aim has been to find and maintain a balance between two options: (a) showing the text in the way it originally was, and (b) offering researchers the possibility of manipulating the data in the texts in an open, flexible and productive way. To this end some editorial decisions have been necessary, and these will now be described.
doi 10.1075/z.198.03cam © 2016 John Benjamins Publishing Company
46 Gonzalo Camiña and Inés Lareo
Figure 1. Original text
1.1
Headers
Prior to the analysable body of every text, as seen in the Info Display window (see the CCT Manual) in the Coruña Corpus Tool (CCT), we have included headers (see Figure 2) that contain information about the file. The first section of the box, in light grey, includes the name of the file in the format [discipline, year, author, pages] with an XML extension. It is followed by the full name of the research group involved with the corpus, the sponsors and the director. In the middle section, in a darker grey, the user can find the name of the subcorpus (CEPhiT) and the amount of analysable material in the file. The third section, again in a lighter grey, contains the official name of the research project carried out at the Universidade da Coruña under the direction of Isabel Moskowich-Spiegel. The header box concludes with the bibliographic references of the sampled extract. A reduced version of the full title of the text is given immediately beneath the header box. This is because the full titles sometimes cover a full page. The full titles of all the texts can be seen in the Metadata section of the CCT. The name of the author and the year of publication can also be seen at the beginning of every sample.
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
Figure 2. Header
1.2
Fonts
Despite the fact that the original texts share neither the same font family nor the same font size, we have opted for one consistent, uniform Arial Unicode MS font throughout the whole corpus, which can be exported and displayed on most computers. Internet browsers such as Microsoft Internet Explorer, Mozilla Firefox, Apple Safari and Opera, among others, are thus fully supported. Any problems in the display are due to the lack of compatibility of a browser with the Unicode standard (see Section 2), since the corpus fully complies with the guidelines of this standard. Also, we have offered a generous 13-point font size, which can be read easily, even on ‘mini laptops’ or tablets (Figure 3):
Figure 3. Fonts displayed in Firefox
47
48 Gonzalo Camiña and Inés Lareo
1.3
Page numbers
The original numbering of the text has been retained, even when the sample does not start on page one. Nevertheless, to be consistent, all page numbers for all texts will appear centred on the screen in a bold font type between blank lines. Figure 4 illustrates how page numbers are shown in the final layout document:
Figure 4. Page numbers
1.4
Titles of chapters and sections
The Info Display window of the CCT shows the titles of chapters and sections, these centred on the screen in a larger bold blue font, to make the visual revision of texts easier, as shown in Figure 5:
Figure 5. Titles
1.5
Paragraphs and lines
The length of the original paragraphs has been respected; however, lines will adapt their length to the width of the Info display window. This is so that the CCT can get rid of unnecessary and misleading spaces at the end of lines. The previous
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
example has been shrunk to illustrate this in Figure 6; such shrinking of the text allows the user to open more than one window on the screen:
Figure 6. Line display
1.6
Analysable items
As described above, the number of items in the header refers to all those elements in a text that can be analysed by the CCT. Those items that either cannot be analysed from a syntactic point of view, such as isolated numbers, symbols, etc., and also those items that were not originated by the author, will not be taken into account by the CCT, yet in that they pertain to the text, they can still be viewed in the Info window. Non-analysable elements are distinguished from analysable items in that the former appear in red. Figures 7 and 8 illustrate how symbols and text fragments quoted from other authors are shown in the XML layout:
Figure 7. Quotations
49
50
Gonzalo Camiña and Inés Lareo
Figure 8. Non-analysable elements
Obviously, the CCT will show our editorial marks, such as [quotation], [note], [fragment], and so on. These will be explained in Section 3. 1.7
Omissions and amendments
Despite our commitment to respect the original versions of the samples as far as possible, we have nonetheless decided to omit editorial material such as page headers, footers and some marginal notes, since they are not part of the body of the text written by the author and thus do not represent her/his linguistic habits. Also, a good number of the older texts presented an extra blank space before some punctuation signs such as colons and semi-colons. For the sake of consistency we have removed those spaces: they do not have any effect on the printed text, but they do have an impact on a text in computerised format. More generally, punctuation signs do not appear in the Search Window of the CCT. Finally, a small number of spelling errors have been corrected, in that these are likely to have been made by the printer rather than the author. We have considered the different spellings over time and checked all items in the Oxford English Dictionary. Those items impossible to identify, or missing elements, have been marked as [unclear]. 2. Mark up language The language used to encode CEPhiT is XML, a subset of the Standard Generalized Markup Language (SGML), which was used, for example, to create the second edition of The Oxford English Dictionary. We have used XML simply because it is far easier to implement than SGML. Indeed, for this reason it is widely used nowadays for deriving document specifications and building general-purpose applications, such as those for electronic commerce, RSS, XHTML and so on. The great advantages of XML are its ease of use, flexibility and cross-platform operability. This means that any text encoded with XML can be stored, modified, expanded, viewed and shown on any computer running any operative system. In order to tag our texts, we have made use of the guidelines provided by the Text Encoding Initiative (TEI). In our corpus, we have incorporated some of
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
the tags proposed by TEI in order to provide CEPhiT users with a structure that allows them to carry out research accurately. The Unicode Consortium, developer of the Unicode Standard, supplies the characters necessary to represent most symbols, including mathematical and astronomical, that allows CEPhiT to offer a representation close to the original text. All texts have been encoded using the open-source text editor Emacs (see Figure 9), which is free, powerful, extensible and customisable. We would like to acknowledge the work of the TEI Consortium, the Unicode Consortium and the developers of Emacs over the years, which has helped us greatly in our project.
Figure 9. The Emacs editor
2.1
Tags contained in CEPhiT
The result of the header of the XML file shown in Section 1 above (Figure 2) contains a whole set of TEI tags that will not be explained here, due to their complexity. The following list provides a simplified version of the TEI tags contained in the main body of the XML file:
51
52
Gonzalo Camiña and Inés Lareo
TAG MEANING Page numbers. Division of chapters and sections. Titles of chapters and sections.
Paragraph. Line. Abbreviation. Italics. Footnote. The elements contained in this tag will not be analysed by the CCT, but will be in red and only shown in the Info Display Window.
Apart from these TEI tags we have included a set of editorial marks in order to make analysis straightforward. They are discussed in the following sections. 3. Editorial marks and decisions As mentioned above, we have selected samples of around 10,000 words taken from different parts of those works included in CEPhiT. Although we have tried to be faithful to the original texts, some changes were necessary to render searches on the CCT comprehensible. This Section will deal with the different text elements that appear in each part of the text: page, paragraph and word. One important decision was that of including editorial information between square brackets, [ ], to make the work of the researcher easier. In those rare cases in which an author has in fact used square brackets, these have been replaced by parenthesis. Square brackets in our corpus enclose data such as the place of quotations, figures, formulae, etc. in the original text. But at the same time they are used to disambiguate homographic forms that the CCT might otherwise consider a word. For instance, the Roman number I has been represented between brackets to avoid the miscounting of the personal pronoun I. Hence, in the wordlist generated by the CCT the first personal pronoun would appear, for example, as i-325 (indicating that the author has used the first person pronoun 325 times) and the Roman number I would appear as [i]-20 (the Roman number I has been found 20 times). Along with Roman numbers, abbreviations and formulae are also enclosed in these brackets. For instance, the phrase the points BE will appear as the points [BE]; the abbreviation for number, No, will appear as [no] to distinguish it from the negative particle no, etc. (see Figures 10 and 11):
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
Figure 10. Use of square brackets
Figure 11. Roman number
Although some of the editorial marks written in square brackets are displayed not only in the CCT Info Display window but also in the Search window, they are not considered for the final count of items/words of each text. 3.1
Pages
The original page number and content of the texts compiled in CEPhiT have been kept in the final files. As mentioned above, the location of tables, figures, formulae etc. have been marked, but these non-text elements have been disregarded. The inclusion of the first word of the next page at the bottom of the current page was also commonly found in eighteenth-century texts. These repeated words have also been excluded from CEPhiT final files. 3.2
Paragraphs
Original paragraphs, but not lines, are retained in CEPhiT files, and this implies the exclusion of truncated words at the end of a line. There is only one case in which the original form of paragraphs cannot be respected: footnotes. TEI restrictions prevent the division of a footnote into different paragraphs. Therefore, the information included in footnotes is written in one single paragraph in the CEPhiT file. TEI restrictions affect also the place where a footnote appears in the Info Display window of the CCT. Footnotes are placed below the word they refer to, in a separate paragraph. We have also placed all note references after the word they refer to. This makes the electronic text easier to understand. Other decisions concerning notes are the exclusion of editorial notes, since they do not represent the author’s language.
53
54
Gonzalo Camiña and Inés Lareo
The following editorial marks are used to add information to the final file or to make researchers’ work easier: [note] / [endnote]
As already mentioned, footnotes are placed on a separate line after the reference mark used by the author (a number, a letter, a symbol, etc.). They are easy to identify in the Info Display window because they have a different font size (see Figure 12):
Figure 12. Info Display window [note]
Figure 13. Search window [note]
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
However, when working with the Search window, and because of the TEI restrictions for footnotes, they cannot be recognised at first sight since they are inserted in the main body of the text. To avoid this, we have included two editorial marks to identify the beginning and the end of a note (see Figure 13). [quotation]
Since only the words reflecting the language of the authors are included in the corpus, we have deleted quotations from the original texts. The mark [quotation] is used instead of the original words to facilitate the understanding of the CCT Search window (see Figure 14):
Figure 14. CCT Search window
Figure 15. CCT Info display window. Editorial mark [quotation]
55
56
Gonzalo Camiña and Inés Lareo
However, many deleted quotations can be seen in red in the Info Display window to the right where this mark precedes the original quotation (see Figure 15). When the quotation is presented in direct speech the mark [quotation] is used twice; one before the words preceding forms such as said he, includes she, mentioned they, etc. followed by the corresponding part of the original quotation in red (meaning deleted) and another one after the same forms and before the following part of the quotation, as seen in this sample of Collins (1717: 53) And this was ARISTOTLE’S ſenſe of ſuch aƈtions of Man. [quotation] As ſays he [quotation] in arguing we neceſſarily aſſent to the inference or concluſion drawn from premiſes, ſo if that arguing relate to praƈtiſe, we neceſſarily aƈt upon ſuch inference or concluſion. But, sometimes, when the quotation itself is not necessary for the comprehension of the text its content is not included at all, not even on the Info Display window, being only visible the editorial marks [quotation]. The language in which the quotation appears in the original texts has been also identified. When the editorial mark [quotation] is included, this means the quotation is written in English. When any other language is used in the quotation, the editorial mark [quot] followed by the name of the corresponding language is used. Therefore, marks such as the following ones can appear: [quotlat], when the original language is Latin. [quotgreek], when the original language is Greek. [quotrussian], when the original language is Russian, etc.
The inclusion of these marks and the original quotation, displayed in the Info Display window, allows researchers to study intertextuality and language variation in scientific writing, because relevant searches can be made in the Search window (see CCT Manual). [fragment]
This mark is used to identify deleted parts of the original text. For instance, the beginning or the end of the sample could have been omitted because their sentences began on the previous page or continued on the following page (see Figure 16). This does not apply to quotations that have their own identifying mark. This mark is also used when a word or fragment is written in a non-Latin alphabet in the original text. Thus, words that cannot be represented by the Latin alphabet are marked as [fragmentgreek] or [fragment, followed by the corresponding language] (see Figure 17).
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
Figure 16. Editorial mark [fragment]
Figure 17. Editorial mark [fragmentgreek]
[unclear]
This mark is used instead of any item that cannot be clearly read in the original text (see Figure 18).
Figure 18. Editorial mark [unclear]
57
58
Gonzalo Camiña and Inés Lareo
3.3
Words
As texts from different centuries are included in CEPhiT, one of the decisions taken by the compilers was to keep original spellings, e.g., the ligatured digraph and the long , also mentioned earlier in this chapter (see Figure 19):
Figure 19. Spelling variants
The CCT has been developed taking these spelling variants into consideration. Therefore, when a searched word could have had two or more different spellings, the CCT shows all the possible spellings and distinguishes them as different types (see Figure 20, bottom row):
Figure 20. Search window. Spelling variants
Chapter 3. Editorial policy in the Corpus of English Philosophy Texts
As we have not kept the original lines of the texts for CEPhiT, truncated words have been eliminated as such, with both parts written together. Hyphens, then, are used only for hyphenated compound words and these can also be searched for (see Figure 21):
Figure 21. Hyphenated compound words
Therefore, when a hyphen has been used as a layout mark by the author or printer, an EM-dash has been used to replace it (see Figure 22):
Figure 22. Use of EM-dashes
59
60 Gonzalo Camiña and Inés Lareo
4. List of editorial marks used in CEPhiT [quotation]; [quotlat]; [quotgreek]; [quot+name of the corresponding language] [note] / [endnote] [fragment]; [fragmentgreek]; [fragment+name of the corresponding language] [unclear] [poem] [figure] [formula] [table] [missing pages] 5. Concluding remarks The members of the research group MuStE have gone through numerous discussions before reaching an agreement on the criteria and conventions explained in this chapter. It has been a long and arduous process in which we explored a wide range of possibilities to fill the gaps left by TEI. Our first and foremost goal has always been to keep these criteria and conventions simple and as closely connected to the standards as possible, in order to offer the users of our corpus the best options to carry out their research with the least difficulty possible. Notwithstanding this, some editorial decisions had to be made in particular situations to guarantee the exploitation of the corpus at its full potential, given the nature of the texts included in the corpus, especially those dating from an earlier period. We are aware that the compromise reached by the team members in the use of editorial marks in CEPhiT may be upgraded, or even made unnecessary by future revisions of TEI. Apart from this, we are open to suggestions from users and would be glad to obtain their feedback on our corpus and tool. All in all, and for the time being, we honestly believe that the Coruña Corpus in general, and the subcorpus CEPhiT in particular, along with the CCT and the criteria adopted here may provide fairly solid and manageable grounds to research the scientific register of the English language in the eighteenth and nineteenth centuries.
Chapter 4
Infrastructure for analysis of the CEPhiT corpus Implementation and applications of corpus annotation and indexing Andrew Hardie
Lancaster University
1. Introduction While the CEPhiT corpus is a newly available resource, many of the resources that can be exploited to enhance the analysis of this kind of early Modern English data are long-established. Such resources include, on the one hand, different levels of automated corpus analysis which can expand the range of procedures available to the researcher; and on the other hand, tools for swift, effective and powerful corpus indexing and querying. In this chapter, I will describe an assemblage of both types of tool and explain how they have been brought together in service of the analytic goals of the corpus creators. Section 2 explains the different forms of annotation that we apply, in the order they are applied: first spelling regularisation, then part-of-speech tagging, and finally semantic tagging and lemmatisation. Section 3.1 explains the data model employed by Open Corpus Workbench (CWB), an indexing and query program able to handle the resulting multilayered annotation, and Section 3.2 illustrates how the CQPweb user interface to CWB makes the resulting sophisticated search system accessible to and usable by a wide range of researchers and other users. In sum, the version of the corpus annotated by the toolchain and indexed into CQPweb provides an alternative to CCT as a means of analysing CEPhiT (and, thus, potentially other components of the Coruña Corpus) whose affordances and capabilities only partially overlap those of CCT.
doi 10.1075/z.198.04har © 2016 John Benjamins Publishing Company
62
Andrew Hardie
2. Corpus annotation 2.1
The purpose of corpus annotation
The advantages of applying various sorts of analytic annotation to corpus data as a preliminary step to undertaking serious research have been well-rehearsed in the literature (see for instance Leech, 1997; Leech & Smith, 1999) and do not need to be explained at length here. When “tags” of various kinds are applied to corpus data, they enable the two fundamental operations of corpus analysis – frequency counts and concordances – to be undertaken at levels of abstraction higher than that of the orthographic word form. For instance, lemmatisation allows frequency lists to be compiled at the level of the dictionary headword, which may subsume many independent wordforms. An example of this is that in the British National Corpus (BNC), the wordform break occurs 9,118 times, but the verb lemma break occurs 18,614 times; there are clear conceptual reasons why we might wish to have the frequency of breaks, broke, broken tallied in with that of the precise wordform break. On the concordancing side, the definition of search terms for abstract grammatical constructions such as the passive or the perfect is much more easily achieved operating at the level of part-of-speech tags than at the level of wordforms. Conversely, annotation can disambiguate the wordforms: break is both a noun and a verb, and broken is both a participle and an adjective – POS tags allow us to distinguish the different grammatical functions when searching or compiling frequency lists. Moreover, when annotation is applied standardly and the resulting tagged text is made available to a research community, there are additional advantages in terms of (1) avoiding the duplication of labour (since annotation can be applied once and used by many researchers) and (2) consistency of analysis (since all analysts are working from the same archived set of taggings). Naturally, this applies only to the levels of annotation which are by their nature applicable in many different types of research. There are many other kinds of annotation which a researcher might apply for the specific purposes of their own research questions. This latter type of annotation is not what is being discussed here. Rather, I will confine my attention to four types of word-level annotation which are very widely utilised in many different kind of linguistic study, namely: spelling regularisation, part-of-speech tagging, semantic tagging, and lemmatisation.
2.2
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
Spelling regularisation
The function of spelling regularisation considered on its own terms is a straightforward instance of the general grouping and disambiguation functions noted above. Given that the same wordform may be spelled in several different ways in early Modern English, due to the lack of consistent spelling standards in that period, it is obviously advantageous if the corpus can provide direct means for unifying searches and frequency counts for different variants. For example, in CEPhiT we observe the spelling cou’d for could, as well as the modern spelling. An analyst approaching the CEPhiT texts and interested in the use of modal verbs would not necessarily think to run a concordance for cou’d as well as could. Even if the analyst is aware of the spelling variation issue, then it is still often difficult to come up with a full list of all possible variants on the basis of intuition. It is much more effective if it is possible for a search of the corpus for any spelling form of could to also automatically retrieve all instances of the other variants, in this case cou’d. This is most effectively achieved by regularising all forms to the (single, standard) modern spelling which can then be used as a key to access all the other variants. A key point of spelling regularisation annotation is that, for the regularised forms to be most useful for searching, the original spelling must be replaced by the regularised form. However, it is customary to retain the original spelling of any amended word in the corpus, since the actual orthography may often be of considerable scholarly interest, and if the original forms are retained within the annotation, the original orthography of the text is retrievable either automatically or manually. For instance, in the following brief extract from the CEPhiT text “phil 1700 Astell 42-89”, we can see that the word cou’d appears only within an attribute of the XML element , indicating a regularisation; it is thus “fenced off ”, so to speak, from the main searchable part of the text, but is still recoverable: nor give an impartial By-ſtander (could ſuch an one be found) any occaſion from thence to ſuſpeƈt
(In a fully regularised version of this text, the words By-ſtander, ſuch, occaſion and ſuſpeƈt would also be altered to their corresponding modern versions.) However, spelling regularisation is not only of use for making searches and frequency counts of wordforms more tractable in corpora such as CEPhiT. It also has an essential role as a feed-in to the other steps of annotation. As noted above, this chapter presents a chain of tools which includes part-of-speech tagging, semantic tagging and lemmatisation. In many types of corpus annotation, one form of annotation acts as the input for another. This is the case, as we will see below, for part-of-speech tagging, which – as well as being of much use in its own right –
63
64 Andrew Hardie
acts as the input to semantic tagging. But it is spelling regularisation for which the function of the annotation as an input to subsequent layers of annotation is perhaps most significant. Given that most existing annotation tools are designed primarily if not solely for contemporary English, the rate of accuracy that they can achieve on the non-standard orthography of the early modern period is much lower than that typically reported for contemporary English. For example, typical state-of-the-art part-of-speech taggers for English attain in excess of 95% accuracy. Such an accuracy rate cannot be expected working from early modern spellings. This is because one of a tagger’s major resources is a part-of-speech lexicon for many of the most common wordforms in a language; when spelling variation prevents the tagger from matching a word against its lexicon, the base of information that the tagger can use to classify that word is drastically reduced. On the other hand, if early modern texts are regularised prior to tagging, then the accuracy rate that can be achieved is much closer to that attainable with contemporary English. The system for spelling regularisation applied in the toolchain described here is VARD (the “VARiant Detector”) described by Baron et al. (2009) and under continuous development at Lancaster. VARD uses lexical lookup to identify wordforms in a text which do not seem to be valid words of contemporary English, and then applies a range of techniques to generate possible regularisations. Critically, although a “default” set of data for Early Modern English regularisation has been established, it is also possible to train VARD through its interactive interface to take account of the particular spelling issues found in any particular text. The version of CEPhiT made available as described in Section 3.2 has been regularised using the default training only, but it is possible that future researchers may find it useful to explicitly train a more accurate set of parameters to remove any remaining unevenness in the spelling regularisation. 2.3
Part-of-speech tagging
Part-of-speech tagging is the assignment to each word of a single label indicating that word’s grammatical category membership. It is the longest-established form of corpus annotation, with the first efforts in the direction of automatic taggers going back as far as the early 1960s (see Klein & Simmons, 1963; Stolz et al., 1965), and the first practical, high-accuracy tagging software emerging in the 1980s (for instance, Garside et al., 1987; Church, 1988; DeRose, 1988; Cutting et al., 1992; Karlsson et al., 1995). The part-of-speech tagger which is used standardly at Lancaster University is version 4 of the CLAWS tagger (Leech et al., 1994), which was originally
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
developed to tag the 100 million word BNC. CLAWS is a hybrid tagger, meaning that, when faced with a word which might potentially have more than one part of speech, it uses both linguistic rules and a probabilistic model (in the form of a Hidden Markov Model) to work out, based on context, which of those tags ought to be applied. The system of classification or tagset that CLAWS uses has, like CLAWS itself, evolved over time; the most generally used output tagset – and the one employed here – is referred to as C7.1 This is a relatively fine-grained tagset which as well as distinguishing major parts of speech also draws subsidiary distinctions between any sub-groups of words which may at least potentially have different (morpho-)syntactic behaviour. For instance, the major class of determiners is divided into three major subcategories of determiners (e.g. this, that, any, some), after-determiners (e.g. former, same), and before-determiners (e.g. all, half, both), representing the positions these forms may take in a noun phrase with a complex determiner. This gives us the tags DD, DB and DA, each of which is further subcategorised (e.g. for number). Likewise modal (VM) and primary auxiliary (VB, VD, VH) verbs are both separately distinguished from lexical verbs (VV). This level of detail, applied across the board, results in an analytic scheme of 137 categories. The C7 tagset is hierarchically designed. That is, each tag consists of a string of letters, where the leftmost letter indicates the major wordclass (such as D for determiner or N for noun or V for verb), and each subsequent letter indicates a further level of subdivision. Because of this, it is possible to operate at multiple levels of generality. For example, though there are a total of 31 different verb tags, one can easily search for “any verb” in a C7 tagged corpus simply by searching for “any tag beginning with V” – in regular expression syntax, for instance, this would be expressed as V.*
However, complex search languages such as regular expression syntax may often be outside the capabilities of the novice or non-technical corpus linguist. For this reason, the compilers of the most recent edition of the BNC devised a simpler tagset which records only the major word category, such as SUBST (i.e. Noun), VERB, ADJ, and so on. In our toolchain, this system is generalised and applied to all output from the CLAWS tagger, meaning that the simple tags are always available alongside the full linguistic complexity of C7. Thus, the needs of a broader set of users is addressed.
1. See http://ucrel.lancs.ac.uk/claws7tags.html
65
66 Andrew Hardie
2.4
Semantic tagging
The underlying principle of semantic tagging is the same as part-of-speech tagging, namely, the assignment to tokens of labels representing categories within an analytic schema. The difference is that in the case of semantic tagging, the tags represent categories of meaning – and the tagset as a whole thus represents some ontology or way of dividing up all possible meanings into various domains or concepts. Semantic tagging is a fundamentally harder task than part-of-speech tagging. First, the need for a base of built-in knowledge is greater; when a part-of-speech tagger encounters an unknown word, it can make a guess on the category based on morphological form, but the morphological form of a word typically gives little or no clue as to the semantic field it belongs in. Moreover, while a word can be ambiguous for both grammatical category and semantic field (in the latter case, due to polysemy or homonymy) the contextual factors that allow us to resolve the grammatical categories tend to be much more directly local – and thus more amenable to computational analysis – than are the contextual factors that allow human beings to resolve the meanings of polysemous tokens. For this reason, we must be prepared to work with a higher error rate when analysing the output of a semantic tagger than when working with part-of-speech tags. Nevertheless, semantic tags can be extremely useful, especially for detecting patterns across groups of semantically-related words which are individually too rare for the pattern to appear as significant unless they are considered collectively. The UCREL Semantic Analysis System (USAS; see Rayson et al., 2004; Rayson, 2008) has been under development at Lancaster University since the early 1990s. Over time, the semantic lexicon which is its major linguistic knowledge-base has steadily expanded until it is able to cover a very large fraction of the words in any given text. USAS is also able to make use of CLAWS tags as one of the factors in its input (since the set of semantic fields to which a given word used as a noun belongs may be different to the set of semantic fields to which the same word used as a verb belongs, for instance). The format of both the semantic lexicon, and the output applied to the text, is a string of possible semantic analyses where the first analysis on the list is considered the most likely. The USAS tagset2 is, like the C7 tagset, hierarchical in nature. Its top level is a division of the whole range of possible meanings into 20 categories, indicated by (mostly arbitrary) letters. Each of these categories is then divided into subcategories represented by numbers, which may be divided yet again, for up to four levels of distinctions in all. Thus, the tagset consists of 232 tags. However, this does 2. See http://ucrel.lancs.ac.uk/usas
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
not exhaust the possibilities of the analytic scheme, because these tags may be further modified by flags such as + or – to indicate a positive or negative position on a semantic scale, or m, f or n to indicate gender on words where gender is a relevant semantic feature. Two different semantic tags can also be joined together by a slash, to indicate double category membership. This ontology was originally based on that of the Longman Lexicon of Contemporary English (McArthur, 1981) but has evolved substantially. Let us consider this system in more concrete terms, taking the phrase printer’s device as an example. Tagged by USAS, this phrase comes out as follows. printer Q1.2/S2mf Q1.2/O2 ’s Z5 device O2 X4.2 Q1.1
The word printer has two possible semantic tags, both of them dual-membership tags. The first half of each tag is Q1.2, “Paper documents and writing”, a subcategory of the major domain Q “Linguistic actions, states and processes”. The first analysis conjoins that category with S2mf, indicating “People” (where the mf indicates it could be either a male or female person) whereas the second analysis is conjoint with O2, “Objects generally”. Thus, the two tags represent the two senses of printer, the profession and the mechanism respectively. The genitive marker, like all function words, receives the tag Z5 “Grammatical bin”. Finally, device receives three possible analyses: O2 “Objects generally”; X4.2 “Mental objects: Means, method”; and Q1.1 “Communication in general”. The level of detail in the semantic analysis scheme is such that for some purposes – for example, statistical calculations – the categories can become actually too fine-grained. For that reason, it is sometimes preferable to (a) ignore all analyses except the first; (b) ignore subsidiary joint categories. Thus, semantic tagging effectively produces two annotations for each token: the full semantic analysis, and the broader, less detailed analysis that categorises every word one way and one way only. 2.5
Lemmatisation
Lemmatisation is conceptually the most straightforward of the word-level annotations; every word is, simply, annotated with the lemma to which it belongs (where each lemma is represented by its morphological base form).3 Thus, searches and frequency counts can be based on grouping together all the separate inflectional 3. It is also computationally straightforward, and in the toolchain described here is applied simultaneously with the semantic tags by USAS.
67
68 Andrew Hardie
forms of a single word, rather than just on the raw orthographic wordforms. However, even this relatively simple operation can raise issues regarding what is, and what is not, considered to be part of the same lemma. For example, in the scheme of lemmatisation applied by the toolchain under discussion, it is considered that a part-of-speech category shift (for instance, from a verb to a noun) is a derivational process that creates a new lemma. This does not usually create much confusion, since part-of-speech information can be included or excluded as necessary. For example, the wordform break can be either a noun or a verb, and in either case it will be lemmatised as break (since that is the base form whether it is a noun or a verb). On the other hand, when the lemmatisation is combined with the part-of-speech analysis, we have two separate lemmata, typically represented as break_SUBST and break_VERB. However, things are not always that straightforward. The word broken, for instance, can be tagged either as a participle or as an adjective. If it is tagged as a participle, it is lemmatised as break – since a participle is an inflectional form of a verb base. On the other hand, if it is tagged as an adjective, broken is lemmatised as broken – since the category shift is deemed, by the assumption of the scheme of analysis, to be a derivational process that creates a new lemma. So the lemmatisation does not always group elements together; in some cases it can draw distinction between elements that are orthographically identical, in line with the general disambiguating role that all forms of word-level annotation may perform. 3. Indexing and querying 3.1
Applying the CWB data model
The process of annotation described above results in a corpus with several layers of annotation. As annotation becomes multilayered, the practical difficulties of searching and analysing the corpus grow. A plain-text corpus, or one with only a single layer of word-level annotation, is easily queried by software tools which operate by searching for strings of characters in the input files. For complex, multilayered annotation, such as that has been applied to CEPhiT by the toolchain described above, another solution is needed. Open Corpus Workbench (CWB; see Evert & Hardie, 2011) is a system designed to deal with this issue. In CWB, a corpus is conceptualised as a set of attributes of different types. A positional attribute, for instance, represents an analysis which has a particular value at every token-position in the corpus – such as the actual words of the text, or some collection of tags applied to those words. On the other hand, a structural attribute represents some region in the corpus with a
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
defined beginning and end point (such as a sentence, paragraph, or text – that is, the kinds of things typically represented by XML markup). This data model is realised in the format that corpus files must have to be indexed by CWB. Consider the following simple example of this file format:
It PP it was VBD be
an DT a elephant NN elephant
. SENT .
This is a columnar format, where each column represents a particular word-level annotation (i.e. p-attribute), and each line represents a single token in the corpus. Punctuation marks are treated as separate tokens. When the data is indexed, each token is assigned a corpus position number, beginning with 0; so in the example above the tokens It, was, an, elephant and . will be assigned the corpus positions 0, 1, 2, 3, and 4 respectively. Note that XML tags such as are not considered as part of this token stream, so that the tokens was and an are adjacent despite the intervening tag. The data example above contains three p-attributes, one represented by each column. The leftmost column normally contains the actual word tokens of the corpus, as in this example. Then, the second and third columns each contain a word-level annotation – in this case, a part-of-speech tag and a lemma. Between each column is a single tab character. It is possible to add as many p-attributes as necessary for whatever other token-level annotation is available in a given corpus, such as phonemic or phonetic transcription, morphological annotation, part-ofspeech tagging, lemmatisation, or semantic tagging. As well as positional attributes, CWB’s data model includes structural attributes. Structural attributes (s-attributes) are associated not with single tokens, but rather with spans that have a given start and end point in the corpus). They are thus equivalent to XML elements. Typical things encoded as s-attributes are sentences (as shown by the element in the short example above), paragraphs, and utterances; when text boundaries are included in a corpus index, these are also specified as s-attributes; phrases can also be represented as sets of s-attributes (e.g. the noun phrase an elephant in the example above). Because of the close correspondence between s-attributes and XML elements, XML-style start and end tags are used to represent s-attributes in the input format, as shown above. Each XML start or end tag is placed on a separate line in the input data.
69
70 Andrew Hardie
Given a corpus such as CEPhiT which is encoded as XML according to the TEI standard, there is a close correspondence between the XML mark-up of the files as distributed and the input format required for CWB. The main difference is that in true XML, whitespace is nearly always irrelevant to the structure of the data, but in columnar format whitespace characters are used to indicate tokenisation and to delimit word-level attributes. Because whitespace is significant, each XML element must occur on a line on its own, with the opening angle bracket as the very first thing on the line. A more subtle discrepancy between true XML and CWB’s input format is that CWB has no notion of a corpus file header providing meta-information about a text document. An XML file header may incorporate elements that themselves have textual content, such as the following standard component of the CEPhiT file headers:
phil 1727 Greene 1-13.xml Research Group for Multidimensional Corpus-based Studies in English Xunta de Galicia, Universidade da Coruña, Ministerio de Ciencia e Innovación and Deputación da Coruña Dir. by Isabel Moskowich-Spiegel
Such a header must be recoded before CWB indexing, since in the format given, CWB would not be able to distinguish the textual content of the header fields from the actual text of the corpus file. A typical approach is to remove the header entirely and instead add text-level metadata to a separate database. This process is discussed further in the following section. In sum, by breaking a corpus up into separately-indexed, and separately- queriable attributes, CWB makes the task of working with a heavily-annotated corpus drastically more tractable. The index in question is a data structure which makes it possible for the results of a query to be located in the corpus without the program having to work sequentially through the entire underlying text from beginning to end. While indexing is an absolute necessity for very large corpora, even for corpora the size of CEPhiT it is a considerable convenience. However, the complexity of the data model – and the fairly heavy-duty query language which CWB’s concordance program, the Corpus Query Processor or CQP, uses to grant access to all of the various attributes – means that a CWB- indexed corpus requires quite a large amount of technical know-how to use effectively. To get around this problem, CWB corpora can be accessed via the graphical user interface system, CQPweb.
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
3.2
CQPweb
CQPweb (see Hardie, 2012) is a web-based front-end interface to CWB. It is designed to be drastically more user – friendly than accessing CWB directly, since it follows the well-established and widely-used interface of the BNCweb software (Hoffmann et al., 2008). CQPweb was first developed as a tool to support the teaching of corpus linguistics, but has found applications in a wide variety of research projects as well. CQPweb can be installed on a researcher’s own computer, but it can also be installed on a public server; Lancaster University’s CQPweb server4 makes available a wide variety of corpora developed by Lancaster and by other research centres. CQPweb places some additional demands on the data formatting, beyond those inherent to CWB. The corpus must be divided into texts, which must be indicated with elements. Each text must have an ID code, consisting only of letters and numbers, and represented in the input file by an “id” attribute on the elements. This requires recoding of the CEPhiT structure based on TEI. Most notably, the long text-identifiers such as “phil 1727 Greene 1-13.xml” must be replaced by shorter letter-and-number codes such as “phil1”. Overall, then, when all the annotation processes described above are applied sequentially to CEPhiT, with the file headers removed and recoded separately, and with ID codes added to the text-delimiting XML, the resulting file format looks like this:
What DDQ what Z8 then RT then N4 is VBZ be A3 to TO to Z5 be VBI be Z5 done VDN do A1:1:1 ? YQUE PUNC
PRON ADV VERB PREP VERB VERB STOP
what_PRON then_ADV be_VERB to_PREP be_VERB do_VERB PUNC_STOP
Z8 Z5 N4 Z5 T1:2 A3u Z5 Z5 Z5 A3u A1:1:1 G2:2d
What then is to be done ?
The columns in the example data above represent the following annotations:
4. http://cqpweb.lancs.ac.uk
71
72
Andrew Hardie
1. Word token (with regularised spelling where applicable) 2. Part-of-speech tag (C7) 3. Lemma 4. Semantic tag 5. Simplified part-of-speech tag 6. Lemma plus simplified part-of-speech tag 7. Full semantic analysis 8. Unregularised word token In addition, the following XML elements are present in the corpus (some but not all of these are visible in the short sample above): – – as noted above. – – the CLAWS tagger produces sentence boundary annotations as part of the process of assigning part-of-speech tags, and these are preserved in the output. – Other XML elements – are preserved from the TEI input of CEPhiT, namely p, del, abbr, pb, emph, div[0-7], and head. (There is not sufficient space here to go into full detail on the use of these XML tags in CEPhiT.) The ID codes are used to link each text to an entry in a database of text-level metadata. This must be imported into CQPweb at the time of indexing. The metadata must be formatted as a single text file, with each text’s information stored on a single line, and with fields of metadata divided into columns (as with the CWB input format, all columns must be delimited by tabs). The first column contains the ID code which appears in the elements of the underlying corpus, and is used to link metadata to text. As many additional fields as required can then be provided. Fields can be of two sorts: free text where the content of the field is expected to be different for every text, and classifications where the field takes one of a limited number of values, each of which represents a category in some classification scheme. In a classification field, the values must consist only of short, letter-and-number-only labels, which can be linked to full descriptions elsewhere. In the case of CEPhiT, the CQPweb metadata is drawn from not only the corpus headers, but also from the separate XML metadata files. The classifications inherent in the corpus are (1) author sex (using the one-word labels male and female), genre (using labels including Treatise, Essay, Textbook and so on) and publication decade (using labels 1700s, 1710s, etc.) The publication decade is not stored explicitly in the original XML metadata; rather, the publication date is represented as a specific year. However, from the point of view of CQPweb, the year of publication is a free-text field, expected to vary greatly with few or no texts having the same value. In order to make decade of publication usable as a
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
discrimination factor within CQPweb, it is necessary to represent both the decade as a classification field, as well as the underlying year as a free-text field.5 Finally, publication century is stored as an additional classification field, grouping decades together. The other metadata fields, all free text, are as follows: – – – – – – – – – – –
Filename in the original corpus (name of untagged source XML file) Short title of the text Full title of the text Original word count of the sample Publication date (year) Author’s name Author’s place of education (represented across three fields) Author’s birth and death dates Author’s occupation Author’s age when the text was published Place of publication (as printed in the text itself; often includes the name of the printer) – Description of the section of the original text included in the sample within CEPhiT – A short descriptive paragraph “about the text” – A short descriptive paragraph “about the author”. The screenshots in Figures 1 and 2 show the CQPweb installation of CEPhiT in action. Figure 1 is the result of a concordance search for the query string _J* philosophy – that is, the noun philosophy preceded by any adjective. This query illustrates the ability to search for phrases in which certain “slots” are specified by part-of-speech tags and not by traditional search terms. Figure 2 exemplifies CQPweb’s statistical collocation display. The list of collocates shown here is based on a query for {philosophy} alone – searching for the lemma of philosophy and thus including instances of the word philosophies. The collocates have been extracted using the mutual information statistic (which picks out pairs of items where the frequency of the words together is high relative to the frequencies of those items apart). CQPweb has many more functions than can be illustrated in the space available here. However, the outputs in Figures 1 and 2 do serve to demonstrate how, by indexing the CEPhiT corpus within CQPweb, access to the multiple layers of annotation – here, part-of-speech tag and lemma – has been made much more
5. I gratefully acknowledge the efforts of Leida Maria Monaco in the creation of the reformatted metadata table for CQPweb insertion.
73
74
Andrew Hardie
Figure 1. A CQPweb concordance output: search term _J* philosophy
Figure 2. A CQPweb statistical collocation output: search term {philosophy}
straightforward and user-friendly for researchers without any background in corpus methods. Likewise, advanced analytic techniques (including collocations, semantic collocations, and grammatical pattern queries) are readily available through the affordances of the indexed corpus.
Chapter 4. Infrastructure for analysis of the CEPhiT corpus
4. Conclusion Much longer works than this chapter can be – and indeed have been – written on the topics of corpus annotation (e.g. Garside et al., 1987) and indexing (e.g. Evert & Hardie, 2011). The purpose of this chapter, rather than to provide a fully detailed account of any of the annotation issues discussed, has been to build up an overall picture of the preliminary technical steps that can be used to implement a corpus infrastructure – an infrastructure which enables modes of analysis to which many corpus users, especially those without extensive technical expertise, might perhaps not otherwise have access. It bears emphasis that none of the tools discussed in this chapter are new; all are tried-and-tested with extensive pedigrees as infrastructure for corpus linguistic research. What has been demonstrated here is how the combination of these tools into a single assemblage can significantly facilitate the analysis of corpora such as CEPhiT.
References Baron, Alistair, Paul Rayson & Dawn Archer (2009). Automatic standardization of spelling for historical text mining. In Proceedings of Digital Humanities 2009 (309–312). Bethesda, MD: University of Maryland, USA. Church, Kenneth (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied Natural Language Processing (136–143). Austin, Texas: Association for Computational Linguistics. doi: 10.3115/974235.974260 Cutting, Doug, Julian Kupiec, Jan Pederson, & Penelope Sibun (1992). A practical part-ofspeech tagger. In L. Ahringerg, N. Dahlback & A. Jonsson (Eds.). Proceedings of the third conference on Applied Natural Language Processing (133–140). Trento: Association for Computational Linguistics. DeRose, Steven Joseph (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14/1, 31–39. Evert, Stefan & Andrew Hardie (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In M. Davies, P. Rayson, S. Hunston & P. Danielsson (Eds.). Proceedings of the Corpus Linguistics 2011 conference (1–21). Birmingham: University of Birmingham. http://www.birmingham.ac.uk/documents/college- artslaw/corpus/conference-archives/2011/Paper-153.pdf. Garside, Roger, Geoffrey Leech & Geoffrey Sampson (1987). The Computational Analysis of English: A Corpus-based Approach. Harlow: Longman. Hardie, Andrew (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17/3, 380–409. doi: 10.1075/ijcl.17.3.04har
75
76
Andrew Hardie
Hoffmann, Sebastian, Stefan Evert, Nicholas Smith, David Lee & Ylva Berglund Prytz (2008). Corpus Linguistics with BNCweb – a Practical Guide. Frankfurt am Main: Peter Lang. Karlsson, Fred, Atro Voutilainen, Juha Heikkilä, & Arto Anttila (Eds.) (1995). Constraint Grammar: a language-independent system for parsing unrestricted text. Berlin: Mouton de Gruyter. doi: 10.1515/9783110882629 Klein, Sheldon & Robert F. Simmons (1963). A computational approach to grammatical coding of English words. Journal of the Association for Computing Machinery, 10, 334–347. doi: 10.1145/321172.321180 Leech, Geoffrey, Roger Garside & Michael Bryant (1994). CLAWS4: The tagging of the British National Corpus. In N. Calzolari (Ed.), Proceedings of the 15th International Conference on Computational Linguistics (COLING 94) (622–628). Kyoto, Japan. doi: 10.3115/991886.991996 Leech, Geoffrey (1997). Introducing Corpus Annotation. In R. Garside, G. Leech & A. McEnery (Eds.). Corpus Annotation (1–18). London: Longman. Leech, Geoffrey & Nicholas Smith (1999). The use of tagging. In H. van Halteren (Ed.). Syntactic Wordclass Tagging (23–36). Dordrecht: Kluwer Academic Publishers. doi: 10.1007/978-94-015-9273-4_3 McArthur, Tom (1981). Longman Lexicon of Contemporary English. London: Longman. Rayson, Paul, Dawn Archer, Scott Piao & Tony McEnery (2004). The UCREL semantic analysis system. In Proceedings of the workshop on Beyond Named Entity Recognition: Semantic labelling for NLP tasks in association with the LREC 2004 (7–12). Lisbon: LREC. Rayson, Paul (2008). From key words to key semantic domains. International Journal of Corpus Linguistics 13/4, 519–549. doi: 10.1075/ijcl.13.4.06ray Stolz, Walter S., Percy H. Tannenbaum & Frederik V. Carstensen (1965). A stochastic approach to the grammatical coding of English. Communications of the ACM (Association for Computing Machinery) 8/6, 399–405. doi: 10.1145/364955.364991
Chapter 5
On the shoulders of giants An overview on the discussion of science and philosophy in Late Modern times Marina Dossena
University of Bergamo
To make experiments, at random, is not to philosophize; it becomes philosophy, only when the experiments are made with a certain view. (CEPhiT, Brown, 1820)
1. Introduction: New perspectives on Late Modern English In the first decade of the twenty-first century Late Modern times finally gained the place they had always deserved in histories of the English language (see Beal, Fitzmaurice & Hodson, 2012: 201–202): an increasing number of studies presented in volumes, journals and conference proceedings testify to the growing scholarly attention that the eighteenth and nineteenth centuries have elicited among scholars. Besides, such attention has enabled a few popular myths to be dispelled; first of all, Late Modern English is no longer seen as so close to Present-Day English as to be less worthy of investigation than other periods, despite its undoubted specificities. Nor is it any longer seen as the time of blinkered prescriptivism, where rules were set down to remove ‘barbarous, provincial’ traits in phonology, grammar and vocabulary. While stressing the much greater value of standardised forms, prescriptive comments had to refer to a status quo which was – albeit indirectly and always negatively – assumed to exist and is described between the lines. As a result, Late Modern documents, such as spelling books, letter-writing manuals, grammars and dictionaries, are now seen as very valuable sources against which to assess the uses found in both literary and non-literary genres. However, Late Modern times did give great attention to opportunities for ‘improvement’, whether this concerned farming, city dwelling, or indeed language. This holistic approach is observed, for instance, in John Sinclair’s Statistical Account of Scotland (1791–1799), for which parish ministers were invited to answer a set of 160 questions on the geography and topography of their parish, its doi 10.1075/z.198.05dos © 2016 John Benjamins Publishing Company
78
Marina Dossena
natural resources, its population and its production (see Plackett, 1986). Language was also taken into consideration, and it may be interesting to trace the roots of Scottish Standard English in what is sometimes described as “neither English nor Scottish, but a mixture of both” (Dumfries), or “a dialect of the Scottish and English blended together” (Banff). More attention was possibly given to place-names: in the First Circular Letter to the Clergy of the Church of Scotland, of 25th May 1790 (reported by Broadie, 1997: 566), Sinclair included the following questions in Section IV “Miscellaneous questions”: 117. Has the parish any peculiar advantages or disadvantages? 118. What language is principally spoken in it? 119. From what language do the names of places in the parish seem to be derived? 120. What are the most remarkable instances of such derivations?
In the same decades interest in etymologies and the genealogy of languages had been growing considerably. This was another very valuable contribution to contemporary knowledge, the roots of which stretch back to the seventeenth century, and which is indebted to the new attitude to investigations that had been developing in those years and would give a remarkable boost to science. In the seventeenth century the establishment of the Royal Society had contributed greatly to the creation of a new discursive mode concerning scientific investigation: the important academic dialogues in the form of letters that would also feature in the Philosophical Transactions – written in English, instead of Latin – were a crucial channel of communication for scholarly findings that authors wished to circulate more quickly and accessibly than in treatises (see Atkinson, 1999). Though Latin continued to play an important part, especially in international exchanges, English acquired importance in the dissemination of knowledge and argument throughout the country, and its innovations left their mark on more general uses. In this respect, the timelines showing the number of words first recorded by the Oxford English Dictionary (henceforth OED) within different time periods provide a fascinating picture of the indebtedness of English vocabulary to Late Modern times, and particularly to its latter decades. In general, over the seventeenth, eighteenth and nineteenth centuries, as many as 146,566 new items or meanings were recorded for the first time: 50,918 in the seventeenth century, 24,839 in the eighteenth century, and 74,809 in the nineteenth century – a figure corresponding to 50% of the total, which is indicative of the momentum that the nineteenth century had in the history of the language. If we then focus our attention on science and philosophy, we find the pattern to be similar, with a definite
Chapter 5. On the shoulders of giants
Table 1. New lexical items or meanings first recorded in the OED (1600–1899) All 1600–1649 1650–1699 1700–1749 1750–1799 1800–1849 1850–1899
17th c.
18th c.
19th c. Total
Sciences
Philosophy
No.
%
No.
%
No.
%
29,410 21,508 50,918 11,332 13,507 24,839 32,101 42,708 74,809 150,566
19.53 14.28 33.82 7.53 8.97 16.50 21.32 28.36 49.69 100
8,034 8,921 16,955 5,578 8,240 13,818 21,179 39,414 60,593 91,366
8.79 9.76 18.56 6.11 9.02 15.12 23.18 43.14 66.32 100
310 370 680 120 163 283 444 823 1,267 2,230
13.90 16.59 30.49 5.38 7.31 12.69 19.91 36.91 56.82 100
increase of items in the seventeenth century, a much smaller increase in the eighteenth century, and a clear soar in the nineteenth century1 – see Table 1. Percentages also show the relatively greater impact of scientific vocabulary in the nineteenth century, while in the seventeenth century more philosophical vocabulary was recorded for the first time. Among such new items we find ear- mindedness (1888), East Coast fever (1881), educational psychology (1865), ego (1894), agnostic (1869), anthropocentric (1863), self-realization (1874) and subjectivism (1857). Further back in time items like the following had been recorded for the first time: comprehensive (1614), consciousness (1605), anamnesis (1656), lemmatical (1665), ontology (1663), acoustics (1684), adipose (1653), aesthetics (1770), snow-storm (1771) and typhoid fever (1789). As regards specific disciplines, geology contributed Jurassic in 1831 and Triassic in 1841, the same year in which dinosaur appears to have been first recorded. According to the OED, palaeontology is first recorded in 1836, while bacteria are first discussed in 1847. As for silicon, it was first named in 1817, replacing silicium. Alessandro Volta’s very early experiments with electricity in the late eighteenth century led to volt (1873); watt (1882) was called thus in honour of James Watt, following a process that had also been at work in the case of galvanism (1797), 1. The second half of the nineteenth century appears to have given the greatest contribution to English vocabulary since the year 1000. Dossena (2012a) has outlined how nineteenth-century English lexis was profoundly influenced by both scientific discoveries and by innovations introduced by new legislation (for example concerning the press and the post office). Not only were new terms introduced, but some changed their connotation or meaning, or instead became obsolete. As a result, both specialised and non-specialised users could rely on new expressive tools.
79
80 Marina Dossena
from the name of Luigi Galvani, who first described these phenomena in 1792. Explorations led to the discovery of kangaroos (1770) and koala bears (1808), eucalypti (1788) and sequoias (1866). Towards the end of Late Modern times, new discoveries and inventions gave centre stage to fields in which interest was growing, and which were a function of the new outlook that was developing onto the world on both sides of the Atlantic: modernity was unfolding, and language was of course a mirror held up to culture. Although Early Modern scientific English has long been a very valuable object of study (see for instance Valle, 2004; Gotti, 2006, and Taavitsainen, 2007), Late Modern times have only recently begun to be investigated (among the earliest studies, see for instance Moskowich, 2011 and Crespo, 2011). In addition, relatively little attention has been given so far to the dissemination of news of discoveries among the general public rather than just within a specialised circle of scholars.2 The Enlightenment was a crucial stage in European history, and the role played by English-speaking scholars cannot be overestimated. The ways in which they corresponded with other scholars, their views were circulated and discussed, and their overall contribution to the history of ideas left a remarkable trace also in the ways in which argumentative discourse in English acquired new value. Science and language, though on the surface belonging to different domains, grew to be seen as closely related – findings could not be circulated unless appropriate linguistic tools were available; in turn, language could be ‘improved’, so as to reflect the rationality that the new methods of investigation advocated so forcefully. Within this framework, Scotland is well-known to have had a crucial part and it may not be accidental that one of the most important eighteenth-century ‘societies’, the Select Society, was established in Edinburgh for “the pursuit of philosophical enquiry and the improvement of the members in the art of speaking”.3 Indeed, many of its members were among the strongest advocates of the cause against Scotticisms.4 2. The first studies to consider this topic are Bertacca (2010), Chiavetta (2010) and Dossena (2010a and 2010b). Dossena’s earlier work (2002, 2003, and 2008) has underlined the strategies adopted by specialised users in the dissemination of information for non-specialised users in political, administrative, and legal discourse. 3. The Select Society was established on 22 May 1754 by the painter Allan Ramsay and held weekly meetings in the Advocates’ Library – see http://enlightenment.nls.uk/clubs-and- societies/source-2 (accessed Feb. 2015). 4. The struggle to improve language, however, was not without occasional misgivings, as in the case of James Boswell, who wished to approximate southern standard forms both in phonology and in lexical usage, but also wanted to preserve Scots vocabulary on account of its antiquarian value, to the point that he began a specimen of a Scots dictionary (see Dossena, 2005 and Rennie, 2011 and 2012). In fact, Scots vocabulary was stigmatised in all non-literary
Chapter 5. On the shoulders of giants
In 1755 the Select Society set up the Edinburgh Society for the Encouragement of Arts, Sciences, Manufactures and Agriculture, which later became an independent organisation with its own officers, meetings, and subjects for debate.5 Other societies also aimed to disseminate knowledge: the Philosophical Society of Edinburgh, for instance, was established in 1737 as a spin-off of the Society for the Improvement of Medical Knowledge, which discussed and published accounts of ‘discoveries and improvements’ in medicine and science. In 1783 this became part of the newly formed Royal Society of Edinburgh, which continues to this day.6 Both north and south of the border, scientific enquiry was accompanied by reflections of a more philosophical nature; in many cases these could be quite radical – Mary Wollstonecraft’s Vindication of the Rights of Women (1792) was a landmark for the feminist movement of later decades, while William Godwin’s An Inquiry Concerning Political Justice (1793) bridged philosophical and political enquiry in the wake of the French Revolution. Though the events in France were an immense scare in Britain, they nonetheless made thinkers aware of the fact that a new approach to society was developing after the American Revolution: for better or worse, the world was changing. Indeed, in the nineteenth century societal and technological changes affected the cultural landscape in very significant ways: not only did the reading public expand considerably, but also new forms of participation in political and economic activities ensured that a new group of speakers and writers should develop their skills to meet new communicative requirements. 1.1
New resources for new research questions
The fairly recent attention to Late Modern times outlined at the beginning of this contribution has also enabled scholars to go beyond the secondary materials to which we have already referred, i.e. dictionaries, grammars and manuals, and collect new corpora by means of which to investigate usage in authentic documents.7
registers. Smollett, for instance, thus commented on Home’s Experiments on Bleaching (1756): “The language in some places is a little uncouth. – We meet with some Scottish words and measures, which an English reader will be at a loss to understand. Such as tramp for treading under foot, lint for flax, dreeper for a dripping-stand, bittling for a beetling, mutchkin for a pint, chopin for a quart, Scots pint for two quarts, Scots Gallon for sixteen quarts, etc.” (The Critical Review 1: 114, also quoted by Basker, 1993: 87). 5. See http://enlightenment.nls.uk/clubs-and-societies/source-3 (accessed Feb. 2015). 6. See http://enlightenment.nls.uk/clubs-and-societies/source-4 (accessed Feb. 2015). 7. An overview of recent corpora dealing with Late Modern English is provided by Dossena (2015: 4–6).
81
82
Marina Dossena
Such corpora are both more general in their scope and more focused on specific genres; for instance, leaving aside non-British varieties, the Corpus of Modern Scottish Writing 1700–1945 (henceforth CSMW) comprises more than 350 documents and ca. 5.5 million words of text in the following genres: journalism, verse and drama, personal writing, and administrative, expository, instructional, imaginative and religious prose. On the other hand, more specific tools have been created on correspondence (see for instance Dossena, 2012b and 2012c) and – what is more relevant here – on specialised discourse, with the texts collected in the Coruña Corpus of English Scientific Writing. This contribution aims to discuss the texts in the philosophy branch of the Coruña Corpus, the Corpus of English Philosophy Texts (henceforth CEPhiT), in order to investigate the main linguistic strategies employed by Late Modern authors to present their sources and express their attitude to them. These findings will be discussed together with those provided by CMSW, so as to enable potentially fruitful comparisons. The choice to focus on the discursive relationship between the texts and the sources to which they refer, whether to adopt their views or to criticise them, but always to build on them, derives from the consideration that this relationship is deemed to play a crucial part in the success of the argument. Authors select the sources they wish to quote on the basis of their validity for their own purposes, though this may imply a challenge. In addition, reference to specific sources contributes to their popularization, as contents are summarised and therefore selective processes of information transmission are seen to be at work. Indeed, some texts in CEPhiT are specifically designed to be instructional, as they are lectures or parts of courses. As a result, the corpus includes diverse modes of discourse used to present scientific contents to a public whose level of competence and specialization can vary considerably. Ranging from quintessentially scientific publications aimed at a public of experts to publications targeted at a wider public of learners, the corpus enables a broad range of investigations. My methodological approach comprises both quantitative and qualitative analyses, following the state-of-the-art theoretical framework of corpus-assisted discourse studies. 2. Opinion, point of view and authorial presence in discussions of science and philosophy Late Modern times were no strangers to lively controversy – already for many decades pamphlets on religious, political and social issues had been both very widespread and vociferous (Brownlees, 2006); at the same time, there was an ongoing and very fruitful scientific debate (especially, though not exclusively, within
Chapter 5. On the shoulders of giants
the Royal Society – see above) on new discoveries and recent experiments. Also ‘antiquarians’ with an interest in language (the ancestors of modern philologists), grammarians and lexicographers responded to each other’s work, in a kind of virtual dialogue that summarised the opponent’s views in order to challenge them, highlighting where such views could be shown to be faulty, and thus emphasizing the encoder’s (counter-)argument (see Dossena, 2003 and 2006). Philosophy was of course not excluded from these scholarly debates. The English Short Title Catalogue lists more than 3000 texts published between 1700 and 1899 with the element ‘philosoph*’ (i.e., ‘philosophy’ or ‘philosophical’) in the title: 271 in Scotland and 3025 in England (92% of the total, 230 of which in the Philosophical Transactions of the Royal Society). As for ‘scien*’ (i.e., ‘science’, ‘sciences’ or ‘scientific’), there are 3512 entries for England (94% of the total) and 230 for Scotland – while the total number increases, percentages are similar, though there is a slight increase in the number of texts of scientific interest published in England. While the ideal recipients of pamphlets were all the readers with an interest in the topic, letters and essays exchanged among scholars had a more limited circulation (whether they were investigating what nowadays we would call ‘hard sciences’, but which at the time was labelled ‘natural philosophy’, or language and its historical roots, or moral and ethical issues). Treatises and lectures targeted readers who could be peers or learners, but who could nonetheless be assumed to be informed of the basic issues under discussion. In any case, whether for pedagogical or argumentative reasons, other authors’ views were presented and supported, or actually challenged with counterarguments, in more or less clearly personalised ways. According to the studies summarised by Lewis (2012: 906), Late Modern scientific discourse witnessed a decrease in authorial presence. This is shown, first and foremost, by the decrease of attitudinal and modal features, and the adoption of a more impersonal style in conflict management. In what follows, an analysis of the ways in which sources are identified, validated and contrasted in CEPhiT will enable us to verify the extent to which greater or lesser authorial presence is conveyed by means of other lexical and pragmatic choices. As for the texts available in CSMW that will also be discussed, they are included in the ‘instructional prose’ section, which comprises nearly 421,000 words, while CEPhiT includes ca. 400,000 – see the summary in Table 2 below.8
8. www.scottishcorpus.ac.uk/CSMW/search/results.php?genre[]=Instructional+prose& search=Search (accessed Feb. 2015).
83
84
Marina Dossena
Table 2. Instructional prose texts in CMSW Author
Title
Essay on the Composition and Writing of the Antients Theodorous: Dialogue Concerning the Art of Preaching The Shepherd’s Guide – A Practical Treatise on Diseases of Sheep Mary Somerville The Connexion of the Physical Science Anonymous The Philosophy of Courtship and Marriage Hand-Book of the Law of Scotland James Lorimer James Geddes David Fordyce James Hogg
2.1
Date Word count 1748 73,839 1752 34,335 1807 56,443 1834 1844 1859 Total
101,792 12,966 141,591 420,966
Sources as a starting point
According to the information provided in the CEPhiT metadata, in eighteenth- century materials the following texts and authors are cited once: The Bible, Anaxagoras, Antonious, Aristotle, Arrianus, Baxter, Beattie, Bellini, Bernier, Bolingbroke, Bonnet, Borelli, Boyle, Brachmans, Brown, Bufon, Butler, Clark, Cudworth, Dacler, De Sally, Edwards, Euclid, Franklin, Garth, Hales, Harris, Herodotus, Hersan, Hume, King, Linnaeus, Ludwig, Maddox, Milton, Newcastle, Ovid, Philips, Phillips, Pitcairn, Pitt, Price, Quintilian, Raii, Reaumur, Rousseau, Saussure, Simonides, Simplicius, Smith, Socrates, Strabo, Themistocles, Trembley, Verulam and Young.
Other authors are cited twice (Addison, Berkeley, Clarke, Descartes, Gregory, Hartley, Hobbes, Hutcheson, Lord Bacon, Mandeville, Newton, Plato, Plutarch, Pope, Priestley and Reid), while Cicero and Malebranche are cited three times, and reference to Locke is made in as many as seven CEPhiT texts. In nineteenth-century texts the number of authors to whom reference is made in the metadata is twice as high: 157 as opposed to 75 in eighteenth-century texts. Locke is still among the most cited philosophers, being mentioned in nine texts, but Reid gets even more references, as he is mentioned ten times. Berkeley gets eight mentions, Hume seven, while Newton, Plato and Stewart are mentioned six times; Aristotle and Smith are referred to five times, while a larger number of authors is mentioned four times (Bacon, Brown, Descartes, Hamilton, Hartley, Kant, Mill, Socrates and Spencer); finally, others are mentioned two or three times (Clarke, Cousin, Edgeworth, Leibniz, Martineau, Priestley and Wagner as regards the former; Butler, Cudworth, Hutcheson, Paley and Whewell as regards the latter). As many as 29 authors are mentioned in both eighteenth- and nineteenth- century texts; among them, there are names who have made the history of science
Chapter 5. On the shoulders of giants
and philosophy, such as Aristotle, Plato, Descartes, Hobbes, Hume, Locke, Newton and Socrates, but also Addison, Anaxagoras, Bacon, Berkeley, Bolingbroke, Brown, Butler, Carlyle, Cicero, Clarke, Cudworth, Harris, Hartley, Hutcheson, Mandeville, Milton, Pope, Price, Priestley, Reid and Smith. Clearly, the authors mentioned in the metadata are a function of what the texts themselves are about, both in a quantitative and a qualitative perspective: while some texts mention no other authors, such as Astell, Dunton, Collins, Greene, Kirkpatrick, Balguy, Hutcheson and Hodgson, other extracts refer to more than twenty other authors: it is the case, for instance, of Mackintosh and Slack, who discuss ethics and progress respectively, and take a relatively broad perspective on the topics at hand. In general, more references to other authors are seen to occur in the metadata concerning nineteenth-century texts: on average, each text cites twelve other authors, while eighteenth-century texts cite five each. When individual extracts are examined, however, we find that actual references to specific authors are slightly different, and the picture is more complex; though more than two hundred names are cited, not all of them are mentioned in the metadata – among these we have, for instance, Galileo, Spallanzani, Torricelli, Boyle, Kepler, Averroes, Avicenna, Luther and Calvin. In many cases, such references are made in order to illustrate a point – see Examples (1)–(3) below: (1) Cicero has ſome paſſages to the purpoſe of this argument. Says he, [quotlat]9 […] Also that illuſtrious Reformer LUTHER, ſays, in his Treatiſe againſt Freewill, [quotlat] […] And our learned [Dr]. SOUTH ſays, [quotation] (CEPhiT, Collins, 1717) (2) When Torricelli, […] – proceeding on the observation previously made, by Galileo, with respect to the limited height to which water could be made to rise in a pump, […] made his equally memorable experiment with respect to the height of the column of mercury supported in an inverted tube, and found, on comparison of their specific gravities, the columns of mercury and water to be exactly equiponderant, it is evident that he was led to the experiment with the mercury by the supposition, that the rise of fluids in vacuo was (CEPhiT, Brown, 1820) occasioned by some counterpressure. (3) Redi’s experiments and remarks turned the attention of philoſophers to the minuter tribes of animals. In the courſe of a few years, accordingly, ſeveral eminent men aroſe. Reaumur, Bonnet, Trembley, Ellis, Spalanzani [sic], and a multitude of other writers, opened new views with regard to the manners and œconomy of animated beings. (CEPhiT, Smellie, 1790)
9. The text of the quotation is not given in CEPhiT.
85
86 Marina Dossena
As for the texts in the instructional prose section of CSMW, a smaller number of sources appears to be cited: Hume, Locke and Descartes are never mentioned, while Newton is mentioned 17 times in Mary Somerville’s The Connexion of Physical Science (1834); Aristotle is mentioned once in David Fordyce’s Theodorous: Dialogue Concerning the Art of Preaching (1752) and eighteen times in James Geddes’s Essay on the Composition and Writing of the Antients, Particularly Plato (1748), in which – predictably – Plato is mentioned 388 times, possibly making him the most frequently cited author (he also occurs twice in Fordyce’s text and once in Somerville’s); see the examples below: (4) This being the chief uſe of oratory, according to Plato’s doctrine, let us next ſee the method he propoſes for acquiring it (CSMW, Geddes, 1748) (5) Ariſtotle juſtly obſerves, “that the diction, (viz. in proſe-writing) ought neither to be entirely ſtrict conſtant meaſure, nor altogether void of rythmus (CSMW, Geddes, 1748) (6) It has been proved by Newton, that a particle of matter, placed without the surface of a hollow sphere, is attracted by it in the same manner as if the mass of the hollow sphere, or the whole matter it contains, were collected in its centre. (CSMW, Somerville, 1834)
2.2
Source evaluation in CEPhiT and CMSW
Both CEPhiT and CSMW texts often refer to sources adding their authors’ views on them – in (5), for instance, we are told that Aristotle’s observation is reputed ‘just’, while in (1) sources are defined as ‘illustrious’ and ‘learned’; also in the examples below approval is implied in the choice of qualifiers like ‘distinguished’ and ‘great’; finally, in (9) the paragraph concludes with a general praise not only of Calvin, but also of his followers (“the most devout and moral portion of the Christian world”): (7) In the first moiety of the middle age, the darkness of Christendom was faintly broken by a few thinly-scattered lights. […]; and a series of distinguished Mahometans, among whom two are known to us by the names of Avicenna and Averroes, translated the Peripatetic writings into their own language, expounded their doctrines in no servile spirit to their followers, and enabled the European Christians to make those versions of them from Arabic into Latin, which in the eleventh and twelfth centuries gave birth to the scholastic philosophy. (CEPhiT, Mackintosh, 1830)
Chapter 5. On the shoulders of giants
(8) [Descartes’s] “cogito,” I think, is just a state of consciousness, and went for nothing more with Descartes himself. This great philosopher has been charged, as we have already hinted, with a logical fallacy in his famous argument, with assuming the very existence which is proved. (CEPhiT, Lyall, 1855) (9) It happened by a singular accident, that the schoolmen of the twelfth century, who adopted [Augustin’s] theology, […] had recourse for the exposition and maintenance of their doctrines to the writings of Aristotle, the least pious of philosophical theists. […] The principles of his rigorous system […] were taught in the schools; […]; and seldom heard of by laymen till the systematic genius and fervid eloquence of Calvin rendered them a popular creed in the most devout and moral portion of the Christian world. (CEPhiT, Mackintosh, 1830)
Reference to sources therefore needs to be seen in the context of the discursive framework within which it occurs, so as to assess whether the author endorses, challenges or at least criticises them. Authorial presence may emerge not only in reporting verbs, but also in other lexical choices, such as conjunctions, adjectives and adverbs, the co-occurrence of which creates the kind of semantic prosody that guides the reader’s interpretation and elicits agreement. In what follows basic quantitative data are presented, concerning items which appear to be particularly interesting on account of their relative frequency or indeed infrequency. Table 3 below presents the frequency of reporting verbs in the two corpora under investigation; as we can see, prove is very significant, while a relatively neutral term like say is of course the most frequent one. Interestingly, refute and deny appear to be more frequent than affirm, which may be indicative of the argumentative nature of the texts, as seen in (10) and (11) below: (10) When we think we are refuting the argument on which a particular doctrine rests, we may discover that that argument is in fact only a part of the scaffolding that has been used in the erection of the structure, the true foundation of which is to be found in some entirely different principle. How little, for instance, would the doctrines of Proudhon or of Karl Marx be affected by the withdrawal of the Hegelian dialectic! If that were removed, the form of the doctrines would no doubt require to be modified; but the substance of them would in most cases be found to depend on considerations that are completely different, and that have indeed but little reference to philosophy at all. (CEPhiT, Mackenzie, 1890) (11) They who consider Berkeley to have denied the reality of things, must, before they can make that charge, have first themselves denied the reality of the mind. (CEPhiT, Simon, 1862)
87
88
Marina Dossena
Table 3. Reporting verbs in CEPhiT and CSMW (instructional prose 1700–1899) Reporting verbs
CEPhiT
CMSW
Item
No.
Normalised (per 10,000 words)
No.
Normalised (per 10,000 words)
Affirm Answer (v.+n.) Argue Ask Assume Claim (v.+n.) Conclude Define Demonstrate Deny Prove Refute Reply Say Show
89 87 48 50 83 37 79 45 18 91 166 9 20 287 83
2.23 2.18 1.20 1.25 2.08 0.93 1.98 1.13 0.45 2.28 4.15 0.23 0.50 7.18 2.08
14 27 16 23 56 131 45 18 1 8 221 5 29 116 107
0.33 0.64 0.38 0.55 1.33 3.11 1.07 0.43 0.02 0.19 5.25 0.12 0.69 2.76 2.54
When scholars present their arguments, they may challenge views which are held to be erroneous, false or incorrect – very important qualifiers which, however, occur less frequently than absurd, inconsistent and unreasonable (see Table 4). Table 4. Adjectives in CEPhiT and CSMW (instructional prose 1700–1899) Adjectives*
CEPhiT
CMSW
Item
No.
Normalised (per 10,000 words)
No.
Normalised (per 10,000 words)
Absurd Actual Apparent Authoritative Certain Clear Consistent Contradictory Correct Definite Deliberate Enlightened Erroneous
80 41 71 3 448 81 18 6 25 39 10 17 23
2.00 1.03 1.78 0.08 11.20 2.03 0.45 0.15 0.63 0.98 0.25 0.43 0.58
2 43 80 0 224 46 5 4 17 11 8 11 5
0.05 1.02 1.90 0.00 5.32 1.09 0.12 0.10 0.40 0.26 0.19 0.26 0.12
Chapter 5. On the shoulders of giants
Table 4. (continued) Adjectives*
CEPhiT
CMSW
Item
No.
Normalised (per 10,000 words)
No.
Normalised (per 10,000 words)
Evident Experimental False Hypothetical Inconceivable Inconsiderable Inconsistent Incontestable Incontrovertible Incorrect Inductive Informed Intelligible Obscure Plain Preposterous Proper Reasonable Remarkable Speculative True Unconditional Unconditioned Undeniable Unintelligible Unquestionable Unreasonable Unrivalled Unthinkable Valuable Wrong
114 11 25 8 20 7 46 1 0 1 29 7 23 16 67 1 137 13 30 22 374 0 27 6 8 3 26 3 3 27 61
2.85 0.28 0.63 0.20 0.50 0.18 1.15 0.03 0.00 0.03 0.73 0.18 0.58 0.40 1.68 0.03 3.43 0.33 0.75 0.55 9.35 0.00 0.68 0.15 0.20 0.08 0.65 0.08 0.08 0.68 1.53
71 3 21 2 3 3 17 0 3 0 5 8 12 6 34 1 162 35 48 0 167 4 0 2 7 2 7 1 0 23 31
1.69 0.07 0.50 0.05 0.07 0.07 0.40 0.00 0.07 0.00 0.12 0.19 0.29 0.14 0.81 0.02 3.85 0.83 1.14 0.00 3.97 0.10 0.00 0.05 0.17 0.05 0.17 0.02 0.00 0.55 0.74
* Comparative and superlative forms are counted together with zero (0) forms.
For the expression of disapproval, appeals to reason and logic appear to be made more frequently than simply presenting a proposition as unacceptable; similarly, approval is often boosted by means of adverbs which stress the certainty, truth and validity of the predication (see Table 5):
89
90 Marina Dossena
Table 5. Adverbs in CEPhiT and CSMW (instructional prose 1700–1899) Adverbs
CEPhiT
CMSW
Item
No.
Normalised (per 10,000 words)
No.
Normalised (per 10,000 words)
Absolutely Actually Admirably Apparently Certainly Clearly Constantly Demonstrably Demonstratively Entirely Evidently Exactly Hardly Indeed Invariably Justly Perhaps Plainly Possibly Precisely Properly Purely Quite Reasonably Seemingly Simply Speculatively Strictly Surely Totally Truly Unavoidably Undoubtedly Unquestionably Wholly Zealously
18 29 4 31 83 64 18 1 1 87 43 53 48 276 28 56 212 34 13 29 70 43 81 17 4 70 3 29 58 29 71 9 22 13 68 1
0.45 0.73 0.10 0.78 2.08 1.60 0.45 0.03 0.03 2.18 1.08 1.33 1.20 6.90 0.70 1.40 5.30 0.85 0.33 0.73 1.75 1.08 2.03 0.43 0.10 1.75 0.08 0.73 1.45 0.73 1.78 0.23 0.55 0.33 1.70 0.03
12 46 5 13 65 32 0 0 0 80 8 48 32 116 16 11 70 10 23 12 34 0 79 9 1 31 0 18 11 29 29 1 4 2 34 0
0.29 1.09 0.12 0.31 1.54 0.76 0.00 0.00 0.00 1.90 0.19 1.14 0.76 2.76 0.38 0.26 1.66 0.24 0.55 0.29 0.81 0.00 1.88 0.21 0.02 0.74 0.00 0.43 0.26 0.69 0.69 0.02 0.10 0.05 0.81 0.00
Chapter 5. On the shoulders of giants
In the examples that follow, the co-occurrence of adjectives and adverbs in the evaluation of propositions is illustrated in cases in which the connotation is positive (12)–(16) and others in which it is negative (17)–(19), while (20) contrasts styles: (12) Cousin has shewn triumphantly that [Descartes] did not mean an argument at all, and that he was sensible that the truth “I exist,” was one independent of all argument. [quotfrench] are his own words, as given by Cousin, [quotfrench] (CEPhiT, Lyall, 1855) (13) memory, as Hume truly says, is the real source of personal identity. (CEPhiT, Seth, 1885) (14) Plutarch likewiſe affirms this to be the Notion of Pythagoras, and Aldobrand and Menage very truly Remark that all Philoſophers, as well as Pythagoras, maintain’d that the World was animated, and had an Intelligence annex’d to it, excepting Democritus and Epicurus; (CEPhiT, Greene, 1727) (15) that from the exerciſe of reaſon, knowledge and virtue naturally flow, is equally undeniable, if mankind be viewed colleƈtively. (CEPhiT, Wollstonecraft, 1792) (16) In a truly splendid paper read before the Royal Society on the 21st of November, 1833, [Herschel] gives the places of 2500 nebulæ and clusters of stars. (CSMW, Somerville, 1834) (17) It was reserved for [Dr]. Reid to show, that these principles are not only unsupported by any proof, but contrary to incontestable facts; nay, that they are utterly inconceivable from the manifest inconsistencies and absurdities which they involve. (CEPhiT, Stewart, 1810) (18) what can be more irrational and inconſiſtent, than to be able to refuſe our aſſent to what is evidently true to us, and to aſſent to what we ſee to be evidently falſe, and thereby inwardly give the lye to the underſtanding? (CEPhiT, Collins, 1717) (19) All the Attempts of others before [Mr]. Newton, to explain the regular and conſtant Appearances of Nature, were moſt of ’em Ungeometrical, and all of ’em ſo inconſiſtent or unintelligible, that it was as hard to allow their Poſtulata as to conceive the thing which they pretended to account for from them. (CEPhiT, Cheyne, 1705) (20) Whereas the Strain of former Sermons was either flat or low, […]; or elſe the Style ſwelled into a ridiculous kind of Bombaſt, and ſometimes an unintelligible Jargon; the Compofitions of this new Race of Preachers, were more according to the genuine Simplicity and Beauty of Nature. (CSMW, Fordyce, 1752)
91
92
Marina Dossena
Views can also be corrected using in truth or properly; in such cases, authors present what is alleged to be a better assessment or description of the phenomena at hand – see (21)–(24): (21) This connexion, neceſſary at firſt, continued long after convenient; and properly conduƈted might indeed, in all ſituations, be an uſeful inſtrument of Government. (CEPhiT, Burke, 1770) (22) In no caſe can light be properly ſaid to be obſcure, for that is a contradiƈtion; what is ſo called is properly a weak light. (CEPhiT, Kirwan, 1811) (23) The reception of the motion by the other bodies and atoms in each case, constituted the re-action, which, in truth, is nothing more than the reception of the motion parted with. (CEPhiT, Phillips, 1824) (24) Reaſon is properly no Principle or Spring of Action at all: (CSMW, Fordyce, 1752)
Another important element in Table 5 above is the high frequency of perhaps, the hedging function of which is to make the proposition epistemically tentative, thus making it less face-threatening for the recipient. Similarly, expressions of modesty are an important pragmatic move – see (25)–(28) below: (25) My Deſign in the following Diſcourſe is not to diſpute againſt any Scheme of thoſe who admit the Exiſtence of a Deity, I intend only to ſhew, that this preſent ſtate of things cou’d not have been from all Eternity, […]; and to make it plain that naturally and of itſelf it tends to Diſſolution; (CEPhiT, Cheyne, 1705) (26) The force of [Hume’s] argument on this subject, as well as of that alleged by Berkeley to disprove the existence of matter, (both of which I consider as demonstratively deduced from Locke’s Theory,) I propose to examine afterwards in a separate Essay. At present, I only wish to infer from what has been (CEPhiT, Stewart, 1810) stated, that, […]. (27) all further ſemblance of inquiry is perhaps but a jargon of words, if I miſtake not, without meaning, I am ſure without uſe. (CSMW, Geddes, 1748) (28) perhaps there never was a pair who did not differ essentially in many important points. (CSMW, Anon., 1844)
2.3
Involving readers
Although not all texts are in the form of a dialogue, a virtual dialogue with the reader is nonetheless established by means of textual choices like rhetorical questions and hypothetical answers – the reader is indirectly invited to answer the
Chapter 5. On the shoulders of giants
former and thus discover the coincidence of views with the author; as for the latter, the author offers them to pre-empt the reader’s potential challenges, or to introduce another author’s views on the matter. Especially in the first two cases, the establishment of common ground is sought, as Examples (29)–(30) may illustrate: (29) what, I ask, are we to think of those persons who, like Berkeley’s adversaries, imagine that an intelligent and sensible man may very well deny the reality of the table he is writing at – may very well think that it does not, really and in point of fact, exist in the way in which he and we perceive it to be existing; and all this without being a whit the less an intelligent and sensible man? Is there not something most preposterous in this? (CEPhiT, Simon, 1862) (30) Behold, I ſhould anſwer, the natural effeƈt of ignorance! (CEPhiT, Wollstonecraft, 1792) (31) Mr Bryden then proceeds to answer, at some length, all the common opinions, or rather the old opinions, concerning the origin of this disease; (CSMW, Hogg, 1807)
A similar process appears to be at work when humorous anecdotes are introduced, inviting readers to share a benevolent laugh at the expense of bygone days of much less ‘positive science’ and greater credulity – see (32) below: (32) The action of occult powers was sometimes of an extraordinary nature: thus Pierius recommends that a patient stung by a scorpion should sit upon an ass, with his face to its tail, so that the poison may pass from the man to the beast! and the stories of Sir Kenelm Digby’s sympathetic powder show how completely credulity had usurped the place of accurate observation. These, and similar facts that might be cited by the score, exhibit the weakest part of the metaphysical stage, and indicate the necessity of its downfall through the growth of positive science; but a very erroneous view would be formed of its value as a contribution to progress if other aspects were not surveyed. (CEPhiT, Slack, 1860)
In such cases, arguments are conducted on less personalised terms. Authors can and often do take responsibility for their views, as is evident when the first person singular pronoun is employed, or when common ground is sought, for instance with the choice of an inclusive we. However, modesty moves like we saw above and actual understatements make authorial presence less conspicuous; it is the case, for instance, when hardly is employed: (33) It is evident that human society may be made the object of such a study, just as any other collection of facts may; and the utility of such a study, within its own limits, is hardly now in danger of being overlooked. (CEPhiT, Mackenzie, 1890)
93
94 Marina Dossena
(34) We could hardly expect to find the modern notion of an infinity of wants, conceived as an essential quality of human beings. (CEPhiT, Bonar, 1893) (35) I can hardly think it compatible with the Conſtitution of human Nature, to purſue Ill as ſuch (CSMW, Fordyce, 1752)
Authorial presence is also found to be less obvious when ‘facts’ are left to speak for themselves, as statements are presented as evident, clear or inconsistent with other findings: (36) it would clearly be erroneous to say of such a being that the immediate causes of the sensations which constitute his perception of the candle were permanent possibilities of sensation (since by hypothesis the possibilities are all converted into actualities); and it would clearly be absurd to say that these sensations were self caused; and it would be altogether impossible to say that they were not caused at all. (CEPhiT, Balfour, 1879) (37) It was already observed, how inconsistent this account of the origin of our ideas, as given by Locke, Berkeley, and Hume, is, with some conclusions to which we were led, in a former part of this discussion; (CEPhiT, Stewart, 1810) (38) the viſual indications of the magnitudes of the ſun and moon are not falſe, though inexaƈt; but a judgment that their real and apparent magnitudes would be the ſame at a nearer diſtance would be inconſiſtent with the reſults of experienced aſſociations, and therefore falſe. (CEPhiT, Kirwan, 1811) (39) there is a sort of false simplification in the introduction of hypotheses, which itself aids the illusion of the mystery. (CEPhiT, Brown, 1820) (40) the laws of terrestrial magnetism deduced from the formulæ of M. Biot, are inconsistent with those which belong to a permanent magnet, but that they are perfectly concordant with those belonging to a body in a state of transient magnetic induction (CSMW, Somerville, 1834)
3. Concluding remarks This overview of some linguistic strategies employed in the presentation, discussion and validation or challenge of sources in CEPhiT and CSMW texts has highlighted a few interesting traits. First of all, we have seen that sources are seldom presented without further qualification, which sets the tone for the interpretation of the proposition. Other authors are praised while their findings are underlined, or criticised, albeit indirectly, when their views are the starting point for a counterargument. To some extent, this is predictable; however, what is more relevant is
Chapter 5. On the shoulders of giants
the way in which adjectives, adverbs, hedges and boosters interact with each other in order to offer a consistent picture which may elicit the readers’ consensus. In particular, boosting adverbs like certainly and invariably are much more frequent than hedges in both CEPhiT and CSMW, which highlights the importance of presenting propositions as valid and quite reliable – a powerful reader-involving and persuasive strategy. At the same time, the frequency of perhaps points to the importance of signalling modest hesitation when subjective views are offered. The validation of views appears to be particularly emphatic when these can be shown to rely on facts and experiments. It is in such cases that what is unquestionable, undeniable, unreasonable or unintelligible is outlined with the least authorial presence: science and reason are given centre stage, and it may not be accidental that adjectives with a positive connotation are significantly more frequent than adjectives with a negative one, though of course the latter are dispreferred as being more face-threatening. In the age of positivism, trust in rational knowledge was a very important principle both in scientific and in philosophical enquiries, as was faith in the developments that were unfolding and that were recorded in new vocabulary. Twenty-first century readers, like a few contemporary commentators, are aware that, in many respects, that was not an entirely golden age, but it is an age to which we are greatly indebted – perhaps much more than it has been acknowledged so far.
References Primary sources CMSW, Corpus of Modern Scottish Writing 1700–1945, retrieved 12 Dec. 2012, from www. scottishcorpus.ac.uk/CSMW/. English Short Title Catalogue, retrieved 11 Dec. 2012, from http://estc.bl.uk. The Coruña Corpus of English Scientific Writing: Corpus of English Philosophy Texts (CEPhiT), retrieved 10 Dec. 2012, from www.udc.es/grupos/muste/corunacorpus/index.html. The Oxford English Dictionary, retrieved December 16, 2002, from www.oed.com.
Secondary sources Atkinson, Dwight (1999). Scientific discourse in sociohistorical context: The Philosophical Transactions of the Royal Society of London, 1675–1975. London: Erlbaum. Basker, James G. (1993). Scotticisms and the problem of cultural identity in eighteenth-century Britain. In J. A. Dwyer & R. B. Sher (Eds.), Sociability and society in eighteenth-century Scotland (81–95). Edinburgh: The Mercat Press.
95
96 Marina Dossena
Beal, Joan C., Susan M. Fitzmaurice & Jane Hodson (2012). Introduction. English Language and Linguistics. Special issue: Selected papers from the 4th International Conference on Late Modern English, 16/2, 201–207. Bertacca, Antonio (2010). The language of Charles Darwin’s Red Notebook. In N. Brownlees, G. Del Lungo & J. Denton (Eds.), The language of public and private communication in a historical perspective (83–99). Newcastle: Cambridge Scholars. Broadie, Alexander (1997). The Scottish Enlightenment: An Anthology. Edinburgh: Canongate. Brownlees, Nicholas (Ed.) (2006). News discourse in Early Modern Britain. Bern: Peter Lang. Chiavetta, Eleonora (2010). “A simple and popular description”: Popularization of natural science in the Natural History Rambles of J. G. Wood. In N. Brownlees, G. Del Lungo & J. Denton (Eds.), The language of public and private communication in a historical perspective (344–358). Newcastle: Cambridge Scholars. Conrad, Susan & Douglas Biber (2001). Variation in English: Multi-Dimensional Studies. Essex: Pearson Education. Crespo, Begoña (2011). Persuasion markers and ideology in eighteenth-century philosophy texts (CEPhiT). Revista de Lenguas para Fines Específicos, Special Issue: Diachronic English for Specific Purposes, 17, 199–228. Dossena, Marina (2002). Sides of the same coin: Ellipsis, redundancy and political discourse in institutional websites. In G. Iamartino, M. Bignami & C. Pagetti (Eds.), The economy principle in English: Linguistic, literary, and cultural perspectives (122–136). Milano: Unicopli. Dossena, Marina (2003). Modality and argumentative discourse in the Darien pamphlets. In M. Dossena & C. Jones (Eds.), Insights into Late Modern English (283–310). Bern: Peter Lang. Dossena, Marina (2005). Scotticisms in grammar and vocabulary. Edinburgh: John Donald. Dossena, Marina (2006). Forms of argumentation and verbal aggression in the Darien pamphlets. In N. Brownlees (Ed.), News discourse in Early Modern Britain (235–254). Bern: Peter Lang. Dossena, Marina (2008). “The times they’re a-changing”: The Abolition of Feudal Tenure (Scotland) Act 2000 and linguistic strategies of popularization. In V. Bhatia, C. Candlin & P. Evangelisti Allori (Eds.), Language, culture and the law. The formulation of legal concepts across systems and cultures (187–206). Bern: Peter Lang. Dossena, Marina (2010a). “Be pleased to report expressly”: The development of public style English in nineteenth-century business and official correspondence. In R. Hickey (Ed.), Eighteenth-century English. Ideology and change (293–308). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511781643.016 Dossena, Marina (2010b). “We beg to suggest”: Features of legal English in Late Modern business letters. In N. Brownlees, G. Del Lungo & J. Denton (Eds.), The language of public and private communication in a historical perspective (46–64). Newcastle: Cambridge Scholars. Dossena, Marina (2012a). Late Modern English – Semantics and lexicon. In L. Brinton & A. Bergs (Eds.), HSK 34.1 – Historical linguistics of English (887–900). Berlin: De Gruyter. Dossena, Marina (2012b). The study of correspondence: Theoretical and methodological issues. In M. Dossena & G. Del Lungo Camiciotti (Eds.), Letter writing in Late Modern Europe (13–30). Amsterdam: John Benjamins. doi: 10.1075/pbns.218.02dos Dossena, Marina (2012c). “I write you these few lines”: Metacommunication and pragmatics in nineteenth-century Scottish emigrants’ letters. In U. Busse & A. Hübler (Eds.), Investigations into the meta-communicative lexicon of English. A contribution to historical pragmatics (45–63). Amsterdam: John Benjamins. doi: 10.1075/pbns.220.06dos
Chapter 5. On the shoulders of giants
Dossena, Marina (2015). Introduction. In M. Dossena (Ed.), Transatlantic Perspectives on Late Modern English (1–12). Amsterdam: John Benjamins. doi: 10.1075/ahs.4.002int Dossena, Marina (Ed.) (2015). Transatlantic Perspectives on Late Modern English. Amsterdam: John Benjamins. doi: 10.1075/ahs.4 Dossena, Marina & Charles Jones (Eds.) (2003). Insights into Late Modern English. Bern: Peter Lang. Dossena, Marina & Susan M. Fitzmaurice (Eds.) (2006). Business and official correspondence: Historical investigations. Bern: Peter Lang. Dossena, Marina & Gabriella Del Lungo Camiciotti (Eds.) (2012). Letter writing in Late Modern Europe. Amsterdam: John Benjamins. doi: 10.1075/pbns.218 Gotti, Maurizio (2006). Communal correspondence in Early Modern English: The Philosophical Transactions network. In M. Dossena & S. M. Fitzmaurice (Eds.), Business and official correspondence: Historical investigations (17–46). Bern: Peter Lang. Lewis, Diana M. (2012). Late Modern English – Pragmatics and discourse. In L. Brinton & A. Bergs (Eds.), HSK 34.1 – Historical linguistics of English (901–915). Berlin: De Gruyter. Moskowich, Isabel (2011). “The golden rule of divine philosophy” exemplified in the Coruña Corpus of English Scientific Writing. Revista de Lenguas para Fines Específicos, Special Issue: Diachronic English for Specific Purposes, 17, 167–198. Plackett, Robin L. (1986). The Old Statistical Account. Journal of the Royal Statistical Society, Series A (General) 149/3, 247–251. doi: 10.2307/2981555 Rennie, Susan (2011). Boswell’s Scottish dictionary rediscovered. Dictionaries: Journal of the Dictionary Society of North America, 32, 94–110. Rennie, Susan (2012). Boswell’s dictionary update. Dictionaries: Journal of the Dictionary Society of North America, 33, 205–207. doi: 10.1353/dic.2012.0010 Swales, John (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Taavitsainen, Irma (2007). “Joyful news out of the Newfoundland world”: Medical and scientific news reports in Early Modern England. In A. H. Jucker (Ed.), Early Modern English news discourse: Newspapers, pamphlets and scientific news discourse (189–204). Amsterdam: John Benjamins. doi: 10.1075/pbns.187.14taa Thompson, Sandra A. (1982). The passive in English: A discourse perspective. Unpublished Ms. Valle, Ellen (2004). “The pleasure of receiving your favour”: The colonial exchange in eighteenth- century natural history. Journal of Historical Pragmatics, 5/2, 315–338. doi: 10.1075/jhp.5.2.09val
97
Chapter 6
Abstractness as diachronic variation in CEPhiT Biber’s Dimension 5 applied* Leida Maria Monaco University of A Coruña
The characteristic impersonal style of English scientific writing (Bazerman, 1984; Hyland, 1995) appears to be conveyed, chiefly, by the abundance of lexical and syntactic abstractness markers found in contemporary academic prose (Biber, 1988, 1995). Diachronic studies (Atkinson, 1999; Biber, 2001; Biber & Finegan, 1997, 2001a) have shown an evolution of the scientific register from “moderately” to “highly abstract” along the late Modern English period. Less specialised prose such as essays, or sermons, on the other hand, has been proved to evolve in the contrary direction (Biber & Finegan, 1989; 2001a). This paper analyses the presence of abstractness markers in the Corpus of English Philosophy Texts (CEPhiT). The corpus extends from 1700 to 1898 and covers a wide range of genres and topics with varying degrees of specialisation. The intention is to apply Biber’s (1988) Multi-Dimensional Analysis, measuring relative degrees of ‘abstractness’ or ‘impersonality’ as manifested through a set of linguistic features found in the texts, in an attempt to detect diachronic variation and variation across genres. Results are compared to those found in a contemporary reference corpus (Biber, 1988) in order to consider the extent to which they vary with regard to contemporary academic prose as well as to the whole range of variability found among present-day English registers.
* This research was funded by the Spanish Ministerio de Ciencia e Innovación (FPU grant, reference number AP2009-3206) and the Universidade da Coruña, through its special programme for Senior Research Groups. These grants are hereby gratefully acknowledged. doi 10.1075/z.198.06mon © 2016 John Benjamins Publishing Company
100 Leida Maria Monaco
1. Introduction Scientific writing in the late Modern English period saw a gradual shift from the authoritative scholastic dictum to the reasoned argumentation of the deductive method. As early as the seventeenth century, Bacon, Sprat and Wilkins had proposed a set of rules on how science should be written; rules which, according to Bazerman (1984), have been a major influence on modern scientific discourse as expressed through a number of commonplaces, such as: 1. the scientist must remove himself from reports of his own work and thus avoid all use of the first person; 2. scientific writing should be objective and precise, with mathematics as its model; 3. scientific writing should shun metaphor and other flights of rhetorical fancy to seek an univocal relationship between word and object; and 4. the scientific article or report should support its claims with empirical evidence from nature, preferably experimental. (Bazerman, 1984: 163–164)
Likewise, Robert Boyle, one of the founders and key early members of the Royal Society of London, set out a number of principles for academic prose, insisting on a “plain and unadorn’d way of Writing” with “expressions [which] should be rather clear and significant, than curiously adorn’d” (Hunter & Davies (1999), in Moessner, 2009: 66). Gotti (2001, 2003, 2005) explores Boyle’s language in the light of the genre ‘experimental essay’, highlighting five main characteristics: brevity, lack of assertiveness, perspicuity, simplicity of form, and objectivity. Notwithstanding, although these same precepts appear to be adhered to in present-day academic discourse and may be seen as “actual descriptions of what scientific writing has been and currently is” (Bazerman, 1984: 164), several sociohistorical studies – Atkinson (1996, 1999), Biber & Finegan (1989, 1997, 2001a), Biber (2001), Taavitsainen & Pahta (2004), Crespo (2011), Gray & Biber (2012), Biber & Gray (2013), and many more – have demonstrated that scientific prose has been constantly (re)forming itself according to, and in response to, continuously changing socioeconomic and rhetorical conventions. In fact, a close examination of any two samples from different stages in the development of scientific writing – even those belonging to the same scientific discipline and sharing subject matter – can be sufficient to spot significant differences not only in their lexicon, but also in their structural organisation and form of argumentation, suggesting epistemological changes that must have gradually taken place along the period separating the two samples.1 The increasing compilation of diachronic 1. See, for instance, Banks (2005); Lareo & Montoya (2007); Seoane (this volume).
Chapter 6. Abstractness as diachronic variation in CEPhiT 101
corpora – not only large multi-register corpora, such as the Helsinki Corpus of English Texts (1991; Kytö & Rissanen, 1992) or ARCHER (Biber et al., 1994), but also specialised ones, such as the Corpus of Early English Medical Writing (CEEM; Taavitsainen et al., 2010), or the Corpus on English Texts on Astronomy (CETA; Moskowich & Crespo, 2012) – has made it possible to give a more detailed and comprehensive account of the changes occurring in scientific discourse over time, encouraging diachronic studies on a larger scale in order to track general patterns of variation. Specifically, it has been demonstrated that selected written English registers have followed two different variation patterns along the past four centuries, with some popular genres such as letters, essays or sermons, addressed to a more widespread audience, evolving towards an even more ‘popular’, or ‘oral’ style, and, conversely, with certain specialised types of writing such as expository prose, intended for a more restricted, literate discourse community, becoming even more specialised, or ‘literate’, along the late Modern English period (Biber & Finegan, 1989, 1997, 2001a). This progressive specialisation of expository prose (i.e. medical, scientific, or legal) was characterised – among other features – by an increasingly frequent use of passive constructions, as also demonstrated in Atkinson (1996, 1999) and Biber (2001), thus causing scientific discourse to become more “abstract”, or “impersonal”, gradually shifting from author-centred to object- centred.2 At the same time, however, selected non-specialised kinds of prose, including those dealing with matters of philosophical and even academic nature (such as sermons or essays), but presumably intended for a less professionally- restricted audience, appear to have gradually developed a considerably less abstract and more involved, or personal, style.3 Given these two patterns of diachronic variation, and taking into account that the texts analysed in the present study, classified as late Modern English philosophy,4 cover topics standing as far apart as the propagation of light and the immortality of the human soul, and belong to a variety of generic formats representing different levels of knowledge (see Moskowich, this volume; Crespo, this volume), it seems somewhat challenging to predict whether such an all-encompassing scientific register as the one represented in CEPhiT would have evolved in one or 2. This increasing specialisation of expository prose seems to explain the gradual consolidation of passive constructions as a recognised high-frequency feature of scientific discourse (Visser, 1973; Riley, 1991; Seoane & Williams, 2006). 3. For authorial presence in CEPhiT, see Seoane (this volume), Crespo & Moskowich (forthcoming). 4. Following the conventions of the UNESCO (1988) classification of sciences; see Moskowich (2011, this volume).
102 Leida Maria Monaco
another direction over the centuries. Thus, the aim of this paper is to analyse abstractness markers (as those described in Biber’s (1988, 1995) Multidimensional approach) in CEPhiT, in an attempt to detect variation along two centuries (1700–1898) and across some of the genres represented in the corpus. Section 2 briefly describes the theoretical framework of the present study and lists the selection criteria for the linguistic features used in the analysis. The corpus and methodology are described in Section 3, while the findings are discussed in Section 4, followed by some concluding remarks (Section 5). 2. Multidimensional Analysis The central assumption of Multidimensional Analysis is that some linguistic features (lexical, syntactic or semantic) tend to co-occur with a certain frequency in the texts, and that each set of these co-occurring linguistic features has a common underlying discursive function. Developed originally by Biber (1988), the method entails the use of a multivariate statistical technique, known as factor analysis,5 applied to a corpus (which, ideally, should be grammatically and semantically parsed). Factor analysis transforms “a large number of original variables, in this case the frequencies of linguistic features” into “a small set of derived variables, the ‘factors’” (Biber, 1988: 79), as a result of the co-occurrence patterns of those linguistic features in the corpus. Very roughly, some linguistic features load very strongly (and therefore cluster) on a factor and less strongly on others, and only those presenting the highest loadings are counted for each factor as they show the most frequent co-occurrence. This co-occurrence is interpreted as reflecting an underlying discursive function shared by the cluster of features. Typically, groups of features cluster on two poles of a factor, meaning that they occur in texts in a complementary pattern: where one cluster of features appears, the other does not, and viceversa.6 The two poles of a factor represent opposite discursive functions (e.g. involved vs. non-involved; narrative vs. non-narrative), and, thus, each factor is interpreted with regard to this function as a ‘dimension’, or continuous scale, of variation.7
5. See Gorsuch (1983); also, Tabachnick & Fidell (2007), cited as a more recent source in Gray (2011). 6. “Positive” features vs. “negative” features. Polarities are conventional and may be reversed (see Biber & Finegan, 1989: 490, for instance). This is not particularly relevant for the present study (see Note 8) and therefore will not be dealt with here. 7. See Biber (1988: 79–97) for a detailed description of factor analysis.
Chapter 6. Abstractness as diachronic variation in CEPhiT 103
In his pioneering study, carried out on a large multi-register corpus of English, Biber (1988: 122) identified six such dimensions, giving them the following functional labels: (1) Involved vs. informational production, (2) Narrative vs. non-narrative concerns, (3) Elaborated vs. situation-dependent reference, (4) Overt expression of persuasion, (5) Abstract vs. non-abstract information, and (6) On-line informational elaboration. Individual texts, registers, or text types can be compared along these dimensions of variation. This means that some texts, or registers, are more informational than others, more narrative than others, and may present a more abstract style than others, depending on the linguistic features that co-occur most (or least) frequently in each of them. This six-dimensional model has been subsequently used as a reference for a large number of register variation studies (Biber & Finegan, 1989; 1997; 2001a; 2001b; Atkinson, 1996; 1999; Conrad, 1996; 2001; Csomay, 2000; Carkin, 2001), although in other cases new multidimensional analyses based on Biber’s method were carried out from scratch, resulting in different dimensions of variation, often characteristic of particular register (Biber, 2001; Biber et al., 2004; Xiao, 2009; Gray, 2011). Ideally, any two texts or registers should be compared with regard to several dimensions of variation, in that no single dimension is sufficient to account for the whole range of variability between them (Biber, 1988: 24). However, in some cases one or two particular dimensions may prove especially relevant when describing certain registers. In the present study we shall focus on Dimension 5, “Abstract vs. non-abstract information”. The reason for our choice is that Biber’s 1988 and subsequent studies have revealed – once more – that present-day English academic prose is characterised by a high co-occurrence of abstractness markers, which makes Dimension 5 particularly relevant in a diachronic analysis of English scientific register. This dimension of variation is composed of six linguistic features, considered to convey impersonal or abstract style.8 They are listed below, each of them accompanied by an example extracted from CEPhiT: – agentless passives: (1) It has been prov’d likewiſe in the preceding Chapter…
(Cheyne, 1705: 51)
– by-passives: (2) Conſiderable progreſs had been made by the ancient Greeks and Romans… (Campbell, 1776: 19)
8. All six linguistic features listed here convey abstract style. There is no complementary group of linguistic features (at the opposite pole) conveying non-abstract style in this dimension.
104 Leida Maria Monaco
– past participial WHIZ-deletions (or past participial post-nominal modifiers): (3) It is justly characterised as a method framed in conformity to experience… (Powell, 1838: 11)
– adverbial past participial clauses: (4) Practically viewed, all theology which teaches a benefit derivable from the “sacrament,” other than in the sense of “life,” is clearly unsound. (Woodward, 1874: 240)
– conjuncts: (5) …and conſequently, if the Motion of theſe Atoms ariſes from ‘emſelves, they muſt all follow the ſame Direƈtions… (Cheyne, 1705: 8)
– certain adverbial subordinators: (6) it is a direƈt Rule (…) whereby all Men ought to ſquare their Lives and Aƈtions (Kirkpatrick, 1730: 12)
Apparently, although by- and agentless passives differ in their thematic functions (Thompson, 1982; Weiner & Labov, 1983; Smith & Seoane, 2013), they nevertheless appear to share a more basic function, as is shown by their strong co- occurrence in the factor analysis; this is similar for the two types of past participial subordinate clauses (Biber, 1988: 112), which in turn act as a means of syntactic compression and have also been found to abound in academic writing (Chafe & Danielewicz, 1987; Greenbaum, 1988; Granger, 1997). Overall, Dimension 5 is largely based on the co-occurrence of passive constructions, “all used to present propositions with reduced emphasis of the agent… [but] to give prominence to the patient of the verb, the entity acted upon, which is typically a non-animate referent and is often an abstract concept rather than a concrete referent” (Biber, 1988: 112), suggesting a strong link between the passive structure and an underlying function of abstractness to the point that abstract may be regarded here as a synonym of passivised (Atkinson, 1999: 125). Conjuncts and adverbial subordinators appear to have a complementary function, which is to mark complex logical relations among clauses within a densely informational discourse (see Ochs, 1979; Biber, 1988: 112, 239; Atkinson, 1999: 125). The description of the corpus material and the methodological steps used in this study are explained in the next section.
Chapter 6. Abstractness as diachronic variation in CEPhiT 105
3. Corpus and methodology The present study analyses the forty texts included in CEPhiT, extending from 1700 to 1900. The two variables used here are time and genre (see Moskowich (2011, this volume) and Crespo (this volume) for a complete description of the corpus). The variable time was used to compare the two centuries. For a more detailed diachronic analysis, each century was divided into three subperiods, as it has been demonstrated in previous research that thirty years are enough to attest for linguistic change (Kytö et al., 2000), and dimension scores have then been obtained for each group of 6 to 7 texts, representing a ca. thirty-year span within the two centuries analysed. Furthermore, the variable time was also used along with the variable genre in order to trace the diachronic evolution of the two genres most abundantly represented in CEPhiT, as well as the only ones present in both centuries: treatise and essay. Even so, the genre lecture (present only in the nineteenth century, but represented by five samples) was also used for a more complete (non-diachronic) comparison across genres. The remaining genres (eighteenth-century textbook, and nineteenth-century article and letter) were only included in the overall diachronic analysis. Each of the six linguistic features listed in the previous Section has been searched for in the corpus using the Coruña Corpus Tool concordancer (Moskowich & Parapar, 2008; Camiña & Lareo, this volume). Given that CEPhiT had not yet been tagged for grammatical categories at the time this study was carried out, all automatically obtained occurrences underwent manual disambiguation. Searches for passive constructions included forms ending in -ed/’d (for all regular past participles, such as placed, moved), -t (affixt, confest), -’n (stol’n) and all the irregular past participles listed in A Comprehensive Grammar of the English Language (Quirk et al., 1985: 115–120). Disregarded cases include simple past or perfect aspect, verb complements, causative constructions, participial adjectives or different lexical categories with same endings, as well as get and become passives.9 For the sake of consistency in categorising, and following Biber (1988: 111– 112), frequency counts of the four passive constructions were all counted separately. In other words, by-passives were counted as such only when preceded by BE. Hence, past participle WHIZ-deletions and adverbial clauses were not counted as by-passives even when they were followed by an agent, as in: 9. Although get and become passive constructions appear to fulfil a similar (if not the same) function as be passives, they were not included in Biber’s (1988) study. For this reason, we have opted not to count them for the present study as their inclusion would alter the results of the present analysis with regard to the reference corpus.
106 Leida Maria Monaco
(7) the high and pure physical philosophy inculcated by Bacon, and practically followed up by Galileo, Newton, and their successors, soon established the dominion of principles… (Powell, 1838: 12) .
After disambiguation, raw counts of the occurrences were normalised to frequencies per 1,000 words and eventually standardised to a mean score of 0.0 and a standard deviation of 1.0, permitting their comparison on a relative scale with regard to the mean. As our intention was to compare our results to those obtained in Biber’s 1988 study, relative standardised scores were calculated for each linguistic feature with regard to their mean scores in the contemporary reference corpus (see Section 4). These relative standardised scores were then added together in order to calculate a single mean dimension score, either for the whole corpus, or for a group of texts, according to the variable analysed. Results are described in what follows. 4. Analysis of data 4.1
Time variable
Table 1 below presents the frequencies of occurrence for six linguistic features across two centuries in CEPhiT on two different scales. The left column (Mean) shows normalised mean frequencies per 1,000 words of text, whereas the central column (R/T) and the right column (R/AP) show mean standardised scores with regard to a contemporary reference corpus in each case: total multi-register corpus used in Biber’s 1988 study, and a subcorpus of Academic Prose taken from this multi-register corpus, respectively.10 As explained in Biber & Finegan (2001b: 111–113), normalised frequencies and standardised scores allow to assess variation on two complementary levels: within the corpus, and across corpora, respectively. Standardised scores are a transformation of normalised frequencies to a different scale, where 0.0 represents the mean frequency in the overall sample (or, here, in the reference corpus), and each unit of 1.0, either plus or minus, represents one standard deviation either above or below that mean. In this study, standardised scores measure the position of normalised frequencies of occurrence of a linguistic feature in CEPhiT with 10. The contemporary reference corpus used in Biber’s 1988 study consists of a sample of the LOB (Johansson et al., 1978; Johansson, 1982) and London-Lund (Svartvik & Quirk, 1980; Johansson, 1982) corpora, plus a few texts, including 481 2,000-word texts from twenty-three spoken and written registers. See Biber (1988: 66); also, Biber & Finegan (2001b) where Biber’s (1988) corpus is also used as a reference corpus.
Chapter 6. Abstractness as diachronic variation in CEPhiT 107
Table 1. Mean frequencies (per 1,000 words) and mean standardised scores for six linguistic features across two centuries with regard to two reference corpora (R/T = total multi-register reference corpus; R/AP = academic prose reference corpus) Linguistic feature agentless passives by-passives WHIZ-deletions past participle clauses Total passive structures conjuncts adverbial subordinators TOTAL
18th century
19th century
Mean
R/T
R/AP
Mean
R/T
R/AP
15,2 1,8 2,3 1,2 20,5 4,1 2,0 26,6
0,8 0,8 –0,1 2,8
–0,2 –0,1 –0,9 1,2
0,9 0,5 0,4 3,6
–0,2 –0,3 –0,5 1,6
1,8 0,9 7,0
0,5 0,1
15,4 1,4 3,7 1,5 22,0 5,3 1,7 29,0
2,6 0,6 8,5
1,1 –0,1
regard to the overall frequency of occurrence of that feature in a reference corpus. For this purpose, descriptive statistics have been taken from Biber (1988: 77–78 and 246–269).11 The two reference corpora are therefore used here as two external points of comparison in order to measure variation in CEPhiT as relative to the whole range of register variation in English, and also with respect to contemporary academic prose. Back to Table 1, we can see that normalised frequencies and standardised scores tell us different stories. The former indicate variation across the centuries within CEPhiT, showing that agentless passives are by far the most frequently occurring linguistic feature in both periods, with a score markedly higher than the rest, followed by conjuncts and past participial WHIZ-deletions. The latter increase dramatically (>60%) in the nineteenth century, whereas conjuncts increase more moderately, as do past participle clauses (1.0 standard deviation above the mean in contemporary academic prose. The overall picture shows that, while almost all the six linguistic features are relatively frequent in CEPhiT with regard to the whole range of register variation in English (with the exception of past participle WHIZ-deletions in the eighteenth century), still agentless passives, by-passives, and past participle WHIZ-deletions appear to be all relatively infrequent with respect to their occurrence in present- day academic prose. This analysis also demonstrates how, when measured on different scales, different linguistic features stand out in each case. While agentless passives present a markedly high mean frequency with regard to other linguistic features in normalised figures, indicating that they occur much more frequently than others within the corpus, on a standardised scale the linguistic feature that clearly stands out from the rest are past participle clauses, which indicates that these occur more frequently in CEPhiT than they do in the reference corpora, reflecting also an apparently higher preference for syntactic compression in late Modern English philosophical writing than in contemporary academic prose. Mean dimension scores are formed out of the six standardised scores, relative to the multi-register reference corpus, added together, so that they are comparable to the mean dimension scores obtained in Biber’s (1988) study for any other registers. The mean Dimension 5 scores for CEPhiT across the two centuries (7.0 and 8.5 for the eighteenth and nineteenth centuries, respectively) are graphically represented on Figure 1, with comparable dimension scores (in parentheses) for selected subregisters of present-day academic prose from Biber’s (1988) study. Dimension 5 represents a scale, ranging from “non-abstract” to “abstract”, with zero representing the mean point (i.e., neither “abstract” nor “non-abstract”, but unmarked for abstractness) with regard to the whole range of variability across the different English registers. Thus, any two registers are not only more or less abstract with regard to each other, depending on the position they occupy on the axis, but also relatively abstract, or non-abstract, with regard to the overall distribution mean. Figure 1 shows that both the eighteenth and nineteenth-century sections of CEPhiT, as well as the contemporary academic prose reference subregisters, are all positively marked for Dimension 5 in that they are all considerably
Chapter 6. Abstractness as diachronic variation in CEPhiT 109
Abstract
CEPhiT (overall) AGENTLESS
(Natural Sciences Academic Prose)
BY WHIZ PP
(General Academic Prose)
CONJ ADV TOTALD
(Humanities Academic Prose)
– –
–
Non-abstract
Figure 1. Dimension 5 scores for 18th and 19th century CEPhiT plus mean standardised scores of individual linguistic features, with comparable dimension scores (in parentheses) for selected academic prose subregisters from Biber (1988: 152, 189)
above zero. In other words, both CEPhiT subsections and the present-day academic prose subregisters are all relatively abstract,12 albeit with different degrees of abstractness. With regard to present-day academic prose, eighteenth-century philosophy (7.0) already exceeds up to four units in abstractness the branch belonging to humanities (3.0), whereas in the nineteenth century it reaches the dimension score of present-day natural sciences (ca. 8.5). This could be interpreted in terms of the general increase in the use of passive constructions in CEPhiT, as was outlined on Table 2. However, it should be also borne in mind that a dimension score depends directly on the weight given to certain linguistic features on a relative scale with regard to their overall distribution in the reference corpus, rather than on higher or lower frequencies on an absolute scale. In this case, and as can also be seen on Figure 1, the relatively high Dimension 5 scores in CEPhiT appear to be due, chiefly, to two linguistic features with a markedly higher relative scores: past participle clauses and conjuncts. Both of them visually parallel 12. Conversely, other comparable registers included in Biber’s (1988) study, such as face-toface conversations, personal letters, or romantic fiction – although not represented on Figure 1 for matters of space – have negative scores (below zero) on Dimension 5, and are all, therefore, relatively non-abstract (see Biber, 1988: 151–154).
110 Leida Maria Monaco
Abstract
AGENTLESS BY
WHIZ
PP CONJ
ADV
TOTALD
–
–
–
–
–
–
–
Non-abstract
Figure 2. Diachronic evolution of CEPhiT along Dimension 5 across six periods of ca. 30 years plus mean standardised scores of individual linguistic features
the evolution of the dimension scores, as also do past participle WHIZ-deletions, contributing to the general increase in abstractness of CEPhiT. A more detailed picture of the diachronic evolution of late Modern English philosophy along Dimension 5 can be seen on Figure 2. The two centuries are here subdivided in ca. thirty-year spans, with mean dimension scores being calculated for each group of texts. Figure 2 shows a steady movement upwards along Dimension 5 in the eighteenth century and the first third of the nineteenth, with a sudden decrease in the mid-1800s, followed by another rise towards the end of the century. Once more, it appears evident from the evolution of the individual linguistic features how these sharp changes in direction are conveyed chiefly by the dramatic rise and fall of relative scores of conjuncts and past participle clauses, which in the mid-to-late eighteenth century occur in a complementary distribution contributing to a more balanced and steady increase of the dimension scores, paralleled by a moderate rise of agentless passives, by-passives and past participle WHIZ-deletions. In the nineteenth century, by contrast, passive constructions (including past participle clauses), as well as conjuncts, begin a movement downwards, but eventually the last two features – which now go hand in hand and keep carrying most of the weight – change back their direction and that of the dimension scores.
Chapter 6. Abstractness as diachronic variation in CEPhiT
Frequencies (per , words)
Overall passive structures Conjuncts Adverbs Agentless passives
–
–
–
–
–
–
Figure 3. Diachronic evolution of individual linguistic features, plus the four passive constructions added together, across six periods of ca. 30 years (frequencies per 1,000 words)
It should be noted that the general increase of abstractness in late Modern English philosophy texts over the two centuries does not exactly equal a steady diachronic increase in passivisation (even though, as outlined above, nineteenth- century CEPhiT does show an overall augment of passive constructions). Normalised frequencies on Figure 3 present a more accurate picture of the distribution of passive structures, conjuncts and adverbial subordinators across time. All four passive constructions move upwards throughout the eighteenth century and then, some by the 1800s (agentless and by-passives, and past participle clauses), and some after 1830 (past participle WHIZ-deletions), begin a moderate but steady movement down, with a resulting 50% decrease in the overall passive structures by the end of the nineteenth century from what they had initially gained (that is, from an starting mean frequency of 17.6 in the first period, to one of 23.6 in the fourth period, and decreasing eventually to 20.5). This gradual change may be illustrated through Examples (8), (9) and (10) below: (8) GOD himſelf does not require our Obedience at this rate, he lays before us the goodneſs and reaſonableneſs of his Laws, and were there any thing in them whoſe Equity we could not readily comprehend, yet we have this clear and ſufficient Reaſon on which to found our Obedience, that nothing but what’s Juſt and Fit, can be enjoyn’d by a Juſt, a Wiſe and Gracious GOD, but this is a
111
112 Leida Maria Monaco
Reaſon will never hold in reſpeƈt of Men’s Commands unleſs they can prove themſelves infallible, and conſequently Impeccable too. It is therefore very much a Man’s Intereſt that Women ſhould be good Chriſtians, in this as in every other Inſtance, he who does his Duty finds his own account in it; Duty and true Intereſt are one and the ſame thing, and he who thinks otherwiſe is to be pitied for being ſo much in the Wrong… (Astell, 1700: 87–88) (9) Now as the exiſtence of all ſenſible objeƈts conſiſts in their being perceived by ſenſe, it is plain that ſuch of them as cannot exiſt ſeparately from each other, cannot be perceived by ſenſe ſeparately from each other; or, in other words, do not admit of ſeparate ſenſations; conſequently, ſince ideas are copies of ſenſations, neither can ſeparate, that is abſtraƈt, ideas of ſuch bodies exiſt. Therefore, we can have no idea of colour abſtraƈted from extenſion, on which it ſeems to be ſpread; nor of a colour or magnitude, &c. common to different objeƈts, abſtraƈted from all thoſe objeƈts; nor of colour, figure, or magnitude in general, abſtraƈted from and excluding all their reſpeƈtive varieties: thus we cannot have an idea of a colour that is neither red, white, yellow, &c. or ſome ſhade or mixture of theſe… (Kirwan, 1811: 373–374) (10) Wisdom, temperance, justice, and courage, are the divine; and, if a city possess them, the others are added to it; but wanting them, it will want the rest also. The rest, or human blessings, are health, beauty, strength, and wealth. Elsewhere in the Laws, he says three things are of concern to men – mind, body, and estate (…), and that is their order of importance. Wealth, therefore, though only in the third rank, is recognised by Plato as an element of real necessity and rationality in human life when it is intelligently and moderately used, and not blindly heaped up, without reference to the chief ends of life. The difficulty is that Plato does not directly define it, and therefore we have really to deal with wealth in two senses of the word; namely, outward goods and an excessive accumulation of them, or, in short, wealth and excessive wealth. (Bonar, 1893: 12)
In Example (8), taken from an essay from the very beginning of the eighteenth century, abstractness features do not appear to occur frequently (7% in total). Example (10), being an excerpt from a treatise written in the late nineteenth century, shows a more moderate use of abstract features (30% with regard to lectures, and almost 40% with regard to nineteenth- century treatises and lectures). By contrast, by-passives appear to abound more in eighteenth-century essays, whereas past participle WHIZ-deletions are more frequent in treatises, particularly in the nineteenth century. Past participle clauses have the lowest frequencies (especially in eighteenth-century essays), as do adverbial subordinators across the three genres. Even so, past participle clauses increase almost 80% in nineteenth-century treatises. As for conjuncts, the second most frequently occurring linguistic feature in CEPhiT, these remain relatively
13. In Biber’s 1988 study, “Involved vs. informational” are opposite poles of Dimension 1, which is often used along with Dimension 5 on a complementary basis. Several new MD analyses (such as Biber, 2001; Biber et al., 2004) resulted in factor solutions with abstractness features from the 1988 Dimension 5 loading as negative features on the new “Involvement” dimension, with the resulting binary opposition “involved vs. informational/abstract”.
114 Leida Maria Monaco
Table 2. Mean frequencies (per 1,000 words) of six linguistic features across three genres and two centuries (Treatise and Essay in the 18th and 19th centuries and Lecture in the 19th century) Linguistic feature
Treatise
agentless passives by-passives WHIZ-deletions past participle clauses Total passive structures conjuncts adverbial subordinators TOTAL
Essay
Lecture
18th c.
19th c.
18th c.
19th c.
19th c.
14,7 1,7 2,4 0,9 19,7 4,1 2,2 26,0
14,3 1,3 3,9 1,6 21,1 4,5 1,6 27,2
15,3 2,1 1,9 1,6 20,9 4,3 1,8 27,0
19,8 1,3 3,5 1,4 26,0 7,3 2,0 35,3
14,9 1,7 3,3 1,4 21,3 5,2 1,5 28,0
Table 3. Mean standardised scores of six linguistic features across three genres and two centuries (Treatise and Essay in the 18th and 19th centuries and Lecture in the 19th century) with regard to two present-day reference corpora (R/T = total multi-register reference corpus; R/AP = academic prose reference corpus) Linguistic feature
Treatise 18th c.
agentless passives by-passives WHIZ-deletions past participle clauses conjuncts adverbial subordinators TOTAL
Essay
19th c.
18th c.
Lecture 19th c.
19th c.
R/T R/AP R/T R/AP
R/T R/AP R/T R/AP
R/T R/AP
0,8 0,7 0,0 2,1
–0,3 –0,2 –0,8 0,7
0,7 0,4 0,4 3,7
–0,4 –0,4 –0,5 1,7
0,9 1,0 –0,2 3,9
–0,2 0,1 –1,0 1,8
1,5 0,4 0,3 3,2
0,4 –0,4 –0,6 1,4
0,8 0,7 0,3 3,2
–0,3 –0,2 –0,6 1,4
1,8 1,0
0,5 0,2
2,1 0,6
0,7 –0,1
1,9 0,7
0,6 0,0
3,8 0,9
2,0 0,1
2,5 0,4
1,0 –0,2
6,2
7,8
8,2
10,1
7,8
stable in treatises but experience an important growth of almost 70% in essays in the nineteenth century, with a 37% above the overall nineteenth-century mean. Table 3, on the other hand, presents relative standardised scores. As in the general distribution within CEPhiT, past participle clauses once more constitute the only linguistic feature occurring relatively frequently across the genres and in both centuries with regard to its distribution in present-day academic prose, even though all six linguistic features appear to be relatively frequent (with the exception of past participle WHIZ-deletions in eighteenth-century essays) respective of
Chapter 6. Abstractness as diachronic variation in CEPhiT 115
the whole range of variability in the multi-register corpus. The other three passive structures are relatively infrequent in all three genres with regard to contemporary academic prose, except for agentless passives in nineteenth-century essays (0.4 standard deviations above the mean). Adverbial subordinators are infrequent in nineteenth-century treatises and in lectures as relative to their occurrence in present-day academic prose, whereas conjuncts appear to be relatively frequent with regard to the two reference corpora and, just as in their general distribution across CEPhiT, and as is also the case with past participle clauses, carry most of the weight of the dimension scores, exceeding their relative overall distribution mean in almost 4.0 standard deviations. Resulting dimension scores for each genre are represented graphically on Figure 4, with comparable scores for selected present-day subregisters of academic prose indicated in parentheses: Abstract
(Technology/Engineering Academic Prose) (Natural Sciences Academic Prose)
Overall CEPhiT TREATISES ESSAYS
LECTURES
(General Academic Prose)
(Humanities Academic Prose)
–
–
Non-abstract
Figure 4. Dimension 5 scores for three genres and two centuries in CEPhiT, with comparable dimension scores (in parentheses) for selected Academic Prose subregisters from Biber (1988: 152, 189)
While both treatises and essays present a general increase in abstractness in the nineteenth century, each of them occupies a different place on the scale, with
116 Leida Maria Monaco
treatises being relatively less abstract than the average of CEPhiT, whereas essays, already in the eighteenth century, stem from a more abstract point (8.2) than nineteenth-century treatises and lectures (7.8 each of them), reaching in their final stage a score comparable to one of the most technical – and hence impersonal, or abstract – representatives of present-day academic prose register (Technology / Engineering). This internal variation across genres is remarkable in that, although both Treatise and Essay can be regarded as belonging to a common epistemological category and may be therefore considered as genres normally used within a specialised/professional epistemic community, presumably addressed to an already well-instructed readership (as opposed, for instance, to Lecture and Textbook, intended for learners; see Crespo, this volume), still Essay appears to be considerably more specialised than Treatise in both centuries – if, like Biber, we assume that abstractness is an indication of specialisation. From hence it also appears quite clear that, overall, late Modern English philosophical writing presents a diachronic evolution very similar to that of the specialised registers analysed in Biber & Finegan (1997, 2001a) and Atkinson (1996, 1999), and that, unlike the late Modern English essays included in Biber & Finegan’s 1989 initial study, CEPhiT essays clearly gain, instead of losing, abstraction over time.14 This contradiction might be due – at least in part – to the rather high internal variation across the essays themselves in CEPhiT, which, considering the small number of essay samples in our corpus, suggests that we should be rather cautious when making statements about the behaviour of the genre. Notwithstanding, another possible reason may lie in a difference in the use of terminology. In Biber & Finegan (1989: 488), the authors explain that they use the term genre “for those varieties readily distinguished by native speakers, corresponding to situational differences in purpose, mode, speaker/listener relationship, etc.”, whereas in their subsequent 1997 and 2001a studies they replace genre by register.15 Here, conversely, genres are treated as categories related to the communicative purposes (or classificatory criteria) of the author, the nature of the epistemic community, and the degree of specialisation or instruction of the reader, and, as such, genre is related to form or text-type (such as essays, treatises, or letters); register, on the other hand, is considered a broader category, seen as a variety of language pertaining to a specific epistemic domain (such as scientific prose, 14. Internal variation in nineteenth-century essays, though, shows that the markedly high dimension score is chiefly conveyed by extremely high scores within the period 1800–1830, combined with much more moderate scores within the period 1874–1898. This, in principle, is consistent with the general moderate decrease in abstractness along the nineteenth century, although mean dimension scores for each century indicate the contrary movement. 15. As consistent with this shift in terminology in Biber (1995).
Chapter 6. Abstractness as diachronic variation in CEPhiT 117
which is the language of science; see Halliday (1988); Swales (1990); Johnstone (2002), and, once more, Crespo, this volume). Genres and registers would not be, thus, comparable on the same scale here; rather, a register (such as scientific prose) may be written in a variety of genres, whereas a genre (such as Essay) may be used for different registers.16 Finally, it is also worth noting that Biber & Finegan explicitly refer to essays as an example of non-specialised prose, as opposed to expository (specialised) prose, whereas in this study we have been dealing with texts classified as “scientific”, albeit belonging to humanities and dealing with an extended variety of subjects of different levels of specialisation. The essays in CEPhiT may not, therefore, be directly compared to Biber & Finegan’s essays, as the scope of the term appears to differ in the two studies. Rather, it seems to be the case that the relative degree of specialisation of a genre (in this case, essay) must largely depend on the register (specialised vs. non-specialised) in which it is used. In the light of these findings, it might be interesting to carry out a further analysis of the diachronic evolution of essays and treatises across different scientific registers, ideally on more than one dimension of variation. 5. Concluding remarks This study has analysed patterns of diachronic variation in eighteenth and nineteenth-century CEPhiT as conveyed through six linguistic features, belonging to Biber’s (1988) Dimension 5, labelled “Abstract vs. Non-Abstract Information”. Normalised frequencies have been used to account for variation across the two centuries and across three genres within the corpus, showing that, by far, the most frequently occurring abstractness marker is the agentless passive, particularly in nineteenth-century essays. On the other hand, standardised scores relative to two contemporary reference corpora have been used to measure variation with regard to the overall range of cross-register variability in English, as well as with regard to present-day academic prose. On a relative scale, the two most frequently occurring abstractness markers appeared to be past participle clauses and conjuncts. The rest of passive structures, though relatively frequent with regard to their distribution across the different registers, are relatively infrequent with respect to their occurrence in present-day academic prose. Even so, the high scores for past participle clauses and conjuncts contribute to relatively high dimension scores in 16. Even though the dichotomy genre/register is not discussed here in depth, this study was carried out within clear terminological boundaries, based primarily on Halliday (1988) and, also, Taavitsainen & Pahta (2004).
118 Leida Maria Monaco
all subsections of CEPhiT, even though these present variation with regard to one another. The relative abundance of past participle clauses in our corpus indicates a greater presence of syntactic compression in late Modern English philosophical writing, compared to present-day academic prose, as seen through the linguistic features analysed. Normalised frequencies also have shown that the presence of passive structures in CEPhiT increases along the eighteenth century and then begins a moderate decrease in the early-to-mid nineteenth century. This is only partially reflected in the dimension scores, the latter being heavily influenced by conjuncts, which fluctuate along the whole period analysed. Still, both overall and genre dimension scores show a general increase in abstractness across the two centuries, particularly in the part of the corpus represented by the genre Essay which appears to be markedly more abstract than the genres Treatise and Lecture. Our findings seem to confirm that, despite the fact that CEPhiT is rather heterogeneous as far as subject matter is concerned, presenting various degrees of specialisation, late Modern English philosophy evolves across the centuries in a way consistent with other specialised registers, analysed in Biber & Finegan (1997, 2001a) and Atkinson (1996, 1999), even though the a relative – yet unsteady – decrease of the dimension scores across smaller subperiods suggests a possible change of direction in the twentieth century, presumably towards a more moderately abstract position on the scale, closer to the one occupied by present-day humanities academic prose.
References Atkinson, Dwight (1996). The Philosophical Transactions of the Royal Society of London, 1675–1975: A sociohistorical discourse analysis. Language in Society, 25, 333–371. doi: 10.1017/S0047404500019205 Atkinson, Dwight (1999). Scientific discourse in sociohistorical context. The Philosophical Transactions of the Royal Society of London, 1675–1975. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Banks, David (2005). The case of Perrin and Thomson: An example of the use of a mini- corpus. English for Specific Purposes, 24/2, 201–211. doi: 10.1016/j.esp.2004.01.001 Bazerman, Charles (1984). Modern evolution of the experimental report in physics: Spectroscopic articles in Physical Review, 1893–1980. Social Studies of Science, 14, 163–196. doi: 10.1177/030631284014002001 Biber, Douglas (1988). Variation across speech and writing. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511621024 Biber, Douglas (1995). Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511519871
Chapter 6. Abstractness as diachronic variation in CEPhiT 119
Biber, Douglas (2001). Dimensions of variation among eighteenth-century speech-based and written registers. In D. Biber & S. Conrad (Eds.), Variation in English: Multi-Dimensional Studies (200–214). Essex: Pearson Education. Biber, Douglas & Edward Finegan (1989). Drift and the evolution of English style: A history of three genres. Language, 65/ 3, 487–517. doi: 10.2307/415220 Biber, Douglas & Edward Finegan (1997). Diachronic relations among speech-based and written registers in English. In T. Nevalainen & L. Kahlas Tarkka (Eds.), To explain the present: Studies in the changing English language in honour of Matti Rissanen (253–275). Helsinki: Mémoires de la Société Néophilologique de Helsinki. Biber, Douglas & Edward Finegan (2001a). Diachronic relations among speech-based and written registers in English. In D. Biber & S. Conrad (Eds.), Variation in English: Multi- Dimensional Studies (66–83). Essex: Pearson Education. Biber, Douglas & Edward Finegan (2001b). Intra-textual variation within medical research articles. In D. Biber & S. Conrad (Eds.), Variation in English: Multi-Dimensional Studies (108–123). Essex: Pearson Education. Biber, Douglas & Bethany Gray (2013). Being specific about historical change: The influence of sub-register. Journal of English Linguistics, 41/2, 104–134. Biber, Douglas, Edward Finegan & Dwight Atkinson (1994). ARCHER and its challenges: Compiling and exploring A Representative Corpus of Historical English Registers. In U. Fries, P. Schneider & G. Tottie (Eds.), Creating and using English language corpora. Papers from the 14th International Conference on English Language Research on Computerized Corpora, Zurich 1993 (1–13). Amsterdam: Rodopi. Biber, Douglas, Susan Conrad, Randi Reppen, Pat Byrd, Marie Helt, Victoria Clark, Viviana Cortes, Eniko Csomay & Alfredo Urzua (2004). Representing language use in the university: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. (ETS TOEFL Monograph Series, MS-25). Princeton, NJ: Educational Testing Service. Carkin, Susan (2001). Pedagogic language in introductory classes: A multi-dimensional analysis of textbooks and lectures in Biology and Macroeconomics. Unpublished PhD Dissertation. Flagstaff: Northern Arizona University. Chafe, Wallace & Jane Danielewicz (1987). Properties of Spoken and Written Language. In R. Horowitz & S. J. Samuels (Eds.), Comprehending Oral and Written Language (83–113). San Diego: Academic Press. Conrad, Susan (1996). Academic discourse in two disciplines: Professional writing and student development in biology and history. Unpublished doctoral dissertation. Flagstaff: Northern Arizona University. Conrad, Susan (2001). Variation among disciplinary texts: a comparison of textbooks and journal articles in biology and history. In D. Biber & S. Conrad (Eds.). Variation in English: Multi-Dimensional Studies (94–107). Essex: Pearson Education. Crespo, Begoña (2011). Persuasion markers and ideology in eighteenth century philosophy texts. Revista de Lenguas para Fines Específicos, 17, 199–228. Crespo, Begoña & Isabel Moskowich (forthcoming). Involved in writing science: nineteenth- century women in the Coruña Corpus. Csomay, Eniko (2000). Academic lectures: An interface of an oral and literate continuum. Novelty, 7, 30–46. Gorsuch, Richard L. (1983). Factor analysis. Hillsdale, NJ: Erlbaum.
120 Leida Maria Monaco
Gotti, Maurizio (2001). The experimental essay in Early Modern English. European Journal of English Studies, 5/2, 221–239. doi: 10.1076/ejes.5.2.221.7307 Gotti, Maurizio (2003). Specialized discourse. Linguistic features and changing conventions. Bern: Peter Lang. Gotti, Maurizio (2005). Investigating specialized discourse. Bern: Peter Lang. Granger, Sylviane (1997). On identifying the syntactic and discourse features of participle clauses in Academic English: Native and non-native writers compared. In J. Aarts, I. de Mönnink & H. Wekker. (Eds.), Studies in English Language and Teaching (185–198). Amsterdam: Rodopi. Gray, Bethany (2011). Exploring academic writing through corpus linguistics: When discipline tells only part of the story. Unpublished doctoral dissertation. Flagstaff: Northern Arizona University. Gray, Bethany & Douglas Biber (2012). The emergence and evolution of the pattern N + PREP + V-ing in historical scientific texts. In I. Moskowich & B. Crespo (Eds.), Astronomy ‘playne and simple’: The writing of science between 1700 and 1900 (181–198). Amsterdam: John Benjamins. doi: 10.1075/z.173.09gray Greenbaum, Sidney (1988). Syntactic devices for compression in English. In J. Klegraf & D. Nehls (Eds.), Essays on the English Language and Applied Linguistics on the occassion of Gerhard Nickel’s 60th birthday (3–10). Heidelberg: Julius Groos Verlag. Halliday, Michael A. K. (1988). On the language of physical science. In M. Ghadessy (Ed.), Registers of written English (162–178). London: Pinter Publishers. Hunter, Michael & Edward B. Davies (Eds.) (1999). The Works of Boyle. London: Pickering and Chatto. Hyland, Ken (1995). The author in the text: Hedging scientific writing. Hong Kong Papers in Linguistics and Language Teaching, 18, 33–42. Johansson, Stig (ed.) (1982). Computer corpora in English language research. Bergen: Norwegian Computing Centre for the Humanities. Johansson, Stig, Geoffrey Leech, & Helen Goodluck (1978). Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo: Department of English, University of Oslo. Johnstone, Barbara (2002). Discourse Analysis. Malden: Blackwell. Kytö, Merja & Matti Rissanen (1992). A Language in Transition: The Helsinki Corpus of English Texts. ICAME Journal, 16, 7–27. Kytö, Merja, Juhani Rudanko & Erik Smitterberg (2000). Building a bridge between the present and the past: A corpus of 19th-century English. ICAME Journal, 24, 85–97. Lareo, Inés & Ana Montoya Reyes (2007). Scientific writing: Following Robert Boyle’s principles in experimental essays – 1704 and 1998. Revista Alicantina de Estudios Ingleses, 20, 119–137. Moessner, Lilo (2009). The influence of the Royal Society on 17th century scientific writing. ICAME Journal, 33, 65–87. Moskowich, Isabel (2011). “The golden rule of divine philosophy” exemplified in the Coruña Corpus of English Scientific Writing. Revista de Lenguas para Fines Específicos, 17, 167–197. Moskowich, Isabel & Begoña Crespo (Eds.) (2012). Astronomy ‘playne and simple’. The writing of science between 1700 and 1900. Amsterdam: John Benjamins. doi: 10.1075/z.173 Moskowich, Isabel & Javier Parapar (2008). Writing science, compiling science: The Coruña Corpus of English Scientific Writing. In M. J. Lorenzo Modia (Ed.), Proceedings from the 31st AEDEAN Conference (531–544). A Coruña: Universidade da Coruña.
Chapter 6. Abstractness as diachronic variation in CEPhiT 121
Ochs, Elinor (1979). Planned and unplanned discourse. In Givón, T. (Ed.), Discourse and syntax (51–80). New York: Academic Press. Oxford English Dictionary. 1989. 2nd ed. online version retrieved October 14, 2012 from. http:// www.oed.com (Accessed 28 November 2012). Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Startvik (1985). A comprehensive grammar of the English language. London: Longman. Riley, Kathryn (1991). Passive voice and rhetorical role in scientific writing. Journal of Technical Writing and Communication, 21, 239–257. doi: 10.2190/Y51Y-P6QF-3LCC-4AUH Rissanen, Matti (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English) (Comps.) The Helsinki Corpus of English Texts (1991). Helsinki: Department of Modern Languages, University of Helsinki. Seoane, Elena & Christopher Williams (2006). Changing the rules: A comparison of recent trends in English in Academic Scientific Discourse and Prescriptive Legal Discourse. In M. Dossena & I. Taavitsainen (Eds.), Diachronic perspectives on domain-specific English (255–276). Bern: Peter Lang. Smith, Nicholas & Elena Seoane (2013). Categorizing syntactic constructions in a corpus. In M. Krug and J. Schlüter (Eds.), Research methods in language variation and change (212– 227). Cambridge: Cambridge University Press. Svartvik, Jan, & Randolph Quirk (Eds.) (1980). A Corpus of English Conversation. Lund: CWK Gleerup. Taavitsainen, Irma & Päivi Pahta (Eds.) (2004). Medical and scientific writing in Late Medieval English. Cambridge: Cambridge University Press. Taavitsainen, Irma, Päivi Pahta, Turo Hiltunen, Martti Mäkinen, Ville Marttila, Maura Ratia, Carla Suhr & Jukka Tyrkkö (Eds.) (2010). Early Modern English Medical Texts. CD-ROM. Amsterdam: John Benjamins. doi: 10.1075/z.160 Tabachnick, Barbara & Linda Fidell (2007). Using multivariate statistics (5th ed.). Boston, MA: Pearson. Visser, Frederik Theodoor (1973). An historical syntax of the English language. Second Half. Vol. III. Leiden: Brill. Weiner, E. Judith & William Labov (1983). Constraints of the agentless passive. Journal of Linguistics, 19, 29–58. doi: 10.1017/S0022226700007441 Xiao, Richard (2009). Multidimensional analysis and the study of World Englishes. Word Englishes, 28/4, 421–450. doi: 10.1111/j.1467-971X.2009.01606.x
Chapter 7
Authorial presence in late Modern English philosophical writing Evidence from CEPhiT* Elena Seoane
University of Vigo
The soft sciences are characterised by a complex interplay of rhetoric devices used with a twofold purpose: to minimize the researcher’s role in the construction of knowledge, and to allow the researcher to step in and show commitment to the propositions expressed, express solidarity with readers, and gain acceptance for his or her ideas (Ivanic, 1998; Hyland, 2008). The aim of this paper is to examine authorial visibility and identity in one of the soft sciences, philosophy, as represented in the Corpus of English Philosophical Texts (CEPhiT), a highly representative corpus of late Modern English Philosophical texts. It will explore the frequency and function of impersonal passives and other impersonal-subject constructions as well as the frequency and role of first and second person pronouns (Kuo, 1999). Since the corpus contains texts from every decade in the eighteenth and nineteenth centuries, diachronic analyses will also be carried out to identify changes in authorial identity over the course of late Modern English. Findings from late Modern English will be compared with those reported in the literature for Present-Day English with the aim of ascertaining whether diachronic linguistic changes in the discipline involve changes within the genre or a change to a new genre. Microanalyses of texts will also account for intradisciplinary divergence.
* I am grateful to Cristina Suárez-Gómez for her suggestions for improvements to this paper. I am also grateful to the following institutions for their generous financial support: Spanish Ministry of Science and Innovation and the Regional Government of Galicia (Grant Nos., FFI2014-53930-P, FFI2014-51873-REDT and GPC2014/060). doi 10.1075/z.198.07seo © 2016 John Benjamins Publishing Company
124 Elena Seoane
1. Introduction As we know, academic writers not only convey ideational content in their work, but also project a personal identity through which they attempt to gain credibility. Historically, there has always been tension between the need for impersonality and objectivity in academic writing and the fact that research, in that it is carried out by individuals, is intrinsically a personal and subjective endeavour (Hyland, 2001, 2002). Such tension can be observed, for example, in the historical vacillation between a detached and impersonal writing style, seen most clearly in the use of the impersonal passive and impersonal-subject constructions, and a more subjective and author-centred type of discourse, with authorial self-mention and other forms of person reference (Taavitsainen et al., 2002). Major differences between disciplines are attested in the rhetorical preferences of academic writers: while impersonality is commonly considered a defining quality of hard sciences, where research is assumed to be purely empirical, the soft sciences are less technical and more interpretative in nature, and thus a more explicitly involved and personal stance is typically expected (Hyland, 2008). In other words, the hard sciences tend to base their arguments on theoretical models and experimental results, and thus tend to produce more impersonal texts; in the soft disciplines (humanities, social sciences), the personal interpretation of beliefs and facts is often central and writers must rely to a greater extent on personal projection in the text. This is achieved through abundant use of interpersonal features to evoke credibility in the authorial voice. Within the soft sciences, philosophy stands out as particularly inclined to use interpersonal features: “in philosophy in particular, a writer’s style is a significant element of both his or her immediate credibility in the paper and wider reputation in the discipline” (Hyland, 2001: 220–221; cf. also Hyland, 2006: 37–38). This paper is concerned with the tension between detached scientificity and authorial involvement observed in late Modern English (LModE) philosophical writing, as represented in the Corpus of English Philosophical Texts (CEPhiT; cf. Chapter 1 this volume). It explores how philosophical writers of the period created scientific personas in their texts through the use of interpersonal features that would convey credibility and a sense of belonging to the community. For this purpose, the study identifies one of the most effective interpersonal features, namely self-reference through use of first person pronouns, as well as the use of impersonal passives, which, on the contrary, serve to avoid self-mention, and impersonal-subject constructions, which also minimise writer presence. It also analyses diachronic variation and intradisciplinary divergence in the period. Finally, it draws on comparisons with equivalent data from other Present-Day English (PDE) corpora of philosophical writing in order to reveal changes in discourse
Chapter 7. Authorial presence in late Modern English philosophical writing 125
in this discipline (Hyland, 1999, 2004, 2006, 2008; Biber & Conrad, 2009; Hundt et al., 2012). At the heart of this study is the desire to shed light on a fascinating question raised by Biber & Conrad (2009: 166), that of how to distinguish between diachronic change within a genre and change towards a different genre (see also Hundt et al., 2012). Examining a central discourse feature of philosophical writing, that of authorial presence, and comparing it with PDE data, is a means of addressing this question. I begin with a brief outline of the analytical framework of this study, the concept of disciplinary community, before going on to describe the notion of authorial stance (Section 2). In Section 3 I discuss findings, and in Section 4 I offer some conclusions. 2. Disciplinary community and authorial stance Communication is essentially a social activity, and writing science is no different. For the scientific writer, language is not just an instrument for the communication of an independent reality, but constitutes a form of social reality: texts are seen as stabilised sites of social interaction and knowledge building (Canagarajah, 2002: 161). Over the last two decades, and fostered by the publication of Swales’ (1990) Genre Analysis, an abundance of research on the differences in the various conventions across academic disciplines has confirmed that discourses cohere to the concept of disciplinary community (Bizzell, 1982; Hyland, 2006: 17–18). Swales (1990: 24–27) describes disciplinary communities as (1) having common public goals, (2) possessing mechanisms of intercommunication among members, (3) displaying active participation in these mechanisms, (4) having one or more genres, (5) owning specific lexis and (6) having members with a suitable degree of content and discoursal expertise. On this view, each community develops its own ways of understanding the world and of talking about it, as can be seen in the fact that they have their own norms, categorizations, bodies of knowledge, sets of conventions and methods of inquiry. Each community formulates and negotiates knowledge differently, and these differences concern key aspects in the production of discourse: knowledge of a cultural and interpersonal situation, knowledge of the world, and knowledge of the texts and appropriate discourse conventions (Hyland, 2006: 18). This approach to scientific discourse has focused on variation across disciplines, presenting the discourse community as a homogeneous, autonomous and self-contained unit. Other authors, however, have highlighted the fact that members of a community are not homogeneous: each writer brings to their discourse their own values, experiences and interests as acquired through their own
126 Elena Seoane
personal backgrounds. These play a role in their interactional styles in a specific community, and contribute to changes in such discourses. As Canagarajah’s (2002: 166) notes, “the history of the community cannot be forgotten. Through time it acquires resources, statuses, and interests that also motivate its activity.” Bazerman (1988, 1991) also portrays discourse communities as heterogeneous processes subject to change triggered by internal and external sources. For him, disciplines are open historical systems which incorporate and reflect multiple sociohistoric trends. Hence, this view emphasises the notion of discourse community as an interactive process that is permanently reshaped by the tensions between its members and institutions, and characterised by diversity and flexibility (see also Prior, 1998: 26; Starfield, 2001: 133–136). Whether we picture disciplinary communities as static entities or as constantly changing processes, the notion of the disciplinary community offers us a framework for conceptualising the conventions and practices which influence academic communication. Writing as a member of a discipline involves textualizing work in a way that colleagues can recognise as their disciplinary discourse. Such community constraints on discourse both restrict how something can be said and authorise the writer as someone competent to say it. By using the practices that define the community, the writer is presenting him or herself as someone competent to contribute, as a legitimate member of the group. “We are more likely to persuade readers of our ideas if we frame our messages in ways which appeal to appropriate community recognised relationships” (Hyland, 2006: 21). The community- based approach to disciplinary discourse, therefore, focuses on the importance of communicating as a member of the community one wishes to engage with. As such, academic writing involves sets of rhetorical choices which most closely correspond to the community’s assumptions, methods and bodies of knowledge. Among the discipline specific conventions which serve the purpose of signalling membership to that community, authorial stance is central, since it constitutes one of the defining properties of discourse communities (Hyland, 2004, 2006: 23, 2008: 21). The notion of stance models the way in which writers convey their involvement in texts: “Stance is a writer’s community-recognised persona as expressed through his or her rhetorical choices, conveying epistemic and affective judgements, opinions and degrees of commitments to what they say. It therefore suggests something of how authors construct a credible academic identity” (Hyland, 2009: 111; cf. also Ivanic, 1998). Stance, therefore, has to do with the way in which writers incorporate their personal authority into their arguments by conveying their judgements, opinions, evaluations and commitments, or, on the contrary, step back from the texts and conceal their involvement. The presence of the author is codified in various writer-oriented features, including hedges (modals like might, adverbs like possibly and perhaps), boosters (verbs such as
Chapter 7. Authorial presence in late Modern English philosophical writing 127
demonstrate, adverbs like definitely), markers of evaluation and attitude (e.g. believe, doubt), and self-mention, to be analysed in this paper (Hyland, 2001: 216, 2004: 15, 2006: 29). As already mentioned, comparisons between the soft and hard sciences show that one of the differences between them is the way writers represent themselves, with those in soft fields taking more involved and personal positions than in the case of the hard sciences (Hyland, 2000, 2005). In his research, Hyland (2000, 2005, 2006) found that the soft fields contain 75 per cent more stance items than in engineering and science papers; from all his studies, the discipline exhibiting the highest proportion of stance features is philosophy. Authorial stance, therefore, is very much framed and restricted by the discipline concerned and the options it makes available for author intrusion. However, some margin of choice in decisions relating to authorial presence remains: presence or absence of the author, however expressed, is a conscious choice by writers in the adoption of a particular stance and authorial identity (Hyland, 2006: 32, 2008: 20). For the analysis of stance, I have focused on the presence of self-mention, which can be identified in the use of first person singular and plural pronouns, and on the avoidance of self-mention, which can be expressed in various ways, the most prominent being the use of agentless passives whose agent is the author(s) of the text. Other impersonalizing strategies aimed at avoiding self-mention and minimizing authorial presence are impersonal-subject constructions, that is, active clauses having as subject impersonal uses of the pronouns one, you, we, they, someone, somebody, anyone, anybody, they and the nouns men and people with impersonal meaning. As a complementary and subsidiary focus of enquiry I also examined the role of direct reader address forms, essentially second person pronouns, which are associated with the expression of reader engagement, as well as the use of agentless passives with the purpose of eliding reference to the researchers or the scientific community in general, which contribute to make texts more impersonal. 3. Authorial presence in CEPhiT In order to identify potential changes in authorial stance over the course of late Modern English, I selected the first four texts from CEPhiT (see Appendix), dating from 1700, 1705, 1710 and 1717, and the last four, from 1885, 1890, 1893 and 1898. Each text contains around 10,000 words, thus yielding a total of 80,555 words. I used the Coruña Corpus Tool software (CCT) to retrieve general frequencies of the relevant pronouns and nouns. CCT identified the following items: (i) first person pronouns I, we, me and us, as well as the possessives my, our, mine,
128 Elena Seoane
ours and the reflexives myself, ourselves; (ii) second person pronouns you, your, yours, yourself and yourselves; (iii) third person plural pronouns they, them, their, theirs and themselves; (iv) pronoun one; (v) indefinite pronouns someone, somebody, anyone, anybody, etc. (vi) nouns author, reader, man, men and people. The search for passive constructions was carried out manually, since instances of be are far too numerous for concordance programmes to be useful in the identification of passives. I agree with Hyland (2009: 110, 126–127) that corpus analysis is useful in offering a comprehensive description of community practices, since it helps to identify recurrent discoursal features and relations between linguistic choices and the contexts in which they occur. The ideal methodology, however, is that in which corpus analyses and intensive studies of texts are carried out, so that both macro and micro analyses can be combined. Microanalyses are especially important in a study like this because all the elements concerned here (pronouns, passives) can have a number of semantic references and functions in academic writing (cf. Kuo, 1999). For example, we, as illustrated below, can refer to the authors (exclusive we), to authors and readers (inclusive we), to an unidentified group of people (impersonal we) and to humanity in general. For this reason, having first identified the relevant items, I then analysed them manually, classifying them according to the following two broad categories: (a) those indicating involvement of the author(s), and (b) those indicating omission of the author(s). a. Author involvement – I, me, my, mine, myself (Example (1)). – Exclusive we (and us, our, ours, ourselves) referring to the writers (excluding the readers and other community members); see Example (2). – Together with author pronouns, I also found references to the author (3) and they have been included in the count (cf. Table 1). (1) To this I anſwer, that tho’ it may be very true, that nothing in this Univerſe is aƈtually at abſolute reſt, but that every thing is in ſome degree of Motion; (CEPhiT 1705 Cheyne 752) (2) As a separate study, Economics comes very late; and in our historical retrospect we shall be looking for answers to questions which have not always been consciously present to the authors embraced in our scrutiny. (CEPhiT 1893 Bonar 271) (3) So far as the author knows, this is the first attempt to present a view of the relations of philosophy and economics through the whole of their history, and the absence of guiding models must be to some extent his excuse for the shortcomings of his work. (CEPhiT 1893 Bonar 1418)
Chapter 7. Authorial presence in late Modern English philosophical writing 129
b. Absence of author – Agentless passives whose elided agent is the author(s) of the text (4). – Impersonal we (and us, our, ours, ourselves), as in (5).1 – Impersonal you (and your, yours, yourself, yourselves) as in (6) as opposed to exclusive you, which refers to the reader (7), and to hidden you or directives (8). – Impersonal one (9) as opposed to the numeral. – Impersonal use of they, someone, somebody, anyone, anybody, etc. (10). – Impersonal use of man, men, people (11). (4) How closely this reproduces Locke on the one hand, and how nearly it anticipates Hume on the other, hardly needs to be pointed out. (CEPhiT 1885 Seth 417) (5) We are eaſily ſatisfy’d we and our own immediate Parents have not been for ever; but few of us go farther, we take this World as we find it, without troubling our Heads who made it, or whether it was made or not. ( CEPhiT 1705 Cheyne 103) (6) All advice and all reaſonings would be of no uſe to him. You might offer arguments to him, and lay before him pleaſure and pain; and he might ſtand unmov’d like a rock. (CEPhiT 1717 Collins 4911) (7) I cou’d reckon up many other Differences more; but theſe are enough to let you ſee what vaſt Diſparity there is, betwixt the Platonick Love of the Ancients, and that of modern Puritan Lovers, and how little Reaſon they have to uſurp either the Example of Socrates, or the Mode of Plato for their Patronage: (CEPhiT 1710 Dunton 5543) (8) On the whole subject, however, compare Caird’s Critical Philosophy of Kant, Introduction, [chap]. [i]., and Essays in Philosophical Criticism, [p]. 8 [seq]. and [p]. 41. (CEPhiT 1890 Mackenzie 9676) (9) As to the Opinion of the World, tho’ one cannot ſay it is always juſt, yet generally it has a Foundation; (CEPhiT 1700 Astell 8677) (10) No Body can well bear to have their Anceſtors affronted, nor their Pedigree deſpiſed; (CEPhiT 1705 Cheyne 146)
1. As mentioned, together with the exclusive and impersonal uses of we mentioned so far, this pronoun can be used inclusively, as will be explained below, and also in a more abstract way to refer to mankind in general, as in: If we will thinking or deliberating on a ſubjeƈt, or will reading, or walking, or riding, we find we muſt do thoſe aƈtions, unleſs ſome external Impediment, as an apoplexy or ſome ſuch intervening cauſe, hinders us; (CEPhiT 1717 Collins 1082).
130 Elena Seoane
(11) and yet to be fully ſatisfy’d of the truth of this Hypotheſis, a Man muſt underſtand the particular Mechaniſm of the whole Syſtem of things, and of every individual Appearance. (CEPhiT 1705 Cheyne 1862)
In what follows, I will first present the results of the macroanalyses of the corpus data, and then turn to a detailed microanalysis of the texts, looking in particular at those yielding discordant results. Table 1 below sets out the frequency of self- mention items per text. The total columns offer the total number of items and their normalised frequency per 10,000 words. Table 1. Number of self-mention elements per text-type (NF per 10,000 words in brackets) 1700 1705 1710 1717 I excl. we the author TOTAL
TOTAL
1885 1890 1893 1898
4 1
31 16
97 2
53 1
185 (46.3) 20 (5.0)
4 69
9 124
5
47
99
54
205 (51.3)
73
133
27 3 30
11 21 32
TOTAL 24 (6.0) 241 (60.3) 3 (0.8) 268 (67.0)
The data in Table 1 indicate a notable increase of self-mention in late Modern English, from 51.3 (per 10,000 words) in the eighteenth century to 67.0 in the nineteenth century. This increase accords with the findings reported in studies of philosophical writing in Present-Day English. Hyland looks at self-mention in PDE philosophical texts in several studies and finds 65 first person pronouns per 10,000 words in one of his corpora (Hyland, 1999; 2004) and 57 in the other (cf. Hyland, 2006; 2008), both figures clearly higher than the ones found here in the eighteenth century (51.3). This comparison reveals that, as philosophical writing evolved, authorial presence became more central, implying that authors relied more on the projection of a credible author, one that would base arguments on his or her own opinions and beliefs, and would take responsibility for the interpretations presented. The data obtained from CEPhiT are also in tune with observations in Biber & Conrad (2009) that the expression of personal attitudes and evaluations on the part of speakers are more frequent in recent periods than in historical periods in the text-types they examine: drama, letters, newspapers and medical prose. However, within this general trend towards the increasing use of stance features, Biber & Conrad (2009: 173) note that medical prose shows a different development: a steady increase in frequency in the eighteenth, nineteenth and early twentieth century, and a radical decrease in the second half of the twentieth century, reaching
Chapter 7. Authorial presence in late Modern English philosophical writing 131
levels of authorial stance considerably lower than those of the nineteenth century.2 The data in Biber & Conrad (2009) are comparable to those from CEPhiT, with an increase in the use of stance features from the eighteenth to the nineteenth century. However, the values obtained by Hyland for PDE philosophical texts, 65 and 57, show not a radical decrease in authorial reference in the second part of the twentieth century, but only a slight one. A very likely reason for this difference is that the research articles examined in Biber & Conrad are medical articles, and, as mentioned in the introduction, the hard sciences in the twentieth century are well-known for their scarce use of stance features as compared to the soft sciences. For this reason, the differences between the levels found here for the nineteenth century and those reported by Hyland for PDE are not as pronounced as could be expected from a hard-science discourse. Parallel to this increase in self-assertion we also observe authors referring to themselves increasingly as we, despite the fact that all the texts in the corpus are written by a single author. In the eighteenth century I clearly predominates, with 46.3 items per 10,000 words as against 5 for we, whereas in the nineteenth century I drops to a frequency of 6 per 10,000 and we rises to 60.3. The difference between the eighteenth and nineteenth centuries in the use of I and exclusive we is statistically significant (X2 = 305.24; p < .0001.).3 Such a switch may be a self-effacing move indicating a wish to reduce personal attributions, but it might also be a means of claiming communality and authority (Hyland, 2001: 27). Given the general increase in authorial presence observed in the corpus, the latter interpretation is undoubtedly more likely. Table 1 also shows important intertextual differences, especially regarding the relatively high number of singular pronouns in the 1710 text and the high proportion of plural pronouns in the 1890 text. The microanalyses at the end of this section will help us account for these divergent data. Together with authorial presence in terms of frequency of self-mention, we might also explore the rhetorical function of first person personal pronouns in our corpus. For this purpose I have examined the verbs which co-occur with the first person and thus indicate distinct representations of authorial reference in the texts (cf. Appendix). Broadly speaking, writers use first person markers with three discourse functions (cf. Hyland 1999: 118–119): 2. Hundt et al. (2012: 236) offer similar findings: they compare two texts from the field of hard sciences, one from the eighteenth century and another from the twentieth century. They find a development in scientific writing from a more involved, personal style to a more impersonal and informational one. They do not study the nineteenth century, and for this reason their data are not comparable to ours. 3. The number of examples for author is too low to apply chi-square tests.
132 Elena Seoane
i. Organise arguments and structure their texts, such as answer, ask, conclude, discuss, introduce (12): (12) But let us put a caſe of true equality or Indifference, and what I have aſſerted will more manifeſtly appear true (CEPhiT 1717 Collins 697)
ii. To discuss research activities, like demonstrate, examine, prove, publish, show (13): (13) We have prov’d §XI. of the preceding Chapter, that Motion is no more eſſential to Matter than Reſt, that of it ſelf it can never bring it ſelf into Motion, that it wou’d for ever continue in the ſtate it is put in, and, if it was from all Eternity at reſt, it would continue ſo for ever; (CEPhiT 1705 Cheyne 473)
iii. To indicate attitudes towards findings or theoretical positions, such as believe, consider, maintain, regard, think (14): (14) This, I think, is a complete enumeration of the essential elements distinguishable in the supposed process-content of consciousness, considered in its character of an act of choice or immanent volition (CEPhiT 1898 Hodgson 7765)
The verbs falling outside these three categories have been classified as ‘other’ (cf. Appendix). Table 2, below, sets out the percentage of verbs used with each of these functions in each text; for the text from 1700 I provide only raw numbers, since they are so infrequent that expressing them as percentages does not seem helpful. Table 2. Rhetorical function of verbs used with first person pronouns 1700 1705 1710 1717 1885 1890 1893 1898
Organise arguments
Research activities
Attitudes
Other
-324.3% 19.2% 42.5% 41.4% 44.7% 40.0% 72.7%
-048.6% 21.2% 15.0% 31.0% 31.6% 28.0% 18.1%
-118.9% 9.6% 32.5% 12.1% 1.3% 12.0% 9.1%
-1 8.1% 50.0% 10.0% 15.5% 10.5% 20.0% –
The discrepancies observed in texts 1710 and 1890 can only be explained by a detailed study of these texts, which is undertaken at the end of this section. As for the remaining texts, we see a predominance of verbs used with author agents to organise arguments and to explain research activities, as is expected in scientific discourse. The first group could be said to increase its frequency over time, as
Chapter 7. Authorial presence in late Modern English philosophical writing 133
if metadiscourse language were steadily becoming more habitual (as in Example (15)), while the others display a fairly heterogeneous distribution. (15) I will take the liberty of stating it in some detail. We may begin by following Reid for a little in his own method of attacking the question, and then proceed to gather up the results in our own way. ( CEPhiT 1885 Seth 6930)
Of note here is the very last text, from 1898, where all the verbs fall within the categories expected from scientific writings, with none belonging to other semantic fields, thus indicating perhaps an increasing specialization of the vocabulary in scientific discourse and a decrease in the degree of freedom allowed to the writer in a specific disciplinary community. This is at least the tendency in the hard sciences, as observed by Biber & Conrad (2009: 166): “science research articles […] have become much more narrowly defined in terms of textual conventions.” The interactive character of academic writing, which allows the writer to build convincing arguments, includes not only authorial stance but also reader engagement (cf. Hyland, 2004), and for this reason I have also analysed this in our texts. Engagement has to do with to the writer’s wish to include readers in the discourse as active participants, in order to show disciplinary solidarity, guide the reader through particular interpretations, and predict possible objections. Findings from the corpus, shown in Table 3, include the following: inclusive we, which refers to writers, readers and all members of the community (16); exclusive you (cf. (7) above), hidden you (8) and the reader or reader, another form of direct reader address (17). (16) And as they, moreover, succeed or introduce one another in an orderly and coherent way, we gradually learn to recognise constant conjunctions, to which we give the name of laws of nature. (CEPhiT 1885 Seth 394) (17) I own, Reader, the Women-Haters will be ready to ſay, Theſe five Hundred Letters (being a Correſpondence between two Perſons of a different Sex, and one of ‘em marry’d) are Light, Vain, Airy; (CEPhiT 1710 Dunton 7542)
It is noteworthy that the texts showing the highest proportion of stance features also display a high proportion of reader involvement, namely the texts from 1710 and 1890. The diachronic evolution is also towards a rise in use, with the difference between the eighteenth and nineteenth centuries in the use of all engagement features being statistically significant (X2 = 132.4; p < .0001). This increase is to be expected, given the findings of Hyland for PDE philosophical texts: 110 items per 10,000 words in Hyland (2006 and 2008) and 118 in Hyland (1999). These findings indicate that the increase observed in authorial presence is part of an increase in the interactive character of philosophical texts in this period.
134 Elena Seoane
Table 3. Number of engagement strategies per text type (NF per 10,000 words in brackets) 1700 1705 1710 1717 Incl. we Excl. you Hidden you (the) reader TOTAL
20 4 4 40
1 21
29 33 1 29 92
16
TOTAL 65 (16.3) 37 (9.3) 7 (1.8) 32 (8.0) 173 (43.3)
2 2 20
1885 1890 1893 1898 34
34
121
19
2
8 1 28
123
TOTAL
37
211 (52.8)
37
10 (2.5) 1 (0.3) 222 (55.5)
So far we have dealt with elements that signal authorial (and reader) presence. Other strategies are available to the writer who wants to step back and hide their involvement, the most common being agentless passives whose recoverable agent is the author of the text (cf. Example (4)). Up to 5.5 per cent of the passives in the texts were found with this function. Other linguistic devices available are impersonal-subject constructions, as exemplified in (9) to (11) above. These pronouns refer to people in general, not to a specific agent; this vague meaning minimizes authorial presence and reduces the writer’s responsibility for his/her claims. Their frequency can be seen in Table 4. Table 4. Number of impersonal strategies per text-type (NF per 10,000 words in brackets) 1700 1705 1710 1717 Passive Author(s): Imp. you Imp. one Imp. we Imp. many, somebody, anybody, etc. Imp. a man, no man some men, people TOTAL
1
16
7
5
29 (7.3)
7 2
1
7 9
1
16 (4.0) 3 (0.8) 7 (1.8) 10 (2.5)
3
1
4 (1.0)
35
18
8 1
10
TOTAL
6
87 (21.8)
1885 1890 1893 1898 TOTAL 23
23
30
30
10
10
21
84 (21.0)
11
11 (2.8)
32
95 (23.8)
If we compare the results in Table 4 for impersonalizing strategies with those in Table 1 for self-mention, we see that the proportion of impersonal constructions is much lower than the frequency of self-mention: 21.8 (per 10,000 words) in the eighteenth century as compared to 51.3 self-mentions, and 23.8 against 67 in the nineteenth century. Moreover, despite the increase in use of impersonalising
Chapter 7. Authorial presence in late Modern English philosophical writing 135
strategies seen in Table 4, the gap between presence and absence of self-mention grows over time. Philosophical texts in late Modern English, therefore, already show the tendency witnessed in Present-Day English towards a strong authorial presence in contrast to the hard sciences. With the impersonal strategies there is a clear predominance of passives over impersonal-subject constructions, probably because the latter have normally been associated with informal text-types while passives tend to feature in formal contexts such as those under study here. From a diachronic perspective, impersonal-subject constructions are seen to practically disappear from the corpus (with the exception of impersonal we in one text) and their function is taken over by the agentless passive. This is in tune with the general evolution found in the language towards a preponderance of impersonal passives over impersonal-subject constructions, especially in formal texts (cf. Seoane, 2000). Before we turn to the microanalyses of the texts from 1710 and 1890, I would like to mention the frequent use of impersonal agentless passives whose agent is an unidentified group of philosophers and members of the scientific community. This does not serve the specific purpose of hiding the author of the text but undoubtedly lends an impersonal flavour to a text, helping to focus on scientific arguments and facts rather than human agents, as in (18). Its use is frequent and on the increase, from 7.5 occurrences per 10,000 words in the eighteenth century to 26.3 in the nineteenth century. (18) Thought which is thus characterised cannot easily be made an object of scientific study. (CEPhiT 1890 Mackenzie 415)
In Tables 1 to 3 the texts from 1710 and 1890 stand out in that they display data which diverge from those of the other texts. I will therefore take a closer look at these two texts, with the aim of accounting for these differences. The 1710 text is Athenianism, by John Dunton (see Appendix), and the author is the protagonist of the “project” he wants to narrate, namely the “Platonick Love (or innocent Pleasure) I found in corresponding with her [Madam Singer] for Six Years.” The text is crowded with references to himself, as author and protagonist of the story, as can be seen in the following excerpt: (19) So that ‘tis to Madam Singer I owe my Beſt (or Intelleƈtual) Pleaſures, and in a great Meaſure my Love to Scribling (being never pleas’d but when I was writing to her or hearing from her). Nay, I might truly ſay, my very Athenian Oracle it ſelf, had never been ſo kindly receiv’d (as to come to a Tenth Edition) [note] Including all thoſe 20 Single Volumes firſt Printed by my ſelf, under the Title of Athenian Mercury, and ſince Re-printed by [Mr]. Bell, under the Title of Athenian Oracle. (CEPhiT 1710 Dunton 385)
136 Elena Seoane
As might be expected from the subjective topic of the text, and from the fact that it is narrated from a first person perspective, the rhetorical functions of the first person pronouns are radically different from the functions of these in the other texts (cf. Table 2): 50% of the verbs are not typical of scientific writings but belong to the category ‘other’, pertinent examples including live, love, protect, strip and wed. The same is the case with those nouns that collocate with first person possessive pronouns (my, our; cf. Kuo, 1999: 135; Hyland, 2001: 223). As can be seen in the appendix, possessive pronouns do not occur at all in three of the eight texts, and have a low frequency in the remaining five, with the exception of the texts from 1710 and 1890. Most of the accompanying nouns in the 1710 text are not typical of scientific discourse, as was the case with the verbs; thus we find courtship, amour, wedding, sex, chats. Another characteristic that sets the text from 1710 apart is the frequent reference to the Reader, with 29 instances from a total 33 in the corpus, and second person you, referring also to readers, with 33 out of 37 occurrences in the corpus. These allusions to the reader make this text very interactive, with the style more typical of a dialogue than a written text for an unknown audience. This can be clearly seen in the excerpt below where the author engages in an imaginary conversation with the reader: (20) or ſuppoſe, Reader, you had neither Ears to hear your Lady ſpeak, nor Eyes to ſee her Beauty, ſhall you not therefore be ſubjeƈt to the Impreſſions of Love – If you anſwer, No, I can alledge divers born deaf and blind that have been wounded: If you grant this, then [hidden you] confeſs the Heart muſt have his Hope, which is neither ſeeing nor hearing – (CEPhiT 1710 Dunton 5736)
In most cases the author directly appeals to the reader, using Reader, with a capital initial and separated by commas, as a vocative. The author is trying to engage the reader and bring him closer to his own position, as if trying to gain the reader’s trust, empathy and understanding, at times even asking explicitly for his support (Reader, be charitable now, … Phil 1710 Dunton 2196). We can see this in the frequent use of know and see with second person pronouns, an attempt to make readers adopt the author’s perspective and a means of claiming a common background between reader and author. He is also trying to address foreseeable criticism or expectations on the part of the reader, as in Examples (21) and (22). This occurs six times in the text. (21) I know, Reader, you’ll ſay, That Platonick Love has min’d half the Female Sex, (…). To this I anſwer, Platonick Love (or a Tender Friendſhip between Perſons of a different Sex) is not only innocent (…) (CEPhiT 1710 Dunton 1146)
Chapter 7. Authorial presence in late Modern English philosophical writing 137
(22) Perhaps, Reader, you’ll here expeƈt I ſhou’d deſcribe the Purity of that Love which ſuch profeſs who diſtinguiſh themſelves from the Herd of ſenſual Inamoratos (CEPhiT 1710 Dunton 2767)
Given the strong presence of both author and reader, it is hardly surprising that the proportion of impersonal constructions in this text is among the lowest in the corpus (cf. Table 4). CEPhiT also offers useful metadata on texts, that is, explicit information about the origin, education and profession of the author. As can be seen in the appendix, most of the authors in our corpus sample are philosophers, writers and freethinkers, occupations that imply a certain kind of educational background. Of Dunton, however, we are told only that he was a bookseller; perhaps it is not unreasonable to hypothesize that he was alien to the disciplinary community and did not care – or indeed know how – to follow closely the textual conventions of the particular genre. He might have chosen, rather, to follow a dialogic style typical of other genres or periods. This would be an example of individual divergence within a disciplinary discourse, an example of independent creativity not wholly shaped by common practices. The text from 1890 is An Introduction to Social Philosophy by John Stuart Mackenzie (see Appendix). The very first paragraph sets the tone of the text (cf. (23) below), which deals with how philosophers, and among them the author (normally referred to as we), should deal with social philosophy. It is, then, little wonder that most sentences have we as topic, instead of philosophical processes, acts, elements or ideas. It contains a total of 254 instances of we; 124 of these refer to the author exclusively, 121 are inclusive, referring to author and readers (as in Example (23) below), and nine refer to mankind in general, with the meaning ‘we as humans’ (see Note 1). (23) IT is my object in this essay to define, in a broad and general way, the scope and limits of the application of philosophical principles to social questions. I wish to ascertain in what respects we may hope for light and guidance from philosophy in dealing with such questions, and in what respects it is necessary that we should look for light and guidance elsewhere. (CEPhiT 1890 Mackenzie 3772)
In contrast with the text from 1710, the high number of first person pronouns here does not preclude the occurrence of a high number of impersonal strategies (cf. Table 4): there are 30 passives eliding reference to the author, and 68 passives suppressing reference to scientists. The author and other philosophers are clearly the focus of the essay and are given an agent role in most sentences, even if the agent is elided and only recoverable from the context; see (24), which displays a
138 Elena Seoane
combination of agentless passive clauses where the retrievable agent is the author with active clauses having exclusive (authorial) we as subject: (24) But the possibility of such an ultimate exclusion need not concern us here. For our purpose it seems clear enough that these four distinct species of science must be recognised. We may distinguish them by saying that the first seeks the attainment of knowledge; the second, of understanding; the third, of insight; and the fourth, of wisdom. (CEPhiT 1890 Mackenzie 8344)
Finally, it is important to note that the text from 1890 presents the lowest percentage of attitude verbs in the whole corpus; despite the overwhelming use of authors and scientists in general as agents, however, the tone of the text is not especially evaluative and attitudinal, as is shown by the fact that most verbs belong to the realm of presenting objective scientific arguments and activities. Similarly, the nouns in the 1890 text denote scientific thinking and processes (study, inquiry, discussion, line of thought, principles, investigation, method). 4. Conclusions This paper has studied the diachronic evolution of authorial presence in LModE philosophical writings as represented in CEPhiT. The findings show that authorial self-mention in late Modern English is lower than in Present-Day English philosophical writings, and that there is a statistically significant increase in the use of self-mention during the period. In addition, the corpus shows a switch in use from I to we, although all texts are single-authored. The main rhetorical functions of the verbs used with authors as subject are typical of scientific writings, since they essentially organise arguments, discuss research activities and processes and also express attitudes and evaluations. An analysis of the strategies available to avoid author involvement – passives with the author as elided agent, impersonal-subject constructions – shows that these are always less frequent than self-mention and that the gap between self-mention and author avoidance grows throughout the period. The increasing presence of the author in philosophical texts is in tune with Hyland’s (2006) observations about the differences between the soft and hard sciences, whereby writers in the humanities and social sciences appear to adopt far more explicitly involved standpoints, actively supporting arguments and intervening to evaluate material and express points of view, focusing less on methods and theories and referring more to social actors and processes. More interestingly, the corpus also shows intradisciplinary variation. We have focused on two texts in particular; one of them (from 1710) is a personal narration
Chapter 7. Authorial presence in late Modern English philosophical writing 139
where the author is also the protagonist of the story and establishes a dialogue with the reader, to whom the author makes direct and explicit reference very frequently. The discourse function of self-mention here is very non-scientific-like, in that the verbs used do not refer to scientific arguments, processes or methods. The metadata provided in CEPhiT reveal that the author of this text is unique in our corpus sample, in that it is the single writer without an academic background, which was perhaps a determining factor in his writing not following the stylistic conventions of the discipline so closely. The other text dates from 1890 and is notable in that it contains a very high proportion of both self-mention and impersonal strategies. A close analysis of the text shows that it concentrates on the role of scientists (authors often included), and for that reason the role of agent (present or elided) tends to be of members of the scientific community. What these two texts show is that variation within disciplinary communities is characteristic of LModE scientific writing. In other words, the findings here challenge the static idea of disciplines as autonomous, unmovable objects and is in tune with sociohistoric theories in which disciplines are seen as open systems allowing for individuality and variation. Here, intradisciplinary variation seems to indicate greater dependence on other factors, such as education, professional background, and expertise. In other words, it shows that community members rarely comprise a uniform and undifferentiated mass, and even if they all try to align themselves with the knowledge-making practices of their community, which validates the notion of disciplinary community as an essential one in the production and interpretation of texts, this is not the only dimension to the discoursal alternatives available to them. Their diverse experiences and biographies, alongside their diverse goals and methods when engaging in writing, also shape their discourses. In the Introduction I mentioned a very interesting question by Biber & Conrad (2009: 166): “How can researches distinguish between change within a genre/register versus change to become a different genre/register?” The study of authorial reference in LModE philosophical writings throws some light on this question: if we compare authorial presence in Late Modern and Present-Day English we observe a statistically significant increase in this, and also an increasing gap between frequent self-mention and infrequent impersonalizing strategies. Despite this difference in linguistic styles and authorial presence, I do not see grounds to argue that these are two distinct genres and registers, because they differ in neither (1) the rhetorical functions that self-mention has, nor (2) communicative purpose. What may have changed, and this would be interesting for future research, is the degree of interdisciplinary variation found in Late Modern as compared to Present-Day English.
140 Elena Seoane
References Bazerman, Charles (1988). Shaping written knowledge: The genre and activity of the experimental article in science. Madison: University of Wisconsin Press. Bazerman, Charles (1991). How natural philosophers can cooperate: The literary technology of coordinated investigation in Joseph Priestley’s History and Present State of Electricity (1767). In Ch. Bazerman & J. Paradis (Eds.), Textual dynamics of the professions: Historical and contemporary studies of writing in professional communities (13–44). Madison: University of Wisconsin Press. Bazerman, Charles & James Paradis (Eds.) (1991). Textual dynamics of the professions: Historical and contemporary studies of writing in professional communities. Madison: University of Wisconsin Press. Biber, Douglas & Susan Conrad (2009). Register, genre and style. (Cambridge Textbooks in Linguistics). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511814358 Bizzell, Patricia (1982). Cognition, convention and certainty: What we need to know about writing. Pre/Text, 3, 213–241. Canagarajah, A. Suresh (2002). Critical academic writing and multilingual students. (Michigan Series on Teaching Multilingual Writers). Ann Arbor: The University of Michigan Press. Hundt, Marianne, David Denison & Gerold Schneider (2012). Relative Complexity in scientific discourse. English Language and Linguistics, 16/2, 209–240. doi: 10.1017/S1360674312000032 Hyland, Ken (1999). Disciplinary discourses: Writer stance in research articles. In C. N. Candlin & K. Hyland (Eds.), Writing: texts, processes and practices (99–121). London: Longman. Hyland, Ken (2000). Disciplinary discourses: Social interactions in academic writing. London: Longman. Hyland, Ken (2001). Humble servants of the discipline? Self-mention in research articles. English for Specific Purposes, 20, 207–226. doi: 10.1016/S0889-4906(00)00012-0 Hyland, Ken (2002). Authority and invisibility: Authorial identity in academic writing. Journal of Pragmatics, 34, 1091–1112. doi: 10.1016/S0378-2166(02)00035-8 Hyland, Ken (2004). Engagement and disciplinarity: The other side of evaluation. In G. Del Lungo (Ed.), Academic discourse: New insights into evaluation (13–30). Amsterdam: Peter Lang. Hyland, Ken (2006). Disciplinary differences: Language variation in academic discourses. In K. Hyland & M. Bondi (Eds.), Academic discourse across disciplines (17–45). Frankfurt: Peter Lang. Hyland, Ken (2008). Disciplinary voices: Interactions in research writing. English Text Construction, 1/1, 5–22. doi: 10.1075/etc.1.1.03hyl Hyland, Ken (2009). Corpus informed discourse analysis: the case of academic engagement. In M. Charles, S. Hunston & D. Pecorari (Eds.), Academic Writing: at the Interface of Corpus and Discourse (110–128). London: Continuum. Hyland, Ken & Marina Bondi (Eds.) (2006). Academic discourse across disciplines. Frankfort: Peter Lang. Ivanic, Roz (1998). Writing and identity: The Discoursal Construction of Identity in Academic Writing. Amsterdam: John Benjamins. doi: 10.1075/swll.5
Chapter 7. Authorial presence in late Modern English philosophical writing 141
Kuo, Chih H. (1999). The use of personal pronouns: Role relationships in scientific journal articles. English for Specific Purposes, 18, 121–138. doi: 10.1016/S0889-4906(97)00058-6 Prior, Paula A. (1998). Writing / Disciplinarity. A Sociohistorical account of literate activity in the academy. London: Lawrence Erlbaum Associates. Seoane, Elena (2000). Impersonalising strategies in Early Modern English. English Studies, 81/2, 102–116. doi: 10.1076/0013-838X(200003)81:2;1-T;FT102 Starfield, Sue (2001). ‘I’ll go with the group’: Rethinking ‘discourse community’ in EAP. In J. Flowerdew & M. Peacock (Eds.), Research perspectives in English for Academic Purposes (132–147). Cambridge: Cambridge University Press. doi: 10.1017/CBO9781139524766.012 Swales, John M. (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Taavitsainen, Irma, Päivi Pahta, Noora Leskinen, Maura Ratia & Carla Suhr (2002). Analysing scientific thought-styles: What can linguistic research reveal about the history of science?. In H. Raumolin-Brunberg, M. Nevala, A. Nurmi & M. Rissanen (Eds.), Variation past and present (251-270). (Mémoires de la Société Néophilologique de Helsinki LXI). Helsinki: Société Néophilologique.
Appendix CEPhiT texts analysed 1700 Some reflections upon marriage: Occasion’d by the Duke and the Dutchess of Mazarine’s Case; Which is also consider’d. By Mary Astell (writer, London). Essay 1705 Philosophical Principles of Natural Religion: Containing the Elements of Natural philosophy, and the Proofs for Natural Religion, Arising from them. By George Cheyne (medical doctor, Edinburgh). Treatise 1710 Athenianism. By John Dunton (Bookseller, London). Treatise 1717 A Philosophical Inquiry Concerning Human Liberty, by Anthony Collins (Cambridge, freethinker). Treatise 1885 Scottish Philosophy. A Comparison of the Scottish and German Answers to Hume. By Andrew Seth Pringle-Pattison (Edinburgh, philosopher). Lecture 1890 An Introduction to Social Philosophy. By John Stuart Mackenzie (Edinburgh/Cambridge, philosopher). Essay 1893 Philosophy and Political Economy in some of their Historical Relations. By James Bonar (Glasgow/Oxford, Civil servant and economist). Treatise 1898 The Metaphysic of Experience. In four books. By Shadworth Hollway Hodgson (Oxford, Philosopher). Treatise
142 Elena Seoane
Rhetorical functions of verbs used with first person pronouns Organise arguments
Research activities
Attitudes
Others
1700 Say, Speak
–
Think
Be tempted
1705 Answer, Ask, Come, Mean, Pass over, Say, Speak, Write
Bring to evince, Call, Demonstrate, Examine, Form, Instance, Know, Observe, Produce, Prove, Show
Be certain, Consider, Allow, Be Find (‘consider’), satisfied, Ruin Think
1710 Answer, Describe, Hint, Behold, Call, Know, Inquire, Introduce, Prove, Publish, See Mean, Portray, Run On, Say, Sustain, Tell, Treat of, Write
Assure, Believe, Challenge, Judge, Think
Admire, Allow, Be, Blame, Carry, Choose, Court, Desire, Engage, Fall, Find, Follow, Forget, Give, Hope, Imitate, Keep, Live, Love, Make Love, Owe, Own, Promise, Protect, Set, Strip, Turn, Value, Wed
1717 Add, Advance, Answer, Confirm, Observe, Ask, Assert, Conclude, Prove, Show Draw, Reply, Run, Through, Say
Conceive, Consider, Contend, Defend, Grant, Maintain, Propose, Take (‘consider’), Think
Allow, Do, Hope, Wave
1885 Arrive, Begin, Come, Concern, Give quotation, Go beyond, Go further, Mean, Mention, Pass, Say, Speak, Start, Trace
Apply, Call, Develop, Attribute, Consider, Find, Have knowledge, Discard, Recognize, Know, Perceive, Read, Suppose See, See
Be willing, Do, Follow, Leave, Precluded, Select, Take
1890 Ask, Assume, Be guided, Be led, Begin, Deal with, Define, Devote, Discuss, Distinguish, Go, Inquire, Notice, Pass, Reach, Refer, Say, Start, Understand
Analyse, Ascertain, Confined, Discover, Dispose, Effect, Exercise, Caution, Find, Glance at, Know, Look forwards, Make an effort, Make progress, Observe, Secure, See, Take to pieces, Think out
Be misled, Consider, Be, Be satisfied, Despair, Recognize, Hope, Succeed, Regard, Suppose, Wish Think
Chapter 7. Authorial presence in late Modern English philosophical writing 143
Organise arguments
Research activities
Attitudes
1893 Add, Be led, Begin, Examine, Find, Gather, Consider, Doubt, Conclude, Gather that, Look for, See, Seek Infer Go as far as, Notice, Pass over, Say 1898 Begin, Come (to the Be in search of, Call, question of), Define, Take (the elements) Do with (‘deal with’), Have a grasp of, Lay a distinction, Lay bare, Mean, Return to, Say, Speak, Use (the word)
Others Be Surprised, Depend, Follow, Take
Be aware of, Think
Nouns used with first person possessive pronouns 1700 – 1705 – 1710 friendship, obligation, love, oracle, wife, marriage, courtship, amour, correspondence, wedding, sex, chats, coat, address, age, project, undertaking, writing, letter, intention, understanding, ease, case, hearts, Platonicks 1717 arguments, subject, reader, opinion 1885 making, record, definition, attention, way 1890 object, study, method, sympathics, hopes, aims, wishes, search, time, modes of thought, inquiry, discussion, line of thought, observation, principles, fancy, investigation, experience, material, purpose, business 1893 retrospect, questions, own mind 1898 –
Chapter 8
The status of seem in the nineteenth-century Corpus of English Philosophy Texts (CEPhiT) Francisco Alonso-Almeida and Inés Lareo
University of Las Palmas de Gran Canaria / University of Vigo
1. Introduction This chapter offers an analysis of the functions of seem in the nineteenth-century section of the Corpus of English Philosophy Texts (CEPhiT), one of the sub-corpora of the Coruña Corpus. It considers the several meanings of the verb with specific regard to linguistic context and the possible effect of this on the preference for seem over semantically related forms, such as appear. We will analyse occurrences of seem, describing the verb’s taxonomy in accordance with Aijmer’s (2009) classification of the evidential system. In many ways, the values of seem are related to those of modal verbs (Aijmer, 2009). However, we will depart significantly from this modal sense of seem, and in our description of examples from CEPhiT we will follow Aijmer (2009), de Haan (2007) and Cornillie (2007a), among others, in showing the evidential, rather than epistemic, meaning of this verbal form. The structure of the chapter is, as follows. Section 2 describes the meanings of seem. Section 3 presents a historical overview of this verb. Section 4 includes a brief discussion of the relationship between evidentiality and epistemic modality. In this point, we will show our understanding of these two concepts, and the way in which we apply them to the study of seem. Section 5 presents the corpus and, also, the methodology of study. In the following section, we present the results of the analysis of the corpus, and then the discussion section follows. Finally, the conclusions drawn from our study are given. 2. The meanings of seem Aijmer (2009) provides one of the most comprehensive recent studies of seem, focussing on translation in order to disambiguate the meaning of this verbal form. She defines seem as a perception verb related to other perception verbs, such as doi 10.1075/z.198.08alo © 2016 John Benjamins Publishing Company
146 Francisco Alonso-Almeida and Inés Lareo
see, sound, and feel, but showing “a vaguer meaning” and with additional meanings closer to evidential verbs of cognition, i.e. think (Aijmer, 2009: 72). The entry for seem, v. in the OED gives the following meanings (with examples taken from the same source; thorns in these excerpts have been replaced by